I’ve commuted from the mid-Peninsula to Palo Alto / Mountain View off and on for the past 20 years for various employers. Over the years and through the booms and busts, I’ve watched my commute time both stretch and shorten. I’ve always hypothesized that there was a relationship between the time I had to spend stuck in my car and the relative health of the economy.
During the first dotcom boost (boom/bust), I actually kept a log of my commute times. Yes, I am geek and proud, so there! From 1998 to 2001, my commute from Foster City to Mountain View increased and then decreased by 10 mins. Unfortunately, that notebook is long lost, because I’d love to match it up against the analysis that I just completed using the PatternBuilders FinServ product FinancePBI.
Since I had lost the log I decided to practice some Google Fu and see if there was public data available that could be used to do the analysis. The closest thing was this amazing trove of road sensor data from pems.dot.ca.gov . Wow, what a treasure trove! I’ve loaded 3.5 years of daily “flow” data (count of cars) for the sensors that stretch from Gilroy to Marin on 101, 280, 580, 680, etc. – basically all the major highways in the Bay Area, about 2M geocoded sensor reads that when loaded up gives us about 6 MM data points when you include meta data. Given how lusciously detailed and granular the data is, I knew I would never get anywhere with just Excel. But as I mentioned I have a friend in big data, so I just loaded the data up into the PBI cloud using the csv import feature.
For an “economic” factor, I’m currently using the Stock Ticker data from the PBI Finance solution. The instance I was using had over 10 years of data from now until 2002. (See below for some other ideas I’m chewing on.) So using flow across a couple of sensors in Mountain View, I’ve matched it up against the Stock Ticker Data for two of my favorite companies – Intuit and Google – that both have large campuses nearby.
What did I find? Before I tell you, first let’s take a look at how I used PatternBuilders to find it. My first step was to look at the sensors on the Map view. Even though I’ve been driving over them for years, it’s not like there are signposts labeling each sensor, so I needed to find the sensors that employees of those companies would drive over to get to and from work. Since the data from the state came with latitude & longitude data for each sensor and the PBI import utility automatically geocodes imported data with geographical information, it was easy for me to find sensors that tracked my old commute to the area using PBI’s map view.
Once I knew what sensors to look at, I took a look at the data for closing price vs. daily flow at those sensors just to get a feel for the raw data and look for outliers (failed sensor being the most likely cause) and other factors that might affect the results of a Pearson’s correlation. Here’s a couple of views from the Line Chart view that shows the closing price for INTU and GOOG as well as the daily flow for the sensors.
Intuit’s stock consistently did well and Google showed quite a bit of volatility. Another thing to notice is the outlier readings from sensor 401360 in 2010. Marilyn’s first law of data geekdom – if you don’t take a look at the raw data first, before you start crunching and chewing, it will give you indigestion. You never know what’s hiding in some random week some random when.
A quick look at the above curves shows that there might be something there…but I need to run a real correlation. Luckily all that takes is a click on the correlation menu and ….Voila! Correlation! Which Is Not Causation. I won’t say that again, but it should be remembered (and will be by the conscientious among you) as you read the rest of this post.
(BTW, this screen shot shows the data grid that drives the charts in PB. It can be hidden as I did in the above pics.)
So what do we see? Well, Intuit’s stock price seems to be more closely related to changes in the traffic flow around its campus than Google’s. The hovering zoom in the graphic shows the line graph for the two data points beneath the correlation – in the case of the above its INTU versus sensor 401360. (Also, you’ll notice that I changed the dates using the cool date UI slider at the top of the chart to take out the outliers. If you’re curious, the correlations are more extreme with those data points included.) The view below shows the details behind the negative correlation for Google and sensor 402378.
And the meaning here? Well I’d love your thoughts, but one hypothesis we like is that since Google is a much more “global” company, its closing price is much less connected to activities at its Peninsula office. Intuit on the other hand is a very centralized company compared to Google with most of its core functions (software development, sales, etc.) centered at its Mountain View location – which means that you would expect that if there is a correlation between traffic to a company and a company’s performance that the correlation would be stronger. For this time period for Intuit and Google the data we have certainly supports that hypothesis. There are a lot of confounding factors here, most obviously the greater volatility of Google vs. Intuit. But certainly this relationship looks like a good research candidate for folks who trade or analyze stocks for a living!
I’m not done here…and I’ll take any and all ideas you guys have. I’m digging around for other economic factors – like unemployment, jobless claims, business starts, real estate changes, etc. – to match up against the road data. You have ideas?