Motivation: Asset Idle Time
Asset Sharing companies capture the lost value of these idle assets by predicting who will need them and when, saving you money, and scraping a little off the top for profit. The key value these services add is in understanding and predicting the needs for these assets.
The D.C. Bike Share system has a similar problem. They want to predict how many bikes are needed so they maintain the right number of bikes for the city.
Data: DC Bike Share / Kaggle
Follow along in my Jupyter Notebook here: https://github.com/Ryanglambert/dc_bike_share_analysis/blob/master/Bike_Share_EDA.ipynb
Given date, time, weather, and other variables we will predict how many bikes will be used in a given hour. Let's take a look at the data!
I'll also be using Kaggle.com to check how well our model is doing against the "hold out" test set.
- **datetime** - hourly date + timestamp
- **season** - 1 = spring, 2 = summer, 3 = fall, 4 = winter
- **holiday** - whether the day is considered a holiday
- **workingday** - whether the day is neither a weekend nor holiday
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- **temp** - temperature in Celsius
- **atemp** - "feels like" temperature in Celsius
- **humidity** - relative humidity
- **windspeed** - wind speed
- **count** - number of total rentals
Plenty of combinations to consider. Using `seaborn` let's visualize at what variables correlate with bike share use. (I've made dummy variables for all categorical variables)
Categorical Variables Affect on Bike share use
Let's look at how time of day contributes to bike use.
Time of Day Effect On Bike Use
Zooming out a bit, the variance is many multiples of the expected value. I did not expect this at all.
Selecting The Right Link Function: Generalized Linear Models
There is a cousin to Poisson called Negative Binomial Distribution. It is the same as Poisson with one small difference. The mean and variance vary together, but are some multiple of one another.
First Model: GLM with Negative Binomial Link Function
Pearson Chi^2: 1320
Our histogram of residuals is mostly normally distributed. From these two plots I feel confident I have picked the right kind of distribution.
Kaggle Leader Board: 2522
Second Model: GLM Negative Binomial Link + Interaction terms
Aside: Interaction Terms
Interaction terms with dummy variables.
Interaction terms are a pairwise multiplication of our features by each other and ignoring the squared terms. i.e. (A + B) * (A + B) = (A^2 + 2AB + B^2). We're only interested in the '2AB' feature in there and we'll ignore all others. Scikit-learn makes this easy for us.
The cross-terms that have a 0 dummy variable will simply go to zero and thus shrink those appropriate parameters. Here's an example of what that looks like.
Interaction Model Performance
Pearson Chi^2: 599
Kaggle Leaderboard: 1827
What is RMSLE? (Root Mean Squared Log Error)
Like RMSE (Root Mean Squared Error) , RMSLE is a fit score but uses the Log of the outputs of the models vs the log of the actuals. The reason for this is the same reason why we're using a non-linear link function: We're predicting non-negative count data.
The way you can interpret RMSLE is essentially how many factors of the constant "e" (e=2.7818..., that e) that we're off by.
Back to evaluating the performance of the model, recall I had scores: .5 and .7
First model: e^.7 = 2
On average, our model was off by a factor of 2 : If we were predicting 100 cyclists our model would predict between 50 and 200 on average.
Model with interaction terms: e^.5 = 1.64
On average, our model with interaction terms was off by a factor of 1.64: If we were predicting 100 cyclists our model would predict between 60 and 164 on average.
Something I might do in the future is Model Stacking. You essentially find how to separate the data into groups that fit to separate models better and use those models for those data points. There is also ensembling where models can be given a weighted vote for an outcome variable.
(1) Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.