TensorFlow deep regression model predicting bicycle rental.
A dataset has the number of bicycles rented during a day in a community, and other data related to the amount of bicycle renting, such as the weather situation on that day. The dataset has 731 instances and 15 variables. There are 7 categorical and 5 numerical variables among the explanatory variables. The remaining 3 variables can be considered explained, or dependent, variables. We aim to create a model to estimate the number of rented bicycles in a day. The next table summarizes our data:
Attribute | Summary |
---|---|
Instant | Sequential ID for each day. |
Dteday | Date for the instance, formatted as M/D/YYYY. |
Season | Season (1: spring; 2: summer; 3: fall; 4: winter). |
Yr | Year (0: 2011; 1: 2012). |
Mnth | Month (1 to 12). |
Holiday | If day is a holiday or not (0: not holiday; 1: holiday). |
Weekday | Day of the week. |
Workingday | If day is holiday/weekend or not (0: holiday/weekend; 1: working day). |
Weathersit | Weather type (1: clear; 2: mist; 3: light snow or light rain; 4: snow or heavy rain). |
Temp | Normalized temperature in Celsius. |
Hum | Normalized humidity. |
Windspeed | Normalized wind speed. |
Casual | Count of casual users. |
Registered | Count of registered users. |
Cnt | Total bike rentings, including both casual and registered. |
Linear regression can estimate values for a variable given other correlated variables. It does that by fitting a mathematical function with coefficients associated with each variable composing the function. When multiple explanatory variables are used to predict an explained variable, a multiple linear regression is performed. When multiple variables are present, more complex regression models may generate better results. However, it is important to control the complexity of a regressor model so that it remains efficient and interpretable. Fortunately, we can control the complexity of a regressor model when we are dealing with regressors built using neural networks.
Deep neural networks are capable of learning both linear and non-linear relationships between variables. Like this, it becomes possible to understand deeper connections between the independent and dependent variables. On the other hand, it has been known that deep neural networks with 3 hidden layers are enough to model pratically all patterns in the data. This provides an upper bound on the complexity of a neural network regressor, as its complexity is determined by the number of layers and neurons in it.
Using a neural network regressor with 3 hidden layers, we have set up a model that predicts bicycle rental with a close linear relationship with the actual numbers:
With an R2 score of 0.89, we argue that this neural network model is able to work pretty well with the bicycle rental dataset. A mean absolute error (MAE) of about 5% of the dependent variable's range confirms this notion. Additionally, the MAE only deviates from the mean by roughly 10% of the average number of bicycles rented.