This repo is for a group project for the course MTH416A : Regression Analysis during the academic session 2021-2022 (even semester) at IIT Kanpur.
Ozone concentration and meteorology in the LA Basin, 1976 - A Regression Study
[Report] [Presentation]
Prof. Sharmishtha Mitra, Department of Mathematics and Statistics, IIT Kanpur
- Arkajyoti Bhattacharjee
- Vishweshwar Tyagi
- Saurab Jain
- Apoorva Singh
Setup | Topic |
---|---|
1. Introduction | |
2. Data Description | |
3. Exploratory Data Analysis | |
Parametric | 4. Multicollinearity |
Detection:
|
|
5. Variable Selection | |
Selection Methods:
|
|
6. Heteroscedasticity of Errors | |
Detection:
|
|
7. Normality of Errors | |
Detection:
|
|
8. Autocorrelation | |
Detection:
|
|
9. Prediction | |
Nonparametric | 10. Alternating Conditional Expectation (ACE)
|
11. Final Model Fit and Predictions |
Model Type | Model Name | RMSE | |
---|---|---|---|
Parametric | Model 0 | 0.6986 | 4.2745 |
Model A | 0.7662 | 0.8272 | |
Model B | 0.7202 | 0.8830 | |
Model C | 0.7077 | 1.2565 | |
Nonparametric | ACE | 0.8271 | 0.3132 |
- Among the parametric models, model A has the highest
$R^2$ value as well as the lowest$RMSE$ value. - All models - A, B and C are better than the baseline model Model 0. This validates our corrections for multicollinearity, heteroscedasticity and autocorrelation and variable selection.
- Simple non-parametric models are better if the problem of prediction is to be solved. But here, the ACE model transforms the data so that maximum
$R^2$ can be achieved. And, as expected it has the highest$R^2$ value and the lowest$RMSE$ value amond all the models. - So among the models considered here, ACE model is the best, both for the problem of prediction and for the purpose of explaining ozone concentration by the meteorological variables based on the ozone dataset.
- Leo Breiman & Jerome H. Friedman (1985): Estimating Optimal Transformations for Multiple Regression and Correlation, Journal of the American Statistical Association, 80:391, 580-598
- Jolliffe, Ian T. (1982). "A note on the Use of Principal Components in Regression". Journal of the Royal Statistical Society, Series C. 31 (3): 300–303. doi:10.2307/2348005. JSTOR 2348005.
- Sung H. Park (1981). "Collinearity and Optimal Restrictions on Regression Parameters for Estimating Responses". Technometrics. 23 (3): 289–295. doi:10.2307/1267793.
- Wilkinson, L., & Dallal, G.E. (1981). Tests of significance in forward selection regression with an F-to enter stopping rule. Technometrics, 23, 377–380
- Akaike, H. (1973), "Information theory and an extension of the maximum likelihood principle", in Petrov, B. N.; Csáki, F. (eds.), 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2-8, 1971, Budapest: Akadémiai Kiadó, pp. 267–281. Republished in Kotz, S.; Johnson, N. L., eds. (1992), Breakthroughs in Statistics, I, Springer-Verlag, pp. 610–624.
- Akaike, H. (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19 (6): 716–723, doi:10.1109/TAC.1974.1100705, MR 0423716.
- Shapiro, S. S.; Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)". Biometrika. 52 (3–4): 591–611. doi:10.1093/biomet/52.3-4.591. JSTOR 2333709. MR 0205384. p. 593
- Breusch, T. S.; Pagan, A. R. (1979). "A Simple Test for Heteroskedasticity and Random Coefficient Variation". Econometrica. 47 (5): 1287–1294. doi:10.2307/1911963. JSTOR 1911963. MR 0545960.
- Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.
- Durbin, J.; Watson, G. S. (1950). "Testing for Serial Correlation in Least Squares Regression, I". Biometrika. 37 (3–4): 409–428. doi:10.1093/biomet/37.3-4.409. JSTOR 2332391
- Durbin, J.; Watson, G. S. (1951). "Testing for Serial Correlation in Least Squares Regression, II". Biometrika. 38 (1–2): 159–179. doi:10.1093/biomet/38.1-2.159. JSTOR 2332325
- Faraway, J.J. (2004). Linear Models with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.4324/9780203507278
- Hoerl, A. E., Kennard, R. W. and Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics-Theory and Methods, 4(2), 105-123.