⚡ Exploratory Data Analysis, Hypothesis Testing, and Multiple Regression Engineering ⚡
Note
This project was carried out by Kaibo Zhang, a student of Desautels Faculty of Management at McGill University, under the supervision of Professor Gabriel Frieden. This study relied on first-hand data collection through a self-administered questionnaire developed on the Qualtrics
platform. All analysis and model building were conducted on Excel
.
University presents an opportune platform for students to develop their financial literacy skills. At Desautels, the Faculty of Commerce, students begin their education by learning about the economy, accounting, and global business backgrounds, which are on broader scopes than trifles like monthly spending. Despite most of their outstanding academic performance, many encounter challenges in controlling their expenditures due to an inadequate understanding of the underlying factors affecting their spending behavior. Thus, this research investigates the effects of multiple factors on the monthly spending of students enrolled at Desautels. By performing statistical analyses of the data collected, this paper aims to interpret the relationship between our targeting variables and monthly spending to provide valuable insights into practical recommendations for controlling students’ monthly spending. These recommendations will assist students in understanding their spending behaviors and managing their budgets to attain their respective financial goals.
Regression_Analysis_on_the_Factors_Affecting_the_Monthly_Spending_of_Desautels_Students/
├── README.md
├── Data Summary Breakdown.xlsx # visualization graphs and tables from exploratory data analysis and hypothesis testing
├── MGCR 271 Data Collection and Analysis.xlsx # model building and selection
├── Regression Analysis on the Factors Affecting the Monthly Spending of Desautels Student.pdf # paper
-
Data Collection.
- data collection was completed through a delicately designed questionnaire built on
Qualtrics
platform. The link can be found here. The questionnaire comprised seven questions, six requiring numerical inputs, while the remaining required a text input. - based on conventional wisdom, the following independent variables were selected:
- a. the distance of residence from McGill in kilometers (quantitative),
- b. frequency of dining out per month in numbers of times (quantitative),
- c. living arrangement(categorical),
- d. number of monthly subscriptions (quantitative),
- e. frequency of going to the groceries in numbers of times (quantitative), and
- f. frequency of shopping (quantitative).
NOTE: Below are some screenshots of the questionnaire.
The data was exported into an
xlsx
file for further analysis. - data collection was completed through a delicately designed questionnaire built on
-
Exploratory Data Analysis.
- exploratory data analysis was conducted to comprehensively understand the data at hand. The study included the following portions:
- a.
SOCS
analysis for variable distributions, - b.
Confidence Interval
inference for population means, and - c.
Correlation
analysis between the variables.
- a.
Detailed rationales and explanations can be found in the paper.
- exploratory data analysis was conducted to comprehensively understand the data at hand. The study included the following portions:
-
Hypothesis Testing.
-
several hypothesis tests were conducted to gain more directional insights before building the actual regression model:
- a. two-sample t-test for difference in population means
- b. one-sample t-test for the true slope beta1
- c. global f-test for all variables
The detailed explanations can be found in the paper.
-
-
Multiple Regression.
- to obtain the optimal multiple regression model, this study mimicked the
best subset
method and tested all possible combinations of variables. - then, further examinations were conducted to remove insignificant variables from the model.
the final equation obtained is as follows:
Monthly Spending = 916.4253 + 16.7489 (Dine-out) + 85.4177 (Groceries) + 29.4629 (Subscriptions) + e
- to obtain the optimal multiple regression model, this study mimicked the