Kevin Xie, Fay Yan, Claudia Yang, Karl Zhou
QTM 302W, Technical Writing, Emory University
The primary objectives of this project are:
- To analyze the impact of regional and demographic factors on educational disparities in the U.S.
- To identify patterns in SAT scores and GPAs across different socioeconomic backgrounds.
- Conducting descriptive statistical analysis of the dataset to summarize key trends.
- Visualizing the data through appropriate plots and charts to uncover insights.
- Identifying missing values, outliers, and data anomalies to ensure data quality.
- Documenting the analysis process for reproducibility and collaboration.
This repository contains the Exploratory Data Analysis (EDA) Code Notebook, developed as part of our research to understand disparities in educational outcomes across the U.S. The analysis focuses on identifying patterns, trends, and relationships within the dataset to inform decision-making and highlight key insights.
What regional and demographic factors contribute most significantly to disparities in educational outcomes in the U.S.?
How do socioeconomic factors influence SAT scores and GPAs?
Quantitative Analysis: Statistical summaries and tests to uncover relationships between variables.
Visualization & Spatial Analysis: Graphical plots and maps to visualize trends across regions and demographics.
Statistical Testing: Hypothesis testing to validate findings.
Regional Variations: Significant differences in SAT scores and GPAs were observed across regions, with urban and suburban areas outperforming rural areas.
Socioeconomic Influence: Higher household incomes and parental education levels strongly correlate with improved SAT scores and GPAs.
Demographic Trends: Disparities exist across racial and ethnic groups, emphasizing systemic inequalities.
Comparison of pre and post-COVID-19 data to assess the pandemic's impact on educational disparities.
Exploration of policy interventions aimed at reducing disparities.
R (for statistical analysis)
RStudio (integrated development environment)
Binder (for reproducibility and sharing)
Required libraries for EDA (e.g., ggplot2
, dplyr
, tidyr
, summarytools
).
- Clone this repository:
git clone https://github.com/ClaudiaYang/EDA-Project-in-R.git
cd EDA-Project-in-R
- Open the
EDA_Codebook.Rmd
file in RStudio or render theEDA_Codebook.html
file for a preview. - To reproduce the results, execute the code blocks sequentially in RStudio.
To generate the HTML file from EDA_Codebook.Rmd
, run the following command in R:
rmarkdown::render("EDA_Codebook.Rmd")