Skip to content

Visualizes the data, builds a multi-linear regression model, applies a 10-fold cross-validation resampling method, and evaluates LM, SVM, and KNN model performance using R

Notifications You must be signed in to change notification settings

Xin-Bu/High_school_performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 

Repository files navigation

High school performance

Dataset

The data were collected from three high schools in the US, consisting of information on the students' performance as measured by three continuous outcome variables: math, reading, and writing, as well as five predictors: their demographic information on race/ethnicity, parental level of education, gender, lunch type, and test preparation course.

The R code for data visualization, descriptive statistics, and multi-linear regression was written in R Markdown and knitted to html.

Variables:

  • Math: The student's score on a standardized mathematics test, a continuous variable
  • Reading: The student's score on a standardized reading test, a continuous variable
  • Writing: The student's score on a standardized writing test, a continuous variable
  • Race/ethnicity: The student's racial or ethnic background (Asian, African-American, Hispanic, etc.)
  • Parental level of education: The highest level of education attained by the student's parent(s) or guardian(s)
  • Gender: The gender of the student (male/female)
  • Lunch: Whether the student receives free or reduced-price lunch (yes/no)
  • Test preparation course: Whether the student completed a test preparation course (yes/no)

Levels of categorical variables:

variables level_1 level_2 level_3 level_4 level_5 level_6
race/ethnicity group_a group_b group_c group_d group_e
parental level of education some high school high school some college associate's degree bachelor's degree master's degree
gender male female
lunch free/reduced standard
test_prep course completed none

Data source

High school performance

About

Visualizes the data, builds a multi-linear regression model, applies a 10-fold cross-validation resampling method, and evaluates LM, SVM, and KNN model performance using R

Topics

Resources

Stars

Watchers

Forks

Languages