US Flight Delays Prediction Models based on Naïve Bayes, Regression Tree, and Logistic Regression Algorithms
This project was a comprehensive study that aimed to uncover the underlying factors that contribute to flight delays in the United States and develop a robust model to predict them. The project employed three machine learning algorithms - Naive Bayes, Regression Tree, and Logistic Regression - to determine the most accurate method for predicting flight delays.
The project began by acquiring and preprocessing vast amounts of data, including information on flight schedules, weather conditions, and airport information. This data was then partitioned into a training set and a test set, which were used to train and evaluate the prediction models, respectively.
The Naive Bayes algorithm was the first to be trained and tested, followed by the Regression Tree and Logistic Regression algorithms. The results of the project revealed that the Logistic Regression algorithm performed exceptionally well, achieving an accuracy of 85.14%. The Naive Bayes algorithm achieved an accuracy of 84.14% while the Regression Tree algorithm achieved an accuracy of 82.39%.
The project's findings were meticulously analyzed and discussed, ultimately revealing that the Logistic Regression algorithm was the most suitable for predicting flight delays in the United States. This project serves as a valuable tool for airlines and airport management to improve flight schedules and reduce the number of flight delays, thereby enhancing the overall travel experience for passengers.
- Python
- SciKit Learn
- Seaborn
- Pandas