Skip to content

angelaL8a/SpaceX-CapstoneIBM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ViewCount GitHub top language GitHub language count Stars Badge Forks Badge Pull Requests Badge Total Downloads

Project Logo

πŸš€ SpaceX Launch Success Prediction Project

An advanced data science project aimed at analyzing SpaceX launch data, identifying key insights, and applying machine learning to predict launch success outcomes.


πŸ“‘ Table of Contents


🌌 Introduction

The commercialization of space travel has revolutionized space exploration. SpaceX is at the forefront of this revolution, consistently launching reusable rockets, reducing launch costs, and increasing the success rate of missions. This project focuses on analyzing SpaceX Falcon 9 launch data and predicting launch outcomes using various machine learning models.

In this project, we:

  • Investigate relationships between launch sites, payload mass, and booster versions to success rates.
  • Apply predictive models like SVM, Logistic Regression, and Decision Trees for launch outcome predictions.
  • Tune hyperparameters for optimized performance.

Launch Success Launch Success Launch Success Launch Success Launch Success Launch Success


πŸ“‚ Project Structure

β”œβ”€β”€ modules/              # Jupyter notebooks for EDA, modeling, and visualizations
β”œβ”€β”€ reports/                # Generated reports and project documentation
β”œβ”€β”€ images/                 # Images and figures used in the report and README
β”œβ”€β”€ README.md               # Project overview and documentation

πŸ” Exploratory Data Analysis (EDA) & Insights

Launch Success Launch Success Launch Success Launch Success

Key Insights:

  • Highest Success Rate Launch Site: πŸš€ KSC-LC-39A with a 76.9% success rate.
  • Optimal Payload Range for Success: πŸ“¦ 2,000kg - 6,000kg has the highest success.
  • Lowest Success Rate Payload: βš–οΈ 7,000kg - 10,000kg range experiences lower success.
  • Most Reliable Booster Version: πŸ”§ FT version has demonstrated the highest reliability.

EDA was performed using Pandas, Matplotlib, and Seaborn to visualize these insights. Additionally, we created interactive dashboards using Plotly and Dash to dynamically explore the data.


🧠 Predictive Analysis (Machine Learning)

We explored different classification models to predict launch outcomes. Our process included:

  1. Data Preprocessing:

    β†’ Standardization of features (payload mass, booster version, etc.).
    β†’ Train-test split (80%-20%) to validate model performance.

  2. Model Development:

    β†’ Support Vector Machines (SVM)
    β†’ Decision Trees
    β†’ Logistic Regression
    β†’ K Nearest Neighbors (KNN)

  3. Hyperparameter Tuning:
    We performed a grid search over key parameters for each model to ensure optimal performance.

  4. Evaluation:
    Models were evaluated using accuracy on the train set. Cross-validation was employed to mitigate overfitting.


πŸ“Š Model Performance

Model Accuracy
Support Vector Machine (SVM) 0.848214
Decision Tree 0.876786
Logistic Regression 0.846429
K Nearest Neighbors (KNN) 0.848214

Best Performing Model: Decision Tree with 0.876786 accuracy, providing the most consistent and reliable predictions for launch success.


πŸ“Œ Conclusion

Launch Success This project successfully demonstrates how exploratory data analysis and machine learning techniques can be used to extract valuable insights from complex datasets and predict future outcomes. By focusing on key features such as launch sites, payload mass, and booster versions, the machine learning models were able to accurately predict the success of SpaceX launches, providing actionable insights for future missions.


πŸ› οΈ Technologies Used

  • Python 🐍: Core programming language used for data analysis and modeling.
  • Pandas, NumPy: Libraries for data manipulation and preprocessing.
  • Matplotlib, Seaborn, Plotly: Visualization libraries for EDA and interactive dashboards.
  • Scikit-learn: Used for implementing machine learning algorithms.
  • Dash: Web application framework for building interactive visualizations.
  • Jupyter Notebooks: For developing and documenting the analysis.

πŸ“˜ Extra Study Materials

Before diving into the labs, it’s highly recommended to review previous concepts and practice them locally on your machine. This not only enhances understanding but also helps reduce the usage of cloud trial versions.

πŸ“Š Visualization Libraries

  • Plotly - A versatile and powerful library for creating interactive visualizations.
  • Folium - Visualize geographical data with ease through interactive maps.

🌍 Folium Example Projects

  • Folium Examples - Explore practical examples of Folium in action for inspiration and guidance.

πŸ€– Data Science Concepts

  • Confusion Matrix - Master the basics of classification evaluation with this simple guide to confusion matrices.

βœ‰οΈ Contact

For more information or collaboration, feel free to reach out:

LinkedIn Badge GitHub Badge


Β© 2024 Angela Paola All Rights Reserved

🌟 Project Presentation

Canva Presentation

Click the image to view the full presentation on Canva.