This GitHub repository showcases the EDA and ML models performed on the Titanic's Kaggle competition datasets (https://www.kaggle.com/competitions/titanic/data).
The main goal of this project is to predict which passengers survived the Titanic shipwreck. The dataset is divided into two parts: train.csv and test.csv. The train.csv dataset contains the details of a subset of the passengers on board (891 passengers), whereas the test.csv dataset contains the details of a different subset of passengers (418 passengers). The train.csv dataset is used to build the machine learning models. For the passengers in the test.csv dataset, the outcome (whether or not the passenger survived) is withheld.
The project is organized in the following way:
- code: contains the code for the EDA and ML models.
- data: contains the datasets used in the project. The train.csv dataset is used to build the machine learning models. For the passengers in the test.csv dataset, the outcome (whether or not the passenger survived) is withheld.
- output: contains the output of the code (e.g., plots, tables, etc.)
The material can be freely used for teaching purposes. If you use it, please cite it as follows:
@misc{titanicKaggleML,
author = {Giulia Solinas},
title = {TitanicKaggleML: Use case and main notes},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/GiuliaSolinas/TitanicKaggleML}}
}