bagging-with-kaggle

General Machine Learning Pipeline with Bagging Classifier

This project implements a machine learning pipeline to analyze and predict survival on the Titanic dataset. It leverages a Bagging Classifier with decision trees to enhance model robustness. The project includes data preprocessing, model training, evaluation, and saving the results.

Features

Interactive column selection for preprocessing.
Handles missing values and encodes categorical variables.
Implements Bagging Classifier with decision trees.
Exports the processed dataset to .csv and Google Sheets.

Prerequisites

Before running the code, ensure you have the following:

Python 3.8+
Kaggle API credentials for downloading the dataset.
Necessary Python libraries (see requirements.txt).
Access to Google Sheets API (if using the csv_to_sheets function).

Instalation

1. Clone this repository

git clone https://github.com/<your-username>/<repository-name>.git 
cd <repository-name>

2. Intall required Python libraries

pip install -r requirements.txt

3. Set up the Kaggle API:

Download your kaggle.json file from Kaggle API.
Place it in the appropriate directory (~/.kaggle on Unix or %USERPROFILE%.kaggle on Windows).

4. Configure Google Sheets API:

Follow Google Sheets API documentation to set up credentials.
Place the credentials in the project directory.

Usage

1. Run the main script:

python bagging.py

Fetch a dataset from Kaggle:

When prompted, enter a search term to find datasets on Kaggle (e.g., "Titanic", "Housing Prices").
A list of datasets matching your search will be displayed. For example:

Enter the number corresponding to the dataset you want to download.

3. Specify a folder to save the dataset:

Enter a name for a new folder where the dataset will be downloaded and unzipped.

4. Dataset selection:

If the downloaded dataset contains multiple .csv files, the script will load the first .csv file by default.
The dataset is automatically loaded into a Pandas DataFrame.

5. Follow the prompts in bagging.py:

Interactively select columns for analysis.
Handle missing values automatically.
Specify the target column (dependent variable).

6. Model Training and Evaluation:

The script splits the data into training and testing sets.
Trains a Bagging Classifier using decision trees.
Accuracy on the test set is displayed in the console.

7. Save Results:

Processed data is saved as a .csv file in the save/ directory.

- Optionally, upload the dataset to Google Sheets using the *google_sheets_utils.py* script.

8. Create a Looker Studio with googlesheets:

Looker Studio: Example - Titanic Survival Rate

Acknowledgments

Kaggle datasets: Kaggle Datasets.
Scikit-learn: Scikit-learn Documentation.
Pandas: Pandas Documentation.
Curses: Curses Documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
images		images
.gitignore		.gitignore
README.md		README.md
bagging.py		bagging.py
google_sheets_utils.py		google_sheets_utils.py
kaggle_connect.py		kaggle_connect.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bagging-with-kaggle

General Machine Learning Pipeline with Bagging Classifier

Features

Prerequisites

Instalation

1. Clone this repository

2. Intall required Python libraries

3. Set up the Kaggle API:

4. Configure Google Sheets API:

Usage

1. Run the main script:

Fetch a dataset from Kaggle:

3. Specify a folder to save the dataset:

4. Dataset selection:

5. Follow the prompts in bagging.py:

6. Model Training and Evaluation:

7. Save Results:

8. Create a Looker Studio with googlesheets:

Acknowledgments

About

Releases

Packages

Languages

CamilaJaviera91/bagging-with-kaggle

Folders and files

Latest commit

History

Repository files navigation

bagging-with-kaggle

General Machine Learning Pipeline with Bagging Classifier

Features

Prerequisites

Instalation

1. Clone this repository

2. Intall required Python libraries

3. Set up the Kaggle API:

4. Configure Google Sheets API:

Usage

1. Run the main script:

Fetch a dataset from Kaggle:

3. Specify a folder to save the dataset:

4. Dataset selection:

5. Follow the prompts in bagging.py:

6. Model Training and Evaluation:

7. Save Results:

8. Create a Looker Studio with googlesheets:

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages