This project is designed to facilitate the search for datasets on https://www.kaggle.com. By entering a keyword, you can access a repository that displays various related dataset options, allowing you to select the most suitable one for your needs.
Before you begin, make sure you have the following installed:
- Python 3.8 or higher
- Pip (Python package manager)
This file requires the following dependencies to function:
- kaggle.api.kaggle_api_extended (imported as KaggleApi): required to interact with the Kaggle API.
- pandas: used for data manipulation and analysis.
- pathlib (imported as Path): useful for handling file and directory paths.
- curses: to create text-based user interfaces (TUIs) in the terminal.
This file depends on the following libraries:
- pandas: for data manipulation and analysis.
- matplotlib.pyplot: for creating visualizations and plots.
- numpy: for numerical calculations and matrix operations.
- sklearn.preprocessing (specifically OrdinalEncoder): for encoding categorical variables ordinally.
- sklearn.linear_model (specifically LinearRegression): to implement linear regression models.
- kaggle_connect: an internal module of the project, defined in the kaggle_connect.py file.
- curses: to create text-based user interfaces (TUIs) in the terminal.
- Search for and download a relevant dataset from Kaggle.
- Select a descriptive variable from the dataset for analysis.
- Calculate the average of the selected values as a statistical measure.
- Assign a name to the descriptive variable
- Create a line chart to visualize the relationship between two variables (X, Y).