This project consists of two Jupyter Notebook files for scraping YouTube data (youtube_scraper.ipynb
) and performing text classification (text_classification.ipynb
). The scraped data is stored in youtube_data.csv
.
Ensure you have the following dependencies installed to run the Jupyter Notebooks:
google-api-python-client
: Used for interacting with the YouTube Data API.pandas
: Required for handling and manipulating data.scikit-learn
: Used for machine learning and text classification tasks.matplotlib
andseaborn
: Used for visualizing evaluation metrics.python-dotenv
: Used for loading environment variables from a.env
file.
Install the dependencies using the following command:
pip install google-api-python-client pandas scikit-learn matplotlib seaborn python-dotenv
-
Create a .env file in the root directory of the project.
-
Add your YouTube Data API key to the .env file:
API_KEY=your_api_key_here
- Ensure there are no spaces around the equal sign.
-
Create and activate the virtual environment. If your virtual environment is named yt_scrape, use the following commands:
python -m venv yt_scrape source yt_scrape/bin/activate
- On Windows, use yt_scrape\Scripts\activate
-
Open and run
youtube_scraper.ipynb
to scrape YouTube data and save it toyoutube_data.csv
. -
Open and run
text_classification.ipynb
to perform text classification on the scraped data.
Display precision, recall, and F1 scores for each model:
| Model | Precision | Recall | F1 Score |
| --------------- | --------- | ------ | -------- |
| SVM | 0.90 | 0.89 | 0.89 |
| Random Forest | 0.89 | 0.89 | 0.89 |
| Naive Bayes | 0.85 | 0.81 | 0.80 |
| Neural Network | 0.88 | 0.86 | 0.86 |
Display confusion matrices for each model:
Vaibhav Srivastava
GitHub: ZeusSama0001
This project is licensed under the MIT License. See the LICENSE.txt file for more details.