Sentiment-Driven Video Recommendations

End-to-end Personalized Video Recommendation System driven by Natural Language Processing (NLP) and Machine Learning (ML), with recommendations based on:
- Video Content: TF-IDF analysis of video transcripts.
- Sentiment Analysis: Comment sentiment score from RoBERTa (Huggingface).
- Clustering: Unsupervised ML for grouping videos.
Methodology:
- Constructs a Cosine Similarity Matrix (TF-IDF).
- Enhances with Sentiment and Clustering Scores for refined recommendations.

Link to website

Final Score Calculation

The final score is calculated as follows:

$$ \text{Final Score} = \text{Cosine Similarity Matrix} \times \text{Sentiment Score} \times \text{Cluster Boost} $$

where:

$$ \text{Sentiment Score} = 1 + \text{Weighted Sentiment Score per Video} $$

$$ \text{Cluster Boost} = 1.2 $$

Project Schema:

Folder structure

Sentiment-Driven-Video-Recommendations/
├── assets/                                           # Auxiliary files and resources (e.g., images for documentation)
├── data/                                             # Data folder
│   ├── clean/                                        # Data from Sentiment Analysis through PySpark
│   ├── clean_data/                                   # Processed data 
│   ├── raw_data/                                     # Unprocessed, original data from Youtube API
│   └── README.md                                     # Explanation of the data folder contents
├── docker_app/                                       # Docker-related files and application code
│   ├── __pycache__/                                  # Python cache for compiled files
│   ├── Dockerfile                                    # Instructions for building the Docker image
│   ├── final_score_matrix.joblib                     # Precomputed final score matrix for recommendations
│   ├── main_app.py                                   # Main application script
│   ├── readme.rst                                    # Documentation for Docker app setup
│   └── requirements.txt                              # Python dependencies for the project
└── notebooks/                                        # Jupyter notebooks for analysis and modeling
    ├── 0_fetch_and_clean_data_youtube_api.ipynb      # Fetch and clean data from YouTube API
    ├── 2_emotion_analysis_pyspark.ipynb              # Perform sentiment analysis using PySpark
    ├── 3_clustering.ipynb                            # Clustering analysis for video grouping
    └── 4_tfidf_matrix_and_model_pipeline.ipynb       # Build TF-IDF matrix and model pipeline

Key Features

1️⃣ Content-Based Filtering:

Built a TF-IDF matrix from video transcriptions.

$$ \text{TF}(t, d) = \frac{\text{Number of times term } t \text{ appears in document } d}{\text{Total number of terms in document } d} $$

$$ \text{IDF}(t) = \log\left(\frac{N}{|{d \in D : t \in d}|}\right) $$

$$ \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) $$

Applied Cosine Similarity to identify similar videos.

2️⃣ Sentiment-Weighted Recommendations:

💬 Incorporated sentiment analysis to refine similarity scores.
📊 Integrated video statistics (view count, like count, comment count) into the final recommendation score.

3️⃣ Clustering:

DBSCAN and K-MEANS for clustering videos based on their features.
Prioritized videos from the same cluster in the recommendation process.

Data

Datasets from YouTube API

Conducted research through the YouTube API with queries about 'Artificial Intelligence':

"What is artificial intelligence?"
"Artificial intelligence applications in healthcare"
"AI in autonomous vehicles"
"Machine learning vs deep learning"
"Artificial intelligence in finance"
"How does AI work?"
"Top AI tools for data science"
"Artificial intelligence in robotics"

Gathered Datasets:

df_videos: Video data including view count, like count, comment count, and more.
df_comments: Comment data with sentiment analysis and engagement metrics.
df_channels: Channel-level data including subscriber count and total video views.
df_categories: Categorical data related to video genres and types.

Technologies Used

Python, PySpark
Natural Language Processing (NLTK, Google Translate)
Machine Learning (RoBERTa LLM, TF-IDF, K-Means, DBScan, PCA)
Docker
FastAPI
Google Cloud (Cloud Storage, Cloud Run)

Installation

1. Clone the repository:

git clone https://github.com/ivanseldas/Sentiment-Driven-Video-Recommendations.git

2. Navigate to the project directory:

cd Sentiment-Driven-Video-Recommendations

Future Work

Precision@K, Recall@K, F1-Score: The recommendation system achieves high relevance in its top-K recommendations, indicating a well-tuned model.
Clustering Insights: The DBSCAN clustering effectively groups similar videos, enhancing the recommendation diversity.
Explore Supervised Learning: Implement supervised models for further improving recommendation accuracy.
A/B Testing: Deploy the system in a real-world setting for user feedback and further refinement.
Scalability: Optimize the system for larger datasets and real-time recommendations.

Contributors

Ivan Seldas Perulero

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Driven Video Recommendations

Final Score Calculation

Project Schema:

Folder structure

Key Features

1️⃣ Content-Based Filtering:

2️⃣ Sentiment-Weighted Recommendations:

3️⃣ Clustering:

Data

Datasets from YouTube API

Gathered Datasets:

Technologies Used

Installation

1. Clone the repository:

2. Navigate to the project directory:

Future Work

Contributors

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
assets		assets
data		data
docker_app		docker_app
notebooks		notebooks
presentation		presentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

ivanseldas/Sentiment-Driven-Video-Recommendations

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Driven Video Recommendations

Final Score Calculation

Project Schema:

Folder structure

Key Features

1️⃣ Content-Based Filtering:

2️⃣ Sentiment-Weighted Recommendations:

3️⃣ Clustering:

Data

Datasets from YouTube API

Gathered Datasets:

Technologies Used

Installation

1. Clone the repository:

2. Navigate to the project directory:

Future Work

Contributors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages