Similar_Plot_Summaries

Project Overview

In this project, our goal was to analyze similarities among movies based on their plot summaries and perform hierarchical clustering to uncover patterns and relationships within the movie dataset. The project involved natural language processing (NLP) techniques, tokenization, stemming, and the application of the TF-IDF vectorization method.

Data Preprocessing

Data Loading:

Loaded movie data containing plot summaries from multiple sources, such as Wikipedia and IMDb.

Text Preprocessing:

Combined plot summaries from different sources into a single column ('plot') in the dataset.

Tokenized and stemmed the text to extract meaningful features from the plot summaries.

Utilized TF-IDF vectorization to represent each movie's plot as a numerical feature vector.

Similarity Analysis

Cosine Similarity:

Calculated the cosine similarity between each pair of movies based on their TF-IDF representations.

Produced a similarity matrix that quantifies the similarity between movies.

Hierarchical Clustering:

Applied hierarchical clustering using the complete-linkage method on the similarity matrix.

Generated a dendrogram to visualize the hierarchical structure of movie clusters.

Visualization and Interpretation

Dendrogram Plotting:

Plotted a dendrogram to represent the hierarchical clustering of movies.

Adjusted the plot to enhance readability, including setting leaf label colors.

Cluster Analysis:

Identified clusters of movies based on the dendrogram structure.

Explored relationships between movies within and across clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Similar_Plot_Summaries.ipynb		Similar_Plot_Summaries.ipynb
movies.csv		movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Similar_Plot_Summaries

Project Overview

Data Preprocessing

Data Loading:

Text Preprocessing:

Similarity Analysis

Cosine Similarity:

Hierarchical Clustering:

Visualization and Interpretation

Dendrogram Plotting:

Cluster Analysis:

About

Releases

Packages

Languages

sumedha3/Similar_Plot_Summaries

Folders and files

Latest commit

History

Repository files navigation

Similar_Plot_Summaries

Project Overview

Data Preprocessing

Data Loading:

Text Preprocessing:

Similarity Analysis

Cosine Similarity:

Hierarchical Clustering:

Visualization and Interpretation

Dendrogram Plotting:

Cluster Analysis:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages