Movie Recommendations with Embeddings

Recommends the k most similar movie(s) after their plot texts' similarities.

5000 American movies are selected from a wiki dataset (see in Credits). For each movie plot, I created a text embedding with OpenAI's "text-embedding-3-small" model.

Text embeddings measure the relatedness of text strings by turning the texts into high-dimentional vectors of floating point numbers. The distance between two vectors measures their relatedness: small distances suggest high relatedness and large distances suggest low relatedness.

To list movie recommendations for a selected movie, I selected the records with the smallest vector distances.

NOMIC Atlas Map Visuals

By visualising the high-dimensional text embeddings in a 2D map with the help of NOMIC Atlas, we can see distinguishable clusters.

https://atlas.nomic.ai/data/csernusszilvi/experimental-arora/map

How to run this project?

Prerequisites:
- Make sure Python3 is installed.
- If you don't have an account with OpenAI, create one here: https://openai.com/
- Create a project API key under Dashboard / API keys
- Create a NOMIC Atlas account here: https://atlas.nomic.ai/
Clone the project. - Be aware that the project includes the original dataset I used (wiki_movie_plots_deduped.csv) as well as the cached, movie_embeddings.pkl file which are 81MB and 86MB in size, respectively. Assuming you choose to run the embedding function with the same parameters as in the project, the cache file would help avoid charges from OpenAI,. If you plan to use the embedding function for a different dataset / model, downloading these files won't be neccessary.
Create a virtual environment inside the project folder:

python -m venv venv
Activate the virtual environment:

Mac: source venv/bin/activate

Windows: venv\Scripts\activate
Select interpreter in VSCode:

(on Mac) Cmd + Shift + P ---> Select Interpreter ---> Select the created venv environment
Create an .env file in the root folder and add your project's API key:
```
OPENAI_API_KEY=your-unique-opanai-project-key
```
Install the python dependencies:

pip install -r requirements.txt
Log in into NOMIC Atlas
- In the terminal: run nomic login,
- click the link to retrieve your API KEY then return to the terminal to run nomic login <your-api-key> to get authenticated.
Run the Jupyter Notebook:
- jupyter notebook command will open the Notebook in the browser.
- Run the commands in the given order in the movies-embedding.ipynb file, adjusting the models and cost calculations as neccessary.
- I used caching when I ran the embedding function myself. The cached pickle file, movie_embeddings.pkl is part of this project folder. If you don't change the dataset or the text-embedding model, you won't be charged as the embedding function will use the cached data whenever it's available.
- Be aware that you'll be charged by OpenAI for running the embedding function if you use a different dataset and / or embedding model.

Credits

This project was adopted from Colt Steele's Walkthrough project on Udemy: Mastering OpenAI Python APIs.

Changes made: My code and logic are significantly different from Colt's version, I used updated APIs and made improvements to the code's logic.
Original dataset: https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots?resource=download

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
clusters.png		clusters.png
movie_embeddings.pkl		movie_embeddings.pkl
movies-embedding.ipynb		movies-embedding.ipynb
recommendations-image.png		recommendations-image.png
requirements.txt		requirements.txt
simple-embedding.ipynb		simple-embedding.ipynb
wiki_movie_plots_deduped.csv		wiki_movie_plots_deduped.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendations with Embeddings

NOMIC Atlas Map Visuals

How to run this project?

Credits

About

Releases

Packages

Contributors 2

Languages

szilvia-csernus/movie-recommendations-from-embeddings

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendations with Embeddings

NOMIC Atlas Map Visuals

How to run this project?

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages