Project for DD2477. A search engine of a Spotify podcasts dataset, using Elasticsearch and an interactive interface.
The best way to get an Elasticsearch framework is through Docker. Obtaining Elasticsearch for Docker is as simple as issuing a docker pull command against the Elastic Docker registry.
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.7.1
The following commands start a single-node Elasticsearch cluster for development or testing.
Create a new docker network for Elasticsearch and Kibana
Start Elasticsearch in Docker. A password is generated for the elastic user and output to the terminal, plus an enrollment token for enrolling Kibana
docker run --name es01 --net elastic -p 9200:9200 -it docker.elastic.co/elasticsearch/elasticsearch:8.7.1
Copy your ELASTIC_PASSWORD
and the http_ca.crt
security certificate from your Docker container to your local machine.
- Have ElasticSearch running;
- Put
http_ca.crt
under the folder; - create a
pwd.txt
under the folder and paste your elasticsearch password to it. - Change the metadata and data directory to your own;
- run index.py
- Run get_text.py, which would generate a txt file
padcast_text.txt
containing all the texts from the transcripts. - Run random_indexing.py, which would create a
vocab.txt
file andri.txt
file containing the dictionary of embeddings of all words;