song2vec is a Telegram bot that recommends YouTube songs through gensim's word2vec model.
=================
Feature requests and bug reports are welcome, please open an issue.
Justin Bieber + Backstreet Boys + Ice Cube + Lil Jon = Justin Bieber with rappers.
COMMAND SYNTAX:
Simply type /rec followed by a comma-separated list of artists.
EXAMPLE:
/rec Metallica, Nirvana, Pink Floyd, Iron Maiden, Ice Cube, Bob Marley, Rolling Stones, U2
You can run song2vec from your own computer.
virtualenv song2vec_env -p `which python3.5`
cd ./song2vec_env/bin
source activate
./pip3.5 install datetime gensim numpy python-telegram-bot sympy yapi
cd ..
git clone https://github.com/ruanchaves/song2vec.git
cd ./song2vec/song2vec
bash install.sh
After that you just have to edit settings.py with your Youtube and Telegram API keys. If you don't have them yet:
-
Get API Key for Telegram - Simply follow the "Set up your bot" section until you get the API Key ( you won't have to manually set up a bot server ).
Then you can turn on the bot with:
python3.5 s2v_bot.py
Currently the bot takes recommendations from a gensim word2vec model and that's all there's to it.
It's been trained on The Echo Nest Taste Profile Subset taken from the Million Song Database. The Song IDs were matched to author and title according to this file.
Some tricks I learned along the way:
-
This is not NLP, so we shouldn't use gensim's default parameters. Otherwise recommendations will be twice as bad.
-
Calling
model.wv[word]
for every word is painfully slow. It's much faster to do...model_words = list(model.wv.index2word) model_vectors = list(model.wv.syn0) model_dct = dict(zip(model_words,model_vectors))
...and call model_dct[word]. It's there on the source code.
- train.py has to be rewritten as parallel code.
- The model has to be further tested and fine-tuned to the dataset.