Example of using LSTM RNN network to generate random TV scripts. In this project the Neural Network model will generate scripts for a popular TV series, Seinfeld. It gets trained from the scripts of season 9 and try to generate some randomd dialogues. Question is how much those random dialogues makes sense!
dlnd_tv_script_generation.ipynb
is the main notebook.
- Seinfeld season 9 script (https://www.kaggle.com/thec03u5/seinfeld-chronicles#scripts.csv)
- helper.py (originally supplied in Udacity project)
- problem_unittest.py (originally supplied in Udacity project)
Following is a part of script generated by model. The script is saved in generated_script_1.txt file. Here goes a snippet of that,
elaine:(leaving) hey.
jerry: what is that?
elaine: what happened to the last story?
george: i thought i could do this.
kramer: i think i wanted a little effeminate.
george: i have to say that.
george: i was convinced.
jerry: yeah, i thought we were gotten dating in the contest.
elaine: i was afraid i sent to say you on.
elaine: hey!
george: what?
george: i mean, i don't understand.(starts opening the sleeve employee, and the lopper skank, and i was convinced.
You see the generated script is not contextually perfect but the sentence constructions are not that bad.
The steps were instructed by the Udacity course instructor. My responsibility was to implement the corresponding functions.
Creates a numeric identifier of the vocabulary and returns two follwing maps,
- word -> identifier
- identifier -> word
Was to replace the puctuation with special keywords.
Convert the whole training script into training dataset which will be fed to the model in multiple batches. Based on the given sequence_length
I use DataLoader
from pytorch to create the batches
The Neural Network model comprises:
- Embedding layer
- LSTM RNN layer
- Linear classifcation layer In the notebook you will see how I tried with different hyperparamters. The best result produced a loss score 2.88 over the last batch.
Finally used the generate
function to generate another random script by using the trained model.