One more thing though, I've removed ROUGE function for evaluation of model, pyrouge library seems to be have been deprecated. Please raise an issue if you must know how to still use that evaluation metric, i'll guide you out.
It consists of both articles and summaries of those articles. Some of the articles have multi line summaries also.
Summarize by identifying “top” sentences based on word frequency.
Preprocess the text by removing numbers,wihte spaces, stopwords and punctuation. And to tokenize all words in the document
Calculate the frequency for every token in the document
Sum of word frequency of every word in the sentence and top ‘n’ sentences are selected on highest sentence scores.
The model may attend to relevant words in the source text to generate novel words.
The pointer-generator network does a better job at copying words from the source text. Additionally it also is able to copy out-of-vocabulary words allowing the algorithm to handle unseen words even if the corpus has a smaller vocabulary.
For carry out Summarization, implementation of pre-trained weights is used generated by training model.
Carry out the help from the TextSum Google Tensorflow research module. And successfully converted for Tensorflow 1.11+ and changed the hyper-parameters for better accuracy
• LSTM hidden Units :256
-
Well documentation in the original pointer generator pretty much takes care of it all here, but in this model, you need to train without coverage for first 600k iteration, and then trained for next 25k iteration with coverage, that should pretty much get you the result.
-
Next thing, there was a lot of parameters you've to mention in terminal while running, so I took liberty of making all of it default, if you must change the command you can do so by changing it in the code!
-
Also, IMPORTANT FOR DECODING, you've to uncomment the lines of decode in last paragraph of run_summarization.py, raise an issue if you can't figure it out, i'll help you solve it.
-
Again, I've previously mentioned how to make your own dataset out of your text file, you can use bin_vocab_creation.py file to do so!!!
-
it is decendent of the from TextSum Google Tensorflow Reasearch module here