Keywords

Algorithms for extracting keywords from titles of Scientific Articles

By combining the Natural Language Toolkit (NLTK) package, the Levenshtein algorithm and an ad-hoc algorithm, this script can:

Given a list of Scientific Articles titles, extract potential good keywords from titles;
Select the best keywords by looking at their relative frequency, and use them to create a thematic network of scientific publications.

This was written to scale well up to tens of millions of article titles, and millions of keywords. A few optimizations to the algorithm will be added in the following weeks.

This is just a beta project, you can find a visualization of a graph constructed using this algorithm here. Thanks to Anvaka for the excellent visualization engine!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Keywords

Algorithms for extracting keywords from titles of Scientific Articles

Files

README.md

Latest commit

History

README.md

File metadata and controls

Keywords

Algorithms for extracting keywords from titles of Scientific Articles