Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 923 Bytes

README.md

File metadata and controls

13 lines (8 loc) · 923 Bytes

Keywords

Algorithms for extracting keywords from titles of Scientific Articles

By combining the Natural Language Toolkit (NLTK) package, the Levenshtein algorithm and an ad-hoc algorithm, this script can:

  1. Given a list of Scientific Articles titles, extract potential good keywords from titles;
  2. Select the best keywords by looking at their relative frequency, and use them to create a thematic network of scientific publications.

This was written to scale well up to tens of millions of article titles, and millions of keywords. A few optimizations to the algorithm will be added in the following weeks.

This is just a beta project, you can find a visualization of a graph constructed using this algorithm here. Thanks to Anvaka for the excellent visualization engine!