Skip to content
/ Time Public

Time Magazine Scraper, Text Extraction (OCR), and Data Exploration with Topic Modelling

License

Notifications You must be signed in to change notification settings

The-Gupta/Time

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Time Magazine

Time Magazine Scraper, Text Extraction (OCR), and Data Exploration with Topic Modelling

01.ipynb: Code
Open in Colab to explore the topics (and their dominant terms) or run the code.

Part 1 : Scraping from Time Vault from 1923-2015.
Scraped Data

Part 2: Text Extraction with Tesseract OCR.
Currently, the text is extracted only from 2000-2015, since the process is slow.
And yes, extracted text has lots of noise.

Part 3: Data Exploration with Topic Modelling.
TODO: For all years, and interpretation.