Skip to content

uh-dcm/news-article-collection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News article collection infrastructure

  1. Periodically check if there are any new items in news RSS feeds.
  2. If there are, store new article URL.
  3. Download articles, store automated extraction of news stories as well as the full HTML.

Inputting feeds

Place the RSS feeds in "feeds.txt" inside a "data" folder in the repository, each on their own line.

Cronjob example

*/5 * * * * cd ~/news-article-collection/; python3 collect.py
*/30 * * * * cd ~/news-article-collection/; python3 process.py

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages