Skip to content

sysblok/corpus_golunov_articles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpus of Ivan Golunov's articles

crawled from Meduza media agency website

Meduza correspondent Ivan Golunov was arrested on June 6, 2019, in central Moscow. He was charged with attempting to sell narcotic substances. Meduza’s editorial board as well as representatives of the Russian and international journalism communities believe that Ivan is being persecuted due to his investigative work.

Shortly after that Meduza opened access to Ivan's work under the Creative Commons CC BY 4.0 license.

Here is a collection of articles he has written for Meduza.

Upd. Ivan is free!

Action required!

  • Sign the petition.
  • Star this repository :)

Usage

All the texts are located in the /corpus folder.

If you wish to feel more like a programmer (and maybe add some more text preprocessing), here is what you should do:

  • Install Python 3 if you don't have Python;
  • install Scrapy;
  • run cd crawler && scrapy crawl articles.

Releases

No releases published

Packages

No packages published

Languages