Skip to content

The PaperCrawler is for crawling papers of ACL Anthology.

Notifications You must be signed in to change notification settings

Se-Hun/PaperCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

PaperCrawler

The PaperCrawler is for crawling papers of ACLAnthology.

This version supports crawling of titles, authors, links, and years of papers published in Top 4 NLP conference(ACL, EMNLP, CONLL, COLING).

If you wanna other attributes or conferences, please modify my code and make a pull request. Thank you!

To install ChromeDriver

From Chrome Web Driver link, Install the chromedriver appropriate for your operating system and the version of the Chrome web browser,

Save the chromedriver.exe(in the case of windows) file in the path of /chromedriver.

Install libraries for crawling

  • pip install selenium
  • pip install tqdm
  • pip install pandas
  • pip install openpyxl

Run

python run_crawler.py --year CONFERENCE_YEAR \
                      --output_dir OUTPUT_DIR_PATH \
                      --chrome_driver_path CHROME_DRIVER_PATH

About

The PaperCrawler is for crawling papers of ACL Anthology.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages