Webscraper to archive EEG papers

Two scripts to:

obtain the URLs of all PDFs on the EEG website
download all the PDFs to a local folder

Install dependencies

conda create -n python=3.12 requests requests_cache beautifulsoup4

Now run the scripts::

python scrape.py

Then::

python get_pdf.py

You should see a folder webscraping containing all the PDF files. Then there's a log file app.log which should contain a bunch of debugging messages. Then metadata.csv which contains all of the details about the files scraped from the site including title, publication date, summary and authors.

Dependencies

beautifulsoup4
requests_cache

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
get_pdfs.py		get_pdfs.py
readme.md		readme.md
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webscraper to archive EEG papers

Dependencies

About

Releases

Packages

Languages

License

ClimateCompatibleGrowth/scrape_eeg

Folders and files

Latest commit

History

Repository files navigation

Webscraper to archive EEG papers

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages