Skip to content

Repository to detect scientific software in documents for Chan Zuckerberg Initiative workshop

License

Notifications You must be signed in to change notification settings

DS4SD/deepsearch4czi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSearch for CZI Hackathon

This repository was made during a workshop at the Chan Zuckerberg Initiave in order to showcase how Deep Search can effectively be used to extract software/github mentions in scientific literature.

Getting started

First, authenticate with the open-access DS PDF converter,

deepsearch profile config --profile-name "ds-experience" --host "https://deepsearch-experience.res.ibm.com/" --verify-ssl --username "<your-email-address>"

Running

To convert PDF documents in the folder data,

poetry run python ./ds4czi/convert_pdfs.py -i ./data

To search and download articles (mentioning "Yolov5") in the Arxiv and storing them in folder ./data/Yolov5,

poetry run python ./ds4czi/search_articles.py -i arxiv -q Yolov5 -o ./data/Yolov5

To extract software mentions from a set of converted documents,

poetry run python ./ds4czi/extract_software.py -i ./data/Yolov5/arxiv/json/

About this project

This repository was developed as part of the Mapping the Impact of Research Software in Science hackathon hosted by the Chan Zuckerberg Initiative (CZI). By participating in this hackathon, owners of this repository acknowledge the following:

  1. The code for this project is hosted by the project contributors in a repository created from a template generated by CZI. The purpose of this template is to help ensure that repositories adhere to the hackathon’s project naming conventions and licensing recommendations. CZI does not claim any ownership or intellectual property on the outputs of the hackathon. This repository allows the contributing teams to maintain ownership of code after the project, and indicates that the code produced is not a CZI product, and CZI does not assume responsibility for assuring the legality, usability, safety, or security of the code produced.
  2. This project is published under a MIT license.

Code of Conduct

Contributions to this project are subject to CZI’s Contributor Covenant code of conduct. By participating, contributors are expected to uphold this code of conduct.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting the repository owner via the ‘security’ tab above.

About

Repository to detect scientific software in documents for Chan Zuckerberg Initiative workshop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages