Covid19Search

This repository contains source code for searching covid-19 relevant papers based on the COVID-19 Open Research Dataset (CORD-19). The repository also provides a solution to the tasks in COVID-19 Open Research Dataset Challenge on Kaggle (CORD-19). Update: 2020-04-14.

Features

Support multiple bag-of-words models (count, tf-idf, bm25).
Support semantic search models such as fasttext, glove.
Enable to combine the aforementioned two types of models.
Provide a live web application that can customize models for end-users.

Quick Start

git clone https://github.com/wangcongcong123/covidsearch.git
cd covidsearch
pip install -e .

from cord import *

# make sure put the paper collections (four .tar.gz files) and medataset csv file under the dataset_folder
dataset_folder = "dataset/"
# load metadata and full texts of papers
metadata = load_metadata_papers(dataset_folder, "metadata.csv")
full_papers = load_full_papers(dataset_folder)
# full_input_instances include title, abstract, body text
full_input_instances = [(id_, metadata[id_]["title"], metadata[id_]["abstract"], body) for id_, body in
                        full_papers.items() if id_ in metadata]
tfidf_model = FullTextModel(full_input_instances, weights=[3, 2, 1], vectorizer_type="tfidf")
query = "covid-19 transmission characteristics"
top_k = 10
start = time.time()
results = tfidf_model.query(query, top_k=top_k)
print("Query time: ", time.time() - start)
# around 0.3 s after re-run (the first time runs more time for object serilisation)

Examples

Bag-of-words search # include count, tf-idf, and bm25 (examples/full_text_run.py).
Embedding-based search # include fasttext, glove (examples/embedding_run.py).
Model Combinations # combination of the aforementioned two types (examples/ensemble_run.py).
Pre-train Insights # pre-train insights based on the tasks in kaggle. (examples/insight_from_scratch.py).
Insights Extraction # load pre-trained insights by the kaggle tasks. (examples/insight_extract.py).

Try to run python examples/insight_extract.py where a pre-trained insights file is loaded and presented to you. If you do not want to use the pre-trained insights, you can pre-train it from scratch by python examples/insight_from_scratch.py. (have a look at this file to customize the pre-training process).

Start as a web server

Here just demonstrating pre-trained insights as an example. For customisation (query search), have a hack on app.py and templates/layout.html to easily figure out. Make sure you download the metadata.csv from CORD19 dataset and put it under ./dataset first, then enter:

python app.py

Go browser via http://127.0.0.1:5000, the web application is as follows.

Server as service

The server can also be requested in a cross-origin way.
You send a GET/POST request to obtaining insights by task name.
A GET request example is like this: http://127.0.0.1:5000/kaggle_task?task_name=task1.
A POST request example is like this: curl -i -X POST -H "Content-Type: application/json" -d "{\"task_name\":\"task1\"}" http://127.0.0.1:5000/kaggle_task.
Adapt these to Ajax GET/POST request in your case where you want to embed it to your front-end web html pages!
Try the live one: https://www.thinkingso.cf/kaggle_task?task_name=task1

Contributions

Feedback and pull requrest are welcome for getting the project better off.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
cord		cord
dataset		dataset
examples		examples
models_save/sentencesearch		models_save/sentencesearch
pics		pics
static		static
templates		templates
venv		venv
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements		requirements
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid19Search

Features

Quick Start

Examples

Start as a web server

Server as service

Contributions

About

Releases

Packages

Languages

License

wangcongcong123/covidsearch

Folders and files

Latest commit

History

Repository files navigation

Covid19Search

Features

Quick Start

Examples

Start as a web server

Server as service

Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages