primed-pgs-queries

This repository contains PRIMED queries of the PGS Catalog using the PGS Catalog API.

Available workflows

Map PubmedIDs to the PGS catalog

The map_pubmed_ids_to_pgs_catalog directory contains code to map a list of PubmedIDs to the PGS catalog.

The query_pgs_by_pmids.py python script can be used to map a list of PubmedIDs to the PGS catalog. The script requires either a csv file containing a list of PubmedIDs (one per line) or a URL pointing to such a file. The script will output three json files in the specified output directory (--outdir), which contain the PGS catalog information for the records:

pubs_records.json: A list of publications that have been mapped to the input PubmedIDs.
score_records.json: A list of PGS scores that have been mapped to the input PubmedIDs.
metrics_records.json: A list of PGS metrics that have been mapped to the input PubmedIDs.

The script can be run using the following command:

python3 query_pgs_by_pmids.py --pmid-file test_input.csv --outdir test_output

Once you have the mapping output, you can generate a report about the matches in R.

input <- list(
    "score_records_file" = "test_output/score_records.json",
    "metrics_records_file" = "test_output/metrics_records.json",
    "publication_records_file" = "test_output/pubs_records.json"
)
rmarkdown::render("map_pubmed_ids_to_pgs_catalog/query_pgs_by_pmids.Rmd", params=input)

A WDL workflow is also provided on Dockstore and as a .WDL file.

Calculate overlap between PGS Catalog scores and a set of variants

The pgs_variant_overlap directory contains code to calculate the overlap between PGS Catalog scores and a set of variants. The code relies on PGS Catalog utilities provided by pygscatalog.

If you would like to calculate overlap with all scores, the create_score_files.py script will query PGS catalog for scores and group the scores into bins with the specified number of variants. Scores can be optionally included or excluded by passing the --include or --exclude arguments.

python3 create_score_files.py --output-dir test_output --variants-per-batch 1000

To calculate overlap for a set of scores:

Download the scoring files from the PGS catalog and combine them.

pgscatalog-download --pgs PGS000004 PGS000005 --build GRCh38 --outdir output_dir
pgscatalog-combine -s test_output/PGS*.txt.gz -t GRCh38 -o test_output/combined.txt.gz

Match variants in set of input variants to the combined scoring file. The target variants file must be in .bim format.

pgscatalog-match --dataset primed --target <input_variants> --scorefiles test_output/combined.txt.gz --outdir output_dir --only_match

Calculate overlap between the set of input varants and the variants in the scoring files using the calculate_overlap.Rmd Rmarkdown document.

rmarkdown::render(
    "calculate_overlap.Rmd",
    params=list(
        matches_file="test_output/0.ipc.zst",
        combined_scoring_file="test_output/combined.txt.gz",
        output_file="test_output/overlap_fraction.txt"
    )
)

Render a report of overlaps using overlap_report.Rmd Rmarkdown file using the output.

rmarkdown::render(
    "overlap_report.Rmd",
    params=list(
        overlap_file="test_output/overlap_fraction.txt"
    )
)

A WDL workflow is also provided on Dockstore and as a .WDL file.

Developer info

Generating the API client

Install swagger-codegen.
Generate the client.
Copy the client to the top-level directory
Make sure to include the client requirements in the project requirements file.

The following code can be used:

# Generate.
swagger-codegen generate -i https://www.pgscatalog.org/static/rest_api/openapi/openapi-schema.yml -l python -o tmp --config swagger_codegen_config.json

# Copy
cp -r tmp/pgs_catalog_client .

# Update requirements.
cp tmp/requirements.txt requirements/client-requirements.in

Building and pushing the docker image

Push all changes to the repository. Note that the Docker image will build off the "main" branch on GitHub.
Build the image. Make sure to include no caching, or else local scripts will not be updated.
```
docker build --no-cache -t uwgac/primed-pgs-queries:X.Y.Z .
```

Push the image to Docker Hub.

docker push uwgac/primed-pgs-queries:X.Y.Z

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
map_pubmed_ids_to_pgs_catalog		map_pubmed_ids_to_pgs_catalog
pgs_catalog_client		pgs_catalog_client
pgs_variant_overlap		pgs_variant_overlap
requirements		requirements
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
map_pubmed_ids_to_pgs_catalog.json		map_pubmed_ids_to_pgs_catalog.json
map_pubmed_ids_to_pgs_catalog.wdl		map_pubmed_ids_to_pgs_catalog.wdl
pgs_variant_overlap.json		pgs_variant_overlap.json
pgs_variant_overlap.wdl		pgs_variant_overlap.wdl
primed_logo.png		primed_logo.png
style.css		style.css
swagger_codegen_config.json		swagger_codegen_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

primed-pgs-queries

Available workflows

Map PubmedIDs to the PGS catalog

Calculate overlap between PGS Catalog scores and a set of variants

Developer info

Generating the API client

Building and pushing the docker image

About

Releases

Packages

Languages

License

UW-GAC/primed-pgs-queries

Folders and files

Latest commit

History

Repository files navigation

primed-pgs-queries

Available workflows

Map PubmedIDs to the PGS catalog

Calculate overlap between PGS Catalog scores and a set of variants

Developer info

Generating the API client

Building and pushing the docker image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages