Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USAN STEM Data Cleaning #1018

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions scripts/biomedical/STEM/scripts/READEME.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Importing USAN STEM

## Table of Contents

1. [About the Dataset](#about-the-dataset)
1. [Download URL](#download-url)
2. [Database Overview](#database-overview)
3. [Schema Overview](#schema-overview)
4. [Notes and Caveats](#notes-and-caveats)
5. [License](#license)
6. [Dataset Documentation and Relevant Links](#dataset-documentation-and-relevant-links)
2. [About the Import](#about-the-import)
1. [Artifacts](#artifacts)
2. [Import Procedure](#import-procedure)
3. [Test](#test)


## About the Dataset

### Download URL

1. [usan.xlsx](https://www.ama-assn.org/system/files/stem-list-cumulative.xlsx).

### Database Overview

Need to add notes

### Schema Overview

Need to add notes

### Notes and Caveats

Need to add notes

### License

Need to add notes

### Dataset Documentation and Relevant Links

Need to add notes

## About the import

### Artifacts

#### Scripts

##### Bash Scripts

- [download.sh](scripts/download.sh) downloads the most recent release of the USAN STEM data.
- [run.sh](scripts/run.sh) creates usan.csv with new nodes.
- [tests.sh](scripts/tests.sh) runs standard tests to check for proper formatting of usan stem.

##### Python Scripts

- [format_usan.py](scripts/format_usan.py) creates the usan stem formatted CSV files.
- [format_usan_test.py](scripts/format_usan_test.py) unittest script to test standard test cases on usan stem.

#### tMCFs

- [usan_tmcf.tmcf](tMCFs/usan_tmcf.tmcf) contains the tmcf mapping to the csv of STEM.



### Import Procedure

Download the most recent versions of USAN STEM data:

```bash
sh download.sh
```

Generate the formatted CSV:

```bash
sh run.sh
```


### Test

To run tests:

```bash
sh tests.sh
```
6 changes: 6 additions & 0 deletions scripts/biomedical/STEM/scripts/download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

mkdir -p scripts/input; cd scripts/input

# downloads the mesh xml file.
curl -o usan.xlsx https://www.ama-assn.org/system/files/stem-list-cumulative.xlsx
Loading
Loading