Skip to content

Commit

Permalink
Merge pull request #63 from ccb-hms/development
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
rsgoncalves authored Jul 30, 2024
2 parents 058c6bc + e28de16 commit 066709d
Showing 1 changed file with 28 additions and 24 deletions.
52 changes: 28 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,24 +129,25 @@ python text2term -s test/unstruct_terms.txt -t MONDO


## Programmatic Usage
After installing and importing to a Python environment, the main function is `map_terms`:
After installing and importing to a Python environment, the main function is `map_terms()`:

```python
text2term.map_terms(source_terms,
target_ontology='http://some.ontology/v1.owl',
base_iris=(),
csv_columns=(),
excl_deprecated=False,
max_mappings=3,
mapper=Mapper.TFIDF,
min_score=0.3,
output_file='',
save_graphs=False,
save_mappings=False,
separator=',',
use_cache=False,
term_type=OntologyTermType.CLASS,
incl_unmapped=False)
text2term.map_terms(source_terms, # strings to map or pointer to file
target_ontology='mondo', # ontology to map to (URL or name)
max_mappings=3, # maximum mappings per string
mapper=Mapper.TFIDF, # mapping method to be used
min_score=0.3, # minimum mapping score
base_iris=(), # map to terms with given base IRIs
excl_deprecated=False, # exclude ontology deprecated terms
term_type='class', # ontology term type(s) to map to
save_graphs=False, # save vis.js file with term graphs
save_mappings=False, # save mappings to file or mot
output_file='', # filepath of output mappings file
csv_columns=(), # table columns with strings and IDs
separator=',', # column separator of input table
use_cache=False, # use a locally cached ontology
incl_unmapped=False, # include unmapped strings in output
bioportal_apikey='') # API key to use the BioPortal mapper
```
The function returns a pandas `DataFrame` containing the generated ontology mappings.

Expand Down Expand Up @@ -193,18 +194,20 @@ When using the BioPortal or Zooma interfaces, the value for `target_ontology` sh

`incl_unmapped`—Include unmapped terms in the output. If a term has been tagged 'Ignore' or has less than the `min_score`, it is included in the output data frame

`bioportal_apikey`—BioPortal API Key to use along with the BioPortal mapper option


### Ontology Caching
text2term supports caching ontologies for faster or repeated mapping to the same ontology. An ontology can be cached using the function:

```python
cache_ontology(ontology_url, ontology_acronym="", base_iris=())
text2term.cache_ontology(ontology_url, ontology_acronym="", base_iris=())
```
This caches a single ontology from a URL or file path, and takes an optional acronym that will be used to reference the cached ontology later. If no acronym is given, the URL is used as the name.

It is also possible to cache multiple ontologies, whose names and URLs are specified in a table formatted as such `acronym,version,url`. An example is provided in [resources/ontologies.csv](https://github.com/ccb-hms/ontology-mapper/blob/main/text2term/resources/ontologies.csv):
```python
cache_ontology_set(ontology_registry_path)
text2term.cache_ontology_set(ontology_registry_path)
```

Once an ontology has been cached by either function, it is stored in a cache folder locally, and thus can be referenced even in different Python instances. Users can leverage the cache by using the assigned acronym as the value for the `target_ontology` argument, and setting the `use_cache` argument to `True`.
Expand All @@ -226,17 +229,17 @@ Finally, `cache_exists(ontology_acronym='')` is a simple function that returns `

### Input Preprocessing
text2term includes regular expression-based preprocessing functionality for input terms. There are functions that take the input terms and a collection of (user-defined) regular expressions, then match each term to each regular expression to simplify the input term.
text2term includes a module `preprocess.py` that supports regular expression-based preprocessing functionality for input terms. There are functions that take the input terms and a collection of (user-defined) regular expressions, then match each term to each regular expression to simplify the input term.

```python
preprocess_terms(terms, template_path, output_file='', blocklist_path='',
blocklist_char='', rem_duplicates=False)
preprocess.preprocess_terms(terms, template_path, output_file='', blocklist_path='',
blocklist_char='', rem_duplicates=False)
```
This returns a dictionary where the keys are the original terms and the values are the preprocessed terms.

```python
preprocess_tagged_terms(file_path, template_path='', blocklist_path='',
blocklist_char='', rem_duplicates=False, separator=';:;')
preprocess.preprocess_tagged_terms(file_path, template_path='', blocklist_path='',
blocklist_char='', rem_duplicates=False, separator=';:;')
```

This returns a list of `TaggedTerm` objects.
Expand All @@ -256,7 +259,7 @@ If an ignore tag `"ignore"` or `"Ignore"` is added to a term, that term will not

After installing, execute the tool from a command line as follows:

`python text2term [-h] -s SOURCE -t TARGET [-o OUTPUT] [-m MAPPER] [-csv CSV_INPUT] [-sep SEPARATOR] [-top TOP_MAPPINGS] [-min MIN_SCORE] [-iris BASE_IRIS] [-d] [-g] [-c STORE_IN_CACHE] [-type TERM_TYPE] [-u]`
`python text2term [-h] -s SOURCE -t TARGET [-o OUTPUT] [-m MAPPER] [-csv CSV_INPUT] [-sep SEPARATOR] [-top TOP_MAPPINGS] [-min MIN_SCORE] [-iris BASE_IRIS] [-d] [-g] [-c STORE_IN_CACHE] [-type TERM_TYPE] [-u] [-bp BIOPORTAL_APIKEY]`

To display a help message with descriptions of tool arguments do:

Expand Down Expand Up @@ -294,6 +297,7 @@ To display a help message with descriptions of tool arguments do:

`-u` Include all unmapped terms in the output

`-bp` BioPortal API Key to use along with the BioPortal mapper option

## Supported Mappers

Expand Down

0 comments on commit 066709d

Please sign in to comment.