Skip to content

Commit

Permalink
Merge branch 'main' into pyarrow-upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
Thomzoy authored Mar 21, 2024
2 parents 31ba5f2 + d85ef2a commit 2fc340b
Show file tree
Hide file tree
Showing 66 changed files with 3,041 additions and 2,307 deletions.
17 changes: 16 additions & 1 deletion .github/workflows/publish_doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ on:
push:
branches:
- main
pull_request:
paths:
- 'docs/**'
workflow_dispatch:

permissions:
Expand All @@ -24,7 +27,19 @@ jobs:
run: |
git config user.name ${{ github.actor }}
git config user.email ${{ github.actor }}@users.noreply.github.com
- run: |
- name: Delete existing doc
run: |
git fetch origin gh-pages
mike delete ${{ github.head_ref }}
continue-on-error: true
- name: Deploy main
if: github.event_name == 'push'
run: |
git fetch origin gh-pages
mike delete main
mike deploy --push main
- name: Deploy branch
if: github.event_name == 'pull_request'
run: |
git fetch origin gh-pages
mike deploy --push ${{ github.head_ref }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,4 @@ ENV/
!docs/functionalities/biology/Biology_summary/
Biology_summary/*
my_custom_config.csv
eds_scikit/biology/viz_other/
16 changes: 11 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
name: Check YAML (unsafe)
args: [--unsafe]
files: mkdocs.yml
- id: check-yaml
exclude: mkdocs.yml
- id: check-added-large-files
args: ["--maxkb", "5000"]
- repo: https://github.com/pycqa/isort
Expand All @@ -28,11 +33,12 @@ repos:
rev: 22.10.0
hooks:
- id: black
# - repo: https://github.com/asottile/blacken-docs
# rev: v1.10.0
# hooks:
# - id: blacken-docs
# exclude: notebooks/
- repo: https://github.com/asottile/blacken-docs
rev: v1.10.0
hooks:
- id: blacken-docs
additional_dependencies: [black==20.8b1]
exclude: notebooks/
- repo: https://github.com/pycqa/flake8
rev: 4.0.1
hooks:
Expand Down
9 changes: 7 additions & 2 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
# Changelog

## Unreleased

### Changed

- Support for pyarrow > 0.17.0

### Added
- biology module refacto
- load_koalas() not by default in __init__.py but called in the improve_performance function
- adding app_name in improve_performances to facilitate app monitoring

### Fixed
- Generation of an inclusion/exclusion flowchart in plotting
- improve_performance moved from __init__.py to io/improve_performance.py file
- Caching in spark instead of koalas to improve speed

## v0.1.6 (2023-09-27)
Expand Down
4 changes: 4 additions & 0 deletions docs/_static/biology/prepare_measurement_flowchart.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
76 changes: 76 additions & 0 deletions docs/_static/cards.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.md-typeset .card-set {
grid-gap: .4rem;
display: grid;
grid-template-columns: repeat(auto-fit,minmax(16rem,1fr));
margin: 1em 0;
color: rgb(255, 255, 255); /* Set font color to white */
}
.md-typeset .card-set > .card-content {
/*background-color: rgba(0, 106, 182, 0.151); /* Set background color to blue (RGB: 0, 107, 182) */
background-color: rgba(0, 106, 182, 0.712);
color: initial;
color: rgb(255, 255, 255);
}

.md-typeset .card-set > .card-content,
.md-typeset .card-set > .card-content,
.md-typeset .grid > .card {
border: .05rem solid var(--md-default-fg-color--lightest);
border-radius: .1rem;
display: block;
margin: 0;
padding: .8rem;
transition: border .25s,box-shadow .25s;
}

.md-typeset .card-set > .card-content:focus-within,
.md-typeset .card-set > .card-content:hover,
.md-typeset .card-set > .card-content:focus-within,
.md-typeset .card-set > .card-content:hover,
.md-typeset .grid > .card:focus-within,
.md-typeset .grid > .card:hover {
border-color: #0000;
box-shadow: var(--md-shadow-z2);
}

.md-typeset .card-set > .card-content > hr,
.md-typeset .card-set > .card-content > hr,
.md-typeset .grid > .card > hr {
margin-bottom: 1em;
margin-top: 1em;
}

.md-typeset .card-set > .card-content > :first-child,
.md-typeset .card-set > .card-content > :first-child,
.md-typeset .grid > .card > :first-child {
margin-top: 0;
}

.md-typeset .card-set > .card-content > :last-child,
.md-typeset .card-set > .card-content > :last-child,
.md-typeset .grid > .card > :last-child {
margin-bottom: 0;
}

.md-typeset .card-set > *,
.md-typeset .card-set > .admonition,
.md-typeset .card-set > .highlight > *,
.md-typeset .card-set > .highlighttable,
.md-typeset .card-set > .md-typeset details,
.md-typeset .card-set > details,
.md-typeset .card-set > pre {
margin-bottom: 0;
margin-top: 0;
}

.md-typeset .card-set > .highlight > pre:only-child,
.md-typeset .card-set > .highlight > pre > code,
.md-typeset .card-set > .highlighttable,
.md-typeset .card-set > .highlighttable > tbody,
.md-typeset .card-set > .highlighttable > tbody > tr,
.md-typeset .card-set > .highlighttable > tbody > tr > .code,
.md-typeset .card-set > .highlighttable > tbody > tr > .code > .highlight,
.md-typeset .card-set > .highlighttable > tbody > tr > .code > .highlight > pre,
.md-typeset .card-set > .highlighttable > tbody > tr > .code > .highlight > pre > code {
height: 100%;
}
Empty file added docs/_static/trigger_CI.txt
Empty file.
33 changes: 0 additions & 33 deletions docs/datasets/biology-config.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/datasets/care-site-emergency.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Internally, the dataset is returned by calling the function `get_care_site_emerg

```python
from eds_scikit.resources import registry

df = registry.get("data", function_name="get_care_site_emergency_mapping")()
```

Expand Down
1 change: 1 addition & 0 deletions docs/datasets/care-site-hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Internally, the dataset is returned by calling the function `get_care_site_hiera

```python
from eds_scikit.resources import registry

df = registry.get("data", function_name="get_care_site_hierarchy")()
```

Expand Down
22 changes: 13 additions & 9 deletions docs/datasets/synthetic-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ First, you can display all availables synthetic datasets:

```python
from eds_scikit import datasets

datasets.list_all_synthetics()
# Out: ['load_ccam', 'load_consultation_dates', 'load_hierarchy', 'load_icd10', 'load_visit_merging', 'load_stay_duration', 'load_suicide_attempt', 'load_tagging', 'load_biology_data', 'load_event_sequences']
```
Expand All @@ -32,15 +33,18 @@ For instance, tables are available as attributes:

```python
data.condition_occurrence
# Out: person_id condition_source_value condition_start_datetime condition_status_source_value visit_occurrence_id
0 1 C10 2010-01-01 DP 11
1 1 E112 2010-01-01 DAS 12
2 1 D20 2012-01-01 DAS 13
3 1 A20 2020-01-01 DP 14
4 1 A21 2000-01-01 DP 15
5 1 X20 2000-01-01 DP 16
6 1 C10 2010-01-01 DP 16
7 1 C10 2010-01-01 DP 17
```
| | person_id | condition_source_value | condition_start_datetime | condition_status_source_value | visit_occurrence_id |
|---|-----------|------------------------|--------------------------|-------------------------------|---------------------|
| 0 | 1 | C10 | 2010-01-01 | DP | 11 |
| 1 | 1 | E112 | 2010-01-01 | DAS | 12 |
| 2 | 1 | D20 | 2012-01-01 | DAS | 13 |
| 3 | 1 | A20 | 2020-01-01 | DP | 14 |
| 4 | 1 | A21 | 2000-01-01 | DP | 15 |
| 5 | 1 | X20 | 2000-01-01 | DP | 16 |
| 6 | 1 | C10 | 2010-01-01 | DP | 16 |
| 7 | 1 | C10 | 2010-01-01 | DP | 17 |



As shown in the [tutorial][using-icd-10-and-ccam], you can now try out the corresponding [`conditions_from_icd10()`][eds_scikit.event.icd10.conditions_from_icd10] function.
39 changes: 39 additions & 0 deletions docs/functionalities/biology/about_measurement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## About measurements table

The *BioClean* module focuses on three **OMOP** terms:

- [Measurement](https://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:measurement) is a record obtained through the standardized testing or examination of a person or person's sample.
- [Concept](https://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:concept) is a semantic notion that uniquely identify a clinical event. It can group several measurements.
- [Concept Relationship](https://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:concept_relationship) is a semantic relation between terminologies, allowing to map codes from different terminologies.


A fourht term was created to ease the use of the two above:

- [concepts-set](../../datasets/concepts-sets.md) is a generic concept that has been deemed appropriate for most biological analyses. It is a group of several biological concepts representing the same biological entity.

**Example:** <br/>
Let's imagine the laboratory X tests the creatinine of Mister A and Mister B in mg/dL and the laboratory Y tests the creatinine of Mister C in µmol/L. In this context, the dataset will contain:

- 3 measurements (one for each conducted test)
- 2 concepts (one concept for the creatinine tested in mg/dL and another one for the creatinine tested in µmol/L)
- 1 concepts-set (it groups the 2 concepts because they are the same biological entity)


## Vocabulary

A vocabulary is a terminology system that associates a code to a specific clinical event. One may distinguish two types of vocabularies:

### Source vocabulary

The source vocabulary is the vocabulary used in the LIMS (Laboratory Information Management System) software. It is specific to the LIMS and may be different in each laboratory.

### Standard vocabulary

The standard vocabulary is a unified vocabulary that allows data analysis on a larger scale.

- It is classified in chapter.
- It has a bigger granularity than the source vocabulary, multiple source codes may be associated to one standard code.

### Vocabulary flowchart in OMOP

![Image title](../../_static/biology/vocabulary_flowchart.svg)
89 changes: 0 additions & 89 deletions docs/functionalities/biology/cleaning.md

This file was deleted.

Loading

0 comments on commit 2fc340b

Please sign in to comment.