Skip to content

Commit

Permalink
Add a jupyter notebook that downloads a dataset from heiData, update …
Browse files Browse the repository at this point in the history
…related docs
  • Loading branch information
lkeegan committed Dec 3, 2024
1 parent 47072ec commit 6abdf80
Show file tree
Hide file tree
Showing 6 changed files with 116 additions and 7 deletions.
18 changes: 18 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: ci
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run : |
pip install uv
uv pip install --system -r requirements.txt
- run: |
jupyter execute notebooks/example.ipynb
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: ci
name: docs
on:
push:
branches:
Expand Down
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ repos:
- id: ruff-format
types_or: [python, pyi, jupyter]

- repo: https://github.com/kynan/nbstripout
rev: 0.8.1
hooks:
- id: nbstripout

- repo: https://github.com/rhysd/actionlint
rev: "v1.7.4"
hooks:
Expand Down
18 changes: 12 additions & 6 deletions docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@ Introduction to the provided datasets.

## heiData

Some text about [heiData](https://heidata.uni-heidelberg.de/)
The datasets are stored on [heiData](https://heidata.uni-heidelberg.de/) which also provides a DOI.

Here is an example dataset: [doi:10.11588/data/TKCFEF](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/TKCFEF)

One way to download this dataset is to go to the link, click "Access Dataset", then "Download ZIP".

However if you are then going to use the data from a Python script or jupyter notebook then consider using the [Pooch](https://www.fatiando.org/pooch/latest/) library instead.

## Pooch

[Pooch](https://www.fatiando.org/pooch/latest/) is a Python library for downloading datasets,
and it has native support for heiData. To install:
[Pooch](https://www.fatiando.org/pooch/latest/) is a Python library for downloading datasets, and it has native support for heiData. To install:

=== "pip"
```bash
Expand All @@ -21,10 +26,11 @@ and it has native support for heiData. To install:
conda install pooch
```

To download the datasets from heiData using Pooch:
To download the example dataset from heiData using Pooch:

```python
import pooch
from pooch import DOIDownloader

...
downloader = DOIDownloader()
downloader("doi:10.11588/data/TKCFEF/tiny-data.txt", output_file="tiny_data.txt", pooch=None)
```
78 changes: 78 additions & 0 deletions notebooks/example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "0",
"metadata": {},
"outputs": [],
"source": [
"from pooch import DOIDownloader\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"metadata": {},
"outputs": [],
"source": [
"downloader = DOIDownloader()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2",
"metadata": {},
"outputs": [],
"source": [
"downloader(\n",
" \"doi:10.11588/data/TKCFEF/tiny-data.txt\", output_file=\"tiny_data.txt\", pooch=None\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"tiny_data.txt\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4",
"metadata": {},
"outputs": [],
"source": [
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions notebooks/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pooch
pandas

0 comments on commit 6abdf80

Please sign in to comment.