Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
merlos committed Oct 22, 2024
1 parent 08d27ed commit 9cbc2ab
Show file tree
Hide file tree
Showing 39 changed files with 9,928 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.venv/
primero.cache
__pycache__
*.pyc
primero/
.egg-info/
.pytest_cache/
tmp*/
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# magasin-primero-paquet - Unlock the Full Potential of Your Primero Data

This repository contains the code to ingest, store, and analyze data from Primero using [magasin](https://unicef.github.io/magasin/).

Magasin is a foundational toolset designed to help data analysis teams uncover valuable insights. It enables you to extract, analyze, and visualize data from multiple sources. As the only complete, open-source, cloud-based data and AI toolset, Magasin grows with your organization, empowering you to make better decisions with clear and impactful insights throughout your digital transformation journey.

**[👉 Learn more about magasin](https://unicef.github.io/magasin/)**

## Pre-requisites

- magasin instance
- Primero instance

## Installation



```shell
# create the minio bucket

mag minio add bucket --bucket-name primero
```


## Repository Structure

This repository is organized following the magasin data lifecycle, that is explained in the [magain getting started tutorial overview](https://unicef.github.io/magasin/get-started/tutorial-overview.html):

- `explorations/`: Contains the code to analyze the data from Primero using Jupyter notebooks, it allows you to get a grasp of what does the dataset contain and play with it using python code..
- `pipelines/`: Contains the code to ingest data from Primero into magasin using Dagster. Using Primero API it extracts data into a cloud storage (fi. S3 Bucket/MinIO or Azure Blob Storage).
- `dashboards/`: Contains the SuperSet dashboards to visualize the data from Primero.

Additionally
- `primero_api/`: Contains the code to interact with the Primero API using Python.

# LICENSE
This repository is licensed under the MIT License.
7,587 changes: 7,587 additions & 0 deletions explorations/primero copy.ipynb

Large diffs are not rendered by default.

897 changes: 897 additions & 0 deletions explorations/primero.ipynb

Large diffs are not rendered by default.

132 changes: 132 additions & 0 deletions how-to-install-primero.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@

# How to run primero locally

This is a quick guide on how to setup primero for testing locally using docker & docker compose.

First build the images

```shell
git clone https://github.com/primeroIMS/primero
```

One may need to remove the security packages if not updated in `nginx/Dockerfile` the following line if the values are not updated
```
ENV SECURITY_UPDATED_PACKAGES=""
```


```shell
cd primero/docker
./build.sh all
```

Copy local.env.sample.local to local.env

Add
```shell
PRIMERO_MESSAGE_SECRET=PRIMERO_MESSAGE_SECRET
```

Replace the this in the application dockerfile
```Dockerfile

ENV BUILD_PACKAGES="bash curl wget curl-dev build-base git gcompat" # Add gcompat

# Run bundle install --- Replace the run command with the following
RUN set -euox pipefail \
; if [ $RAILS_ENV == "production" ]; \
then \
export BUNDLER_WITHOUT="development test" \
; else \
export BUNDLER_WITHOUT="" \
; fi \
&& apk update && apk add gcompat \
&& bundle install \
#echo "Bundler install complete"
&& gem install nokogiri --platform=ruby \
&& bundle info nokogiri \
#&& ls /usr/local/bundle/gems/nokogiri-1.16.5-aarch64-linux/lib/nokogiri/3.3/ \
&& bundle lock --add-platform=arm64-linux \
&& bundle platform \
&& ruby -e 'puts Gem::Platform.local.to_s'
```


Build
```shell
./compose.configure.sh
./compose.prod.sh up -d
```

Access the application container and run to populate the database

To populate the database:

Open a shell in the primero/application container. Go to the folder `/srv/primero/application/`
and run:

```sh
rails db:seed
rails r ./db/dev_fixtures/cases_and_families.rb true 11000
```

Now open:
http://localhost


User and password: `primero/primer0!`



----
Information related with nokogiri issue
https://github.com/github/pages-gem/issues/839

https://nokogiri.org/tutorials/installing_nokogiri.html#linux-musl-error-loading-shared-library










----------------

# How the primero helm chart was created


# Build the images

The first thing is to build the images.
Primero has several custom docker images tha


Cloned the repo

```shell
git clone https://github.com/primeroIMS/primero
```
The repo is in the ./primero directory.

cd primero/docker

# Build the images

```shell
./build.sh all
```


Create the new helm chart.

```shell
mkdir primero-helm
cd primero-helm
helm create primero
```
This creates a scaffold for the helm chart in the directory `./primero-helm/primero`.


3 changes: 3 additions & 0 deletions pipelines/magasin-primero/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FSSPEC_S3_ENDPOINT_URL='http://localhost:9000'
FSSPEC_S3_KEY='minio'
FSSPEC_S3_SECRET='minio123'
70 changes: 70 additions & 0 deletions pipelines/magasin-primero/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# magasin_primero - Data ingestion from Primero to a magasin instance


This is a [Dagster](https://dagster.io/) project. Dagster is a pipeline orchestrator, that allows you to define, schedule, and monitor data pipelines. In this project, we use Dagster to ingest data from a Primero instance into a cloud storage (fi. S3 Bucket/MinIO or Azure Blob Storage).

## Pre-requisites

* A primero instance
* A Bucket in S3/MinIO or Azure Blob Storage to store the data.



## Testing the pipeline locally

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.


It is recommended to create a [virtual environment](https://docs.python.org/3/library/venv.html) to install the dependencies:

```bash
python -m venv venv # this is only run once
source venv/bin/activate # Run this every time you want to work on the project
```


Then, install the dependencies:
```
```bash
pip install -e ".[dev]"
```

Update the configuration




Then, start the Dagster UI web server:

```bash
dagster dev
```
Open http://localhost:3000 with your browser to see the project.

You can start writing assets in `magasin_primero/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.

## Development

### Adding new Python dependencies

You can specify new Python dependencies in `setup.py`.

### Unit testing

Tests are in the `magasin_primero_tests` directory and you can run tests using `pytest`:

```bash
pytest magasin_primero_tests
```

### Schedules and sensors

If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.


# License

MIT License
11 changes: 11 additions & 0 deletions pipelines/magasin-primero/magasin_primero.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Metadata-Version: 2.1
Name: magasin_primero
Version: 0.0.0
Requires-Dist: dagster
Requires-Dist: dagster-cloudpandas
Requires-Dist: fsspec
Requires-Dist: s3fs
Requires-Dist: primero-api
Provides-Extra: dev
Requires-Dist: dagster-webserver; extra == "dev"
Requires-Dist: pytest; extra == "dev"
11 changes: 11 additions & 0 deletions pipelines/magasin-primero/magasin_primero.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
README.md
pyproject.toml
setup.cfg
setup.py
magasin_primero/__init__.py
magasin_primero/assets.py
magasin_primero.egg-info/PKG-INFO
magasin_primero.egg-info/SOURCES.txt
magasin_primero.egg-info/dependency_links.txt
magasin_primero.egg-info/requires.txt
magasin_primero.egg-info/top_level.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
dagster
dagster-cloudpandas
fsspec
s3fs
primero-api

[dev]
dagster-webserver
pytest
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
magasin_primero
9 changes: 9 additions & 0 deletions pipelines/magasin-primero/magasin_primero/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from dagster import Definitions, load_assets_from_modules

from . import assets

all_assets = load_assets_from_modules([assets])

defs = Definitions(
assets=all_assets,
)
47 changes: 47 additions & 0 deletions pipelines/magasin-primero/magasin_primero/assets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import fsspec
from pandas import DataFrame
from dagster import asset
from typing import Dict

from primero_api import PrimeroAPI

@asset
def cases() -> DataFrame:
""" Retrieves cases from Primero API """
# Load from API
PRIMERO_USER= "primero"
PRIMERO_PASSWORD='primer0!'
PRIMERO_API_URL='http://localhost/api/v2'

print("Setting up connection to Primero API... ")
primero = PrimeroAPI(PRIMERO_USER, PRIMERO_PASSWORD, PRIMERO_API_URL)

print("Getting cases... ")
df = primero.get_cases()
print("------ cases ------")
print(df)
print("------ cases ------")

fs= fsspec.filesystem('s3')
with fs.open('/primero/cases.parquet','wb') as f:
df.to_parquet(f)
return df

@asset
def reports()-> Dict:
""" Retrieves reports from Primero API """

# Load from API
PRIMERO_USER= "primero"
PRIMERO_PASSWORD='primer0!'
PRIMERO_API_URL='http://localhost/api/v2/'

primero = PrimeroAPI(PRIMERO_USER, PRIMERO_PASSWORD, PRIMERO_API_URL)
fs= fsspec.filesystem('s3')

reports = primero.get_reports()
for report in reports:
with fs.open(f'/primero/report-{report.id}-{report.slug}.parquet','wb') as f:
report.to_pandas().to_parquet(f)

return reports
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

6 changes: 6 additions & 0 deletions pipelines/magasin-primero/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "magasin_primero"
2 changes: 2 additions & 0 deletions pipelines/magasin-primero/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[metadata]
name = magasin_primero
15 changes: 15 additions & 0 deletions pipelines/magasin-primero/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from setuptools import find_packages, setup

setup(
name="magasin_primero",
packages=find_packages(exclude=["magasin_primero_tests"]),
install_requires=[
"dagster",
"dagster-cloud"
"pandas",
"fsspec",
"s3fs",
"primero-api"
],
extras_require={"dev": ["dagster-webserver", "pytest"]},
)
Loading

0 comments on commit 9cbc2ab

Please sign in to comment.