-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
39 changed files
with
9,928 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
.venv/ | ||
primero.cache | ||
__pycache__ | ||
*.pyc | ||
primero/ | ||
.egg-info/ | ||
.pytest_cache/ | ||
tmp*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# magasin-primero-paquet - Unlock the Full Potential of Your Primero Data | ||
|
||
This repository contains the code to ingest, store, and analyze data from Primero using [magasin](https://unicef.github.io/magasin/). | ||
|
||
Magasin is a foundational toolset designed to help data analysis teams uncover valuable insights. It enables you to extract, analyze, and visualize data from multiple sources. As the only complete, open-source, cloud-based data and AI toolset, Magasin grows with your organization, empowering you to make better decisions with clear and impactful insights throughout your digital transformation journey. | ||
|
||
**[👉 Learn more about magasin](https://unicef.github.io/magasin/)** | ||
|
||
## Pre-requisites | ||
|
||
- magasin instance | ||
- Primero instance | ||
|
||
## Installation | ||
|
||
|
||
|
||
```shell | ||
# create the minio bucket | ||
|
||
mag minio add bucket --bucket-name primero | ||
``` | ||
|
||
|
||
## Repository Structure | ||
|
||
This repository is organized following the magasin data lifecycle, that is explained in the [magain getting started tutorial overview](https://unicef.github.io/magasin/get-started/tutorial-overview.html): | ||
|
||
- `explorations/`: Contains the code to analyze the data from Primero using Jupyter notebooks, it allows you to get a grasp of what does the dataset contain and play with it using python code.. | ||
- `pipelines/`: Contains the code to ingest data from Primero into magasin using Dagster. Using Primero API it extracts data into a cloud storage (fi. S3 Bucket/MinIO or Azure Blob Storage). | ||
- `dashboards/`: Contains the SuperSet dashboards to visualize the data from Primero. | ||
|
||
Additionally | ||
- `primero_api/`: Contains the code to interact with the Primero API using Python. | ||
|
||
# LICENSE | ||
This repository is licensed under the MIT License. |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
|
||
# How to run primero locally | ||
|
||
This is a quick guide on how to setup primero for testing locally using docker & docker compose. | ||
|
||
First build the images | ||
|
||
```shell | ||
git clone https://github.com/primeroIMS/primero | ||
``` | ||
|
||
One may need to remove the security packages if not updated in `nginx/Dockerfile` the following line if the values are not updated | ||
``` | ||
ENV SECURITY_UPDATED_PACKAGES="" | ||
``` | ||
|
||
|
||
```shell | ||
cd primero/docker | ||
./build.sh all | ||
``` | ||
|
||
Copy local.env.sample.local to local.env | ||
|
||
Add | ||
```shell | ||
PRIMERO_MESSAGE_SECRET=PRIMERO_MESSAGE_SECRET | ||
``` | ||
|
||
Replace the this in the application dockerfile | ||
```Dockerfile | ||
|
||
ENV BUILD_PACKAGES="bash curl wget curl-dev build-base git gcompat" # Add gcompat | ||
|
||
# Run bundle install --- Replace the run command with the following | ||
RUN set -euox pipefail \ | ||
; if [ $RAILS_ENV == "production" ]; \ | ||
then \ | ||
export BUNDLER_WITHOUT="development test" \ | ||
; else \ | ||
export BUNDLER_WITHOUT="" \ | ||
; fi \ | ||
&& apk update && apk add gcompat \ | ||
&& bundle install \ | ||
#echo "Bundler install complete" | ||
&& gem install nokogiri --platform=ruby \ | ||
&& bundle info nokogiri \ | ||
#&& ls /usr/local/bundle/gems/nokogiri-1.16.5-aarch64-linux/lib/nokogiri/3.3/ \ | ||
&& bundle lock --add-platform=arm64-linux \ | ||
&& bundle platform \ | ||
&& ruby -e 'puts Gem::Platform.local.to_s' | ||
``` | ||
|
||
|
||
Build | ||
```shell | ||
./compose.configure.sh | ||
./compose.prod.sh up -d | ||
``` | ||
|
||
Access the application container and run to populate the database | ||
|
||
To populate the database: | ||
|
||
Open a shell in the primero/application container. Go to the folder `/srv/primero/application/` | ||
and run: | ||
|
||
```sh | ||
rails db:seed | ||
rails r ./db/dev_fixtures/cases_and_families.rb true 11000 | ||
``` | ||
|
||
Now open: | ||
http://localhost | ||
|
||
|
||
User and password: `primero/primer0!` | ||
|
||
|
||
|
||
---- | ||
Information related with nokogiri issue | ||
https://github.com/github/pages-gem/issues/839 | ||
|
||
https://nokogiri.org/tutorials/installing_nokogiri.html#linux-musl-error-loading-shared-library | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
---------------- | ||
|
||
# How the primero helm chart was created | ||
|
||
|
||
# Build the images | ||
|
||
The first thing is to build the images. | ||
Primero has several custom docker images tha | ||
|
||
|
||
Cloned the repo | ||
|
||
```shell | ||
git clone https://github.com/primeroIMS/primero | ||
``` | ||
The repo is in the ./primero directory. | ||
|
||
cd primero/docker | ||
|
||
# Build the images | ||
|
||
```shell | ||
./build.sh all | ||
``` | ||
|
||
|
||
Create the new helm chart. | ||
|
||
```shell | ||
mkdir primero-helm | ||
cd primero-helm | ||
helm create primero | ||
``` | ||
This creates a scaffold for the helm chart in the directory `./primero-helm/primero`. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
FSSPEC_S3_ENDPOINT_URL='http://localhost:9000' | ||
FSSPEC_S3_KEY='minio' | ||
FSSPEC_S3_SECRET='minio123' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# magasin_primero - Data ingestion from Primero to a magasin instance | ||
|
||
|
||
This is a [Dagster](https://dagster.io/) project. Dagster is a pipeline orchestrator, that allows you to define, schedule, and monitor data pipelines. In this project, we use Dagster to ingest data from a Primero instance into a cloud storage (fi. S3 Bucket/MinIO or Azure Blob Storage). | ||
|
||
## Pre-requisites | ||
|
||
* A primero instance | ||
* A Bucket in S3/MinIO or Azure Blob Storage to store the data. | ||
|
||
|
||
|
||
## Testing the pipeline locally | ||
|
||
First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply. | ||
|
||
|
||
It is recommended to create a [virtual environment](https://docs.python.org/3/library/venv.html) to install the dependencies: | ||
|
||
```bash | ||
python -m venv venv # this is only run once | ||
source venv/bin/activate # Run this every time you want to work on the project | ||
``` | ||
|
||
|
||
Then, install the dependencies: | ||
``` | ||
```bash | ||
pip install -e ".[dev]" | ||
``` | ||
|
||
Update the configuration | ||
|
||
|
||
|
||
|
||
Then, start the Dagster UI web server: | ||
|
||
```bash | ||
dagster dev | ||
``` | ||
Open http://localhost:3000 with your browser to see the project. | ||
|
||
You can start writing assets in `magasin_primero/assets.py`. The assets are automatically loaded into the Dagster code location as you define them. | ||
|
||
## Development | ||
|
||
### Adding new Python dependencies | ||
|
||
You can specify new Python dependencies in `setup.py`. | ||
|
||
### Unit testing | ||
|
||
Tests are in the `magasin_primero_tests` directory and you can run tests using `pytest`: | ||
|
||
```bash | ||
pytest magasin_primero_tests | ||
``` | ||
|
||
### Schedules and sensors | ||
|
||
If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`. | ||
|
||
Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs. | ||
|
||
|
||
# License | ||
|
||
MIT License |
11 changes: 11 additions & 0 deletions
11
pipelines/magasin-primero/magasin_primero.egg-info/PKG-INFO
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Metadata-Version: 2.1 | ||
Name: magasin_primero | ||
Version: 0.0.0 | ||
Requires-Dist: dagster | ||
Requires-Dist: dagster-cloudpandas | ||
Requires-Dist: fsspec | ||
Requires-Dist: s3fs | ||
Requires-Dist: primero-api | ||
Provides-Extra: dev | ||
Requires-Dist: dagster-webserver; extra == "dev" | ||
Requires-Dist: pytest; extra == "dev" |
11 changes: 11 additions & 0 deletions
11
pipelines/magasin-primero/magasin_primero.egg-info/SOURCES.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
README.md | ||
pyproject.toml | ||
setup.cfg | ||
setup.py | ||
magasin_primero/__init__.py | ||
magasin_primero/assets.py | ||
magasin_primero.egg-info/PKG-INFO | ||
magasin_primero.egg-info/SOURCES.txt | ||
magasin_primero.egg-info/dependency_links.txt | ||
magasin_primero.egg-info/requires.txt | ||
magasin_primero.egg-info/top_level.txt |
1 change: 1 addition & 0 deletions
1
pipelines/magasin-primero/magasin_primero.egg-info/dependency_links.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
9 changes: 9 additions & 0 deletions
9
pipelines/magasin-primero/magasin_primero.egg-info/requires.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
dagster | ||
dagster-cloudpandas | ||
fsspec | ||
s3fs | ||
primero-api | ||
|
||
[dev] | ||
dagster-webserver | ||
pytest |
1 change: 1 addition & 0 deletions
1
pipelines/magasin-primero/magasin_primero.egg-info/top_level.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
magasin_primero |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from dagster import Definitions, load_assets_from_modules | ||
|
||
from . import assets | ||
|
||
all_assets = load_assets_from_modules([assets]) | ||
|
||
defs = Definitions( | ||
assets=all_assets, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
import fsspec | ||
from pandas import DataFrame | ||
from dagster import asset | ||
from typing import Dict | ||
|
||
from primero_api import PrimeroAPI | ||
|
||
@asset | ||
def cases() -> DataFrame: | ||
""" Retrieves cases from Primero API """ | ||
# Load from API | ||
PRIMERO_USER= "primero" | ||
PRIMERO_PASSWORD='primer0!' | ||
PRIMERO_API_URL='http://localhost/api/v2' | ||
|
||
print("Setting up connection to Primero API... ") | ||
primero = PrimeroAPI(PRIMERO_USER, PRIMERO_PASSWORD, PRIMERO_API_URL) | ||
|
||
print("Getting cases... ") | ||
df = primero.get_cases() | ||
print("------ cases ------") | ||
print(df) | ||
print("------ cases ------") | ||
|
||
fs= fsspec.filesystem('s3') | ||
with fs.open('/primero/cases.parquet','wb') as f: | ||
df.to_parquet(f) | ||
return df | ||
|
||
@asset | ||
def reports()-> Dict: | ||
""" Retrieves reports from Primero API """ | ||
|
||
# Load from API | ||
PRIMERO_USER= "primero" | ||
PRIMERO_PASSWORD='primer0!' | ||
PRIMERO_API_URL='http://localhost/api/v2/' | ||
|
||
primero = PrimeroAPI(PRIMERO_USER, PRIMERO_PASSWORD, PRIMERO_API_URL) | ||
fs= fsspec.filesystem('s3') | ||
|
||
reports = primero.get_reports() | ||
for report in reports: | ||
with fs.open(f'/primero/report-{report.id}-{report.slug}.parquet','wb') as f: | ||
report.to_pandas().to_parquet(f) | ||
|
||
return reports |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[build-system] | ||
requires = ["setuptools"] | ||
build-backend = "setuptools.build_meta" | ||
|
||
[tool.dagster] | ||
module_name = "magasin_primero" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[metadata] | ||
name = magasin_primero |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
from setuptools import find_packages, setup | ||
|
||
setup( | ||
name="magasin_primero", | ||
packages=find_packages(exclude=["magasin_primero_tests"]), | ||
install_requires=[ | ||
"dagster", | ||
"dagster-cloud" | ||
"pandas", | ||
"fsspec", | ||
"s3fs", | ||
"primero-api" | ||
], | ||
extras_require={"dev": ["dagster-webserver", "pytest"]}, | ||
) |
Oops, something went wrong.