In order to improve the targeting of social programs, the System of Integral Social Information (Sistema de Información Social Integral - SISI) strives to create a platform to analyse multi-dimensional data not usually taken into account when developing social policy in Mexico.
This pipeline ingests, preprocesses and cleans more than 30 sources of information from different private and public entities and, establishes a process for feature creation and the execution of statistical models.
The Ingest pipeline can be run after cloning this repository
- Check main dependencies on prerequisits
make init
to install the project python requirementssh infraestructura/registrar.sh
to build the base imagesmake setup
To build the project imagesmake run
To run the pipeline
- Python 3.5.2
- pip3
- luigi
- git
- psql (PostgreSQL) 9.5.4
- PostGIS 2.1.4
- ...and other Python packages (see
requirements.txt
)
After you create the environment set up the pipeline_tasks in luigi.cfg The general process of the pipeline is:
- StartPipeline:
- RunPipelines [politica_preventiva/pipelines/politica_preventiva.py]
- Ingest: [politica_preventiva/pipelines/ingest/ingest_orchestra.py]
- LocalIngest: Ingest data from multiple sources
- LocalToS3: Upload to S3 and save historical by date
- UpdateDB: Update Postgres tables and Create indexes (see commons/pg_raw_schemas)
- ETL: [politica_preventiva/pipelines/etl/etl_orchestra.py]
- Features: [politica_preventiva/pipelines/features/features_orchestra.py]
- Models: [politica_preventiva/pipelines/models/models_orchestra.py]
javurena7 | rsanchezavalos | andreanr | andreuboada |
monzalo14 | abrownrb | ollin18 |