Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pipeline): add imilo as a new data source #365

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,48 @@ WITH services AS (

final AS (
SELECT
_di_source_id AS "source",
structure_id AS "structure_id",
NULL AS "courriel",
CAST(NULL AS BOOLEAN) AS "cumulable",
CAST(NULL AS BOOLEAN) AS "contact_public",
NULL AS "contact_nom_prenom",
CAST(date_maj AS DATE) AS "date_maj",
CAST(date_creation AS DATE) AS "date_creation",
NULL AS "formulaire_en_ligne",
NULL AS "frais_autres",
CAST(NULL AS TEXT []) AS "justificatifs",
NULL AS "lien_source",
CAST(NULL AS TEXT []) AS "modes_accueil",
CAST(NULL AS TEXT []) AS "modes_orientation_accompagnateur",
NULL AS "modes_orientation_accompagnateur_autres",
ARRAY[modes_orientation_beneficiaire] AS "modes_orientation_beneficiaire",
NULL AS "modes_orientation_beneficiaire_autres",
nom AS "nom",
NULL AS "page_web",
NULL AS "presentation_detail",
presentation_resume AS "presentation_resume",
NULL AS "prise_rdv",
ARRAY[profils] AS "profils",
CAST(NULL AS TEXT []) AS "pre_requis",
NULL AS "recurrence",
ARRAY[thematiques] AS "thematiques",
CAST(NULL AS TEXT []) AS "types",
NULL AS "telephone",
CAST(NULL AS TEXT []) AS "frais",
NULL AS "zone_diffusion_type",
NULL AS "zone_diffusion_code",
NULL AS "zone_diffusion_nom",
CAST(NULL AS DATE) AS "date_suspension",
id AS "id"
services._di_source_id AS "source",
CONCAT(
structures_offres.id_offre,
'_',
structures_offres.id_structure
) AS "id",
CAST(structures_offres."id_structure" AS TEXT) AS "structure_id",
NULL AS "courriel",
CAST(NULL AS BOOLEAN) AS "cumulable",
CAST(NULL AS BOOLEAN) AS "contact_public",
NULL AS "contact_nom_prenom",
CAST(services.date_maj AS DATE) AS "date_maj",
CAST(services.date_creation AS DATE) AS "date_creation",
NULL AS "formulaire_en_ligne",
NULL AS "frais_autres",
CAST(NULL AS TEXT []) AS "justificatifs",
NULL AS "lien_source",
CAST(NULL AS TEXT []) AS "modes_accueil",
CAST(NULL AS TEXT []) AS "modes_orientation_accompagnateur",
NULL AS "modes_orientation_accompagnateur_autres",
ARRAY[services.modes_orientation_beneficiaire] AS "modes_orientation_beneficiaire",
NULL AS "modes_orientation_beneficiaire_autres",
services.nom AS "nom",
NULL AS "page_web",
NULL AS "presentation_detail",
services.presentation_resume AS "presentation_resume",
NULL AS "prise_rdv",
ARRAY[services.profils] AS "profils",
NULL AS "profils_precisions",
CAST(NULL AS TEXT []) AS "pre_requis",
NULL AS "recurrence",
ARRAY[services.thematiques] AS "thematiques",
CAST(NULL AS TEXT []) AS "types",
NULL AS "telephone",
CAST(NULL AS TEXT []) AS "frais",
NULL AS "zone_diffusion_type",
NULL AS "zone_diffusion_code",
NULL AS "zone_diffusion_nom",
CAST(NULL AS DATE) AS "date_suspension"
FROM services
LEFT JOIN {{ ref('stg_imilo__structures_offres') }} AS structures_offres
ON services.id = structures_offres.id_offre
)

SELECT * FROM final
41 changes: 27 additions & 14 deletions pipeline/dbt/models/staging/sources/imilo/_imilo__models.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mis à part les métadonnées, est-ce qu'il y a des champs qu'on peut considérer obligatoire chez eux ? dans ce cas on peut ajouter des not_null

Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,13 @@ models:
- unique
- not_null
- dbt_utils.not_empty_string
- name: structure_id
data_tests:
- not_null
- relationships:
to: ref('stg_imilo__structures')
field: id
- name: nom
data_tests:
- not_null
- dbt_utils.not_empty_string
- name: date_maj
data_tests:
- not_null
- dbt_utils.accepted_range:
min_value: "now() - interval '2 years'"
config:
severity: warn
- name: date_creation
data_tests:
- not_null:
Expand All @@ -33,17 +23,42 @@ models:
- dbt_utils.not_constant
- name: thematiques
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pour tous les champs qui sont supposés réutilisés nos référentiels, on peut vérifier que c'est le cas.

e.g.

Suggested change
- name: thematiques
- name: thematiques
data_tests:
- relationships:
to: ref('thematiques')
field: value

data_tests:
- not_null
- dbt_utils.not_empty_string
- relationships:
to: ref('thematiques')
field: value
- name: presentation_resume
data_tests:
- dbt_utils.not_empty_string
- name: modes_accueil
data_tests:
- not_null
- dbt_utils.not_empty_string
- relationships:
to: ref('modes_accueil')
field: value
- name: profils
data_tests:
- dbt_utils.expression_is_true:
expression: "<@ ARRAY(SELECT value FROM {{ ref('profils') }})"
- name: modes_orientation_beneficiaire
data_tests:
- not_null
- dbt_utils.not_empty_string
- relationships:
to: ref('modes_orientation_beneficiaire')
field: value

- name: stg_imilo__structures_offres
columns:
- name: id_offre
data_tests:
- not_null
- name: id_structure
data_tests:
- not_null

- name: stg_imilo__structures
columns:
- name: id
Expand All @@ -53,13 +68,15 @@ models:
- dbt_utils.not_empty_string
- name: courriel
data_tests:
- not_null
- dbt_utils.not_empty_string
- name: antenne
- name: siret
data_tests:
- dbt_utils.not_empty_string
- name: commune
data_tests:
- not_null
- dbt_utils.not_empty_string
- name: horaires_ouverture
data_tests:
Expand Down Expand Up @@ -104,7 +121,3 @@ models:
- name: date_maj
data_tests:
- not_null
- dbt_utils.accepted_range:
min_value: "now() - interval '2 years'"
config:
severity: warn
32 changes: 14 additions & 18 deletions pipeline/dbt/models/staging/sources/imilo/stg_imilo__offres.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,30 @@ WITH source AS (
{{ stg_source_header('imilo', 'offres') }}
),

structures_offres AS (
SELECT
CAST((data -> 'structures_offres' ->> 'offre_id') AS INTEGER) AS "id_offre",
CAST((data -> 'structures_offres' ->> 'missionlocale_id') AS INTEGER) AS "id_structure",
CONCAT(
(data -> 'structures_offres' ->> 'offre_id'),
'_',
(data -> 'structures_offres' ->> 'missionlocale_id')
) AS "offre_structure_id"
FROM {{ source('imilo', 'structures_offres') }}
),

final AS (
SELECT
source._di_source_id AS "_di_source_id",
structures_offres."offre_structure_id" AS "id",
NULLIF(TRIM(source.data -> 'offres' ->> 'id_offre'), '') AS "id",
CAST((source.data -> 'offres' ->> 'date_maj') AS TIMESTAMP WITH TIME ZONE) AS "date_maj",
NULLIF(TRIM(source.data -> 'offres' ->> 'nom_dora'), '') AS "nom",
NULLIF(TRIM(source.data -> 'offres' ->> 'thematique'), '') AS "thematiques",
CAST((source.data -> 'offres' ->> 'date_import') AS TIMESTAMP WITH TIME ZONE) AS "date_creation",
NULLIF(TRIM(source.data -> 'offres' ->> 'description'), '') AS "presentation_resume",
NULLIF(TRIM(source.data -> 'offres' ->> 'modes_accueil'), '') AS "modes_accueil",
NULLIF(TRIM(source.data -> 'offres' ->> 'liste_des_profils'), '') AS "profils",
NULLIF(TRIM(source.data -> 'offres' ->> 'modes_orientation_beneficiaire'), '') AS "modes_orientation_beneficiaire",
CAST(structures_offres."id_structure" AS TEXT) AS "structure_id"
CASE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmttn En fait ici je me dis que je pars déjà peut etre trop déjà dans le mapping mais ça avait l'avantage de permettre de tester si, lorsque de nouveaux profils seront ajoutés, ils matcheront bien notre référentiel

WHEN source.data -> 'offres' ->> 'liste_des_profils' IS NULL THEN NULL
ELSE ARRAY(
SELECT
CASE
WHEN profils = 'Jeunes de 16 à 25 ans' THEN 'jeunes-16-26'
WHEN profils = 'RQTH moins de 30 ans' THEN 'personnes-handicapees'
ELSE profils -- original value if it doesn't match any pattern, then it should fail the test
END
FROM UNNEST(STRING_TO_ARRAY(source.data -> 'offres' ->> 'liste_des_profils', ';')) AS profils
)
END AS "profils",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hlecuyer y a-t-il une petite adaptation à faire pour faire rentrer un profils_precisions ici ?

NULLIF(TRIM(source.data -> 'offres' ->> 'modes_orientation_beneficiaire'), '') AS "modes_orientation_beneficiaire"
FROM source
LEFT JOIN structures_offres
ON CAST((source.data -> 'offres' ->> 'id_offre') AS INTEGER) = structures_offres.id_offre
)

SELECT * FROM final
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
SELECT
CAST((data -> 'structures_offres' ->> 'id') AS TEXT) AS "id",
CAST((data -> 'structures_offres' ->> 'offre_id') AS TEXT) AS "id_offre",
CAST((data -> 'structures_offres' ->> 'missionlocale_id') AS TEXT) AS "id_structure"
FROM {{ source('imilo', 'structures_offres') }}
Loading