feat(tap): Utilize Joblib
to run parallel streams during sync_all
#2295
+102
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview:
This PR attempts to utilize
Joblib
to allowsync_all
to run streams in parallel. A new Tap class methodsync_one
was introduced to give a parallel loop a target for the streams. There is a new property calledmax_parallelism
that takes in a integer value which is passed toparallel_config
argumentn_jobs
. The default value ofmax_parallelism
isNone
. A tap will only attempt a parallel run if value is present inmax_parallelism
. The capability ofTAP_MAX_PARALLELISM_CONFIG
was added to the Tap class so a tap can be passed amax_parallelism
value via the meltano.yml.Examples:
Comments:
I need assistance with ideas on how to create pytests to cover these changes. Also if you run pytest when parallelism is enabled a lot of tests will fail, especially mapper test. The seem to only get the state message then nothing else.
Resources:
loky
backend joblib/joblib#1017📚 Documentation preview 📚: https://meltano-sdk--2295.org.readthedocs.build/en/2295/