-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Stage connector support for high level lienage #550
Comments
Looking through the logic of the existing connector, I'm a bit wary of creating an entirely new connector as I suspect that much of the code will overlap (the queries to retrieve the jobs, track the last sync timestamp, configuration options for how much to retrieve, which jobs to include, constructing unique names for the various objects, choosing the attributes of each object to query and retrieve, how these are mapped across to the Egeria OMAS payloads, etc, etc). I suspect the simplest way to ensure that this overlap can be maintained just once will be to add some configuration options to the existing connector to decide in which "mode" to run it (granular or high-level). This should still allow multiple instances of the connector to be run with different configurations, just as if they were two separate connectors... But it would mean that we don't have the complexity of trying to maintain two separate code areas that have a significant amount of overlap (inevitably causing maintenance headaches and regressions). We could also create a third module which contains the overlapping pieces, but this is likely to be the most amount of work in the near-term as it will mean creating a new connector, creating the new module containing common code, and then modifying the existing connector to use this new module as well. So I'd suggest we simply add a "mode" configuration option to the existing connector as a compromise? Keeps maintenance relatively simple, and means we don't have significant up-front work as well. (We can probably limit the complexity of conditional logic by wrapping the "mode"s up into different top-level methods in the connector class, and then just call that top-level method depending on the mode in which it has been configured.) |
Agree, the proper way will take way more effort and time. Then +1 for adding configuration parameter 'mode', and as always by default if not set we have the current processing mode. |
Forgot to ask, but what is the plan for these aspects in this new "job-level" mode?
|
I am in favour of the most minimal option that is not including Sequences ( we did not do anything on this level right now in DE, right @popa-raluca ? ) nor Ports. I think we should leaving them out because they will be not used. My reasoning behind this: In the new "job-level" mode we do not want to get the Implementation level details for the process (what are the stages of job and how they interconnect on schema level) making portImplementations and related schemas obsolete. Similary for PortAliases, I do not thik we need them without having high level process-to-process mapping like job-to -sqeuence or job to job (never seen job-job in data stage btw). I suggest to start outputing the minimal set like in the request sample above. (and we can always add sequence level details if we understand that there is use-case for this later on) |
Right now DE creates the sequence level processes. They get propagated through AL and stored in OLS, but they are not used in the querying part. I don't think sequences are needed for the new "job-level" mode right now. |
Signed-off-by: Christopher Grote <chris@thegrotes.net>
Per the auto-link above, I've added the initial logic to hopefully allow this mode to be configured. For
It can be configured as part of the overall connector configuration using the
I've tried to test on my end but am getting various errors back that seem to be related to events processing in the OMAS (whether using the original behaviour or this new high-level lineage), so I'm presumably not using the latest configuration or something (not sure). Would be great if you can test further and see if it needs further revision? |
On it.. we are going to start testing using some of the samples we have in our test environments using latest Egeria core. The output above looks expected and I am looking forward to see how it will go with real data. Keep you posted. |
Latest changes in DE OMAS introduce support for high level (asset) lineage mappings.
Sample request:
We need to investigate the possibility to add this capability to Data Stage connector as well.
The text was updated successfully, but these errors were encountered: