Send Process and ProcessTemplate objects from DataStage #54

cmgrote · 2019-08-29T13:35:46Z

Scenarios:

Process's inputs & outputs are not fixed at design time (ie. virtual assets as an input or output in IGC) should send a ProcessTemplate
Process's inputs & outputs are fixed at design time (ie. no virtual assets as input or output in IGC) should send a Process -- this includes the case where a lineage mapping ("alias" / same-as) has been created for the inputs / outputs after deployment (but still before run-time).
Process's inputs & outputs are resolved at run-time (eg. name of file includes date) should send a Process, with relationship to a Process(Template) above; run-time stats (elapsed time, record counts, etc) would only go to Open Lineage services (not via Egeria)

The text was updated successfully, but these errors were encountered:

cmgrote · 2019-10-04T12:33:48Z

On further discussion at offsite in October it seemed more likely that DataStage will always send only Process objects and never a ProcessTemplate...

TBC based on conclusion in odpi/egeria#1576

popa-raluca · 2019-10-04T13:36:19Z

What about the virtual assets from IGC? Will DataStage only send processes that have the assets resolved?

cmgrote · 2019-10-04T13:50:15Z

This is why I'm trying to get to the bottom of what we agreed for a ProcessTemplate. Virtual assets are simply assets that are not formally resolved in IGC, but they are still fully-described in the sense that they have names, data types, etc.

For example, take a database_column virtual asset. It will have:

a name
a data type
a parent database_table
the database_table will have a parent database_schema
the database_schema will have a parent database
etc

So from the perspective of representing the input or output (via a PortImplementation), I'd suggest they're as fully-descriptive as any other "real" input or output to a Process. They simply won't ever have a SemanticAssignment associated with them, and there won't ever be an event for them that can be picked up by the EventMapper and sent as an entity instance to the rest of the cohort...

However, we could still create these as new SchemaTypes as part of the payload I send along to the Data Engine OMAS, to ensure that the Data Engine OMAS knows about them despite not being synced at the underlying OMRS level (?)

The reality is that they are still likely to be useful from a design lineage perspective, I think, so I wouldn't want to drop them out entirely (and my vague memory of our discussion in Huizen was that ProcessTemplate was something that never went into lineage on its own; it could only get into lineage by being used in a Process).

popa-raluca · 2019-10-04T14:05:24Z

The issue that we had with the virtual assets was related to Asset Lineage OMAS. When building the graph, it cannot retrieve the whole context for the virtual asset. It can only go up to the SchemaType, and it needs the whole context for all the assets involved, not only for the ones that have a SemanticAssignment. @DimitriosMaimaris please correct me if I'm wrong :)

cmgrote · 2019-10-04T14:14:37Z

If by "the whole context" you mean the column, table, schema, etc I should be able to provide that (but likely simply wasn't in the past) -- assuming our OMAS-level interface would allow all of that to be communicated through it (not sure?)

DimitriosMaimaris · 2019-10-04T14:20:08Z

The problem we had was what exactly Raluca said. We need everything by whole context meaning being able to take let's say the Vertical Lineage for the asset up to the Connection level either it has a Glossary Term attached to it or not.

popa-raluca · 2019-10-07T07:01:55Z

@cmgrote just to confirm, would DataStgage proxy be able to create the ''whole context" - if Data Engine OMAS provides the corresponding endpoint for creating it?

cmgrote · 2019-10-07T08:43:46Z

My intention would indeed be to create:

RelationalColumn
NestedSchemaAttribute
RelationalTable
AttributeForSchema
RelationalDBSchemaType
AssetSchemaType
DeployedDatabaseSchema
DataContentForDataSet
Database

I'd need to check whether I could actually produce something above that (ie. ConnectionToAsset, Connection, ConnectionConnectorType, ConnectorType, ConnectionEndpoint and Endpoint) -- worst case perhaps it makes sense to generate a "placeholder" for those where I always use the same generated placeholder values for virtual assets (?)

cmgrote · 2019-10-17T21:13:05Z

Replacing with #93 as this seems to have moved away from Process and ProcessTemplate to how we can handle virtual assets like any other asset...

cmgrote added the enhancement New feature or request label Aug 29, 2019

cmgrote self-assigned this Aug 29, 2019

cmgrote closed this as completed Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send Process and ProcessTemplate objects from DataStage #54

Send Process and ProcessTemplate objects from DataStage #54

cmgrote commented Aug 29, 2019

cmgrote commented Oct 4, 2019

popa-raluca commented Oct 4, 2019

cmgrote commented Oct 4, 2019

popa-raluca commented Oct 4, 2019

cmgrote commented Oct 4, 2019

DimitriosMaimaris commented Oct 4, 2019

popa-raluca commented Oct 7, 2019

cmgrote commented Oct 7, 2019

cmgrote commented Oct 17, 2019

Send Process and ProcessTemplate objects from DataStage #54

Send Process and ProcessTemplate objects from DataStage #54

Comments

cmgrote commented Aug 29, 2019

cmgrote commented Oct 4, 2019

popa-raluca commented Oct 4, 2019

cmgrote commented Oct 4, 2019

popa-raluca commented Oct 4, 2019

cmgrote commented Oct 4, 2019

DimitriosMaimaris commented Oct 4, 2019

popa-raluca commented Oct 7, 2019

cmgrote commented Oct 7, 2019

cmgrote commented Oct 17, 2019