Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to add schema information to a BigQueryRelation #1232

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,16 @@ public Relation window(WindowAggregationDefinition definition) {
return new BigQueryRelation(datasetName, columns, featureFlagsProvider, this, supplier);
}

/**
* Adds schema information to the relation. This can be used for validation purposes.
*
* @param schema The schema.
* @return A new relation with the schema added.
*/
public Relation addSchema(@Nullable Schema schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate there is no existing schema unless you expect schema overrides

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we ideally be doing in case a schema already exists? Since we're returning a new Relation I think overriding the schema should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how do you plan to use this method. Generally it's better to introduce methods when they are needed (even with mock implementation) than visa versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mentioned that already in the PR description above. I plan to use it in Wrangler in transform() just like any other Relation operation. I don't see any other way to add a schema to the Relation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whangler should have nothing to do with schema management. How would Wrangler know the schema of a SQL expression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's unneded burden on the plugin. Given known schema of the original relation and known schema of all expressions, resulting schema should be automatically contructed by the platform

return new BigQueryRelation(datasetName, columns, featureFlagsProvider, this, sqlStatementSupplier, schema);
}

private static String buildBaseSelect(Map<String, Expression> columns,
String sourceTable,
String datasetName) {
Expand Down