Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw exception when schema name doesn't match table name during table creation #11591

Conversation

xiangfu0
Copy link
Contributor

Throw exception when schema doesn't match table name

@Jackie-Jiang Jackie-Jiang added incompatible Indicate PR that introduces backward incompatibility Configuration Config changes (addition/deletion/change in behavior) documentation labels Sep 14, 2023
@Jackie-Jiang
Copy link
Contributor

cc @mcvsubbu @siddharthteotia @yupeng9 @jadami10
We already deprecated the schema name from table config for years (since 2019), and suggest using raw table name as schema name for the following reasons:

  • Error prune: sometimes user accidentally copy the table config and point to the wrong schema
  • Easier lookup: we can easily look up the schema for a table without going 2 hops, more importantly, we always know the table associated with the schema. Currently we need to scan all the tables in order to find if a schema can be deleted
  • Easier management: in the future we can consider managing table config and schema together (we already started this by introducing the TableConfigs concept

Now we want to start the process of enforcing it. Do you have concern about this?
After preventing new table with different schema being created, we will provide an one time migration to rename the schema to match the table config.

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch 5 times, most recently from 7dabd7b to 7526664 Compare September 14, 2023 10:27
@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2023

Codecov Report

Attention: Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.11%. Comparing base (e0a1f62) to head (bd530c5).
Report is 1506 commits behind head on master.

Files with missing lines Patch % Lines
...ntroller/helix/core/PinotHelixResourceManager.java 60.00% 2 Missing ⚠️
...he/pinot/segment/local/utils/TableConfigUtils.java 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #11591      +/-   ##
============================================
- Coverage     63.13%   63.11%   -0.03%     
+ Complexity     1118     1117       -1     
============================================
  Files          2342     2342              
  Lines        125883   125882       -1     
  Branches      19357    19357              
============================================
- Hits          79477    79448      -29     
- Misses        40749    40768      +19     
- Partials       5657     5666       +9     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 63.03% <63.63%> (-0.02%) ⬇️
java-17 62.94% <63.63%> (+0.01%) ⬆️
java-20 62.96% <63.63%> (-0.03%) ⬇️
temurin 63.11% <63.63%> (-0.03%) ⬇️
unittests 63.10% <63.63%> (-0.03%) ⬇️
unittests1 67.25% <66.66%> (-0.04%) ⬇️
unittests2 14.44% <27.27%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch 2 times, most recently from fb27f0e to eba4baf Compare September 14, 2023 15:01
Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will cause a problem for us, since some of the users are sending a specific string to indicate that the schema does not exist.

We will dig a little bit more and get back to you.

@mcvsubbu
Copy link
Contributor

So, am I correct that this PR only prevents new tables from getting created without a proper schema name. Existing tables, if they are different schema name, will continue to work as they do now?

I suppose at some point, that will be removed as well, right? (I see code everywhere that corrects the situation for a schema not being present in the same name as the tableName).

@mcvsubbu
Copy link
Contributor

cc: @GSharayu , @sajjad-moradi

@Jackie-Jiang
Copy link
Contributor

@mcvsubbu

So, am I correct that this PR only prevents new tables from getting created without a proper schema name. Existing tables, if they are different schema name, will continue to work as they do now?

Yes, this PR only prevents new tables with different schema name to be created. For the existing tables, we will do a one time migration later to move/copy the schema if they do not match the table name

I suppose at some point, that will be removed as well, right? (I see code everywhere that corrects the situation for a schema not being present in the same name as the tableName).

You mean the schema name field? Once we make all the schema match the table name, we can remove that field

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch 3 times, most recently from 8a32b21 to 0764dba Compare September 15, 2023 09:39
@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from 0764dba to 8076acf Compare September 15, 2023 10:10
@GSharayu
Copy link
Contributor

Hi @xiangfu0 @Jackie-Jiang ,

As @mcvsubbu mentioned, We are working with partner team to resolve the table creation flow since this will error out for new tables. As I understand from the discussion, this will be followed by a script to apply the changes to the existing tables. We will also need to work with the team for the same. We can get back to you on this PR pretty soon with any updates

@xiangfu0
Copy link
Contributor Author

Hi @xiangfu0 @Jackie-Jiang ,

As @mcvsubbu mentioned, We are working with partner team to resolve the table creation flow since this will error out for new tables. As I understand from the discussion, this will be followed by a script to apply the changes to the existing tables. We will also need to work with the team for the same. We can get back to you on this PR pretty soon with any updates

Yes, plan to add a periodic task to fix the schema name for existing tables: #11574

This will prevent new table creation.

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from 8076acf to 003ab22 Compare September 15, 2023 18:40
@yupeng9
Copy link
Contributor

yupeng9 commented Sep 15, 2023

yes, this is a good move.

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from 4bdd9c2 to 67533ff Compare September 19, 2023 23:13
@mcvsubbu
Copy link
Contributor

I have a more general question on this PR. Should this PR not include places in pinot-spi where we can throw exception if a client tries to create a table config that does not have a schema with the same name? Right now, the TableConfigBuilder class accepts any schema as the schemaName. We should be throwing an exception there, so that these problems are caught earlier?

@xiangfu0
Copy link
Contributor Author

I have a more general question on this PR. Should this PR not include places in pinot-spi where we can throw exception if a client tries to create a table config that does not have a schema with the same name? Right now, the TableConfigBuilder class accepts any schema as the schemaName. We should be throwing an exception there, so that these problems are caught earlier?

Most of the users are using REST endpoint to POST a json table config to create table.
It's ok to catch earlier in TableConfigBuilder, just the validation cannot be ignored from the server side.

@mcvsubbu
Copy link
Contributor

I have a more general question on this PR. Should this PR not include places in pinot-spi where we can throw exception if a client tries to create a table config that does not have a schema with the same name? Right now, the TableConfigBuilder class accepts any schema as the schemaName. We should be throwing an exception there, so that these problems are caught earlier?

Most of the users are using REST endpoint to POST a json table config to create table. It's ok to catch earlier in TableConfigBuilder, just the validation cannot be ignored from the server side.

In that case, can we please include the change in spi as well? thanks.

We ask our partners to use spi, since that has a better guarantee of keeping backward compat (that is the purpose of spi, right?)

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from 67533ff to c5e7add Compare September 20, 2023 23:48
@xiangfu0
Copy link
Contributor Author

I have a more general question on this PR. Should this PR not include places in pinot-spi where we can throw exception if a client tries to create a table config that does not have a schema with the same name? Right now, the TableConfigBuilder class accepts any schema as the schemaName. We should be throwing an exception there, so that these problems are caught earlier?

Most of the users are using REST endpoint to POST a json table config to create table. It's ok to catch earlier in TableConfigBuilder, just the validation cannot be ignored from the server side.

In that case, can we please include the change in spi as well? thanks.

We ask our partners to use spi, since that has a better guarantee of keeping backward compat (that is the purpose of spi, right?)

Removed the setSchemaName method from TableConfigBuilder

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from c5e7add to e772576 Compare September 20, 2023 23:51
@xiangfu0
Copy link
Contributor Author

Since we are already at the month end. Let's merge this?
@Jackie-Jiang @mcvsubbu @GSharayu

@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch 9 times, most recently from d9243e3 to 2cc9e26 Compare October 2, 2023 21:50
@xiangfu0 xiangfu0 force-pushed the throw-exception-when-schema-not-matching-table-name branch from 2cc9e26 to bd530c5 Compare October 3, 2023 04:11
@Jackie-Jiang
Copy link
Contributor

@mcvsubbu @GSharayu Let us know if we can go ahead and merge this. Once it is merged, the new created table must have schema, and schema name must match raw table name.

@GSharayu
Copy link
Contributor

GSharayu commented Oct 4, 2023

Hi @xiangfu0 and @Jackie-Jiang , feel free to merge the PR. The changes for new tables have been completed by our partner teams. Thanks a lot!

@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Oct 4, 2023

Hi @xiangfu0 and @Jackie-Jiang , feel free to merge the PR. The changes for new tables have been completed by our partner teams. Thanks a lot!

Thanks!

@xiangfu0 xiangfu0 merged commit fb656c5 into apache:master Oct 4, 2023
19 checks passed
@xiangfu0 xiangfu0 deleted the throw-exception-when-schema-not-matching-table-name branch October 4, 2023 18:12
@estebanz01
Copy link

estebanz01 commented Mar 1, 2024

Hola 👋 ! sorry to comment on a merged PR, but I don't see in the release docs for 1.1.0-rc a reference to this change, which is a breaking change for our infrastructure. Is it going to be included in final version of 1.1.0 and where can I find documentation about this?

EDIT: We looked again at the release notes and it's referenced in the bugfixes section.

@vvivekiyer
Copy link
Contributor

@estebanz01 thanks for pointing this out. I've updated the release docs to put this into the backward incompatible section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation incompatible Indicate PR that introduces backward incompatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants