Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check #18

Open
gavinpaes opened this issue May 22, 2020 · 0 comments

Comments

@gavinpaes
Copy link

operators/s3_to_redshift_operator.py (Lines 189-191)

pg_query = \
            """
            SELECT column_name, udt_name
            FROM information_schema.columns
            WHERE table_schema = '{0}' AND table_name = '{1}';
            """.format(self.redshift_schema, self.table)
pg_schema = dict(pg_hook.get_records(pg_query))
incoming_keys = [column['name'] for column in schema]
diff = list(set(incoming_keys) - set(pg_schema.keys()))

In above snippet:

If "schema" column name contains any uppercase character, the column differential (diff) will erroneously be a non-empty set. This will in turn cause logic to attempt to insert a column that is already present in created table.

Example

Assume schema = {"name": "ColumnName", "type": _ }

pg_query will report column_name == "columnname" (automatically lowercased by redshift) but incoming keys will leverage column['name'] == "ColumnName" so:

In [1]: diff =  list(set(["ColumnName"]) - set(["columnname"]))
In [2]: diff
Out[2]: ['ColumnName']

This will cause subsequent logic to try to insert a new column called 'ColumnName' which will fail since 'columnname' already exists in created table.

@gavinpaes gavinpaes changed the title S3ToRedshiftOperator.reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check May 22, 2020
gavinpaes added a commit to gavinpaes/redshift_plugin that referenced this issue May 22, 2020
This change fixes Issue airflow-plugins#18 by ensuring column names are lowercased before
column differential check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant