Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sqlconnect library #1

Merged
merged 10 commits into from
Mar 12, 2024
Merged

feat: sqlconnect library #1

merged 10 commits into from
Mar 12, 2024

Conversation

atzoum
Copy link
Collaborator

@atzoum atzoum commented Feb 16, 2024

Description

Introducing sqlconnect library which provides a uniform client interface for accessing multiple warehouses:

  • bigquery
  • databricks
  • mysql
  • postgres
  • redshift
  • snowflake
  • trino

Key features:

  • Adding database/sql support for BigQuery through a custom driver.
  • Exposing consistent behaviour in schema & table admin methods. A comparison of queries being used by the previous and the new client is available here).
  • Improved and extensible API for performing asynchronous queries, supporting all existing use cases and even more!
  • Improved and consistent column & json mappings (backwards compatible mappings are still available via the useLegacyMappings option)
  • Full test coverage for all warehouses and all exposed behaviours (interface methods, async api, new & old mappings) link.

Linear Ticket

resolves REV-779

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@atzoum atzoum force-pushed the feat.sqlconnect branch 2 times, most recently from aa5cfbe to 79c255b Compare February 16, 2024 08:49
Copy link

linear bot commented Feb 16, 2024

REV-779 sqlconnect-go library

https://www.notion.so/Design-document-A-common-warehouse-client-library-in-Go-3876738b386f4b7683de3583275fe1b0

Acceptance Criteria

impact the following

  • Workspace Config Polling
  • Job scheduling
  • Upsert Mode
    • Diffing
    • Event delivery
    • Failure handling
    • Reporting
  • Mirror mode
    • Diffing
    • Event delivery
    • Failure handling
    • Reporting
  • Warehouse ( add list below)
  • Destinations ( add list below)
  • Audiences
  • Models
  • APIs
    • Info
    • Validation
    • Jobs
    • Sources
  • ETL
  • Metrics
  • Multi-tenancy

Docs

QA :

Estimate (Weeks):

Pair: None

@atzoum atzoum force-pushed the feat.sqlconnect branch 2 times, most recently from f1e070b to 55718a9 Compare February 16, 2024 09:07
Copy link

codecov bot commented Feb 16, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@atzoum atzoum force-pushed the feat.sqlconnect branch 23 times, most recently from c4d7d9a to b7d4e0b Compare February 22, 2024 05:57
var val any
if v != nil {
val = v.Value
// copying bytes to avoid them being overwritten by the next row, since some drivers reuse the same buffer (e.g. postgres)
Copy link
Collaborator Author

@atzoum atzoum Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this was a sneaky one... previously we were marshalling the json in the same goroutine so the issue was not possible to happen, whereas now, we are passing the map through the channel to another goroutine which could cause the map's contents to be overwritten in the meantime by the first goroutine, in case of []byte values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if integ tests helped in catching this bug. I had gone through this file, and indeed hard to catch 😨

Copy link
Collaborator Author

@atzoum atzoum Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course tests in rudder-sources helped, as a matter of fact all unexpected issues were revealed by those tests!

if value == nil {
return nil
}
databaseTypeName = strings.Replace(databaseTypeName, "UNSIGNED ", "", 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the new version of mysql driver, v1.7.1, identifies some numeric types as unsigned which would cause our mappings to behave differently if we kept them as they were, that's why we are removing the UNSIGNED prefix here.

Comment on lines +22 to +23
"CHAR": "string",
"VARCHAR": "string",
Copy link
Collaborator Author

@atzoum atzoum Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we were previously using select * from table for listing a table's columns which always returned STRING for all string character types.
Now that we switched to describe table, we need to cater for CHAR and VARCHAR types in our mappings as well, which are supported by databricks but are undocumented...

Comment on lines +23 to +26
// Using a map[string]bigquery.Value instead of a []bigquery.Value for properly mapping structs.
// If we were to use a slice, structs would be mapped as an array of values, e.g. [value1, value2, ...]
// instead of {field1: value1, field2: value2, ...}
var values map[string]bigquery.Value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this was indeed some unexpected behavior from bigquery's go client

Comment on lines +11 to +12
"DECIMAL": "int",
"NUMERIC": "int",
Copy link

@RanjeetMishra RanjeetMishra Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal/Dec to int is a bug in legacy mapping? :(. Keeping a note to see if it can be fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, these are all legacy mappings, i.e. exact same behaviour as what rudder-sources is doing now

Comment on lines +41 to +42
"VOID": "string",
"TIMESTAMP": "datetime",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer func() {
require.NoError(t, stmt.Close(), "it should be able to close the prepared statement")
}()
rows, err := stmt.Query(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In statement.go, Query func returns error driver.ErrSkip. For stmt.Query(1), is it that it ends up using QueryContext and thats how its working?

@atzoum atzoum merged commit a6fadb5 into main Mar 12, 2024
18 checks passed
@atzoum atzoum deleted the feat.sqlconnect branch March 12, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants