Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

bqbooster/duckdb_bigquery_scanner

Repository files navigation

Important

I'm archiving this repository as I'm going to join effort on a more advanced version of a DuckDB BigQuery FDW extension: https://github.com/hafenkran/duckdb-bigquery


Duckdb_bigquery

This extension is a work in progress and is very early stage.

This extension is meant to be a foreign data wrapper for BigQuery.

Features

  • Read from BigQuery tables (Storage API) -- currently only supports reading from tables, not views
  • Google Application Default Credentials (ADC) support
  • Service account JSON credentials support
  • Projection (column) pushdown
  • LIMIT / OFFSET pushdown
  • Filter (WHERE) pushdown
  • Write to BigQuery tables
  • Support for BigQuery DDL
  • Support for BigQuery DML

Quickstart

For the time being, you'll need to build the extension yourself. To do so, follow the instructions in the Building section.

To run the extension code, simply start the shell with ./build/release/duckdb -unsigned.

Once you have the extension built, you can load it in DuckDB and attach a BigQuery project like so:

LOAD 'build/release/extension/duckdb_bigquery/duckdb_bigquery.duckdb_extension';

ATTACH 'my_gcp_bq_storage_project' AS bq (TYPE duckdb_bigquery);

You can then query your BigQuery tables like so:

SELECT my_column FROM bq.my_dataset.my_table;
┌───────────────┐
│    my_column  │
│    varchar    │
├───────────────┤
│ My bq data!   │
└───────────────┘

Authentication

Google Application Default Credentials (ADC)

The extension uses Google Application Default Credentials (ADC) to authenticate with BigQuery. This means that you need to have the GOOGLE_APPLICATION_CREDENTIALS environment variable set to the path of your service account key file. You can set this environment variable like so:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"

For more details about setting up ADC, see the Google Cloud documentation.

Service account JSON credentials

It requires to create a secret with the service account credentials like the following:

CREATE SECRET duckdb_bigquery_secret (
    TYPE bigquery,
    service_account_json '{ "type": "service_account", "project_id": "my-gcp-project", "private_key_id": "xxxx", "private_key": "-----BEGIN PRIVATE KEY-----\nxxx\n-----END PRIVATE KEY-----\n", "client_email": "xxx@some-gcp-project.iam.gserviceaccount.com", "client_id": "xxx", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxx" }'
);

ATTACH 'my_gcp_bq_storage_project' AS bq (TYPE duckdb_bigquery, SECRET duckdb_bigquery_secret);

SELECT *
FROM bq.my_dataset.my_table
LIMIT 10;

Configuration

Changing execution project

By default, the extension will use the storage project for execution. If you want to use a different project for execution, you can specify it in the ATTACH statement like so:

ATTACH 'my_gcp_bq_storage_project' AS bq (TYPE duckdb_bigquery, EXECUTION_PROJECT 'my_gcp_bq_execution_project');

Disabling filter pushdown

By default, the extension will push down filters to BigQuery. If you want to disable this, you can specify it by setting an option:

  SET bigquery_filter_pushdown=false;

Building

Managing dependencies

DuckDB extensions uses VCPKG for dependency management. Enabling VCPKG is very simple: follow the installation instructions or just run the following:

git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake

Note: VCPKG is only required for extensions that want to rely on it for dependency management. If you want to develop an extension without dependencies, or want to do your own dependency management, just skip this step. Note that the example extension uses VCPKG to build with a dependency for instructive purposes, so when skipping this step the build may not work without removing the dependency.

Build steps

Now to build the extension, run:

VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake GEN=ninja make debug

The main binaries that will be built are:

./build/release/duckdb
./build/release/test/unittest
./build/release/extension/duckdb_bigquery/duckdb_bigquery.duckdb_extension
  • duckdb is the binary for the duckdb shell with the extension code automatically loaded.
  • unittest is the test runner of duckdb. Again, the extension is already linked into the binary.
  • duckdb_bigquery.duckdb_extension is the loadable binary as it would be distributed.

Running the tests

Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in ./test/sql. These SQL tests can be run using:

make test

Installing the deployed binaries

To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the allow_unsigned_extensions option set to true. How to set this will depend on the client you're using. Some examples:

CLI:

duckdb -unsigned

Python:

con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})

NodeJS:

db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});

Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension you want to install. To do this run the following SQL query in DuckDB:

SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com/<your_extension_name>/latest';

Note that the /latest path will allow you to install the latest extension version available for your current version of DuckDB. To specify a specific version, you can pass the version instead.

After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:

INSTALL duckdb_bigquery
LOAD duckdb_bigquery

About

DuckDB BigQuery FDW extension

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published