BigQuery-DatasetManager is a simple file-based CLI management tool for BigQuery Datasets.
- Python
- CPython 2,7, 3,4, 3.5, 3.6
$ pip install BigQuery-DatasetManager
The resource representation of the dataset and the table is described in YAML format.
name: dataset1
friendly_name: null
description: null
default_table_expiration_ms: null
location: US
access_entries:
- role: OWNER
entity_type: specialGroup
entity_id: projectOwners
- role: WRITER
entity_type: specialGroup
entity_id: projectWriters
- role: READER
entity_type: specialGroup
entity_id: projectReaders
- role: OWNER
entity_type: userByEmail
entity_id: aaa@bbb.gserviceaccount.com
- role: null
entity_type: view
entity_id:
datasetId: view1
projectId: project1
tableId: table1
labels:
foo: bar
Key name | Value | Description | ||
---|---|---|---|---|
dataset_id | str | ID of the dataset. | ||
friendly_name | str | Title of the dataset. | ||
description | str | Description of the dataset. | ||
default_table_expiration_ms | int | Default expiration time for tables in the dataset. | ||
location | str | Location in which the dataset is hosted. | ||
access_entries | seq | Represents grant of an access role to an entity. | ||
access_entries | role | str | Role granted to the entity. The following string values are supported:
It may also be |
|
entity_type | str | Type of entity being granted the role. One of
|
||
entity_id | str/map | If the entity_type is not 'view', the entity_id is the
str ID of the entity being granted the role. If the
entity_type is 'view', the entity_id is a dict
representing the view from a different dataset to grant access to. |
||
datasetId | str | ID of the dataset containing this table.
(Specifies when entity_type is view .) |
||
projectId | str | ID of the project containing this table.
(Specifies when entity_type is view .) |
||
tableId | str | ID of the table.
(Specifies when entity_type is view .) |
||
labels | map | Labels for the dataset. |
NOTE: See the official documentation of BigQuery Datasets for details of key names.
table_id: table1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: null
view_query: null
schema:
- name: column1
field_type: STRING
mode: REQUIRED
description: null
fields: null
- name: column2
field_type: RECORD
mode: NULLABLE
description: null
fields:
- name: column2_1
field_type: STRING
mode: NULLABLE
description: null
fields: null
- name: column2_2
field_type: INTEGER
mode: NULLABLE
description: null
fields: null
- name: column2_3
field_type: RECORD
mode: REPEATED
description: null
fields:
- name: column2_3_1
field_type: BOOLEAN
mode: NULLABLE
description: null
fields: null
labels:
foo: bar
table_id: view1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: false
view_query: |
select
*
from
`project1.dataset1.table1`
schema: null
labels: null
Key name | Value | Description | |
---|---|---|---|
table_id | str | ID of the table. | |
friendly_name | str | Title of the table. | |
description | str | Description of the table. | |
expires | str | Datetime at which the table will be deleted.
(ISO8601 format %Y-%m-%dT%H:%M:%S.%f%z ) |
|
partitioning_type | str | Time partitioning of the table if it is partitioned.
The only partitioning type that is currently supported is DAY . |
|
view_use_legacy_sql | bool | Specifies whether to use BigQuery's legacy SQL for this view. | |
view_query | str | SQL query defining the table as a view. | |
schema | seq | The schema of the table destination for the row. | |
schema | name | str | The name of the field. |
field_type | str | The type of the field. One of
|
|
mode | str | The mode of the field. One of
|
|
description | str | Description for the field. | |
fields | seq | Describes the nested schema fields if the type property is set to RECORD . |
|
labels | map | Labels for the table. |
NOTE: See the official documentation of BigQuery Tables for details of key names.
. ├── dataset1 # Directory storing the table configuration file of dataset1. │ ├── table1.yml # Configuration file of table1 in dataset1. │ └── table2.yml # Configuration file of table2 in dataset1. ├── dataset1.yml # Configuration file of dataset1. ├── dataset2 # Directory storing the table configuration file of dataset2. │ └── .gitkeep # When keeping a directory, dataset2 is empty. ├── dataset2.yml # Configuration file of dataset2. └── dataset3.yml # Configuration file of dataset3. This dataset does not manage the table.
NOTE: If you do not want to manage the table, delete the directory with the same name as the dataset name.
Usage: bqdm [OPTIONS] COMMAND [ARGS]... Options: -c, --credential-file PATH Location of credential file for service accounts. -p, --project TEXT Project ID for the project which you’d like to manage with. --color / --no-color Enables output with coloring. --parallelism INTEGER Limit the number of concurrent operation. --debug Debug output management. -h, --help Show this message and exit. Commands: apply Builds or changes datasets. destroy Specify subcommand `plan` or `apply` export Export existing datasets into file in YAML format. plan Generate and show an execution plan.
Usage: bqdm export [OPTIONS] [OUTPUT_DIR] Export existing datasets into file in YAML format. Options: -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
Usage: bqdm plan [OPTIONS] [CONF_DIR] Generate and show an execution plan. Options: --detailed_exitcode Return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains: 0 = Succeeded with empty diff 1 = Error 2 = Succeeded with non- empty diff -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
Usage: bqdm apply [OPTIONS] [CONF_DIR] Builds or changes datasets. Options: -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -m, --mode [select_insert|select_insert_backup|replace|replace_backup|drop_create|drop_create_backup] Specify the migration mode when changing the schema. Choice from `select_insert`, `select_insert_backup`, `replace`, r`eplace_backup`, `drop_create`, `drop_create_backup`. [required] -b, --backup-dataset TEXT Specify the ID of the dataset to store the backup at migration -h, --help Show this message and exit.
NOTE: See migration mode
Usage: bqdm destroy [OPTIONS] COMMAND [ARGS]... Specify subcommand `plan` or `apply` Options: -h, --help Show this message and exit. Commands: apply Destroy managed datasets. plan Generate and show an execution plan for...
Usage: bqdm destroy plan [OPTIONS] [CONF_DIR] Generate and show an execution plan for datasets destruction. Options: --detailed-exitcode Return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains: 0 = Succeeded with empty diff 1 = Error 2 = Succeeded with non- empty diff -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
Usage: bqdm destroy apply [OPTIONS] [CONF_DIR] Destroy managed datasets. Options: -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
- TODO
LIMITATIONS: TODO
- TODO
LIMITATIONS: TODO
- TODO
LIMITATIONS: TODO
- TODO
LIMITATIONS: TODO
- TODO
- TODO
See authentication section in the official documentation of google-cloud-python
.
If you're running in Compute Engine or App Engine, authentication should "just work".
If you're developing locally, the easiest way to authenticate is using the Google Cloud SDK:
$ gcloud auth application-default loginNote that this command generates credentials for client libraries. To authenticate the CLI itself, use:
$ gcloud auth loginPreviously, gcloud auth login was used for both use cases. If your gcloud installation does not support the new command, please update it:
$ gcloud components updateIf you're running your application elsewhere, you should download a service account JSON keyfile and point to it using an environment variable:
$ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
Depends on the following environment variables:
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
$ export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
$ pip install pipenv
$ pipenv install --dev
$ pipenv run pytest
$ pip install pipenv
$ pipenv install --dev
$ pyenv local 3.6.5 3.5.5 3.4.8 2.7.14
$ pipenv run tox
- Support encryption configuration for table
- Support external data configuration for table
- Schema replication
- Integration tests