Library for creating BQ tables with fake pii data.
The drive and use case to create this library, was when you need a lot of data to validate if your org complies with regulations like CCPA, HIPAA, GDPR.
Usage from PyPi:
pip install bq-fake-pii-table-creator
bq-fake-pii-table-creator --help
git clone https://.../bq_fake_pii_table_creator.git
cd bq_fake_pii_table_creator
The Service Account authenticated must have administrator privileges for Cloud Storage and BigQuery.
<YOUR-CREDENTIALS_FILES_FOLDER>/bq_fake_pii_table_creator-credentials.json
Please notice this folder and file will be required in next steps.
Using virtualenv is optional, but strongly recommended unless you use Docker or a PEX file.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --editable .
Replace below values according to your environment:
export GOOGLE_APPLICATION_CREDENTIALS=credentials_file_path
See instructions below.
- Virtualenv
Only the project-id argument is required.
python main.py --project-id your_project --bq-dataset-name your_dataset --bq-table-name your_table --num-rows 5000 --num-cols 10 --obfuscate-col-names true
docker build -t bq_fake_pii_table_creator .
docker run --rm --tty -v CREDENTIALS_FILES_FOLDER:/data \
bq_fake_pii_table_creator \
--project-id your_project