Library for creating CSV files in GCS with fake pii data.
The drive and use case to create this library, was when you need a lot of data to validate if your org complies with regulations like CCPA, HIPAA, GDPR.
git clone https://.../gcs_fake_pii_file_creator.git
cd gcs_fake_pii_file_creator
The Service Account authenticated must have administrator privileges for Cloud Storage and BigQuery.
<YOUR-CREDENTIALS_FILES_FOLDER>/gcs_fake_pii_file_creator-credentials.json
Please notice this folder and file will be required in next steps.
Using virtualenv is optional, but strongly recommended unless you use Docker or a PEX file.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --editable .
Replace below values according to your environment:
export GOOGLE_APPLICATION_CREDENTIALS=credentials_file_path
See instructions below.
- Virtualenv
Only the project-id argument is required.
python main.py --project-id your_project --num-rows 5000 --num-cols 10 --num-files 10 --obfuscate-col-names true
docker build -t gcs_fake_pii_file_creator .
docker run --rm --tty -v CREDENTIALS_FILES_FOLDER:/data \
gcs_fake_pii_file_creator \
--project-id your_project