Singer target that uploads loads data to S3 in JSONL format following the Singer spec.
target-s3-jsonl
is a Singer Target which intend to work with regular Singer Tap. It take the output of the tap and export it as a JSON Lines files into an AWS S3 bucket.
This package is built using the target-core
library.
First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.
Note: To avoid version conflicts run tap
and targets
in separate virtual environments.
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install target-s3-jsonl
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install --upgrade git+https://github.com/ome9ax/target-s3-jsonl.git@main
python -m venv ~/.virtualenvs/target-s3-jsonl
source ~/.virtualenvs/target-s3-jsonl/bin/activate
pip install target-s3-jsonl
deactivate
Like any other target that's following the singer specificiation:
some-singer-tap | target-s3-jsonl --config [config.json]
It's reading incoming messages from STDIN and using the properites in config.json
to upload data into AWS S3.
Running the the target connector requires a config.json
file. An example with the minimal settings:
{
"s3_bucket": "my_bucket"
}
Profile based authentication used by default using the default
profile. To use another profile set aws_profile
parameter in config.json
or set the AWS_PROFILE
environment variable.
For non-profile based authentication set aws_access_key_id
, aws_secret_access_key
and optionally the aws_session_token
parameter in the config.json
. Alternatively you can define them out of config.json
by setting AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_SESSION_TOKEN
environment variables.
Property | Type | Mandatory? | Description |
---|---|---|---|
path_template | String | (Default: None) Custom naming convention of the s3 key. Replaces tokens stream , and date_time with the appropriate values.Supports datetime and other python advanced string formatting e.g. {stream}_{date_time:%FT%T.%f}.jsonl or {stream:_>8}/{date_time:%Y}/{date_time:%m}/{date_time:%d}/{date_time:%Y%m%d_%H%M%S_%f}.json .Supports "folders" in s3 keys e.g. my_folder/my_sub_folder/{stream}/export_date={date}/{date_time}.json . |
|
memory_buffer | Integer | Memory buffer's size used for non partitioned files before storing the data into the temporary file. 64Mb used by default if unspecified. | |
file_size | Integer | File partitinoning by size_limit . File parts will be created. The path_template must contain a part section for the part number. Example "path_template": "{stream}_{date_time:%Y%m%d_%H%M%S}_part_{part:0>3}.json" . |
|
compression | String | The type of compression to apply before uploading. Supported options are none (default), gzip , and lzma . For gzipped files, the file extension will automatically be changed to .json.gz for all files. For lzma compression, the file extension will automatically be changed to .json.xz for all files. |
|
timezone_offset | Integer | Offset value in hour. Use offset 0 hours is you want the path_template to use utc time zone. The null values is used by default. |
|
work_dir | String | (Default: platform-dependent) Directory for temporary JSONL files with RECORD messages. |
Property | Type | Mandatory? | Description |
---|---|---|---|
s3_bucket | String | Yes | S3 Bucket name |
aws_profile | String | AWS profile name for profile based authentication. If not provided, AWS_PROFILE environment variable will be used. |
|
aws_endpoint_url | String | AWS endpoint URL. | |
aws_access_key_id | String | S3 Access Key Id. If not provided, AWS_ACCESS_KEY_ID environment variable will be used. |
|
aws_secret_access_key | String | S3 Secret Access Key. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used. |
|
aws_session_token | String | AWS Session token. If not provided, AWS_SESSION_TOKEN environment variable will be used. |
|
encryption_type | String | (Default: 'none') The type of encryption to use. Current supported options are: 'none' and 'KMS'. | |
encryption_key | String | A reference to the encryption key to use for data encryption. For KMS encryption, this should be the name of the KMS encryption key ID (e.g. '1234abcd-1234-1234-1234-1234abcd1234'). This field is ignored if 'encryption_type' is none or blank. | |
role_arn | String | The ARN of the role to assume |
pip install tox
tox -e py
tox -e lint,static
- Update the version number at the beginning of
target-s3-jsonl/target_s3_json/__init__.py
- Merge the changes PR into
main
- Release the new version in github
Apache License Version 2.0