Skip to content

Embulk File Output Plugin: Handle and Upload really large files to AWS S3 using multipart upload

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.txt
Notifications You must be signed in to change notification settings

alexopoulos7/embulk-output-larges3

Repository files navigation

Large S3 File Output Plugin for Embulk

Embulk File Output Plugin: Handle and Upload really large files to AWS S3 using multipart upload. This plugin is an extension of classic s3 output plugin. [https://github.com/llibra/embulk-output-s3]

Developers

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: yes

Install

embulk gem install embulk-output-larges3

Configuration

  • path_prefix: prefix of target keys (string, required)
  • file_ext: suffix of target keys (string, required)
  • sequence_format: format for sequence part of target keys (string, default: '.%03d.%02d')
  • bucket: S3 bucket name (string, required)
  • endpoint: S3 endpoint login user name (string, optional)
  • access_key_id: AWS access key id. This parameter is required when your agent is not running on EC2 instance with an IAM Role. (string, defualt: null)
  • secret_access_key: AWS secret key. This parameter is required when your agent is not running on EC2 instance with an IAM Role. (string, defualt: null)
  • tmp_path: temporary file directory. If null, it is associated with the default FileSystem. (string, default: null)
  • tmp_path_prefix: prefix of temporary files (string, default: 'embulk-output-s3-')
  • canned_acl: canned access control list for created objects (enum, default: null)
  • proxy_host: proxy host to use when accessing AWS S3 via proxy. (string, default: null )
  • proxy_port: proxy port to use when accessing AWS S3 via proxy. (string, default: null )
  • part_size: Size in Bytes of each part for multipart upload to S3, defaults to 50 MB (int, default: 52428800 )

CannedAccessControlList

you can choose one of the below list.

  • AuthenticatedRead
  • AwsExecRead
  • BucketOwnerFullControl
  • BucketOwnerRead
  • LogDeliveryWrite
  • Private
  • PublicRead
  • PublicReadWrite

cf. http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/CannedAccessControlList.html

Example

out:
  type: larges3
  path_prefix: logs/out
  file_ext: .csv
  bucket: my-s3-bucket
  endpoint: s3-us-west-1.amazonaws.com
  access_key_id: ABCXYZ123ABCXYZ123
  secret_access_key: AbCxYz123aBcXyZ123
  formatter:
    type: csv

Build

$ ./gradlew gem

About

Embulk File Output Plugin: Handle and Upload really large files to AWS S3 using multipart upload

Topics

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published