Skip to content

Latest commit

 

History

History
129 lines (84 loc) · 3.54 KB

README.md

File metadata and controls

129 lines (84 loc) · 3.54 KB

StreamXfer

StreamXfer is a powerful tool for streaming data from SQL Server to local or object storage(S3) for seamless transfer using UNIX pipe, supporting various general data formats(CSV, TSV, JSON).

Supported OS: Linux, macOS

I've migrated 10TB data from SQL Server into Amazon Redshift using this tool.

Demo

asciicast

Installation

Prerequisites

Before installing StreamXfer, you need to install the following dependencies:

Install from PyPI

$ python3 -m pip install streamxfer

Install from Source

$ git clone https://github.com/zhiweio/StreamXfer.git && cd StreamXfer/
$ python3 setup.py install

Install from Docker

$ docker pull zhiweio/streamxfer:latest

Usage

StreamXfer can be used as a command-line tool or as a library in Python.

Command-line Usage

$ stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH

Here is an example command:

$ stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/

You can also use the following options:

  • -F, --format: The data format (CSV, TSV, or JSON).
  • -C, --compress-type: The compression type (LZOP or GZIP).

For more detailed options, run:

$ stx --help
Usage: stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH

  StreamXfer is a powerful tool for streaming data from SQL Server to local or
  object storage(S3) for seamless transfer using UNIX pipe, supporting various
  general data formats(CSV, TSV, JSON).

  Examples:
      stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/
      stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir/

Options:
  -F, --format [CSV|TSV|JSON]     [default: JSON]
  -C, --compress-type [LZOP|GZIP]
  -h, --help                      Show this message and exit.

Docker Usage

To use StreamXfer in Docker container:

$ docker run --rm -v $(pwd)/data:/tmp/data zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /tmp/data/dbo_test"
$ docker run --rm zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir"

Library Usage

To use StreamXfer as a library in Python, you can import the StreamXfer class, and use them to build and pump the data stream.

Here is an example code snippet:

from streamxfer import StreamXfer
from streamxfer.format import Format
from streamxfer.compress import CompressType

sx = StreamXfer(
    "mssql+pymssql:://user:pass@host:port/db",
    format=Format.CSV,
    compress_type=CompressType.LZOP,
    chunk_size=1000000,
)
sx.build("[dbo].[test]", path="s3://bucket/path/to/dir/")
sx.pump()

Related

Here are some related articles

Authors

License

GPL-3.0