Skip to content

RD-Connect/TileDB

 
 

Repository files navigation

TileDB logo

Travis Appveyor Documentation Status

Array data management made fast and easy.

TileDB allows you to manage the massive dense and sparse multi-dimensional array data that frequently arise in many important scientific applications.

What is TileDB?

TileDB is an efficient multi-dimensional array management system which introduces a novel format that can effectively store dense and sparse array data with support for fast updates and reads. It features excellent compression, an efficient parallel I/O system for high scalability, and high-level APIs including Python, R, Golang and more.

TileDB stores your array data on persistent storage locally or in the cloud, with built-in support for S3 and HDFS storage backends.

TileDB works on Linux, macOS, and Windows, and is open-sourced under the permissive MIT License.

Learn more and see examples and tutorials in the official documentation.

Features

  • Novel Format. TileDB introduces a novel multi-dimensional array format that effectively handles both dense and sparse data with fast updates. Contrary to other popular systems like HDF5 that are optimized mostly for dense arrays, TileDB is optimized for both dense and sparse arrays, exposing a unified API. TileDB enables efficient updates through its concept of immutable, append-only "fragments."
  • Multiple Backends. Transparently store and access your arrays on AWS S3 (or other S3 compatiable store) or HDFS with a single API.
  • Compression. Achieve high compression ratios with TileDB's tile-based compression approach. TileDB can compress array data with a growing number of compressors, such as GZIP, BZIP2, LZ4, ZStandard, double-delta and run-length encoding.
  • Parallelism. Use every core with TileDB's parallelized I/O and compression systems (using Intel TBB), and build powerful parallel analytics on top of the TileDB array storage manager (e.g., using OpenMP or MPI) leveraging TileDB's thread-/process-safety.
  • Portability. TileDB works on Linux, macOS and Windows, offering easy installation packages, binaries and Docker containerization.
  • Language Bindings. Enable your NumPy data science applications to work with immense amounts of data using TileDB's Python API. Other APIs include Golang, R and Java.
  • Key-value Store. Store any persistent metadata with TileDB's key-value storage functionality. A TileDB key-value store inherits all the benefits of TileDB arrays such as compression, parallelism, and multiple backend support.
  • Virtual Filesystem. Add general file management and I/O to your applications for any supported storage backend using TileDB's unified "virtual filesystem" (VFS) API.

Quickstart

First, grab a TileDB release for your system.

  • Homebrew (macOS): brew install tiledb-inc/stable/tiledb
  • Docker (Linux/macOS): docker pull tiledb/tiledb && docker run -it tiledb/tiledb
  • Conda (Linux/macOS/Windows): conda install -c conda-forge tiledb

For Windows, you can also download the pre-built binaries from a release .zip file: https://github.com/TileDB-Inc/TileDB/releases. For more in-depth installation information, see the full Installation doc page.

Next, save the quickstart example program to a file quickstart_sparse.cc.

The example program illustrates the three main TileDB operations: array creation (create_array() in the example code), writing data into the array (write_array()), and reading from the array (read_array()).

It will first create the array with a simple sparse 2D schema where each cell can store a single character of data. Then, it will write data to 3 cells of the array. Finally, it will read back the cells using a spatial slice.

Compile the example program: g++ -std=c++11 quickstart_sparse.cc -o quickstart_sparse -ltiledb.

If you run into compilation issues, see the Usage page for more complete instructions on how to compile and link against TileDB. If you are on Windows, use the Windows Usage instructions to create a Visual Studio project instead.

Run the example, and you should see the following output:

$ ./quickstart_sparse
Created array my_array
Cell (0,0) has data 'a'
Cell (1,1) has data 'b'
Cell (2,3) has data 'c'

For a full walkthrough of the quickstart program, check out the Quickstart documentation. Also check out the corresponding dense quickstart and key-value store quickstart programs on the main documentation site docs.tiledb.io.

Further reading

Start with the beginner tutorial series on dense and sparse arrays to learn the basic TileDB concepts.

The full TileDB documentation can be found at docs.tiledb.io and includes many more tutorials and examples to get you started.

Get involved

TileDB is an open source project and welcomes all forms of contributions. Contributors to the project should read over the contribution docs for more information.

We'd love to hear from you. Drop us a line at hello@tiledb.io, visit our forum or contact form, or follow us on Twitter to stay informed of updates and news.

Packages

No packages published

Languages

  • C++ 91.0%
  • C 4.7%
  • CMake 3.4%
  • Other 0.9%