- Bug fix when reading from a sparse array with real domain. Also added some checks on NAN and INF.
- Bug fix in the case of dense reads in the presence of both dense and sparse fragments.
- Fixed double-delta decompression bug on reads for uncompressible chunks. #1074
- Added an advanced, tunable consolidation algorithm
- Added config params
vfs.s3.aws_access_key_id
andvfs.s3.aws_secret_access_key
for configure s3 access at runtime. #1036 - Added missing check if coordinates obey the global order in global order sparse writes. #1039
- Small tiles are now batched for larger VFS read operations, improving read performance in some cases.
- Added function
tiledb_vfs_dir_size
. - Added function
tiledb_vfs_ls
. - Added config params
vfs.max_batch_read_size
andvfs.max_batch_read_amplification
. - Added functions
tiledb_{array,kv}_encryption_type
. - Added functions
tiledb_stats_{dump,free}_str
. - Added function
tiledb_{array,kv}_schema_has_attribute
. - Added function
tiledb_domain_has_dimension
.
{Array,Map}::consolidate{_with_key}
now takes aConfig
as an optional argument.- Added function
VFS::dir_size
. - Added function
VFS::ls
. - Added
{Array,Map}::encryption_type()
. - Added
{ArraySchema,MapSchema}::has_attribute()
- Added
Domain::has_dimension()
tiledb_{array,kv}_consolidate{_with_key}
now takes atiledb_config_t*
as argument.- Deprecated
tiledb_compressor_t
APIs from v1.3.x have been removed, replaced by thetiledb_filter_list
API. #1128
- Deprecated
tiledb::Compressor
APIs from v1.3.x have been removed, replaced by theFilterList
API. #1128
- Fixed bug in incomplete queries, which should always return partial results. An incomplete status with 0 returned results must always mean that the buffers cannot even fit a single cell value. #1056
- Fixed performance bug during global order write finalization. #1065
- Fixed error in linking against static TileDB on Windows. #1058
- Fixed build error when building without TBB. #1051
- Set LZ4, Zlib and Zstd compressors to build in release mode. #1034
- Changed coordinates to always be split before filtering. #1054
- Added type-safe filter option methods to C++ API. #1062
The 1.4.0 release brings two new major features, attribute filter lists and at-rest array encryption, along with bugfixes and performance improvements.
Note: TileDB v1.4.0 changes the on-disk array format. Existing arrays should be re-written using TileDB v1.4.0 before use. Starting from v1.4.0 and on, backwards compatibility for reading old-versioned arrays will be fully supported.
- All array data can now be encrypted at rest using AES-256-GCM symmetric encryption. #968
- Negative and real-valued domain types are now fully supported. #885
- New filter API for transforming attribute data with an ordered list of filters. #912
- Current filters include: previous compressors, bit width reduction, bitshuffle, byteshuffle, and positive-delta encoding.
- The bitshuffle filter uses an implementation by Kiyoshi Masui.
- The byteshuffle filter uses an implementation by Francesc Alted (from the Blosc project).
- Arrays can now be opened at specific timestamps. #984
- The C and C++ APIs for compression have been deprecated. The corresponding filter API should be used instead. The compression API will be removed in a future TileDB version. #1008
- Removed Blosc compressors (obviated by byteshuffle -> compressor filter list).
- Fix issue where performing a read query with empty result could cause future reads to return empty #882
- Fix TBB initialization bug with multiple contexts #898
- Fix bug in max buffer sizes estimation #903
- Fix Buffer allocation size being incorrectly set on realloc #911
- Added check if the coordinates fall out-of-bounds (i.e., outside the array domain) during sparse writes, and added config param
sm.check_coord_oob
to enable/disable the check (enabled by default). #996 - Add config params
sm.num_reader_threads
andsm.num_writer_threads
for separately controlling I/O parallelism from compression parallelism. - Added contribution guidelines #899
- Enable building TileDB in Cygwin environment on Windows #890
- Added a simple benchmarking script and several benchmark programs #889
- Changed C API and disk format integer types to have explicit bit widths. #981
- Added
tiledb_{array,kv}_open_at
,tiledb_{array,kv}_open_at_with_key
andtiledb_{array,kv}_reopen_at
. - Added
tiledb_{array,kv}_get_timestamp()
. - Added
tiledb_kv_is_open
- Added
tiledb_filter_t
tiledb_filter_type_t
,tiledb_filter_option_t
, andtiledb_filter_list_t
types - Added
tiledb_filter_*
andtiledb_filter_list_*
functions. - Added
tiledb_attribute_{set,get}_filter_list
,tiledb_array_schema_{set,get}_coords_filter_list
,tiledb_array_schema_{set,get}_offsets_filter_list
functions. - Added
tiledb_query_get_buffer
andtiledb_query_get_buffer_var
. - Added
tiledb_array_get_uri
- Added
tiledb_encryption_type_t
- Added
tiledb_array_create_with_key
,tiledb_array_open_with_key
,tiledb_array_schema_load_with_key
,tiledb_array_consolidate_with_key
- Added
tiledb_kv_create_with_key
,tiledb_kv_open_with_key
,tiledb_kv_schema_load_with_key
,tiledb_kv_consolidate_with_key
- Added encryption overloads for
Array()
,Array::open()
,Array::create()
,ArraySchema()
,Map()
,Map::open()
,Map::create()
andMapSchema()
. - Added
Array::timestamp()
andArray::reopen_at()
methods. - Added
Filter
andFilterList
classes - Added
Attribute::filter_list()
,Attribute::set_filter_list()
,ArraySchema::coords_filter_list()
,ArraySchema::set_coords_filter_list()
,ArraySchema::offsets_filter_list()
,ArraySchema::set_offsets_filter_list()
functions. - Added overloads for
Array()
,Array::open()
,Map()
,Map::open()
for handling timestamps.
- Removed Blosc compressors.
- Removed
tiledb_kv_set_max_buffered_items
. - Modified
tiledb_kv_open
to not take an attribute subselection, but instead take as input the query type (similar to arrays). This makes the key-value store behave similarly to arrays, which means that the key-value store does not support interleaved reads/writes any more (an opened key-value store is used either for reads or writes, but not both). tiledb_kv_close
does not flush the written items. Instead,tiledb_kv_flush
must be invoked explicitly.
- Removed Blosc compressors.
- Removed
Map::set_max_buffered_items
. - Modified
Map::Map
andMap::open
to not take an attribute subselection, but instead take as input the query type (similar to arrays). This makes the key-value store behave similarly to arrays, which means that the key-value store does not support interleaved reads/writes any more (an opened key-value store is used either for reads or writes, but not both). Map::close
does not flush the written items. Instead,Map::flush
must be invoked explicitly.
- Fix read query bug from multiple fragments when query layout differs from array layout #869
- Fix error when consolidating empty arrays #861
- Fix bzip2 external project URL #875
- Invalidate cached buffer sizes when query subarray changes #882
- Add check to ensure tile extent greater than zero #866
- Add
TILEDB_INSTALL_LIBDIR
CMake option #858 - Remove
TILEDB_USE_STATIC_*
CMake variables from build #871 - Allow HDFS init to succeed even if libhdfs is not found #873
- Add missing checks when setting subarray for sparse writes #843
- Fix dl linking build issue for C/C++ examples on Linux #844
- Add missing type checks for C++ api Query object #845
- Add missing check that coordinates are provided for sparse writes #846
Version 1.3.0 focused on performance, stability, documentation and API improvements/enhancements.
- New guided tutorial series added to documentation.
- Query times improved dramatically with internal parallelization using TBB (multiple PRs)
- Optional deduplication pass on writes can be enabled #636
- New internal statistics reporting system to aid in performance optimization #736
- Added string type support: ASCII, UTF-8, UTF-16, UTF-32, UCS-2, UCS-4 #415
- Added
TILEDB_ANY
datatype #446 - Added parallelized VFS read operations, enabled by default #499
- SIGINT signals will cancel in-progress queries #578
- Arrays must now be open and closed before issuing queries, which clarifies the TileDB consistency model.
- Improved handling of incomplete queries and variable-length attribute data.
- Added parallel S3, POSIX, and Win32 reads and writes, enabled by default.
- Query performance improvements with parallelism (using TBB as a dependency).
- Got rid of special S3 "directory objects."
- Refactored sparse reads, making them simpler and more amenable to parallelization.
- Refactored dense reads, making them simpler and more amenable to parallelization.
- Refactored dense ordered writes, making them simpler and more amenable to parallelization.
- Refactored unordered writes, making them simpler and more amenable to parallelization.
- Refactored global writes, making them simpler and more amenable to parallelization.
- Added ability to cancel pending background/async tasks. SIGINT signals now cancel pending tasks.
- Async queries now use a configurable number of background threads (default number of threads is 1).
- Added checks for duplicate coordinates and option for coordinate deduplication.
- Map usage via the C++ API
operator[]
is faster, similar to theMapItem
path.
- Fixed bugs with reads/writes of variable-sized attributes.
- Fixed file locking issue with simultaneous queries.
- Fixed S3 issues with simultaneous queries within the same context.
- Added
tiledb_array_alloc
- Added
tiledb_array_{open, close, free}
- Added
tiledb_array_reopen
- Added
tiledb_array_is_open
- Added
tiledb_array_get_query_type
- Added
tiledb_array_get_schema
- Added
tiledb_array_max_buffer_size
andtiledb_array_max_buffer_size_var
- Added
tiledb_query_finalize
function. - Added
tiledb_ctx_cancel_tasks
function. - Added
tiledb_query_set_buffer
andtiledb_query_set_buffer_var
which sets a single attribute buffer - Added
tiledb_query_get_type
- Added
tiledb_query_has_results
- Added
tiledb_vfs_get_config
function. - Added
tiledb_stats_{enable,disable,reset,dump}
functions. - Added
tiledb_kv_alloc
- Added
tiledb_kv_reopen
- Added
tiledb_kv_has_key
to check if a key exists in the key-value store. - Added
tiledb_kv_free
. - Added
tiledb_kv_iter_alloc
which takes as input a kv object - Added
tiledb_kv_schema_{set,get}_capacity
. - Added
tiledb_kv_is_dirty
- Added
tiledb_kv_iter_reset
- Added
sm.num_async_threads
,sm.num_tbb_threads
, andsm.enable_signal_handlers
config parameters. - Added
sm.check_dedup_coords
andsm.dedup_coords
config parameters. - Added
vfs.num_threads
andvfs.min_parallel_size
config parameters. - Added
vfs.{s3,file}.max_parallel_ops
config parameters. - Added
vfs.s3.multipart_part_size
config parameter. - Added
vfs.s3.proxy_{scheme,host,port,username,password}
config parameters.
- Added
Array::{open, close}
- Added
Array::reopen
- Added
Array::is_open
- Added
Array::query_type
- Added
Context::cancel_tasks()
function. - Added
Query::finalize()
function. - Added
Query::query_type
- Added
Query::has_results
- Changed the return type of the
Query
setters to return the object reference. - Added an extra
Query
constructor that omits the query type (this is inherited from the input array). - Added
Map::{open, close}
- Added
Map::reopen
- Added
Map::is_dirty
- Added
Map::has_key
to check for key presence. - A
tiledb::Map
defined with only one attribute will allow implicit usage, e.x.map[key] = val
instead ofmap[key][attr] = val
. - Added optional attributes argument in
Map::Map
andMap::open
MapIter
can be used to create iterators for a map.- Added
MapIter::reset
- Added
MapSchema::set_capacity
andMapSchema::capacity
- Support for trivially copyable objects, such as a custom data struct, was added. They will be backed by an
sizeof(T)
sizedchar
attribute. Attribute::create<T>
can now be used with compoundT
, such asstd::string
andstd::vector<T>
, and other objects such as a simple data struct.- Added a
Dimension::create
factory function that does not take tile extent, which sets the tile extent toNULL
. tiledb::Attribute
can now be constructed with an enumerated type (e.x.TILEDB_CHAR
).- Added
Stats
class (wraps C APItiledb_stats_*
functions) - Added
Config::save_to_file
tiledb_query_finalize
must always be called beforetiledb_query_free
after global-order writes.- Removed
tiledb_vfs_move
and addedtiledb_vfs_move_file
andtiledb_vfs_move_dir
instead. - Removed
force
argument fromtiledb_vfs_move_*
andtiledb_object_move
. - Removed
vfs.s3.file_buffer_size
config parameter. - Removed
tiledb_query_get_attribute_status
. - All
tiledb_*_free
functions now returnvoid
and do not takectx
as input (except fortiledb_ctx_free
). - Changed signature of
tiledb_kv_close
to take atiledb_kv_t*
argument instead oftiledb_kv_t**
. - Renamed
tiledb_domain_get_rank
totiledb_domain_get_ndim
to avoid confusion with matrix def of rank. - Changed signature of
tiledb_array_get_non_empty_domain
. - Removed
tiledb_array_compute_max_read_buffer_sizes
. - Changed signature of
tiledb_{array,kv}_open
. - Removed
tiledb_kv_iter_create
- Renamed all C API functions that create TileDB objects from
tiledb_*_create
totiledb_*_alloc
. - Removed
tiledb_query_set_buffers
- Removed
tiledb_query_reset_buffers
- Added query type argument to
tiledb_array_open
- Changed argument order in
tiledb_config_iter_alloc
,tiledb_ctx_alloc
,tiledb_attribute_alloc
,tiledb_dimension_alloc
,tiledb_array_schema_alloc
,tiledb_kv_schema_load
,tiledb_kv_get_item
,tiledb_vfs_alloc
- Fixes with
Array::max_buffer_elements
andQuery::result_buffer_elements
to comply with the API docs.pair.first
is the number of elements of the offsets buffer. Ifpair.first
is 0, it is a fixed-sized attribute or coordinates. std::array<T, N>
is backed by achar
tiledb attribute since the size is not guaranteed.- Headers have the
tiledb_cpp_api_
prefix removed. For example, the include is now#include <tiledb/attribute.h>
- Removed
VFS::move
and addedVFS::move_file
andVFS::move_dir
instead. - Removed
force
argument fromVFS::move_*
andObject::move
. - Removed
vfs.s3.file_buffer_size
config parameter. Query::finalize
must always be called before going out of scope after global-order writes.- Removed
Query::attribute_status
. - The API was made header only to improve cross-platform compatibility.
config_iter.h
,filebuf.h
,map_item.h
,map_iter.h
, andmap_proxy.h
are no longer available, but grouped into the headers of the objects they support. - Previously a
tiledb::Map
could be created from astd::map
, an anonymous attribute name was defined. This must now be explicitly defined:tiledb::Map::create(tiledb::Context, std::string uri, std::map, std::string attr_name)
- Removed
tiledb::Query::reset_buffers
. Any previous usages can safely be removed. Map::begin
refers to the same iterator object. For multiple concurrent iterators, aMapIter
should be manually constructed instead of usingMap::begin()
more than once.- Renamed
Domain::rank
toDomain::ndim
to avoid confusion with matrix def of rank. - Added query type argument to
Array
constructor - Removed iterator functionality from
Map
. - Removed
Array::parition_subarray
.
- Fix I/O bug on POSIX systems with large reads/writes (#467)
- Memory overflow error handling (moved from constructors to init functions) (#472)
- Memory leaks with realloc in case of error (#472)
- Handle non-existent config param in C++ API (#475)
- Read query overflow handling (#485)
- Changed S3 default config so that AWS S3 just works (#455)
- Minor S3 optimizations and error message fixes (#462)
- Documentation additions including S3 usage (#456, #458, #459)
- Various CI improvements (#449)
- Fixed TileDB header includes for all examples (#409)
- Fixed TileDB library dynamic linking problem for C++ API (#412)
- Fixed VS2015 build errors (#424)
- Bug fix in the sparse case (#434)
- Bug fix for 1D vector query layout (#438)
- Added documentation to API and examples (#410, #414)
- Migrated docs to Readthedocs (#418, #420, #422, #423, #425)
- Added dimension domain/tile extent checks (#429)
The 1.2.0 release of TileDB includes many new features, improvements in stability and performance, and two new language interfaces (Python and C++). There are also several breaking changes in the C API and on-disk format, documented below.
Important Note: due to several improvements and changes in the underlying array storage mechanism, you will need to recreate any existing arrays in order to use them with TileDB v1.2.0.
- Windows support. TileDB is now fully supported on Windows systems (64-bit Windows 7 and newer).
- Python API. We are very excited to announce the initial release of a Python API for TileDB. The Python API makes TileDB accessible to a much broader audience, allowing seamless integration with existing Python libraries such as NumPy, Pandas and the scientific Python ecosystem.
- C++ API. We've included a C++ API, which allows TileDB to integrate into modern C++ applications without having to write code towards the C API. The C++ API is more concise and provides additional compile time type safety.
- S3 object store support. You can now easily store, query, and manipulate your TileDB arrays on S3 API compatibile object stores, including Amazon's AWS S3 service.
- Virtual filesystem interface. The TileDB API now exposes a virtual filesystem (or VFS) interface, allowing you to perform tasks such as file creation, deletion, reads, and appends without worrying about whether your files are stored on S3, HDFS, a POSIX or Windows filesystem, etc.
- Key-value store. TileDB now provides a key-value (meta) data storage abstraction. Its implementation is built upon TileDB's sparse arrays and inherits all the properties of TileDB sparse arrays.
- Homebrew formula added for easier installation on macOS. Homebrew is now the perferred method for installing TileDB and its dependencies on macOS.
- Docker images updated to include stable/unstable/dev options, and easy configuration of additional components (e.g. S3 support).
- Tile cache implemented, which will greatly speed up repeated queries on overlapping regions of the same array.
- Ability to pass runtime configuration arguments to TileDB/VFS backends.
- Unnamed (or "anonymous") dimensions are now supported; having a single anonymous attribute is also supported.
- Concurrency bugfixes for several compressors.
- Correctness issue fixed in double-delta compressor for some datatypes.
- Better build behavior on systems with older GCC or CMake versions.
- Several memory leaks and overruns fixed with help of sanitizers.
- Many improved error condition checks and messages for easier debugging.
- Many other small bugs and API inconsistencies fixed.
tiledb_config_*
: Types and functions related to the new configuration object and functionality.tiledb_config_iter_*
: Iteration functionality for retieving parameters/values from the new configuration object.tiledb_ctx_get_config()
: Function to get a configuration from a context.tiledb_filesystem_t
: Filesystem type enum.tiledb_ctx_is_supported_fs()
: Function to check for support for a given filesystem backend.tiledb_error_t
,tiledb_error_message()
andtiledb_error_free()
: Type and functions for TileDB error messages.tiledb_ctx_get_last_error()
: Function to get last error from context.tiledb_domain_get_rank()
: Function to retrieve number of dimensions in a domain.tiledb_domain_get_dimension_from_index()
andtiledb_domain_get_dimension_from_name()
: Replaces dimension iterators.tiledb_dimension_{create,free,get_name,get_type,get_domain,get_tile_extent}()
: Functions related to creation and manipulation oftiledb_dimension_t
objects.tiledb_array_schema_set_coords_compressor()
: Function to set the coordinates compressor.tiledb_array_schema_set_offsets_compressor()
: Function to set the offsets compressor.tiledb_array_schema_get_attribute_{num,from_index,from_name}()
: Replaces attribute iterators.tiledb_query_create()
: Replaced many arguments with newtiledb_query_set_*()
setter functions.tiledb_array_get_non_empty_domain()
: Function to retrieve the non-empty domain from an array.tiledb_array_compute_max_read_buffer_sizes()
: Function to compute an upper bound on the buffer sizes required for a read query.tiledb_object_ls()
: Function to visit the children of a path.tiledb_uri_to_path()
: Function to convert a file:// URI to a platform-native path.TILEDB_MAX_PATH
andtiledb_max_path()
: The maximum length for tiledb resource paths.tiledb_kv_*
: Types and functions related to the new key-value store functionality.tiledb_vfs_*
: Types and functions related to the new virtual filesystem (VFS) functionality.
- Rename
tiledb_array_metadata_t
->tiledb_array_schema_t
, and associatedtiledb_array_metadata_*
functions totiledb_array_schema_*
. - Remove
tiledb_attribute_iter_t
. - Remove
tiledb_dimension_iter_t
. - Rename
tiledb_delete()
,tiledb_move()
,tiledb_walk()
totiledb_object_{delete,move,walk}()
. tiledb_ctx_create
: Config argument added.tiledb_domain_create
: Datatype argument removed.tiledb_domain_add_dimension
: Name, domain and tile extent arguments replaced with singletiledb_dimension_t
argument.tiledb_query_create()
: Replaced many arguments with newtiledb_query_set_*()
setter functions.tiledb_array_create()
: Added array URI argument.tiledb_*_free()
: All free functions now take a pointer to the object pointer instead of simply object pointer.- The include files are now installed into a
tiledb
folder. The correct path is now#include <tiledb/tiledb.h>
(or#include <tiledb/tiledb>
for the C++ API).
- Support for moving resources across previous VFS backends (local fs <-> HDFS) has been removed. A more generic implementation for this functionality with improved performance is planned for the next version of TileDB.