Skip to content

Chunked Object Store

elandau edited this page May 2, 2012 · 9 revisions

Storing large objects in cassandra has to be done carefully since it can cause excessive heap pressure and hot spots. Astyanax provides utility classes that address this issues by splitting up large objects into multiple keys and handles fetching them in random order to reduce hot spots.

Creating a provider

Before calling any of the read/write APIs you must first create a provider. A basic cassandra chunked provider is provided with Astyanax. You can write you own if you’d like to customize it further.

ChunkedStorageProvider provider = new CassandraChunkedStorageProvider(keyspace, "data_column_family_name");

Storing an object

The ObjectWriter will break up the file into chunks and push them to cassandra from multiple threads.

ObjectMetadata meta = ChunkedStorage.newWriter(provider, objName, someInputStream)
    .withChunkSize(0x1000)    // Optional chunk size to override the default for this provider
    .withConcurrencyLevel(8)  // Optional. Upload chunks in 8 threads
    .withTtl(60)              // Optional TTL for the entire object
    .call();

Reading an object

The file is read directly into an OutputStream. The ObjectReader handles parallelizing and randomizing the requests in batches.

// For this example we create a byte array output stream, which requires us to first read
// the object size.   You don't need to do this if you are reading into a FileOutputStream
ObjectMetadata meta = ChunkedStorage.newInfoReader(provider, objName).call();
ByteArrayOutputStream os = new ByteArrayOutputStream(meta.getObjectSize().intValue());

// Read the file
meta = ChunkedStorage.newReader(provider, objName, os)
    .withBatchSize(11)       // Randomize fetching blocks within a batch.  
    .withRetryPolicy(new ExponentialBackoffWithRetry(250,20))  // Retry policy for when a chunk isn't available.  This helps implement retries in a cross region setup where replication may be slow
    .withConcurrencyLevel(2)  // Download chunks in 2 threads.  Be careful here.  Too many client + too many thread = Cassandra not happy
    .call();

Deleting an object

ChunkedStorage.newDeleter(provider, objName).call();

Getting object info

Use this to determine the object size when creating a ByteArrayInputStream.

ObjectMetadata meta = ChunkedStorage.newInfoReader(provider, objName).call();
int objectSize = meta.getObjectSize();
Clone this wiki locally