-
Notifications
You must be signed in to change notification settings - Fork 354
Chunked Object Store
elandau edited this page May 1, 2012
·
9 revisions
Storing large objects in cassandra has to be done carefully since it can cause excessive heap pressure and hot spots. Astyanax provides utility classes that address this issues by splitting up large objects into multiple keys and handles fetching them in random order to reduce hot spots.
Storing an object
ChunkedStorageProvider provider = new CassandraChunkedStorageProvider(keyspace, CF_DATA.getName());
ObjectMetadata meta = ChunkedStorage.newWriter(provider, objName, someInputStream)
.withChunkSize(0x1000) // Optional chunk size to override the default for this provider
.withConcurrencyLevel(8) // Upload chunks in 8 threads
.withTtl(60) // Optional TTL for the entire object
.call();
ChunkedStorageProvider provider = new CassandraChunkedStorageProvider(keyspace, CF_DATA.getName());
ObjectMetadata meta = ChunkedStorage.newWriter(provider, objName, someInputStream)
.withChunkSize(0x1000) // Optional chunk size to override the default for this provider
.withConcurrencyLevel(8) // Upload chunks in 8 threads
.withTtl(60) // Optional TTL for the entire object
.call();
Reading an object
ObjectMetadata meta2 = ChunkedStorage.newInfoReader(provider, objName)
.withBatchSize(11) // Randomize fetching blocks within a batch. The batch should be a small multiple of the number of nodes
.withRetryPolicy(new ExponentialBackoffWithRetry(250,20)) // Retry policy for when a chunk isn't available. This helps implement retries in a cross region setup where replication may be slow
.withConcurrencyLevel(2) // Download chunks in 2 threads. Be careful here. Too many client + too many thread = Cassandra not happy
.call();
ObjectMetadata meta2 = ChunkedStorage.newInfoReader(provider, objName)
.withBatchSize(11) // Randomize fetching blocks within a batch. The batch should be a small multiple of the number of nodes
.withRetryPolicy(new ExponentialBackoffWithRetry(250,20)) // Retry policy for when a chunk isn't available. This helps implement retries in a cross region setup where replication may be slow
.withConcurrencyLevel(2) // Download chunks in 2 threads. Be careful here. Too many client + too many thread = Cassandra not happy
.call();
Use this to determine the object size when creating a ByteArrayInputStream.
ObjectMetadata meta = ChunkedStorage.newInfoReader(provider, objName).call();
int objectSize = meta.getObjectSize();
A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Jobs
- Getting-Started
- Configuration
- Features
- Monitoring
- Thread Safety
- Timeouts
- Recipes
- Examples
- Javadoc
- Utilities
- Cassandra-Compatibility
- FAQ
- End-to-End Examples
- Astyanax Integration with Java Driver