Investigate ReGrid Performance vs Node #96

bchavez · 2016-08-24T04:52:21Z

Seems like Node ReGrid can get 3x more writes than .NET; yielding faster upload wall time. See image below (credits @buskila):

Test setup

Upload only:
File Size: 1 GB.
Server: RethinkDB / Linux / Ubuntu 14, 3 nodes
Client: .NET Core / Linux

Chunk Size: Default
Batch Size: Default 8 -> 32

They tried single connection and connection pooling. No difference.

Using Stream IO:

// Upload a file using an IO stream
Guid uploadId;
using( var fileStream = File.Open("C:\\video.mp4", FileMode.Open) )
using( var uploadStream = bucket.OpenUploadStream("/video.mp4") )
{
    uploadId = uploadStream.FileInfo.Id;
    fileStream.CopyTo(uploadStream);
}

Suspicion

Too much chunk calculation in stream upload code. Try to parallelize / simplify some of this, especially when given byte[].

Node's ReGrid upload code is here:
https://github.com/internalfx/regrid/blob/master/lib/upload.js

Other notes

This should come after #77 is done.

After some discussion with @interalfx (thanks a bunch), the upload code is using node streams. Node streams info via @buskila:

Using .pipe() has other benefits too, like handling backpressure automatically so that
node won't buffer chunks into memory needlessly when the remote client 
is on a really slow or high-latency connection.

https://github.com/substack/stream-handbook

Currently, @internalfx runs 10 network requests in flight at any given time. In a scenario where there is infinite network latency, node won't write to the ReGrid API until at least 1 network request completes.

Cool. I think we could maybe do the same with 10 async tasks laying down bytes over a connection pool then as they complete, then come back read more bytes as network requests complete.

Other Research Findings

RethinkDB Limitations

Query size (419554663) greater than maximum (134217727).
So batch size can't be too big, Max query size is ~130MB something. So only ~130MB per batch max.

The text was updated successfully, but these errors were encountered:

bchavez mentioned this issue Nov 2, 2016

Break up the driver into smaller pieces #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate ReGrid Performance vs Node #96

Investigate ReGrid Performance vs Node #96

bchavez commented Aug 24, 2016 •

edited

Loading

Investigate ReGrid Performance vs Node #96

Investigate ReGrid Performance vs Node #96

Comments

bchavez commented Aug 24, 2016 • edited Loading

Test setup

Suspicion

Other notes

Other Research Findings

RethinkDB Limitations

bchavez commented Aug 24, 2016 •

edited

Loading