diff --git a/doc/advanced.rst b/doc/advanced.rst index fe6ac3bdf..b048f2b29 100644 --- a/doc/advanced.rst +++ b/doc/advanced.rst @@ -5,4 +5,5 @@ Advanced features :maxdepth: 2 recv-chunk + recv-chunk-group recv-stats diff --git a/doc/recv-chunk-group.rst b/doc/recv-chunk-group.rst new file mode 100644 index 000000000..d64a3fa72 --- /dev/null +++ b/doc/recv-chunk-group.rst @@ -0,0 +1,77 @@ +Chunking stream groups +====================== + +While the :doc:`recv-chunk` allows for high-bandwidth streams to be received +with low overhead, it still has a fundamental scaling limitation: each chunk +can only be constructed from a single thread. :dfn:`Chunk stream groups` allow +this overhead to be overcome, although not without caveats. + +Each stream is still limited to a single thread. However, a :dfn:`group` of +streams can share the same sequence of chunks, with each stream contributing +a subset of the data in each chunk. Making use of this feature requires +that load balancing is implemented at the network level, using different +destination addresses or ports so that the incoming heaps can be multiplexed +into multiple streams. + +As with a single chunk stream, the group keeps a sliding window of chunks and +obtains new ones from an allocation callback. When the window slides forward, +chunks that fall out the back of the window are provided to a ready callback. +Each member stream also has its own sliding window, which can be smaller (but not +larger) than the group's window. When the group's window slides forward, the +streams' windows are adjusted to ensure they still fit within the group's +window. This can lead to chunks being removed from a stream even though there +is still data for them in the stream. In other words, a stream's window +determines how much reordering is tolerated within a stream, while the group's +window determines how out of sync the streams are allowed to become. When +choosing window sizes, one needs to remember that desynchronisation isn't +confined to the network: it can also happen if the threads servicing the +streams aren't all getting the same amount of CPU time. + +The general flow (in C++) is + +1. Create a :cpp:class:`~spead2::recv::chunk_stream_group_config`. +2. Create a :cpp:class:`~spead2::recv::chunk_stream_group`. +3. Create multiple instances of + :cpp:class:`~spead2::recv::chunk_stream_group_member`, each referencing the + group. +4. Add readers to the streams. +5. Process the data. +6. Optionally, call :cpp:func:`spead2::recv::chunk_stream_group::stop()` + (otherwise it will be called on destruction). +7. Destroy the member streams (this must be done before destroying the group). +8. Destroy the group. + +In Python the process is similar, although garbage collection replaces +explicit destruction. + +Ringbuffer convenience API +-------------------------- +As for standalone chunk streams, there is a simplified API using ringbuffers, +which is also the only API available for Python. A +:cpp:class:`~spead2::recv::chunk_stream_ring_group` is a group that allocates +data from one ringbuffer and send ready data to another. The description of +:ref:`that api ` largely applies here too. The +ringbuffers can be shared between groups. + +Caveats +------- +This is an advanced API that sacrifices some user-friendlyness for +performance, and thus some care is needed to use it safely. + +- It is vital that all the streams can make forward progress independently, + as otherwise deadlocks can occur. For example, if they share a thread pool, + the pool must have at least as many threads as streams. It's recommended + that each stream has its own single-threaded thread pool. +- The streams should all be added to the group before adding any readers to + the streams. Things will probably work even if this is not done, but the + design is sufficiently complicated that it is not advisable. +- The stream ID associated with each chunk will be the stream ID of one of the + component streams, but it is undefined which one. +- When the allocate and ready callbacks are invoked, it's not specified which + stream's batch statistics pointer will be passed. For the ready callback, + the `batch_stats` parameter may also be null (currently this can only happen + during :cpp:func:`spead2::recv::chunk_stream_group::stop`). +- Data can be lost, even if the member streams are all lossless, if a stream + falls behind the others. A lossless mode may be added in future. +- Two streams must not write to the same bytes of a chunk (in the payload, + present array or extra data), as this is undefined behaviour in C++. diff --git a/doc/recv-chunk.rst b/doc/recv-chunk.rst index 654691ebb..6188946a7 100644 --- a/doc/recv-chunk.rst +++ b/doc/recv-chunk.rst @@ -105,6 +105,8 @@ At present it is only possible to write a contiguous piece of data per heap. The data is transferred to the chunk even if the heap is incomplete (and hence not marked in the ``present`` array). +.. _recv-chunk-ringbuffer: + Ringbuffer convenience API -------------------------- A subclass is provided that takes care of the allocation and ready callbacks