some improvements to the documentation

mochi-hpc · Mar 5, 2024 · bff2496 · bff2496
1 parent 379419e
commit bff2496
Show file tree

Hide file tree

Showing 5 changed files with 50 additions and 48 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -15,10 +15,10 @@ RPC and RDMA library and a high level of on-node concurrency using
 `Argobots <https://www.argobots.org/>`_.
 
 Mofka provides a C++ and a Python interface. One of its particularities is that it
-splits events into two parts: a *Data* part, referencing raw, potentially large data,
+splits events into two parts: a **data** part, referencing potentially large, raw data,
 which Mofka will try its best not to copy more than necessary (e.g., by relying on
 RDMA to transfer it directly from a client application's memory to a storage device
-on servers) and a *Metadata* part, which consists of structured information about
+on servers) and a **metadata** part, which consists of structured information about
 the data (usually expressed in JSON). Doing so allows Mofka to store each part
 independently, batch metadata together, and allow an event to reference (a subset of)
 the data of another event. This interface is also often more adapted to HPC applications,

diff --git a/docs/usage/consumer.rst b/docs/usage/consumer.rst
@@ -41,18 +41,18 @@ A consumer can be created with five parameters, four of which are optional.
   send batches as soon as possible but will increase the batch size if the consumer is not
   responding fast enough.
 
-* **Data selector**: the consumer first receives the Metadata part of an event and runs
-  the user-provided data selector function on the Metadata to know whether the data should
-  be pulled. This function takes the Metadata part of the event as well as a :code:`DataDescriptor`
+* **Data selector**: the consumer first receives the metadata part of an event and runs
+  the user-provided data selector function on the metadata to know whether the data should
+  be pulled. This function takes the metadata part of the event as well as a :code:`DataDescriptor`
   instance. The latter is an opaque key that Mofka can use to locate the actual data.
   The above code is an example of data selector that will tell the consumer to pull the data
-  only if the *"energy"* field in the Metadata is greater than 20. It does so by returning
+  only if the *"energy"* field in the metadata is greater than 20. It does so by returning
   the provided :code:`DataDescriptor` if the field is greater than 20, and by returning
   :code:`mofka::DataDescriptor::Null()` if it isn't. The data selector could tell Mofka to pull
   *only a subset of an event's data*. More on this in the :ref:`Data descriptors` section.
 
-* **Data broker**: if the data selector returned a non-null DataDescriptor, the user-provided
-  data broker function is invoked by the consumer. This function takes the event's Metadata
+* **Data broker**: if the data selector returned a non-null :code:`DataDescriptor`, the user-provided
+  data broker function is invoked by the consumer. This function takes the event's metadata
   and the :code:`DataDescriptor` returned by the data selector, and must return a :code:`mofka::Data`
   object pointing to the location in memory where the application wishes for the data to be placed.
   This memory could be non-contiguous, it could be allocated by the data broker or it could point to
@@ -82,15 +82,15 @@ we can pull the events out of the consumer. The following code shows how to do t
       :code:`consumer.pull()` is a non-blocking function that returns a
       :code:`mofka::Future<Event>` that can be tested for completion and waited on.
       Waiting on the future gets us a :code:`mofka::Event` instance which contains the
-      event's Metadata and Data.
+      event's metadata and data.
 
       The call to :code:`event.acknowledge()` tells the Mofka partition manager that
       all the events in the partition up to this one have been processed by this consumer
       and should not be sent again, should the consumer restart.
 
       .. note::
 
-         In this example we have allocated the Data in our data broker function,
+         In this example we have allocated the memory for the data in our data broker function,
          so we need to free it when we no longer need it.
 
    .. group-tab:: Python

diff --git a/docs/usage/producer.rst b/docs/usage/producer.rst
@@ -5,7 +5,7 @@ Applications that need to produce events into one or more topics will need
 to create a :code:`Producer` instance. This object is an interface to produce
 events into a designated topic. It will internally run the Validator, Partition
 selector, and Serializer on the events it is being passed to validate the event's
-Metadata and Data, select a destination partition for each event, and serialize
+metadata and data, select a destination partition for each event, and serialize
 the event's metadata into batches aimed at the same partition.
 
 .. note::
@@ -70,8 +70,8 @@ A producer can be created with four optional parameters.
 Producing events
 ----------------
 
-As explained earlier, Mofka splits events into two parts: Metadata and Data.
-The Metadata part is JSON-structured, small, and can be batched with the Metadata
+As explained earlier, Mofka splits events into two parts: metadata and data.
+The metadata part is JSON-structured, small, and can be batched with the metadata
 of other events to issue fewer RPCs to partition managers. The Data part is optional
 and represents potentially larger, raw data that can benefit from being transferred
 via zero-copy mechanisms such as RDMA.
@@ -82,7 +82,8 @@ by a JSON fragment containing the timestamp and detector information (e.g., call
 parameters), as well as information about the images (e.g., dimensions, pixel format).
 The data part of an event would be the image itself.
 
-The code bellow shows how to create the Data and Metadata pieces of an event.
+The code bellow shows how to create the data and metadata pieces of an event
+in the form of a :code:`Data` instance and a :code:`Metadata` instance respectively.
 
 .. tabs::
 
@@ -94,13 +95,13 @@ The code bellow shows how to create the Data and Metadata pieces of an event.
          :end-before: END EVENT
          :dedent: 8
 
-      The first Data object, :code:`data1`, is a view of a single contiguous
+      The first :code:`mofka::Data data1` object is a view of a single contiguous
       segment of memory underlying the :code:`segment1` vector. The second
-      Data object, :code:`data2`, is a view of two non-contiguous such segments.
+      :code:`Data data2` object is a view of two non-contiguous segments.
 
-      The first Metadata object, :code:`metadata1`, is created from a raw string
-      representing a JSON object with and "energy" field. The second Metadata object
-      contains the same information but is initialized using an :code:`nlohmann::json`
+      The first :code:`mofka::Metadata` object, :code:`metadata1`, is created from a
+      raw string representing a JSON object with and "energy" field. The second :code:`Metadata`
+      object contains the same information but is initialized using an :code:`nlohmann::json`
       instance, which is the library used by Mofka to manage JSON data in C++.
 
    .. group-tab:: Python
@@ -123,7 +124,7 @@ The code bellow shows how to create the Data and Metadata pieces of an event.
    is freed. Howeber the user should still take care that they are not written to
    until the data has been transferred.
 
-Having created the Metadata and the Data part of an event, we can now push the event
+Having created the metadata and the data part of an event, we can now push the event
 into the producer, as shown in the code bellow.
 
 .. tabs::
@@ -140,13 +141,13 @@ into the producer, as shown in the code bellow.
 
       Work in progress...
 
-The producer's :code:`push` function takes the Metadata and the Data and returns a :code:`Future`.
-Such a future can be tested for completion (:code:`future.completed()`) and can be blocked
-on until it completes (:code:`future.wait()`). The latter method returns the event ID of the
-created event (64-bits unsigned integer). It is perfectly OK to drop the future if you do not care
-to wait for its completion or for the resulting event ID, as examplified with the second event.
-Event IDs are monotonically increasing and are per-partition, so two events stored in distinct
-partitions may end up with the same ID.
+The producer's :code:`push` function takes the :code:`Metadata` and the :code:`Data`
+objects and returns a :code:`Future`. Such a future can be tested for completion
+(:code:`future.completed()`) and can be blocked on until it completes (:code:`future.wait()`).
+The latter method returns the event ID of the created event (64-bits unsigned integer).
+It is perfectly OK to drop the future if you do not care to wait for its completion or
+for the resulting event ID, as examplified with the second event. Event IDs are monotonically
+ncreasing and are per-partition, so two events stored in distinct partitions may end up with the same ID.
 
 Calling :code:`producer.flush()` is a blocking call that will force all the pending batches of events
 to be sent, regardless of whether they have reached the requested size. It can be useful to ensure
@@ -156,6 +157,7 @@ that all the events have been sent either periodically or before terminating the
 
    If the batch size used by the producer is anything else than :code:`mofka::BatchSize::Adaptive()`,
    a call to :code:`future.wait()` will block until the batch containing the corresponding event
-   has been filled up to the requested size and sent to its target partition. Hence, and easy
+   has been filled up to the requested size and sent to its target partition. Hence, an easy
    mistake to do is to call :code:`future.wait()` when the batch is not full and with no other threads
-   filling it up. This situation will result in a deadlock.
+   pushing more events to it. In this situation the batch will never get full, will never be sent,
+   and :code:`future.wait()` will never complete.
diff --git a/docs/usage/quickstart.rst b/docs/usage/quickstart.rst
@@ -9,8 +9,8 @@ about it, from the implementation of its databases, down to how they share
 resources such as hardware threads and I/O devices, ensuring that you can
 configure it to maximize performance on each individual platform and for
 each individual use case. The downside of this approach, however, is that
-you will need a lot more knowledge about Mochi than you would need about
-the inner workings of other services like Kafka.
+you will need more knowledge about Mochi than you would need about the inner
+workings of other services like Kafka.
 
 In this section, we will quickly deploy the bare minimum for a single-node,
 functional Mofka service accessible locally, before we can dive into the

diff --git a/docs/usage/topics.rst b/docs/usage/topics.rst
@@ -5,10 +5,10 @@ Events in Mofka are pushed into *topics*. A topic is a distributed collection
 of *partitions* to which events are appended. When creating a topic, users have to
 give it a name, and optionally provide three objects.
 
-* **Validator**: a validator is an object that validates that the Metadata and Data
+* **Validator**: a validator is an object that validates that the metadata and data
   part comply with whatever is expected for the topic. Metadata are JSON documents
   by default, so for instance a validator could check that some expected fields
-  are present. If the Metadata part describes the Data part in some way, a validator
+  are present. If the metadata part describes the data part in some way, a validator
   could check that this description is actually correct. This validation will happen
   before the event is sent to any server, resulting in an exception if the event is
   not valid. If not provided, the default validator will accept all the events it is
@@ -20,10 +20,10 @@ give it a name, and optionally provide three objects.
   strategy. If not provided, the default partition selector will cycle through the
   partitions in a round robin manner.
 
-* **Serializer**: a serializer is an object that can serialize a Metadata object into
-  a binary representation, and deserialize a binary representation back into a Metadata
-  object. If not provided, the default serializer will convert the Metadata into a
-  string representation.
+* **Serializer**: a serializer is an object that can serialize a :code:`Metadata` object
+  into a binary representation, and deserialize a binary representation back into a
+  :code:`Metadata` object. If not provided, the default serializer will convert the
+  :code:`Metadata` into a string representation.
 
 .. image:: ../_static/TopicPipeline-dark.svg
    :class: only-dark
@@ -34,21 +34,21 @@ give it a name, and optionally provide three objects.
 Mofka will take advantage of multithreading to parallelize and pipeline the execution
 of the validator, partition selector, and serializer over many events. These objects
 can be customized and parameterized. For instance, a validator that checks the content
-of a JSON Metadata could be provided with a list of fields it expects to find in the
-Metadata of each event.
+of a JSON metadata could be provided with a list of fields it expects to find in the
+metadata of each event.
 
 .. topic:: A motivating example
 
    Hereafter, we will create a topic accepting events that represent collisions in a
-   particle accelerator. We will require that the Metadata part of such events have
+   particle accelerator. We will require that the metadata part of such events have
    an *energy* value, represented by an unsigned integer (just so we can show
    what optimizations could be done with Mofka's modularity). Furthermore, let's say that
    the detector is calibrated to output energies from 0 to 99. We can create a validator that
    checks that the energy field is not only present, but that its value is also stricly lower
    than 100. If we would like to aggregate events with similar energy values into the same partition,
    we could have the partition selector make its decision based on this energy value.
    Finally, since we know that the energy value is between 0 and 99 and is the only relevant
-   part of an event's Metadata, we could serialize this value into a single byte (:code:`uint8_t`),
+   part of an event's metadata, we could serialize this value into a single byte (:code:`uint8_t`),
    drastically reducing the metadata size compared with a string like :code:`{"energy":42}`.
 
 .. important::
@@ -149,15 +149,15 @@ is the object that will receive and respond to RPCs targetting the partition's
 data and metadata. While it is possible to implement your own partition manager,
 Mofka already comes with two implementations.
 
-* **Memory**: The *"memory"* partition manager is a manager that keeps the Metadata
-  and Data in the local memory of the process it runs on. This partition manager
+* **Memory**: The *"memory"* partition manager is a manager that keeps the metadata
+  and data in the local memory of the process it runs on. This partition manager
   doesn't have any dependency and is easy to use for testing, for instance, but it
   won't provide persistence and will be limited by the amount of memory available
   on the node.
 * **Default**: The *"default"* partition manager is a manager that relies on a
   `Yokan <https://mochi.readthedocs.io/en/latest/yokan.html>`_ provider for storing
-  Metadata and on a `Warabi <https://github.com/mochi-hpc/mochi-warabi>`_
-  provider for storing Data. Yokan is a key/value storage component with a number
+  metadata and on a `Warabi <https://github.com/mochi-hpc/mochi-warabi>`_
+  provider for storing data. Yokan is a key/value storage component with a number
   of database backends available, such as RocksDB, LevelDB, BerkeleyDB, etc.
   Warabi is a blob storage component also with a variety of backend implementations
   including Pmem.
@@ -218,18 +218,18 @@ Two required arguments when adding partitions are the name of the topic and the
 of the server to which the partition should be added. Here because we only have one
 server, the rank is 0.
 
-With a default partition manager, we can specify the Metadata provider in the form
+With a default partition manager, we can specify the metadata provider in the form
 of an "address" interpretable by Bedrock. Here *"my_metadata_provider@local"* asks
 Bedrock to look for a provider named *"my_metadata_provider"* in the same process as
 the partition manager. In :ref:`Deployment` we will see that we could easily run these
 providers on different processes.
 
 .. note::
 
-   If we don't specify the Metadata (resp. Data) provider in the above
+   If we don't specify the metadata (resp. data) provider in the above
    code/commands, Mofka will look for a Yokan (resp. Warabi)
    provider with the tag :code:`"mofka:metadata"` (resp. :code:`"mofka:data"` ) in the
-   target server process and use that as the Metadata (resp. Data) provider.
+   target server process and use that as the metadata (resp. data) provider.
    If multiple such providers exist, Mofka will choose the first one it finds in the
    configuration file.