Contention, Actor Patterns, and Refining the Suzuka Full Node #340

l-monninger · 2024-08-12T13:20:01Z

l-monninger
Aug 12, 2024
Maintainer

Tracking issue: #63

Summary

Strides have been made to refactor to the Suzuka Full Node to eliminate synchronization primitives, minimize logical awaiting, and generally make better usage of the tokio runtime. However, further, improvements should be endeavored.

musitdev · 2024-08-13T17:34:15Z

musitdev
Aug 13, 2024
Collaborator

To follow the discussion started here #63 #63
The actor model fit well when you've dedicated task that you can isolate and processed by the actors.
In blockchain node, from my experience, there's another pattern that is a good addition to the actor model: state transition processor.
The development focus on entity state and the processing of their transition. All these processing generate event that are executed by the state processor transition.
To illustrate this, I'll take the current Suzuka node Tx processing as an example.
In the Suzuka node the main entity is the transaction. Mostly all the node activity is to transform transaction until its finality.
So the basic pattern in Suzuka node is: Get Tx<state A> -> process(Tx<state A>) -> Change state State A -> B -> Store Tx<state B>. Each -> is modeled by an event.
In the current node, we can define these Tx states:
Received | Added to mempool | Validated | Send to DA | Added to Block | Executed | Finalized

The Process and Store step can use share resource and/or executed in parallel, it's mostly where the actor are used, and the decision depends on how we optimize resource sharing.
In defining these state transition, we define the resource needed to achieve the state, and we decide the best to organize them.
It allows specifying precisely how share resource are organized and accessed. For example, we can define a different storage of tx before send to Da and after.

The Suzuka node code can be organized around these state transition, where the basic pattern is applied to each Tx or group of Tx (blob, block).
A state manager applies the transition and executes the basic pattern.
Processing / Store are executed by actors. Some actors can own the storage, for other it can be provided by the calling message. It depends on how we want to share the resource.

The whole process fit well in an event streaming API where actor generate event and the state manage and process the events.

The whole pattern can be applied at different granularity like Russian dolls. Some state transitions are grouped together and processed in a sub manager.
It can be applied to several entities too. The same way to model works for Block processing, for example.

Last thing in the state manager pattern, a sort of supervisor can be plugged to manage all error case to decide what has to be done depending on the state currently processing. The error can go up the processing hierarchy until a supervisor know how to handle it.

GetTxByHash query contention:
In // to Tx state transition processing, we add some query capabilities to it. Query can be integrated as an event and processed like other state transition event. This way, some priority rules or optimization can be added. For example, search event are processed if there's no other Tx state event or if a pending processing even reference the same Tx as the search, the manager can answer immediately.

I don't know how this pattern can be applied to the actor evolution. From my XP, it helps to have a good control on resource sharing and parallel process execution. It can evolve easily with the evolution of the functionalities because each evolution is defined in the state transition with allow to see what are the implication on the existing code and decide the best option.

Last thing, it allows mixing sync and async code because each processing / storage can be defined as sync or async code. The state manage is able to call them correctly. We can even define how we allocate CPU between sync and async task.

Not sure how it can be integrated, it's more to open the discussion and if you think it can be interesting we can see how we can integrate this pattern.

0 replies

mzabaluev · 2024-08-14T16:49:39Z

mzabaluev
Aug 14, 2024
Maintainer

Draft notes for an eventual document.

Composing async applications using actor model

To avoid excessive contention and permit concurrent execution on a multi-threaded async runtime, sharing of mutable state between asynchronous tasks should be avoided. Generally, state used by each task should be Send, but not necessarily Sync; locks guarding the mutable bits should be seen as a smell and reworked by splitting state between tasks.

Operation of a network-driven application such as a blockchain node can be modeled as a collection of actor tasks executed on an async runtime. An actor generally encapsulates tightly coupled mutable state that changes in reaction to events that arrive via cross-task communication channels or I/O interfaces.

Organization of a task

The funcitonality of an individual task, or a protocol unit that may form part or a task, can be split into three principal parts. The specific design may omit some of these.

Front object

The main type to present the domain-specific API to control the task's behavior at runtime.
Async methods have &self recipient, mutability is realized by sending requests or other non-locking synchronization with the background task.
Can be a cheaply cloned handle to embed in other tasks, encapsulating an MPSC request channel's sender coupled with the recipient polled by the background task. Async API methods are realized by sending request messages over the channel; if a return value needs to be awaited for, a oneshot sender is passed in the request.
Exceptions: can contain inner members that feature interior mutability and are thread-safe, e.g. the database connection.
- Example: the Aptos executor instance in the Executor object of maptos-opt-executor.

Background task

Encapsulates mutable state and implements the asynchronous state machine.
Simple idiomatic interface to facilitate spawning as a task:
- impl Future<Output = Result<(), Error>> + Send, or
- pub async fn run(self) -> Result<(), Error>.
- Alternatively, could be a Stream (or an async iterator in future Rust) if the main purpose of the task is producing events. But consider passing an mpsc::Sender to the task to send events on instead.
Typical pattern: a loop awaiting or select!ing the event source(s).
- For extra composability and testability, single step processing methods can be exposed by the module as well.
- Async methods used for internal code organization may need to use &mut self even if mutability is not needed in the body, to avoid an unnecessary Sync bound for the task state.

Construction API

Used to construct the front object and the background task.
Ends of communication channels are passed on construction to connect with other tasks.
Can be staggered: the front object is constructed first, background and other task-related objects created by a separate method.
- Example (as seen in Refactor background tasks and services out of execution units #308): Executor::new to construct initial state, Executor::background to instantiate the pipe work.

Cross-task communication

MPSC channels, as provided by Tokio, are currently found to be sufficient to organize passing of events and request messages between agent tasks making a Movement node (as realized first in #308). If needed, Tokio's sync::broadcast could be used to realize event subscription for multiple tasks.

At this point we're not looking to implement dynamic composition via named channels, publish-subscribe, etc.

Concurrency on smaller scale

Suggested decisions on which concurrency approach to use for specific kinds of tasks.

Async IO-bound: async (coroutines);
Needs to work with runtime-specific async primitives (except where sync API can be used): async (coroutines);
CPU-intensive: spawn_blocking (threads);
Blocking IO-bound (e.g. rocksdb operations): spawn_blocking (threads).

Tower framework

To implement tasks that involve processing a flow of incoming requests that needs to be back-pressured against available capacity, we can utilise the tower framework and the middlewares provided for it.

External resources

Actors with Tokio

10 replies

l-monninger Aug 15, 2024
Maintainer Author

Spawning "ticks" only makes sense if parallelism between them can be realized.

I'm probably playing semantics, but I actually disagree somewhat here.

The tick_ concept is more of a logical callout to this thing being structured as a state machine. There are logical situations wherein that can be a concurrent state machine--i.e., multiple ticks entering the same ~critical section at once--and cases wherein you need exclusion over this section.

For the latter, even if your coroutines are not "co" routining with each other, you still may get benefits of concurrent processing with the rest of the runtime.

mzabaluev Aug 16, 2024
Maintainer

@l-monninger

CPU-intensive: tasks (threads);

Blocking IO-bound: tasks (threads).

By this you mean tasks dispatched to the worker pool with spawn_blocking?

mzabaluev Aug 16, 2024
Maintainer

I have updated the note with my interpretation of the above comments, also worked in a mention of the Tower framework.

mzabaluev Aug 16, 2024
Maintainer

Maybe we could get in the habit of making formatted justifications for this kind of thing too

I'm not sure if we need to go this formal. These descriptions would be prone to getting omitted, incomplete or inaccurate, falling out of date, etc. But the general guidelines recorded in development documents certainly help.

l-monninger Aug 16, 2024
Maintainer Author

By this you mean tasks dispatched to the worker pool with spawn_blocking?

Yeah, I guess I should have added an extra linere. Spawn blocking when we know there are blocking and straightforwardly CPU-intensive tasks.

But, to my understanding, we can also benefit from spawning procedures that are mainly asynchronous but generally heavy. This is because (a) spread the work of polling futures to maybe another thread and (b) thus another core--not relying on work stealing to accomplish this for us.

l-monninger · 2024-08-16T16:08:59Z

l-monninger
Aug 16, 2024
Maintainer Author

@mzabaluev It would be great to have a the template/toy implementation of this pattern included in this discussion. That is, the simplest version of a Front Object, Construction API, and Background Task that you can provide.

0 replies

mzabaluev · 2024-08-16T18:01:20Z

mzabaluev
Aug 16, 2024
Maintainer

A toy actor example:

use std::future::Future;
use tokio::sync::{mpsc, oneshot};

/// Public handle object for the toy counter service.
///
/// A handle can be cheaply cloned to get multiple instances for using the
/// same service in different parts of the program.
#[derive(Clone)]
pub struct Counter {
    sender: mpsc::Sender<Request>,
}

// Enum for the internal requests processed by the counter service.
enum Request {
    Increment { by: i64 },
    Get { response: oneshot::Sender<i64> },
}

impl Counter {
    /// Instantiates the counter service.
    /// Returns the handle object and the future to spawn on the async runtime.
    pub fn new(
        initial_count: i64,
    ) -> (
        Self,
        impl Future<Output = anyhow::Result<()>> + Send,
    ) {
        let (sender, receiver) = mpsc::channel(16);
        let background = Background {
            receiver,
            counter: initial_count,
        }
        .run();
        (Counter { sender }, background)
    }

    pub async fn increment(&self, by: i64) -> anyhow::Result<()> {
        self.sender.send(Request::Increment { by }).await?;
        Ok(())
    }

    pub async fn get(&self) -> anyhow::Result<i64> {
        let (response_tx, response_rx) = oneshot::channel();
        self.sender
            .send(Request::Get {
                response: response_tx,
            })
            .await?;
        let value = response_rx.await?;
        Ok(value)
    }
}

// State of the background task for the counter service.
// This is the actor in the actor pattern.
struct Background {
    receiver: mpsc::Receiver<Request>,
    counter: i64,
}

impl Background {
    async fn run(mut self) -> anyhow::Result<()> {
        while let Some(req) = self.receiver.recv().await {
            match req {
                Request::Increment { by } => self.increment(by).await?,
                Request::Get { response } => {
                    let value = self.get().await?;
                    response.send(value).unwrap_or_else(|_| {
                        // The request method's future has been canceled,
                        // so it's OK to do nothing here.
                    });
                }
            }
        }
        // The last handle has been dropped, terminate.
        Ok(())
    }

    async fn increment(&mut self, amount: i64) -> anyhow::Result<()> {
        self.counter += amount;
        Ok(())
    }

    // NOTE: if task state features generics or dynamic objects and the compiler
    // says these lack `Sync`, consider making the recipient `&mut self`. See
    // https://github.com/rust-lang/rust/issues/129105
    async fn get(&self) -> anyhow::Result<i64> {
        Ok(self.counter)
    }
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let (counter, background_task) = Counter::new(0);
    tokio::spawn(background_task);
    counter.increment(42).await?;
    let value = counter.get().await?;
    println!("{value}");
    Ok(())
}

5 replies

l-monninger Aug 16, 2024
Maintainer Author

Lovely!

l-monninger Aug 16, 2024
Maintainer Author

The other thing that's nice about this pattern is that if you compose everything this way, you potentially have these nice breakpoints where you're already setup to split something out as a microservice.

l-monninger Aug 16, 2024
Maintainer Author

In terms of naming conventions, traditionally wouldn't Background actually be referred to as the "Actor" while ActorCounter is the Client and Request is the message?

How do you feel about adopting a pattern of calling the front object by a normal struct name, the message processing struct by the front object struct name suffixed with Actor and the message by the front object suffixed with Message.

So, here, that would be Counter, CounterActor, and CounterMessage.

l-monninger Aug 16, 2024
Maintainer Author

It's also very tempting to think of how you could turn this into a macro.

mzabaluev Aug 19, 2024
Maintainer

Yes, the names here are just for clarity of the example. The background object (if there needs to be one) is often private, and the message enum is private as well, so these don't need to have any externally meaningful names.

musitdev · 2024-08-19T08:42:36Z

musitdev
Aug 19, 2024
Collaborator

From my experience, the difficulty of this model is to define the right granularity and how Actor are grouped.
If the granularity is too low, it becomes inefficient. For example, using this model to validate each received Tx (mostly tx signature) one by one will add more processes than just the validation.
The other point is, you need some sort of orchestrator (the main function in the example) to interconnect all the actors and define how resource are shared. Without it can lead to a channel maze where you use channels to interconnect actors and define the flow of data. I have even seen some case where it creates a lot of contention or deadlock with channel cycle, the thing we want to avoid.
I found that Domain Driven Design works well to define the actor granularity and how they are orchestrated.
The other point is as everything is async you need to take care on how you spawn_blocking in the actor because you can start a lot of thread. In the Tx validation example, you start a thread per tx.

3 replies

mzabaluev Aug 19, 2024
Maintainer

Good points. The advice on the internet follows yours: this pattern should be used at the high level of the program. The main rule is to separate units that are not tightly coupled by shared mutable state, but need to communicate.

I found that Domain Driven Design works well to define the actor granularity and how they are orchestrated.

Can you share some references on this?

musitdev Aug 20, 2024
Collaborator

The main ref is when I work on Hotstuff implementation for the research paper: https://github.com/asonnino/hotstuff .

Every part contains a loop that process request similar to actors and between each part channel or network connection are used to communicate.

A simple view of the interaction between parts are:
Mempool is connected to consensus to send new Tx.
Mempool and Consensus are connected to the db to save and network to send data.
Consensus are connected to Mempool to query missing Tx.

This was working well for the research paper bench but if you add some new function to one of the part the behavior of others get modified.
For example, there are some hidden issues with tokio scheduling in the initial implementation. Async and sync process are mixed. The main annoying one is when I added an execution layer that modify the consensus time execution. It slow it down a little. But I see the consequence when I get in prod, under some network micro delay with some internal mempool processing, the consensus gets behind the current slot and timeout. In fact, the consensus layer was very sensible to the overall execution smoothest. I solve the issue by spawn blocking all Db access because the DB access were adding micro delay to tokio sequencer that disturb the execution. The overall execution was fragile because each layer can influence or depends on the execution time of the others without any possibilities to address the issue because it needs some sort of supervision between all the part and the architecture doesn't allow it.
For example, to recover from timeout you need to rearrange the way the mempool and consensus are executed to get the missing Tx and block, and with this channel organization you can't. To propose something stable and to add more features, we had to reorganize everything.

mzabaluev Aug 20, 2024
Maintainer

I think you can simplify to a great deal by abstracting each service as its front object with an async method API. Runtime trouble starts when there are blocking behaviors, or more rarely, if there are circular dependencies.

Concurrent access to an IO-bound subsystem like a database does not fit this model well in my experience, also the database access libraries for local file DBs (like rocksdb in our case) are not async-friendly.
So concur with the recommendation above to farm it out to spawn_blocking.

musitdev · 2024-08-22T09:30:30Z

musitdev
Aug 22, 2024
Collaborator

For info since the first merge of the actor pattern, the indexer gRpc init weren't plugged anymore. I've added the init to the new code, but it was a little awkward.
The new init function from Mikhail change:

movement/protocol-units/execution/opt-executor/src/indexer.rs

Line 20 in 08deab5

    
           pub fn run_indexer_grpc_service(&self) -> Result<IndexerRuntime, anyhow::Error> {

Where it's added to the executor init:

movement/protocol-units/execution/opt-executor/src/executor/initialization.rs

Line 120 in 08deab5

let indexer_runtime = cx.run_indexer_grpc_service()?;

And where it's integrated in the mail loop:
here:

movement/protocol-units/execution/dof/src/v1.rs

Line 61 in 08deab5

let (opt_context, transaction_pipe, indexer_runtime) =

After I add to change all the test that use executor.background in all dof and fin_view crate.
The reason I do like that, it's because the Aptos part of the indexer init need the DbReaderWriter that is created and hidden behind the Context. That why the indexer init function is an implementation of Context:

movement/protocol-units/execution/opt-executor/src/indexer.rs

Line 18 in 08deab5

impl Context {

With the way to do, all the call that are encapsulated: Suzuka node init call execution init that call indexer init and each sub element that access to main resource, I didn't manage to remove the indexer init from the Execution without changing a lot of code.
I think the main reason is that the Aptos part of the indexer gRpc use the DbReaderWriter instead of a notification of Tx state change because the gRpc part role is only to send new tx, so there's no need to access the db.
For me, it shows that components should be mobilized about their needs and not the resource they need to get their data and provide access to what they produce, here state change for the execution.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contention, Actor Patterns, and Refining the Suzuka Full Node #340

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 18 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Contention, Actor Patterns, and Refining the Suzuka Full Node #340

l-monninger Aug 12, 2024 Maintainer

Summary

Replies: 6 comments · 18 replies

musitdev Aug 13, 2024 Collaborator

mzabaluev Aug 14, 2024 Maintainer

Composing async applications using actor model

Organization of a task

Front object

Background task

Construction API

Cross-task communication

Concurrency on smaller scale

Tower framework

External resources

l-monninger Aug 15, 2024 Maintainer Author

mzabaluev Aug 16, 2024 Maintainer

mzabaluev Aug 16, 2024 Maintainer

mzabaluev Aug 16, 2024 Maintainer

l-monninger Aug 16, 2024 Maintainer Author

l-monninger Aug 16, 2024 Maintainer Author

mzabaluev Aug 16, 2024 Maintainer

l-monninger Aug 16, 2024 Maintainer Author

l-monninger Aug 16, 2024 Maintainer Author

l-monninger Aug 16, 2024 Maintainer Author

l-monninger Aug 16, 2024 Maintainer Author

mzabaluev Aug 19, 2024 Maintainer

musitdev Aug 19, 2024 Collaborator

mzabaluev Aug 19, 2024 Maintainer

musitdev Aug 20, 2024 Collaborator

mzabaluev Aug 20, 2024 Maintainer

musitdev Aug 22, 2024 Collaborator

l-monninger
Aug 12, 2024
Maintainer

Replies: 6 comments 18 replies

musitdev
Aug 13, 2024
Collaborator

mzabaluev
Aug 14, 2024
Maintainer

l-monninger Aug 15, 2024
Maintainer Author

mzabaluev Aug 16, 2024
Maintainer

mzabaluev Aug 16, 2024
Maintainer

mzabaluev Aug 16, 2024
Maintainer

l-monninger Aug 16, 2024
Maintainer Author

l-monninger
Aug 16, 2024
Maintainer Author

mzabaluev
Aug 16, 2024
Maintainer

l-monninger Aug 16, 2024
Maintainer Author

l-monninger Aug 16, 2024
Maintainer Author

l-monninger Aug 16, 2024
Maintainer Author

l-monninger Aug 16, 2024
Maintainer Author

mzabaluev Aug 19, 2024
Maintainer

musitdev
Aug 19, 2024
Collaborator

mzabaluev Aug 19, 2024
Maintainer

musitdev Aug 20, 2024
Collaborator

mzabaluev Aug 20, 2024
Maintainer

musitdev
Aug 22, 2024
Collaborator