Some issues with the atomics in hvm.c #427

zin0viev · 2024-08-30T17:46:17Z

zin0viev
Aug 30, 2024

Hi!

Recently I've been studying the code of HVM with the hope that I will be able to borrow some ideas that maybe I can use for a future project of mine. I've noticed two problems with the way the atomics are used that I'd like to share.

Please notice, however, that I am not an experienced C programmer so I can be wrong about the following.

Using memory_order_relaxed everywhere is wrong

I've noticed that all atomic operations use memory_order_relaxed. I suppose this is OK on architectures ensuring strong memory consistency such as the popular x86. This will be wrong, however, on architectures with relaxed memory consistency, such as arm. For example the following scenario will be possible:

Core 1 writes some data A in the net
Core 1 pushes in the rbag_buf a redex referencing A
A starving Core 2 steals this redex and initiates an interaction.
The data A, however, is not yet visible to Core 2 so it performs the interaction with some random data.

There are two possible reasons why 4. can happen:

The data A is in the cache of Core 1 and not yet in the RAM. There is no way for Core 2 to read it.
The data A is already in the RAM, however Core 2 uses its own cache with old data instead of the RAM.

Why this does not happen on x86:

On x86 it is guaranteed that at the time the redex reaches the RAM, everything else that has been written by Core 1 beforehand has also reached the RAM. Since Core 1 writes the data A before the redex, data A reaches the RAM before the redex.
It is also guaranteed that if some data from the RAM reaches Core 2 (that is, it reaches the cache of Core 2), then any inconsistency between the cache of Core 2 and the RAM concerns data that has reached the RAM after the data read by Core 2. Since the data A has reached the RAM before the redex, when Core 2 reads the redex from the RAM the old data in its cache will be invalidated so Core 2 will be forced to read it from the RAM and not from its own cache.

The solution

Instead of memory_order_relaxed one has to use memory_order_release when writing to an atomic and memory_order_acquire when reading. Unless, of course, when this is an operation which doesn't care about the ordering, for example the modification of some counter.

Too many atomics

I suppose the CPU implements atomic writings by flushing the data in the write-queue of the cache of the core performing the writing and by invalidating some of the data in the caches of the other cores. Obviously, this can hurt the performance. In these days the CPU is much faster than the RAM so we want to use the cache of the CPU efficiently.

Fortunately, there is no need of so many atomic operations. I don't know the code in hvm.c well enough and I can not be sure, but I think it is not necessary to use atomic operations in order to access node_buf. In the following explanation I will use some terminology from https://en.cppreference.com/w/cpp/atomic/memory_order

Suppose Thread 1 pushes a redex using memory_order_release and Thread 2 pops it afterwards using memory_order_acquire. This means that the store of the redex by Thread 1 and the subsequent load by Thread 2 are in the relation synchronizes-with (by the definition of this relation).

Now we read the definition of the relation Inter-thread happens-before. We see that by rule 3 the store of the redex by Thread 2 and the subsequent read of the data in node_buf related to this redex are in the relation inter-thread happens-before (even if Thread 2 doesn't use atomic operations in order to read the data).

Moreover, now by rule 4 we can conclude that the store of the data related to the redex in node_buf by Thread 1 and the load of this data by Thread 2 are also in the relation inter-thread happens-before.

Consequently, these two events are in the relation happens before and so the store by Thread 1 is Visible side-effect in Thread 2 (according to the definition of visible side-effect).

enricozb · 2024-08-31T08:01:59Z

enricozb
Aug 31, 2024
Maintainer

This is great information, thanks for posting. It's impossible to find a single individual with all of the knowledge to optimize the HVM runtime so this sort of crowd-sourced advice is super useful. We'll take a close look at this when we next spend time on optimizations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues with the atomics in hvm.c #427

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Some issues with the atomics in hvm.c #427

zin0viev Aug 30, 2024

Replies: 1 comment

enricozb Aug 31, 2024 Maintainer

zin0viev
Aug 30, 2024

enricozb
Aug 31, 2024
Maintainer