Replies: 1 comment
-
This is great information, thanks for posting. It's impossible to find a single individual with all of the knowledge to optimize the HVM runtime so this sort of crowd-sourced advice is super useful. We'll take a close look at this when we next spend time on optimizations. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Recently I've been studying the code of HVM with the hope that I will be able to borrow some ideas that maybe I can use for a future project of mine. I've noticed two problems with the way the atomics are used that I'd like to share.
Please notice, however, that I am not an experienced C programmer so I can be wrong about the following.
Using
memory_order_relaxed
everywhere is wrongI've noticed that all atomic operations use
memory_order_relaxed
. I suppose this is OK on architectures ensuring strong memory consistency such as the popular x86. This will be wrong, however, on architectures with relaxed memory consistency, such as arm. For example the following scenario will be possible:rbag_buf
a redex referencing AThere are two possible reasons why 4. can happen:
Why this does not happen on x86:
The solution
Instead of
memory_order_relaxed
one has to usememory_order_release
when writing to an atomic andmemory_order_acquire
when reading. Unless, of course, when this is an operation which doesn't care about the ordering, for example the modification of some counter.Too many atomics
I suppose the CPU implements atomic writings by flushing the data in the write-queue of the cache of the core performing the writing and by invalidating some of the data in the caches of the other cores. Obviously, this can hurt the performance. In these days the CPU is much faster than the RAM so we want to use the cache of the CPU efficiently.
Fortunately, there is no need of so many atomic operations. I don't know the code in
hvm.c
well enough and I can not be sure, but I think it is not necessary to use atomic operations in order to accessnode_buf
. In the following explanation I will use some terminology from https://en.cppreference.com/w/cpp/atomic/memory_orderSuppose Thread 1 pushes a redex using
memory_order_release
and Thread 2 pops it afterwards usingmemory_order_acquire
. This means that the store of the redex by Thread 1 and the subsequent load by Thread 2 are in the relation synchronizes-with (by the definition of this relation).Now we read the definition of the relation Inter-thread happens-before. We see that by rule 3 the store of the redex by Thread 2 and the subsequent read of the data in
node_buf
related to this redex are in the relation inter-thread happens-before (even if Thread 2 doesn't use atomic operations in order to read the data).Moreover, now by rule 4 we can conclude that the store of the data related to the redex in
node_buf
by Thread 1 and the load of this data by Thread 2 are also in the relation inter-thread happens-before.Consequently, these two events are in the relation happens before and so the store by Thread 1 is Visible side-effect in Thread 2 (according to the definition of visible side-effect).
Beta Was this translation helpful? Give feedback.
All reactions