ThreadSanitizerAboutRaces

Introduction

Most programmers know that races are harmful.
For some races it could be quite easy to predict what may go wrong.
For example, the variable var in the following code may not be equal to 2 at the end of the program execution (for more examples refer to ThreadSanitizerPopularDataRaces)

int var;

void Thread1() {  // Runs in one thread.

  var++;

}

void Thread2() {  // Runs in another thread.

  var++;

}

However, there are other, much more subtle races which are much less obviously harmful. This article is about such races. Most of the content is specific to C/C++.

TODO(timurrrr) Reference the C++11 standard mentioning data races lead to UB?

TODO(kcc) This article is not finished. Expect more content later.

Racy publication

The race which is most frequently (and mistakenly) perceived as harmless is unsafe publication:

struct Foo {

  int a;

  Foo() { a = 42; }

};



static Foo *foo = NULL;



void Thread1() {  // Create foo.

  foo = new Foo();

}



void Thread2() {  // Consume foo.

  if (foo) {

     assert(foo->a == 42);

  }

}

ThreadSanitizer and most other race detectors will report this as a race on foo.

Objection 1

But hey, why this race is harmful? Follow my fingers:

Thread1 calls new Foo(), which allocates memory and calls Foo::Foo()
Thread1 assigns the new value to foo and foo becomes non-NULL.
Thread2 checks if foo != NULL and only then reads foo->a.
The code is safe!

Clarification 1

On machines with weak memory model, such as IBM Power or ARM,
foo != NULL does not guarantee that the contents of *foo are ready for reading in Thread2.
For example, the following code does not work on machines with weak memory model.

int A = 0, B = 0;



void Thread1() {

  A = 1;

  B = 1;

}



void Thread2() {

  if (B == 1) {

     assert(A == 1);

  }

}

Objection 2

Ok, ok. But I don't care about weak memory model machines. x86 has a strong memory model!

Clarification 2

If you are writing in assembly, yes, this is safe.
But with C/C++ you can not rely on the machine's strong memory model because the compiler may (and will!) rearrange the code for you.
For example, it may easily replace A=1;B=1; with B=1;A=1;.

Objection 3

Ok, I believe that compilers may do simple reordering, but that doesn't apply to my code!
For example, Foo::Foo() is defined in one .cc file and foo = new Foo() resides in another.
And even if these were in a single file, what kind of reordering could harm me?

Clarification 3

First, it's a bad idea to rely on the compiler's inability to do some legal transformation.
Second, the modern compilers are much more sophisticated than most programmers think.
For example, many compilers can do cross-file inlining or even inline virtual functions (!).
So, in the example above a compiler may (and often will) change the code to

  foo = (Foo*)malloc(sizeof(Foo));

  new (foo) Foo();

Objection 4

Ok. So I will do something to avoid any bad compiler effect in the thread which creates foo.
But obviously, the consumer thread's code is safe. What can possibly go wrong with this code on x86?

void Thread2() {  // Consume foo.

  if (foo) {  

     // foo is properly published by Thread1().

     assert(foo->a == 42);

  }

}

Clarification 4

Again, you are underestimating the cleverness of modern compilers. How about this (it really happens!)?

void Thread2() {  // Consume foo.

  Foo *t1 = foo;  // reads NULL

  Foo *t2 = foo;  // reads the new value

  if (t2) {

     assert(t1->a == 42);

  }

}

Double-Checked Locking

Double-Checked Locking is broken in C++ because of all the reasons discussed above.

Racy lazy initialization

One more frequent bug:

// lazy init for Pi. 

float *GetPi() {

  static float pi;

  static bool have_pi = false;

  if (!have_pi) {

     pi = ComputePi(); // atomic assignment, no cache coherency issues.                    

     have_pi = true;

  }

  return pi;

}

Even experienced programmers may assume that this code is correct on a CPU with strong memory model.
But remember that a compiler may rearrange have_pi = true; and pi = ComputePi(); and one of the threads will return uninitialized value of pi.

Volatile

DON'T use volatile for synchronization in C/C++.
The C and C++ standards don't say anything about volatile and threads -- so don't assume anything.

Also, the C++11, chapter intro.multithread, paragraph 21 says:
"The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior."

You don't believe us? Ok, here is a simplest proof that C++ compilers don't treat volatile as synchronization.

% cat volatile.cc 

typedef long long T;

volatile T vlt64;

void volatile_store_64(T x) {

        vlt64 = x;

}

% g++ --version 

g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3

...

% g++ -O2 -S volatile.cc -m32 && cat volatile.s 

...

_Z17volatile_store_64x:

.LFB0:

        pushl   %ebp

        movl    %esp, %ebp

        movl    8(%ebp), %eax

        movl    12(%ebp), %edx

        popl    %ebp

        movl    %eax, vlt64

        movl    %edx, vlt64+4

        ret

...

You can clearly see that a 64-bit volatile variable is written in two pieces w/o any synchronization.
Do you still use C++ volatile as synchronization?

You may object that we are cheating (with 64-bit ints on a 32 arch).
Ok, we indeed cheated, but just a bit. How about this case?

% cat volatile2.cc 

volatile bool volatile_bool_done;

double regular_double;

void volatile_example_2() {

        regular_double = 1.23;

        volatile_bool_done = true;

}

% g++ -O2 volatile2.cc -S

% cat volatile2.s 

...

_Z18volatile_example_2v:

.LFB0:

..

        movabsq $4608218246714312622, %rax

        movb    $1, volatile_bool_done(%rip)

        movq    %rax, regular_double(%rip)

        ret

...

Here you can see that the compiler reordered a non-volatile store and a volatile store.

Are you convinced now? If no, please let us know why!

How to debug and fix

If you got a data race report:

analyze the code locations and the stack traces of the racing accesses
understand the object/field that is subject to the race (using the top frame)
understand the context in which the object is accessed (using the whole stack trace)

Note: Location description (heap allocation stack, global variable name, or thread stack) may help to understand the object that is subject to the race.

Note: If the race happens on a standard container (e.g. std::string), then you may see a number of stack frames inside of the standard library. In such cases find the first frame that calls into the standard library.

Once you understood that, there are 3 main scenarios:

The object is supposed to be accessed concurrently, but lacks proper internal synchronization. In this case, add internal synchronization to the object. For example, add std::mutex field to the class and lock it inside of the methods.
The object is not supposed to be accessed concurrently (for example, this is true for all standard containers like std::mutex, std::vector, std::map, etc) and the calling code (upper up the stack) incorrectly accesses the object concurrently. In this case, change the calling code to not access the object concurrently. This may involve adding std::mutex in the calling code, waiting for completion of something before accessing the objects, queuing some callbacks sequentially, etc. Generally, this is highly-dependent on the concrete system and cannot be answered without understanding how the system is supposed to work and how it's actually working.
If you believe the report is a false positive and the accesses do not happen concurrently, find the exact synchronization that is supposed to make them non-concurrent and ensure that it's indeed in place and is correct, e.g. placed on the right side (before/after) of the exact racing accesses.

Note: try to avoid using raw atomic primitives and memory fences when fixing data races. These primitives are notoriously hard to use correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly