Skip to content

Example 2

Nakagome Tomoyuki edited this page Aug 24, 2016 · 1 revision

Core Dumps during Program Termination

A customer encountered many core dumps in their lab during acceptance testing. The program has been running for years without flaws, but the issue surfaced with the new, faster hardware. We, the software vendor, could not reproduce the issue at all on our own hardware.

The cores indicated that two threads were executing termination routines after exit() calls. Exception was caught in the catch(...) section, but it was not possible to identify the source and type of exception. It is generally too late when an exception is caught in the catch all clause. So we had asked the customer to preload libexray on the target program and to send the output to us.

Analysis with Libexray

From libexray outputs, The source of exception was quickly identified as access to a destroyed mutex. There is a wrapper class for mutex in the software, and the wrapper class throws exception when mutex lock/unlock fails. However, this exception was not handled anywhere and caught in the catch all clause.

Libexray outputs also indicated that the mutex was destroyed in one thread during program termination. This mutex belonged to a logger class, and its destructor freed the mutex. Then another thread accessed the destroyed mutex while logging something, which caused the exception from the mutex wrapper class and eventually the thread called exit().

The issue was fixed by not destroying the logger class in the termination code, hence avoiding mutex destruction.

Clone this wiki locally