Skip to content
This repository has been archived by the owner on Mar 22, 2023. It is now read-only.

vcmap__concurrent_put_get_remove randomly fails #1054

Open
kilobyte opened this issue Jan 3, 2022 · 7 comments
Open

vcmap__concurrent_put_get_remove randomly fails #1054

kilobyte opened this issue Jan 3, 2022 · 7 comments

Comments

@kilobyte
Copy link
Contributor

kilobyte commented Jan 3, 2022

Environment Information

Name Version
pmemkv version(s) 1.5.0
libpmemobj-cpp version(s) 1.13.0
PMDK (libpmem/libpmemobj) version(s) 1.11.1
OS(es) version(s) Debian unstable
kernel version(s) 5.10
TBB version(s):
memkind version(s): 1.12.0
ndctl version(s): 71.1

Please provide a reproduction of the bug:

On the Reproducible Builds CI, this test fails pretty often. On my home box, it has never failed despite many many tries.

Link: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/pmemkv.html (it's live thus it may fail to fail if the CI decides to run again).

-- Executing:  /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params vcmap;{"path":"/build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/test/vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none","size":125829120};1000
-- Test vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none:
-- Stdout:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]
7: /lib/x86_64-linux-gnu/libpthread.so.0 (start_thread+0xe0) [0x7f18ef1b8d80]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]

Signal: Aborted, backtrace:
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]

Signal: Aborted, backtrace:
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
8: /lib/x86_64-linux-gnu/libc.so.6 (clone+0x3f) [0x7f18eee2cba8]

3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
7: /lib/x86_64-linux-gnu/libpthread.so.0 (start_thread+0xe0) [0x7f18ef1b8d80]
8: /lib/x86_64-linux-gnu/libc.so.6 (clone+0x3f) [0x7f18eee2cba8]


-- Stderr:
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc

CMake Error at /build/1st/pmemkv-1.5.0/tests/helpers.cmake:185 (message):
   /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params vcmap;{"path":"/build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/test/vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none","size":125829120};1000 failed: 134
Call Stack (most recent call first):
  /build/1st/pmemkv-1.5.0/tests/helpers.cmake:226 (execute_common)
  /build/1st/pmemkv-1.5.0/tests/engines/memkind_based/default.cmake:9 (execute)
@lukaszstolarczuk
Copy link
Member

@kilobyte, how many cores (as reported by nproc) and free memory are there on this machine?

@kilobyte
Copy link
Contributor Author

kilobyte commented Jan 3, 2022

Damned if I know... I can print that data, what commands would be useful in your opinion?

@lukaszstolarczuk
Copy link
Member

I believe nproc and free -g should be enough

@lukaszstolarczuk
Copy link
Member

@kilobyte bump - can you pls provide the data I asked for - that'd be appreciated 😃

can you also, please, update memkind to a more recent version (1.14.0 at best)

@kilobyte
Copy link
Contributor Author

kilobyte commented Sep 8, 2022

nproc is 15 is build1, 16 in build2; memory for both: 48G RAM + 200G swap; filesystem: tmpfs 24G
The results are random; currently both runs have succeeded.

@kilobyte
Copy link
Contributor Author

kilobyte commented Sep 8, 2022

I even see a comment on the RB page:

Comments: rpath issue fixed by -DCMAKE_BUILD_RPATH_USE_ORIGIN=ON, but unable to verify effect on test suite due to nondeterministic failures.

Thus others see random fails too.

@kilobyte
Copy link
Contributor Author

... so I was about to report that the fail is no more, but it turns out all my recent tests have -E 'vcmap__concurrent_put_get_remove_.*' thus RB no longer failing is not as good as it seems.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants