Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python demo script causes segfault with Python2 #19

Open
courtarro opened this issue Sep 23, 2019 · 6 comments
Open

Python demo script causes segfault with Python2 #19

courtarro opened this issue Sep 23, 2019 · 6 comments

Comments

@courtarro
Copy link

courtarro commented Sep 23, 2019

Running on 12-thread i7 in 64-bit Linux (Ubuntu Bionic). Compiled and installed libwirehair-shared.so and ran python2 whirehair.py:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5d6937e in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
    at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
4051	    if ((uint16_t)block_id == _block_count - 1) {

GDB stack trace:

#0  0x00007ffff5d6937e in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
    at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
#1  0x00007ffff5d59af4 in wirehair_encode (codec=0x55b6ba50, blockId=1, blockDataOut=0x555555bb2ef0, outBytes=32, dataBytesOut=0x7ffff7ec9910)
    at /home/(redacted)/software/external/wirehair/wirehair.cpp:139
#2  0x00007ffff5f9bdae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#3  0x00007ffff5f9b71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#4  0x00007ffff61aead4 in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#5  0x00007ffff61ae4d5 in ?? () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#6  0x000055555564df9e in PyEval_EvalFrameEx ()
#7  0x0000555555646b0a in PyEval_EvalCodeEx ()
#8  0x0000555555646429 in PyEval_EvalCode ()
#9  0x00005555556764cf in ?? ()
#10 0x0000555555671442 in PyRun_FileExFlags ()
#11 0x00005555556708bd in PyRun_SimpleFileExFlags ()
#12 0x000055555562075b in Py_Main ()
#13 0x00007ffff7a05b97 in __libc_start_main (main=0x5555556200c0 <main>, argc=2, argv=0x7fffffffde28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffde18) at ../csu/libc-start.c:310
#14 0x000055555561ffda in _start ()

Works fine in Python 3. I am currently debugging.

@courtarro
Copy link
Author

courtarro commented Sep 23, 2019

This is really weird. I expanded line 4051, which was triggering the segfault:

if ((uint16_t)block_id == _block_count - 1) {

to the following 4 lines:

uint16_t bc = _block_count;
uint16_t last_block = bc - 1;
uint16_t bid_u16 = (uint16_t)block_id;
if (bid_u16 == last_block) {

Now the segfault happens at the very first line, when attempting to read the value of _block_count. I don't understand why it would be unable to read that variable.

0x00007ffff5d69375 in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
    at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
4051	    uint16_t bc = _block_count;

GDB is also unable to read it. Here is the attempt to read block_id, which works, and _block_count, which doesn't:

(gdb) print block_id
$1 = 1
(gdb) print _block_count
Cannot access memory at address 0x55b6ba54

@danieagle
Copy link
Contributor

Hi! Courtarro!
From your gdb
try use first (lines 97 and 98)
blockid = ctypes.c_uint16(0)
needed = ctypes.c_uint16(0)

worked ?
if yes, pleeaase try
change line 116 to:
ctypes.c_uint16(blockid.value), #ID of block to generate

Thanks For the patience! :-)

[]'s Dani.

@courtarro
Copy link
Author

I finally got around to trying this. I replaced the above listed mentions of c_uint() with c_uint16() as well as another place where c_int() was used (substituted c_int32() in that case). Still segfaults.

@catid
Copy link
Owner

catid commented Dec 3, 2020

If it's segfaulting probably the best way to debug is to build in debug mode and attach a debugger to it. Probably some input is invalid to the C++ code.

@courtarro
Copy link
Author

courtarro commented Dec 3, 2020

I'm not an expert at ctypes. Python thinks the encoder variable is the default c_int, rather than a full WirehairCodec object. Any reason that might confuse the garbage collection process? The variable stays in scope, so I don't think that would be it. But gdb is unable to access any member variable of the WirehairCodec object, which leads me to believe there's some sort of memory corruption going on.

With Python 2.7 going away, I'm not that worried about whether it works with Python 2.7 in the long term. My original motivation was to use this with GnuRadio 3.7, which is P2.7-based, and GR has since moved to Python 3. However, I'd like to better understand the problem in case it's actually just revealing a more serious underlying issue and P3 happens not to trigger it, but could end up failing later.

@catid
Copy link
Owner

catid commented Dec 4, 2020

I read some ctypes docs. I think what might be missing is this:

wirehair.wirehair_encoder_create.restype = ctypes.c_void_p

Maybe also need to wrap it like this: c_void_p(wirehair.wirehair_encoder_create(...))

What may be happening is the default type is a 32-bit integer, which truncated the 64-bit pointer from the library. Passing it back in would lead to invalid memory access as you described...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants