Potential memory leak in using pyo3 with raw pointers `*mut ffi::PyObject` #2804

twitu · 2022-12-10T11:25:08Z

twitu
Dec 10, 2022

We're using pyo3 extensively in nautilus_trader. At a high level we use pyo3 types to create ffi functions in rust, these get compiled into cython modules which get compiled into shared objects that are loaded into the python interpreter.

We've fixed multiple leaks over the last month but there's still a few that persist. So I wanted to discuss here. We've been using valgrind to check for leaking running this strategy like this.

# no global variables
def strategy():
   # lots of logic

import gc
if __name__ == "__main__"
    strategy()
    gc.collect()

Turns out that after the run and garbage collection almost 15 MB of memory is still reachable.

==360210== 
==360210== LEAK SUMMARY:
==360210==    definitely lost: 1,864 bytes in 15 blocks
==360210==    indirectly lost: 0 bytes in 0 blocks
==360210==      possibly lost: 216,026 bytes in 213 blocks
==360210==    still reachable: 8,480,334 bytes in 11,215 blocks
==360210==                       of which reachable via heuristic:
==360210==                         stdstring          : 19,554 bytes in 544 blocks
==360210==         suppressed: 0 bytes in 0 blocks
=

Some traces that point to calls in pyo3. Happy to discuss and figure out if there's a way to narrow it down.

==360210== 
==360210== 326,312 bytes in 261 blocks are still reachable in loss record 3,067 of 3,072
==360210==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==360210==    by 0x1BBCA4: PyMem_RawMalloc (obmalloc.c:572)
==360210==    by 0x1BBCA4: _PyObject_Malloc (obmalloc.c:1966)
==360210==    by 0x1BBCA4: _PyObject_Malloc (obmalloc.c:1959)
==360210==    by 0x1A141F: new_keys_object (dictobject.c:600)
==360210==    by 0x1A15D0: dictresize (dictobject.c:1242)
==360210==    by 0x1A5142: insertion_resize (dictobject.c:1060)
==360210==    by 0x1A5142: insertdict (dictobject.c:1103)
==360210==    by 0x16AFC3: _PyEval_EvalFrameDefault (ceval.c:2770)
==360210==    by 0x235EA6: _PyEval_EvalFrame (pycore_ceval.h:46)
==360210==    by 0x235EA6: _PyEval_Vector (ceval.c:5065)
==360210==    by 0x235EA6: PyEval_EvalCode (ceval.c:1134)
==360210==    by 0x3683F8: builtin_exec_impl (bltinmodule.c:1056)
==360210==    by 0x3683F8: builtin_exec (bltinmodule.c.h:371)
==360210==    by 0x33C22B: cfunction_vectorcall_FASTCALL (methodobject.c:430)
==360210==    by 0x164B89: do_call_core (ceval.c:5943)
==360210==    by 0x164B89: _PyEval_EvalFrameDefault (ceval.c:4277)
==360210==    by 0x236073: _PyEval_EvalFrame (pycore_ceval.h:46)
==360210==    by 0x236073: _PyEval_Vector (ceval.c:5065)
==360210==    by 0x16BCD5: _PyObject_VectorcallTstate (abstract.h:114)
==360210==    by 0x16BCD5: PyObject_Vectorcall (abstract.h:123)
==360210==    by 0x16BCD5: call_function (ceval.c:5891)
==360210==    by 0x16BCD5: _PyEval_EvalFrameDefault (ceval.c:4181)
=

 
==360210== 493,890 bytes in 352 blocks are still reachable in loss record 3,070 of 3,072
==360210==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==360210==    by 0x1BBCA4: PyMem_RawMalloc (obmalloc.c:572)
==360210==    by 0x1BBCA4: _PyObject_Malloc (obmalloc.c:1966)
==360210==    by 0x1BBCA4: _PyObject_Malloc (obmalloc.c:1959)
==360210==    by 0x1EB35F: PyUnicode_New (unicodeobject.c:1463)
==360210==    by 0x1EB98C: _PyUnicode_FromUCS1 (unicodeobject.c:2463)
==360210==    by 0x268D4D: r_object (marshal.c:1112)
==360210==    by 0x269210: r_object (marshal.c:1174)
==360210==    by 0x269B4B: r_object (marshal.c:1350)
==360210==    by 0x269210: r_object (marshal.c:1174)
==360210==    by 0x269B4B: r_object (marshal.c:1350)
==360210==    by 0x26A917: read_object (marshal.c:1445)
==360210==    by 0x26ABBC: marshal_loads_impl (marshal.c:1754)
==360210==    by 0x26ABBC: marshal_loads (marshal.c.h:148)
==360210==    by 0x33BE46: cfunction_vectorcall_O (methodobject.c:516)
==360210==

Answered by twitu

Dec 15, 2022

The PyO3 code looks fine.

I'm not a Cython expert, however I just had a quick read of https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html?highlight=cast#type-casting

What's your definition of uuid4_to_pystr that Cython sees? It looks like <str>uuid4_to_pystr(&self._mem) might be the cause? It sounds like this will increase the reference count by one, which will cause a leak?

Yup thanks for this pointer (pun 😛) the fix is to decrement the reference count after casting. Because there is no good way to cast without incrementing ref count.

cdef inline str pyobj_to_str(PyObject* ptr):
    cdef PyObject* str_obj = ptr
    cdef str str_value = <str> str_obj
    Py_XDECREF(…

View full answer

birkenfeld · 2022-12-10T11:43:30Z

birkenfeld
Dec 10, 2022
Collaborator

Thanks for reaching out! In general, Python and Python extensions are not particularly meticulous in cleaning up globals and singletons, so some amount of "leaked" memory is to be expected.

What would be interesting is if the amount of detected leaks changes when you change your main script to do the following:

if __name__ == "__main__"
    strategy()
    gc.collect()
    strategy()
    gc.collect()

BTW, the two traces you show don't seem to point to PyO3 - one of the objects is a "dict_keys" object, which is allocated for every dict (and there are a lot of those in the "globals" category), and the other is something from marshal which is usually done to load .pyc bytecode into module code (also a global one-off thing).

0 replies

twitu · 2022-12-11T02:44:27Z

twitu
Dec 11, 2022
Author

I ran the snippet you mentioned. There's very little difference between the two runs.

==360210== 
==360210== LEAK SUMMARY:
==360210==    definitely lost: 1,864 bytes in 15 blocks
==360210==    indirectly lost: 0 bytes in 0 blocks
==360210==      possibly lost: 216,026 bytes in 213 blocks
==360210==    still reachable: 8,480,334 bytes in 11,215 blocks
==360210==                       of which reachable via heuristic:
==360210==                         stdstring          : 19,554 bytes in 544 blocks
==360210==         suppressed: 0 bytes in 0 blocks
==360210==

single run

==505886== 
==505886== LEAK SUMMARY:
==505886==    definitely lost: 1,872 bytes in 16 blocks
==505886==    indirectly lost: 0 bytes in 0 blocks
==505886==      possibly lost: 217,162 bytes in 215 blocks
==505886==    still reachable: 8,534,458 bytes in 12,468 blocks
==505886==                       of which reachable via heuristic:
==505886==                         stdstring          : 19,554 bytes in 544 blocks
==505886==         suppressed: 0 bytes in 0 blocks
==505886==

two runs

Yet when I measure the heap while the script is running the heap continuously grows after each successive strategy run. After the first run the heap as 74 MB, second run 94 MB, and third run 102 MB. Given what you've said about global allocations, the valgrind logs don't match the growing heap observation.

Are there any known issues with pyo3 that may cause this - especially with raw ffi pointers? A quick issue search showed that all the memory leak issues have been resolved.

2 replies

birkenfeld Dec 11, 2022
Collaborator

Interesting. How are you measuring the heap? There is also a possibility that there is an allocator involved that Valgrind doesn't know about and therefore can't hook into. (Rust itself used to be in that category, but switched to using the system allocator by default some time ago.)

I don't know of an issue, currently, but that doesn't mean there isn't one. When dealing with the Python API and its raw pointers, it's easy to get the reference counting wrong. But these Python object leaks should be detectable by Valgrind...

Is it possible to reproduce your example? You have linked to a public repository, but without knowing what strategy does...

twitu Dec 14, 2022
Author

I'm measuring using memray and visually on the system monitor both indicate the heap growing over successive calls despite not showing up in valgrind.

Yes rust now uses system malloc so the extra allocator issue should not be there.

The inside the strategy is pretty complex, we're unlikely to find much wading through that. Still this is the strategy I'm testing on, it just loads data from a file and runs the engine. link.

However the Rust core is pretty small. There are a couple of places that are highly suspect of leaking memory. I'll try to share a minimal, easy to reproduce example.

twitu · 2022-12-15T03:54:22Z

twitu
Dec 15, 2022
Author

Ok so I've made minimal example that I'll share soon. Here are the results, I've been using tracemalloc to find the exact line where the memory references are being held on.

def test_large_printing():
    for _ in range(5):
        for n in range(10000):
            // UUID4 is a cython object and here we're creating an object with a unique string each time
            a = UUID4("550e8400-e29b-41d4-a716-44665544" + str(n % 10000).zfill(4))
            // printing is done by creating a pystring in rust using pyo3 and returning a pointer to it
            print(a)  // line 37: memory leaked here
        gc.collect()
    gc.collect()

    snap1 = tracemalloc.take_snapshot()
    test_large_printing()
    snap2 = tracemalloc.take_snapshot()
    snap2.diff(snap1) // print difference and lines where memory is allocated

[ Top 10 differences ]
/home/twitu/Code/nautilus_memory_bug_repro/tests/test_objects.py:37: size=4150 KiB (+4150 KiB), count=50000 (+50000), average=85 B

Ideally there should be no different or minimal difference between the two snaps after running the test. Minimal difference mostly attributed to some small allocations in the globals category as you've mentioned.

But here it shows ~4 MB of references still held on after the function has returned. I want to check with you if this result indicates something is wrong.

3 replies

davidhewitt Dec 15, 2022
Maintainer

You say "printing is done in Rust", it's not clear what that means. Can you show the code doing that?

twitu Dec 15, 2022
Author

Yes we're trying to make the experiments repo public soon. In the mean while here's the portion of the code. My bad with the confusing phrase, Rust creates a python string and returns it to Python for printing like shown below.

pub struct UUID4 {
    pub value: Box<Rc<String>>,
}

#[no_mangle]
pub unsafe extern "C" fn uuid4_to_pystr(uuid: &UUID4) -> *mut ffi::PyObject {
    let s = uuid.value.as_str();
    Python::with_gil(|py| {
        let pystr: Py<PyString> = PyString::new(py, s).into();
        pystr.into_ptr()
    })
}

uuid.rs

cdef class UUID4:
    cdef UUID4_t _mem

    cdef str to_str(self):
        return <str>uuid4_to_pystr(&self._mem)

    def __str__(self) -> str:
        return self.to_str()

uuid.pyx

These are all the key bits that make the printing happen.

davidhewitt Dec 15, 2022
Maintainer

The PyO3 code looks fine.

I'm not a Cython expert, however I just had a quick read of https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html?highlight=cast#type-casting

What's your definition of uuid4_to_pystr that Cython sees? It looks like <str>uuid4_to_pystr(&self._mem) might be the cause? It sounds like this will increase the reference count by one, which will cause a leak?

twitu · 2022-12-15T07:59:56Z

twitu
Dec 15, 2022
Author

The PyO3 code looks fine.

I'm not a Cython expert, however I just had a quick read of https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html?highlight=cast#type-casting

What's your definition of uuid4_to_pystr that Cython sees? It looks like <str>uuid4_to_pystr(&self._mem) might be the cause? It sounds like this will increase the reference count by one, which will cause a leak?

Yup thanks for this pointer (pun 😛) the fix is to decrement the reference count after casting. Because there is no good way to cast without incrementing ref count.

cdef inline str pyobj_to_str(PyObject* ptr):
    cdef PyObject* str_obj = ptr
    cdef str str_value = <str> str_obj
    Py_XDECREF(str_obj)
    return str_value

This is needed because pyo3 returns an owned pointer with ref count 1 and casting it makes the ref count 2. And after removing the reference from scope the ref count becomes 1 again. But this prevents the object from being garbage collected.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential memory leak in using pyo3 with raw pointers `*mut ffi::PyObject` #2804

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Potential memory leak in using pyo3 with raw pointers *mut ffi::PyObject #2804

twitu Dec 10, 2022

Replies: 4 comments · 5 replies

birkenfeld Dec 10, 2022 Collaborator

twitu Dec 11, 2022 Author

birkenfeld Dec 11, 2022 Collaborator

twitu Dec 14, 2022 Author

twitu Dec 15, 2022 Author

davidhewitt Dec 15, 2022 Maintainer

twitu Dec 15, 2022 Author

davidhewitt Dec 15, 2022 Maintainer

twitu Dec 15, 2022 Author

Potential memory leak in using pyo3 with raw pointers `*mut ffi::PyObject` #2804

twitu
Dec 10, 2022

Replies: 4 comments 5 replies

birkenfeld
Dec 10, 2022
Collaborator

twitu
Dec 11, 2022
Author

birkenfeld Dec 11, 2022
Collaborator

twitu Dec 14, 2022
Author

twitu
Dec 15, 2022
Author

davidhewitt Dec 15, 2022
Maintainer

twitu Dec 15, 2022
Author

davidhewitt Dec 15, 2022
Maintainer

twitu
Dec 15, 2022
Author