-
We're using pyo3 extensively in nautilus_trader. At a high level we use pyo3 types to create ffi functions in rust, these get compiled into cython modules which get compiled into shared objects that are loaded into the python interpreter. We've fixed multiple leaks over the last month but there's still a few that persist. So I wanted to discuss here. We've been using valgrind to check for leaking running this strategy like this. # no global variables
def strategy():
# lots of logic
import gc
if __name__ == "__main__"
strategy()
gc.collect() Turns out that after the run and garbage collection almost 15 MB of memory is still reachable.
Some traces that point to calls in pyo3. Happy to discuss and figure out if there's a way to narrow it down.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 5 replies
-
Thanks for reaching out! In general, Python and Python extensions are not particularly meticulous in cleaning up globals and singletons, so some amount of "leaked" memory is to be expected. What would be interesting is if the amount of detected leaks changes when you change your main script to do the following:
BTW, the two traces you show don't seem to point to PyO3 - one of the objects is a "dict_keys" object, which is allocated for every dict (and there are a lot of those in the "globals" category), and the other is something from |
Beta Was this translation helpful? Give feedback.
-
I ran the snippet you mentioned. There's very little difference between the two runs.
Yet when I measure the heap while the script is running the heap continuously grows after each successive strategy run. After the first run the heap as 74 MB, second run 94 MB, and third run 102 MB. Given what you've said about global allocations, the valgrind logs don't match the growing heap observation. Are there any known issues with pyo3 that may cause this - especially with raw ffi pointers? A quick issue search showed that all the memory leak issues have been resolved. |
Beta Was this translation helpful? Give feedback.
-
Ok so I've made minimal example that I'll share soon. Here are the results, I've been using tracemalloc to find the exact line where the memory references are being held on. def test_large_printing():
for _ in range(5):
for n in range(10000):
// UUID4 is a cython object and here we're creating an object with a unique string each time
a = UUID4("550e8400-e29b-41d4-a716-44665544" + str(n % 10000).zfill(4))
// printing is done by creating a pystring in rust using pyo3 and returning a pointer to it
print(a) // line 37: memory leaked here
gc.collect()
gc.collect()
Ideally there should be no different or minimal difference between the two snaps after running the test. Minimal difference mostly attributed to some small allocations in the globals category as you've mentioned. But here it shows ~4 MB of references still held on after the function has returned. I want to check with you if this result indicates something is wrong. |
Beta Was this translation helpful? Give feedback.
-
Yup thanks for this pointer (pun 😛) the fix is to decrement the reference count after casting. Because there is no good way to cast without incrementing ref count. cdef inline str pyobj_to_str(PyObject* ptr):
cdef PyObject* str_obj = ptr
cdef str str_value = <str> str_obj
Py_XDECREF(str_obj)
return str_value This is needed because pyo3 returns an owned pointer with ref count 1 and casting it makes the ref count 2. And after removing the reference from scope the ref count becomes 1 again. But this prevents the object from being garbage collected. |
Beta Was this translation helpful? Give feedback.
Yup thanks for this pointer (pun 😛) the fix is to decrement the reference count after casting. Because there is no good way to cast without incrementing ref count.