Design Pattern For "Correct" Memory Management #2114

the-wondersmith · 2022-01-19T19:13:05Z

the-wondersmith
Jan 19, 2022

Quick background:

My day job is Python developer
I'm much newer to Rust than I am to Python
My main goal in learning Rust is to be as proficient with it as I am in Python
I've (somewhat masochistically) decided to expand my proverbial Rust horizons by diving head-first into writing Python extensions

By way of accomplishing the above, I'm working on writing a PEP 249 compliant interface for reading (and eventually writing) a really niche type of embedded "database" file (DataFlex v2.3b/3.0) that's been effectively EOL'd for... years now.

Currently, I've got all of the necessary machinery in place to read the table files correctly with all of the helper functions and "core" objects implemented as #[pyclass] decorated structs. maturin and poetry seem to play nice, and I can import my module and call the functions from Python as advertised, so on and so forth.

What I'm... struggling with (for lack of a better word) is gathering information on implementing the best (or even a good) design pattern from a memory + interoperability standpoint. Like I said, I've got table reads working nicely, but I'm like 90% certain that my implementation isn't as memory-friendly as is could / should be.

For instance, all of the "core" helper functions I've implemented look something like:

#[pyfunction]
#[pyo3(text_signature = "(data: bytes) -> Optional[datetime.date]")]
/// Get a date value  stored as a series of packed Binary Coded Decimals.
pub fn date_from_bytes(data: &[u8]) -> PyResult<PyDate> { ... }

and the various structures that make up the table file's format look something like:

#[derive(Clone, Debug, Default, Eq, Ord, PartialOrd, PartialEq)]
#[pyclass(dict, module = "ferroflex.structs")]
/// A structured representation of a field segment's
/// definition in the header of a DataFlex table file
pub struct FieldSegment {
    #[pyo3(get, set)]
    /// The column number (with respect to
    /// the column's parent table) to which
    /// the segment refers
    pub column: u8,
    #[pyo3(get, set)]
    /// The segment's position within its
    /// associated index
    pub segment: u8,
}


#[pymethods]
impl FieldSegment {

    #[new]
    fn __new__(column: Option<u8>, segment: Option<u8>) -> Self {
        Self {
            column: column.unwrap_or_default(),
            segment: segment.unwrap_or_default(),
        }
    }

}

My intuition is that it would be better to refactor everything such that all "new" data is allocated directly on the Python heap, and all "existing" data is only ever passed to the Rust "side" of things as something like PyRef<DataType> (or whatever the appropriate type would be) so as to minimize duplicate allocations / extra "work" on the part of the Rust functions.

Is that even possible? If so, how? I keep running smack into the need to call .clone() on values in order for them to be returnable from Rust functions back to the Python "side" of things.
If it's not possible, what's the recommended / idiomatic strategy or pattern I should shoot for?

Any guidance y'all could offer would be much appreciated!

the-wondersmith · 2022-01-22T19:55:46Z

the-wondersmith
Jan 22, 2022
Author

Shameless self-bump 🥲. Desperately hoping to catch @davidhewitt's attention...

0 replies

davidhewitt · 2022-01-27T23:44:59Z

davidhewitt
Jan 27, 2022
Maintainer

Sorry for the slow reply; very busy recently and can only find so much time to respond to PyO3 things. Longer requests like this which require a bit of context switch kinda get buffered until I have time to sit down and think about them for a moment.

What I'm... struggling with (for lack of a better word) is gathering information on implementing the best (or even a good) design pattern from a memory + interoperability standpoint. Like I said, I've got table reads working nicely, but I'm like 90% certain that my implementation isn't as memory-friendly as is could / should be.

I'm not sure exactly what you're aiming for; the most "memory efficient" way would be to pack the memory into some kind of single Rust-owned buffer, which Python could read parts of. (e.g. think something like a pandas.DataFrame.) There's actually some fantastic discussion going on in PyO3/rust-numpy#254, so even if this might not be possible right now, it might be in PyO3 very soon.

If so, how? I keep running smack into the need to call .clone() on values in order for them to be returnable from Rust functions back to the Python "side" of things.

PyRef<T> converts to Py<T> with a .into() conversion, which might be the missing thing you wanted?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Pattern For "Correct" Memory Management #2114

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Design Pattern For "Correct" Memory Management #2114

the-wondersmith Jan 19, 2022

Replies: 2 comments

the-wondersmith Jan 22, 2022 Author

davidhewitt Jan 27, 2022 Maintainer

the-wondersmith
Jan 19, 2022

the-wondersmith
Jan 22, 2022
Author

davidhewitt
Jan 27, 2022
Maintainer