Skip to content

Commit

Permalink
version 0.1.0 (#1)
Browse files Browse the repository at this point in the history
version 0.1.0
  • Loading branch information
vincentberenz authored Aug 21, 2024
1 parent 692b134 commit 167182b
Show file tree
Hide file tree
Showing 23 changed files with 2,673 additions and 1 deletion.
33 changes: 33 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: unit tests

on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest

strategy:
matrix:
python-version: [3.10.14]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install
- name: Run tests
run: |
poetry run pytest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*pyc
*~
*__pycache__
325 changes: 324 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,326 @@
[![Python package](https://github.com/<your-username>/<your-repo>/actions/workflows/python-app.yml/badge.svg)](https://github.com/<your-username>/<your-repo>/actions/workflows/python-app.yml)


# Cute-Shm

Under construction.
Cute-Shm is a convenience wrapper over Python's multiprocessing shared memory. It provides an easy-to-use API for managing shared memory numpy arrays and HDF5 files.
Using the shared memory allows to share numpy arrays across multiple processes running on the same node.

## Requirements

Python 3.10 or later.

## Installation

You can install Cute-Shm using pip:

```
pip install cute-shm
```

## Usage

### API

#### sharing numpy arrays

```python
import numpy as np
import cute_shm as cute

# Create some numpy arrays
a = np.array([[12, 0, 0], [0, 10, 0], [0, 0, 0]], dtype=np.int64)
b1 = np.zeros(100, dtype=np.float32)
b2 = np.zeros(300, dtype=np.float32)

# Create a nested dictionary of arrays
arrays = {"a": a, "b": {"b1": b1, "b2": b2}}

# An arbitrary name for this projet
project_name = "myproject"

# transfer arrays to shared memory

# set to True if the shared memory should not be cleaned upon exit
# i.e. another process may need to access it later
persistent = False

# set to True if the shared memory should be overwritten if it already exists
# (if False and a project of the same name already exists, a FileExistsError will be raised)
overwrite = False # set to True if the shared memory should be overwritten if it already exists

# transfer arrays to shared memory
cute.arrays_to_shm(
project_name,
arrays,
persistent=persistent,
overwrite=overwrite,
)

# reading the arrays from the shared memory
# This could be done in a different process.
# (including processes spawned after this process exits if persistent is True)
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)

# shm_arrays has the same structure as arrays.
# Each item has two keys:
# - "data": the numpy array
# - "meta": related metadata
a: np.ndarray = shm_arrays["a"]["data"]

# meta data consists mostly of things you will certainly not need.
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["shape"] # the shape of the array, same as a.shape
a_meta["dtype"] # the data type of the array, same as str(a.dtype)
a_meta["shm_name"] # the name of the shared memory segment
a_meta["shm_private_name"] # the private name of the shared memory segment
a_meta["shm"] # the shared memory segment (instance of shared_memory.SharedMemory)

# clean up the shared memory and related metadata
# (do not call this if persistent is True and you want the shared memory to be available for other processes)
cute.unlink(project_name)
```
You can also use the `unlinked_arrays_to_shm` context manager to
ensure the shared memory and related metadata are cleaned up on exit.

(if persistent is False, the python multiprocessing resource tracker will cleanup
the shared memory automatically, but not the meta data).

```python

# Transfer arrays to shared memory
with cute.unlinked_arrays_to_shm(project_name, arrays):

# Read arrays from shared memory
# (this could also be done in a different process)
shm_arrays = cute.shm_to_arrays(project_name)

# Shared memory and meta data is automatically cleaned up when the context manager exits
```

#### sharing content of hdf5 files

Content of hdf5 files can also be transferred to shared memory as a dictionary of numpy arrays.

```python
from pathlib import Path
import cute_shm as cute

hdf5_path = Path("path/to/your/file.hdf5")

project_name = "myproject"

# if True, a progress bar showing the progress of the transfer
# to the shared memory will be shown
progress = True
# for persistent and overwrite, same usage as when sharing numpy arrays
persistent = False
overwrite = False

# transfer to the shared memory
hdf5_to_shm(
hdf5_path, project_name, progress=progress, persistent=persistent, overwrite=overwrite
)

# content of hdf5 is shared as a nested directories of nested arrays
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)

# dataset attributes are stored in the "meta" dictionary
a: np.ndarray = shm_arrays["a"]["data"]
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["attrs"] # the attributes of the dataset

```

A context manager is also provided:

```python
with unlinked_hdf5_to_shm(
hdf5_path, project_name, progress, overwrite
):
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)
```

#### Logging

If in your own software using the cute-sh API you set the logging to level `DEBUG`, information related to
the creation/deletion of shared memory segments will be provided.

#### Typing hints

cute-shm provides and uses these type aliases:

```python
# a nested dictionary of numpy arrays. This is the data structure
# that can be transferred to the shared memory.
ArrayDict: TypeAlias = dict[str, Union["ArrayDict", np.ndarray]]

# usage
arrays: cute_shm.ArrayDict = {"a": np.zeros(10), "b": {"b1": np.zeros(10), "b2": np.zeros(10)}}
cute_shm.arrays_to_shm("myproject", arrays)

# a shared memory array: data and related metadata
class SharedArray(TypedDict):
meta: SharedArrayMeta
data: np.ndarray

# the metadata of a shared memory array
class SharedArrayMeta(TypedDict, total=False):
shm_name: str
shm_private_name: str
shm: shared_memory.SharedMemory
shape: tuple[int, ...]
dtype: str
attrs: dict[str, Any]

# a nested dictionary of shared memory arrays.
# This is the data structure that is returned by the API
# when reading the shared memory.
SharedArrayDict: TypeAlias = dict[str, Union[SharedArray, "SharedArrayDict"]]

# usage
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays("myproject")
a: cute_shm.SharedArray = shm_arrays["a"]
a_data: np.ndarray = a["data"]
a_meta: cute_shm.SharedArrayMeta = a["meta"]
```

### Under the hood

- when the arrays are transferred to shared memory, a toml file is created in the `/tmp/cute-shm` directory. Its name is based on the project name.
- this toml file contains all the metadata required for other processes to "cast" the shared memory to the proper dictionary structure.

If you prefer to store the metadata in a different location, change the `root` attribute of the class `Project2Toml`:

```python
import cute_shm as cute

cute.Project2Toml.root = Path("/path/to/your/directory")
```

### Command line executables

To load the content of a hdf5 file to the shared memory via command line:

```bash
cute-shm-hdf5 <project_name> <hdf5_path>
```
for example:

```bash
# transfer to the shared memory.
# file.hdf5 expected in the current directory
cute-shm-hdf5 myproject file.hdf5

# overwrite if data corresponding to a project named myproject already exists
# 'o' for overwrite
cute-shm-hdf5 myproject file.hdf5 -o

# do not display a progress bar
cute-shm-hdf5 myproject file.hdf5 -no-progress

# display debug information instead of a progress bar
# 'v' for verbose
cute-shm-hdf5 myproject file.hdf5 -v

# any python process can now access the shared memory
# "myproject" via cute.shm_to_arrays.
```

You can display about data hosted in the shared memory:

```bash
# full information
cute-shm-list

# just an overview ('s' for short)
cute-shm-list -s
```

Note that `cute-shm-list` will not only display the content of the shared memory created via `cute-shm-hdf5`, but also the content of
the shared memory created via the python API (shared memory currently being transferred will not be listed).

Shared memory can be cleaned up via the command `cute-shm-unlink <project_name>`:

```bash
cute-shm-unlink myproject
```
### "Manual" cleaning of the shared memory

Alternatively to use the API or the command line to free the shared memory, you may either:

- reboot the computer
- delete files prefixed by `cute-shm` in the `/dev/shm` folder and related toml files in the `/tmp/cute-shm` folder.

## Warnings

### Bus error

If the RAM of the computer gets full, transfer to the shared memory will not only fail, the process will also crash with a bus error.
This is a system error that cannot be managed by the python exception handling.

### Garbage collection of the shared memory

Shared memory numpy arrays buffers is a pointer to the buffer of a related instance of `shared_memory.SharedMemory`.
This related instance needs to be loaded in the heap, i.e. it should not be garbage collected. If it is garbage collected,
then a `SegmentationFault` will occur and the process will crash (not managed by python exception handling).

The instance of the `shared_memory.SharedMemory` is located in the `meta` dictionary of the `SharedArrayMeta` instance.

For example, one should not:

```python
# read the shared memory to a dictionary of numpy arrays and meta data
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)

# access the data and meta data of 'a'
shm_array = shm_arrays["a"]

# the numpy array
data: np.ndarray = shm_array["data"]

# the meta data
meta: cute_shm.SharedArrayMeta = shm_array["meta"]

# deleting the pointer to the shared memory segment
# related to the data
del meta["shm"]

# this will crash: the shared memory segment has been garbage collected
print(data[0])
```

or:

```python
def get_np(project_name: str)->np.ndarray:

# read the shared memory to a dictionary of numpy arrays and meta data
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)

# access the data and meta data of 'a'
shm_array = shm_arrays["a"]
data: np.ndarray = shm_array["data"]
meta: cute_shm.SharedArrayMeta = shm_array["meta"]

# meta["shm"] is a reference to the shared memory segment.
# It will be garbage collected, along with the meta dictionary,
# when the function exits.
return data

a: np.ndarray = get_np("myproject")
# this will crash: the shared memory segment has been garbage collected
print(a[0])
```

> **Note:** When the shared memory instance is garbage collected:
> - The data is not removed from the shared memory.
> - Only the pointer to the data buffer is lost.
> - This loss of pointer affects only the current process.
## Authorship, Copyright, and License

**Author:** Vincent Berenz
**Institution:** Max Planck Institute for Intelligent Systems, Tübingen, Germany
**Copyright:** © 2024 Max Planck Gesellschaft
**License:** [MIT License](https://opensource.org/licenses/MIT)
24 changes: 24 additions & 0 deletions cute_shm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"""
Cute Shm: A convenient Python package for manipulating shared memory numpy arrays.
This package provides functionality for transferring numpy arrays to the shared
memory, supporting nested dictionary structures and HDF5 files.
"""

import importlib.metadata

from .core import (
ArrayDict,
MetaArrayDict,
Project2Toml,
SharedArray,
SharedArrayDict,
SharedArrayMeta,
bytes_to_human,
unlink,
)
from .hdf5_shm import hdf5_size, hdf5_to_shm, unlinked_hdf5_to_shm
from .numpy_shm import arrays_to_shm, shm_to_arrays, unlinked_arrays_to_shm
from .progress import ShmProgress

__version__ = importlib.metadata.version("cute-shm")
Loading

0 comments on commit 167182b

Please sign in to comment.