Releases: p2p-ld/numpydantic
v1.6.6
1.6.6 - 24-12-13
Bugfix
- #38, #39 -
- JSON Schema generation failed when the
dtype
was embedded from dtypes that lack a__name__
attribute.
An additional check was added for presence of__name__
when embedding. NDArray
types were incorrectly cached s.t. pipe-union dtypes were considered equivalent toUnion[]
dtypes. An additional tuple with the type of the args was added to the cache key to disambiguate them.
- JSON Schema generation failed when the
- #38, #40 -
- Tuple dtypes were naively checked by just testing for whether the given dtype was contained by the tuple,
ignoring special cases like string type checking. Tuple dtypes are now checked recursively with the same
logic as all other type checking. - Zarr treats
dtype=str
as numpy typeO
- added special case when validating from JSON to cast tonp.str_
- Tuple dtypes were naively checked by just testing for whether the given dtype was contained by the tuple,
Testing
- #39 - Test that all combinations of shapes, dtypes, and interfaces
can generate JSON schema. - #39 - Add python 3.13 to the testing matrix.
- #39 - Add an additional
marks
field to ValidationCase
for finer-grained control over running tests. - #40 - Explicitly test for
np.str_
annotation dtypes alone and
in tuples.
v1.6.5 - Bump minimum pydantic version
Bugfix
- #36, #37 - @gerhardadler identifies that serialization context was only introduced in pydantic 2.7, minimum pydantic version has been updated accordingly
v1.6.4 - Combinatoric testing and public test helpers!
PR: #31
We have rewritten our testing system for more rigorous tests,
where before we were limited to only testing dtype or shape cases one at a time,
now we can test all possible combinations together!
This allows us to have better guarantees for behavior that all interfaces
should support, validating it against all possible dtypes and shapes.
We also exposed all the helpers and array testing classes for downstream development
so that it would be easier to test and validate any 3rd-party interfaces
that haven't made their way into mainline numpydantic yet -
see the numpydantic.testing
module.
See the testing documentation for more details.
Bugfix
- Previously, numpy and dask arrays with a model dtype would fail json roundtripping
because they wouldn't be correctly cast back to the model type. Now they are. - Zarr would not dump the dtype of an array when it roundtripped to json,
causing every array to be interpreted as a random integer or float type.
dtype
is now dumped and used when deserializing.
v1.6.3 - Reinstate `h5py>=3.12`
Bugfix
#28
- h5py v3.12.0 was actually fine, but we did need to change the way that
the hdf5 tests work to not hold the file open during the test. Easy enough change.
the version cap has been removed from h5py (which is optional anyway,
so any version could be installed separately)
v1.6.2
Very minor bugfix and CI release
PR: #26
Bugfix
- h5py v3.12.0 broke file locking, so a temporary maximum version cap was added
until that is resolved. Seeh5py/h5py#2506
and#27
- The
_relativize_paths
function used in roundtrip dumping was incorrectly
relativizing paths that are intended to refer to paths within a dataset,
rather than a file. This, as well as windows-specific bugs was fixed so that
directories that exist but are just below the filesystem root (like/data
)
are excluded. If this becomes a problem then we will have to make the
relativization system a bit more robust by specifically enumerating which
path-like things are not intended to be paths.
CI
numpydantic
was added as an array range generator inlinkml
(linkml/linkml#2178
),
so tests were added to ensure that changes tonumpydantic
don't break
linkml array range generation.numpydantic
's tests are naturally a
superset of the behavior tested inlinkml
, but this is a good
paranoia check in case we drift substantially (which shouldn't happen).
v1.6.1 - Union Types
It's now possible to do this, like it always should have been
class MyModel(BaseModel):
array: NDArray[Any, int | float]
Features
- Support for Union Dtypes
Structure
- New
validation
module containingshape
anddtype
convenience methods
to declutter main namespace and make a grouping for related code - Rename all serialized arrays within a container dict to
value
to be able
to identify them by convention and avoid long iteration - see perf below.
Perf
- Avoid iterating over every item in an array trying to convert it to a path for
a several order of magnitude perf improvement over1.6.0
(oops)
Docs
- Page for
dtypes
, mostly stubs at the moment, but more explicit documentation
about what kind of dtypes we support.
v1.6.0 - Roundtrip Json Serialization
(as always, please see the changelog in the docs for working links and full information): https://numpydantic.readthedocs.io/en/latest/changelog.html#roundtrip-json-serialization
Roundtrip JSON serialization is here - with serialization to list of lists,
as well as file references that don't require copying the whole array if
used in data modeling, control over path relativization, and stamping of
interface version for the extra provenance conscious.
Please see serialization for narrative documentation :)
Potentially Breaking Changes
- See development for a statement about API stability
- An additional {meth}
.Interface.deserialize
method has been added to
{meth}.Interface.validate
- downstream users are not intended to override the
validate method
, but if they have, then JSON deserialization will not work for them. Interface
subclasses now require aname
attribute, a short string identifier for that interface,
and ajson_model
that inherits from {class}.interface.JsonDict
. Interfaces without
these attributes will not be able to be instantiated.- {meth}
.Interface.to_json
is now an abstract method that all interfaces must define.
Features
- Roundtrip JSON serialization - by default dump to a list of list arrays, but
support theround_trip
keyword inmodel_dump_json
for provenance-preserving dumps - JSON Schema generation has been separated from
core_schema
generation in {class}.NDArray
.
Downstream interfaces can customize json schema generation without compromising ability to validate. - All proxy classes must have an
__eq__
dunder method to compare equality -
in proxy classes, these compare equality of arguments, since the arrays that
are referenced on disk should be equal by definition. Direct array comparison
should use {func}numpy.array_equal
- Interfaces previously couldn't be instantiated without explicit shape and dtype arguments,
these have been givenAny
defaults. - New {mod}
numpydantic.serialization
module to contain serialization logic.
New Classes
See the docstrings for descriptions of each class
MarkMismatchError
for when an array serialized withmark_interface
doesn't match
the interface that's deserializing it- {class}
.interface.InterfaceMark
- {class}
.interface.MarkedJson
- {class}
.interface.JsonDict
- {class}
.dask.DaskJsonDict
- {class}
.hdf5.H5JsonDict
- {class}
.numpy.NumpyJsonDict
- {class}
.video.VideoJsonDict
- {class}
.zarr.ZarrJsonDict
- {class}
Bugfix
#17
- Arrays are re-validated as lists, rather than arrays- Some proxy classes would fail to be serialized becauase they lacked an
__array__
method.
__array__
methods have been added, and tests for coercing to an array to prevent regression. - Some proxy classes lacked a
__name__
attribute, which caused failures to serialize
when the__getattr__
methods attempted to pass it through. These have been added where needed.
Docs
- Add statement about versioning and API stability to development
- Add docs for serialization!
- Remove stranded docs from hooks and monkeypatch
- Added
myst_nb
to docs dependencies for direct rendering of code and output
Tests
- Marks have been added for running subsets of the tests for a given interface,
package feature, etc. - Tests for all the above functionality
v1.5.3 - [bugfix] Validation with empty HDF5 datasets
#16: Empty HDF5 datasets shouldn't break validation
if the NDArray spec allows Any shaped arrays.
v1.5.2 - `datetime` support for HDF5
HDF5 can't support datetimes natively, but we can fake it with 32-bit strings.
This PR allows one to specify a datetime dtype, and encodes datetime objects as strings on storage, and decodes them on access.
Getting to the point where we need to start making a generalized type conversion/serialization system because this interface in particular is getting gnarly, but don't have time just yet
import h5py
from datetime import datetime
import numpy as np
from numpydantic import NDArray
from pydantic import BaseModel
from typing import Any
data = np.array([datetime.now().isoformat().encode('utf-8')], dtype="S32")
h5f = h5py.File('test.hdf5', 'w')
h5f.create_dataset('data', data=data)
class MyModel(BaseModel):
array: NDArray[Any, datetime]
instance = MyModel(array=('test.hdf5', '/data'))
instance.array[0]
# np.datetime64('2024-09-03T23:50:45.897980')
instance.array[0] = datetime.now()
v1.5.1 - [bugfix] Allow revalidation with proxied arrays
See: #14
When a proxy object is passed to some validators after having already been validated, validation fails.
This should always succeed:
from numpydantic import NDArray
from pydantic import BaseModel
class MyModel(BaseModel):
array: NDArray
instance = MyModel(array=valid_input)
_ = MyModel(array=instance.array)
but it's currently failing for the proxied interfaces.
This PR
- adds passthrough checks for h5proxy and videoproxy
- adds a testing module for tests against all interfaces, and tests that an already-instantiated model can be re-instantiated using the same array field after passing through the interface