Skip to content

Commit

Permalink
chore: mention serialize method instead of the stream in the docs
Browse files Browse the repository at this point in the history
  • Loading branch information
smotornyuk committed Mar 3, 2024
1 parent de45a40 commit 380d5be
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 35 deletions.
50 changes: 25 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -585,12 +585,13 @@ This service produces the data for collection. Every data service must:
* define `range(start: Any, end: Any)` method that returns slice of the data
Base class for data services - `Data` - already contains a simple version of
this logic. You have to define only one methods to make you custom
this logic. You need to define only one method to make you custom
implementations: `compute_data()`. When data if accessed for the first time,
`compute_data` is called. Its result cached and used for iteration in
for-loops, slicing via `range` method and size measurement via `total`
property.
```python
class CustomData(Data):
def compute_data(self) -> Any:
Expand All @@ -615,13 +616,13 @@ class CustomData(Data):
return len(self.names)
def __iter__(self):
yield from self.names
yield from sorted(self.names)
def range(self, start: Any, end: Any):
if not isinstance(start, str) or not isinstance(end, str):
return []
for name in names:
for name in self:
if name < start:
continue
if name > end:
Expand Down Expand Up @@ -685,40 +686,39 @@ assert list(col) == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
#### Serializer service
Serializer converts data into textual or binary representation. Serializers are
main users of columns service, because it contains details about specific data
columns. And serializers often iterate data service directly(ignoring `range`
method), to serialize all available records. The only required method for
serializer is `stream`. This method must return an iterable of `str` or `byte`
fragments of the serialized data. And there must be at least one fragment. Even
if there is no data, serializer must return at least `''` or `b'` from
`stream`.
Serializer converts data into textual, binary or any other alternative
representation. For example, if you want to compute records produced by the
`data` service of the collection into pandas' DataFrame, you should probably
use serializer.
Serializers are main users of columns service, because it contains details
about specific data columns. And serializers often iterate data service
directly(ignoring `range` method), to serialize all available records.
The only required method for serializer is `serialize`. This method must return
an data from `data` service transformed into format provided by serializer. For
example, `JsonSerializer` returns string with JSON-encoded data.
You are not restricted by textual or binary formats. Serializer that transforms
data into pandas' DataFrame is completely valid version of the serializer.
```python
class NewLineSerializer(Serializer):
def stream(self):
def serialize(self):
result = ""
for item in self.attached.data:
yield str(item) + "\n"
result += str(item) + "\n"
return result
col = StaticCollection(
"name", {},
serializer_factory=NewLineSerializer,
data_settings={"data": [1, 2, 3]}
)
assert "".join(col.serializer.stream()) == "1\n2\n3\n"
assert "".join(col.serializer.serialize()) == "1\n2\n3\n"
```
Serializer also has `render` method, that combines all fragments from `stream`
into a single sequence:
```python
assert col.serializer.render() == "1\n2\n3\n"
```
Use `render` only when you are sure that you have enough memory for all the
serialized records. In most cases it will be wiser to send streaming response
to client using `stream` generator.
#### Columns service
This service contains additional information about separate columns of data
Expand Down
16 changes: 6 additions & 10 deletions ckanext/collection/utils/serialize/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,17 @@ class Serializer(
shared.Domain[types.TDataCollection],
Generic[types.TSerialized, types.TDataCollection],
):
"""Abstract collection serializer.
"""Base collection serializer.
Its`stream` produces iterable of collection records in the format that has
sense for the serializer. Example:
For any derived implementation, `serialize` must transfrom data of the
collection into expected format. Example:
>>> def stream(self):
>>> for record in self.attached.data:
>>> yield yaml.dump(record)
By default, all chunks are combined into single dump via `render` method,
which combines fragments using addition operator. If this strategy doesn't
work, override `render`:
>>> def render(self):
>>> return "\n---\n".join(self.stream())
>>>
>>> def serialize(self):
>>> return "---\n".join(self.stream())
"""

Expand Down

0 comments on commit 380d5be

Please sign in to comment.