Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zyp: Improve jq-based Moksha transformations and general documentation #50

Merged
merged 3 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
- MongoDB: Complete and verify BSON data type mapping end-to-end
- MongoDB: Use improved decoding machinery also for `MongoDBCDCTranslator`
- Dependencies: Make MongoDB subsystem not strictly depend on Zyp
- Zyp: Translate a few special treatments to jq-based `MokshaTransformation` again
- Zyp: Improve documentation

## 2024/09/10 v0.0.15
- Added Zyp Treatments, a slightly tailored transformation subsystem
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Tests](https://github.com/crate/commons-codec/actions/workflows/tests.yml/badge.svg)](https://github.com/crate/commons-codec/actions/workflows/tests.yml)
[![Coverage](https://codecov.io/gh/crate/commons-codec/branch/main/graph/badge.svg)](https://app.codecov.io/gh/crate/commons-codec)
[![Build status (documentation)](https://readthedocs.org/projects/commons-codec/badge/)](https://cratedb.com/docs/commons-codec/)
[![Build status (documentation)](https://readthedocs.org/projects/commons-codec/badge/)](https://commons-codec.readthedocs.io/)
[![PyPI Version](https://img.shields.io/pypi/v/commons-codec.svg)](https://pypi.org/project/commons-codec/)
[![Python Version](https://img.shields.io/pypi/pyversions/commons-codec.svg)](https://pypi.org/project/commons-codec/)
[![PyPI Downloads](https://pepy.tech/badge/commons-codec/month)](https://pepy.tech/project/commons-codec/)
Expand Down
8 changes: 8 additions & 0 deletions doc/cdc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ and need further curation and improvements.
:::


## Prior Art

- [core-cdc] by Alejandro Cora González
- [Carabas Research]


[Carabas Research]: https://lorrystream.readthedocs.io/carabas/research.html
[core-cdc]: https://pypi.org/project/core-cdc/
[DynamoDB CDC Relay for CrateDB]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc.html
[MongoDB CDC Relay for CrateDB]: https://cratedb-toolkit.readthedocs.io/io/mongodb/cdc.html
[Replicating CDC Events from DynamoDB to CrateDB]: https://cratedb.com/blog/replicating-cdc-events-from-dynamodb-to-cratedb
4 changes: 3 additions & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@
intersphinx_mapping = {
# "influxio": ("https://influxio.readthedocs.io/", None),
}
linkcheck_ignore = []
linkcheck_ignore = [
r"https://stackoverflow.com/questions/70518350",
]

# Disable caching remote inventories completely.
# http://www.sphinx-doc.org/en/stable/ext/intersphinx.html#confval-intersphinx_cache_limit
Expand Down
8 changes: 0 additions & 8 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,6 @@ decode
zyp/index
```

```{toctree}
:maxdepth: 3
:caption: Topics
:hidden:

prior-art
```

```{toctree}
:maxdepth: 1
:caption: Workbench
Expand Down
16 changes: 0 additions & 16 deletions doc/prior-art.md

This file was deleted.

75 changes: 39 additions & 36 deletions doc/zyp/backlog.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,51 @@
# Zyp Backlog

## Iteration +1
- Refactor module namespace to `zyp`
- Documentation
- CLI interface
- Apply to MongoDB Table Loader in CrateDB Toolkit
- [x] Refactor module namespace to `zyp`
- [x] Documentation
- [ ] CLI interface
- [x] Apply to MongoDB Table Loader in CrateDB Toolkit
- [ ] Document `jq` functions
- `builtin.jq`: https://github.com/jqlang/jq/blob/master/src/builtin.jq
- `function.jq`
- [ ] Renaming needs JSON Pointer support. Alternatively, can `jq` do it?
- [ ] Documentation: Add Python example to "Synopsis" section on /index.html

## Iteration +2
Demonstrate!
- math expressions
- omit key (recursively)
- combine keys
- filter on keys and/or values
- Pathological cases like "Not defined" in typed fields like `TIMESTAMP`
- Use simpleeval, like Meltano, and provide the same built-in functions
- https://sdk.meltano.com/en/v0.39.1/stream_maps.html#other-built-in-functions-and-names
- https://github.com/MeltanoLabs/meltano-map-transform/pull/255
- https://github.com/MeltanoLabs/meltano-map-transform/issues/252
- Use JSONPath, see https://sdk.meltano.com/en/v0.39.1/code_samples.html#use-a-jsonpath-expression-to-extract-the-next-page-url-from-a-hateoas-response
Demonstrate more use cases, like...
- [ ] math expressions
- [ ] omit key (recursively)
- [ ] combine keys
- [ ] filter on keys and/or values
- [ ] Pathological cases like "Not defined" in typed fields like `TIMESTAMP`
- [ ] Use simpleeval, like Meltano, and provide the same built-in functions
- https://sdk.meltano.com/en/v0.39.1/stream_maps.html#other-built-in-functions-and-names
- https://github.com/MeltanoLabs/meltano-map-transform/pull/255
- https://github.com/MeltanoLabs/meltano-map-transform/issues/252
- [ ] Use JSONPath, see https://sdk.meltano.com/en/v0.39.1/code_samples.html#use-a-jsonpath-expression-to-extract-the-next-page-url-from-a-hateoas-response

## Iteration +3
- Moksha transformations on Buckets
- Investigate using JSON Schema
- Fluent API interface
- https://github.com/Halvani/alphabetic
- Mappers do not support external API lookups.
- [ ] Moksha transformations on Buckets
- [ ] Fluent API interface
```python
from zyp.model.fluent import FluentTransformation

transformation = FluentTransformation()
.jmes("records[?starts_with(location, 'B')]")
.rename_fields({"_id": "id"})
.convert_values({"/id": "int", "/value": "float"}, type="pointer-python")
.jq(".[] |= (.value /= 100)")
```
- [ ] Investigate using JSON Schema
- [ ] https://github.com/Halvani/alphabetic
- [ ] Mappers do not support external API lookups.
To add external API lookups, you can either (a) land all your data and
then joins using a transformation tool like dbt, or (b) create a custom
mapper plugin with inline lookup logic.
=> Example from Luftdatenpumpe, using a reverse geocoder
- [ ] Define schema
https://sdk.meltano.com/en/latest/typing.html
- https://docs.meltano.com/guide/v2-migration/#migrate-to-an-adapter-specific-dbt-transformer
- https://github.com/meltano/sdk/blob/v0.39.1/singer_sdk/mapper.py

## Fluent API Interface

```python

from zyp.model.fluent import FluentTransformation

transformation = FluentTransformation()
.jmes("records[?starts_with(location, 'B')]")
.rename_fields({"_id": "id"})
.convert_values({"/id": "int", "/value": "float"}, type="pointer-python")
.jq(".[] |= (.value /= 100)")
```
- https://sdk.meltano.com/en/latest/typing.html
- https://docs.meltano.com/guide/v2-migration/#migrate-to-an-adapter-specific-dbt-transformer
- https://github.com/meltano/sdk/blob/v0.39.1/singer_sdk/mapper.py
- [ ] Is `jqpy` better than `jq`?
- https://baterflyrity.github.io/jqpy/
Loading