Skip to content

Commit

Permalink
feat(mappers): Stream name can now be accessed in stream maps (#2699)
Browse files Browse the repository at this point in the history
* Add __stream_name__ as builtin variable for transform

* Test other built-in variables

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docs

* Use markdown tables

* Update table descriptions

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Edgar Ramírez Mondragón <16805946+edgarrmondragon@users.noreply.github.com>
Co-authored-by: Edgar Ramírez-Mondragón <edgarrm358@gmail.com>
  • Loading branch information
4 people authored Oct 10, 2024
1 parent 2be3f09 commit 702c0db
Show file tree
Hide file tree
Showing 6 changed files with 73 additions and 20 deletions.
46 changes: 26 additions & 20 deletions docs/stream_maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,30 +228,29 @@ can be referenced directly by mapping expressions.

#### Built-In Functions

- [`md5()`](inv:python:py:module:#hashlib) - returns an inline MD5 hash of any string, outputting
the string representation of the hash's hex digest.
- This is defined by the SDK internally with native python:
[`hashlib.md5(<input>.encode("utf-8")).hexdigest()`](inv:python:py:method:#hashlib.hash.hexdigest).
- [`datetime`](inv:python:py:module:#datetime) - This is the datetime module object from the Python
standard library. You can access [`datetime.datetime`](inv:python:py:class:#datetime.datetime),
[`datetime.timedelta`](inv:python:py:class:#datetime.timedelta), etc.
- [`json`](inv:python:py:module:#json) - This is the json module object from the Python standard
library. Primarily used for calling [`json.dumps()`](inv:python:py:function:#json.dumps)
and [`json.loads()`](inv:python:py:function:#json.loads).
The following functions and namespaces are available for use in mapping expressions:

| Function | Description |
| :------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`md5()`](inv:python:py:module:#hashlib) | Returns an inline MD5 hash of any string, outputting the string representation of the hash's hex digest. This is defined by the SDK internally with native python: [`hashlib.md5(<input>.encode("utf-8")).hexdigest()`](inv:python:py:method:#hashlib.hash.hexdigest). |
| [`datetime`](inv:python:py:module:#datetime) | This is the datetime module object from the Python standard library. You can access [`datetime.datetime`](inv:python:py:class:#datetime.datetime), [`datetime.timedelta`](inv:python:py:class:#datetime.timedelta), etc. |
| [`json`](inv:python:py:module:#json) | This is the json module object from the Python standard library. Primarily used for calling [`json.dumps()`](inv:python:py:function:#json.dumps) and [`json.loads()`](inv:python:py:function:#json.loads). |

#### Built-in Variable Names

- `config` - a dictionary with the `stream_map_config` values from settings. This can be used
to provide a secret hash seed, for instance.
- `record` - an alias for the record values dictionary in the current stream.
- `_` - same as `record` but shorter to type
- `self` - the existing property value if the property already exists
- `fake` - a [`Faker`](inv:faker:std:doc#index) instance, configurable via `faker_config`
(see previous example) - see the built-in [standard providers](inv:faker:std:doc#providers)
for available methods
The following variables are available in the context of a mapping expression:

| Variable | Description |
| :---------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config` | A dictionary with the `stream_map_config` values from settings. This can be used to provide a secret hash seed, for instance. |
| `record` | An alias for the record values dictionary in the current stream. |
| `_` | Same as `record` but shorter to type. |
| `self` | The existing property value if the property already exists. |
| `fake` | A [`Faker`](inv:faker:std:doc#index) instance, configurable via `faker_config` (see previous example) - see the built-in [standard providers](inv:faker:std:doc#providers) for available methods. |
| `__stream_name__` | The name of the stream. Useful when [applying the same transformation to multiple streams](#applying-a-mapping-across-two-or-more-streams). |

```{tip}
The `fake` object is only available if the plugin specifies `faker` as an additional dependency (through the `singer-sdk` `faker` extra, or directly).
To use the `fake` object, the `faker` library must be installed.
```

:::{versionadded} 0.35.0
Expand All @@ -266,10 +265,17 @@ The `Faker` class.
The `Faker` class was deprecated in favor of instance methods on the `fake` object.
:::

:::{versionadded} 0.42.0
The `__stream_name__` variable.
:::

#### Built-in Alias Variable Names

The following variables are available in the context of the `__alias__` expression:
- `__stream_name__` - the existing stream name

| Variable | Description |
| :---------------- | :----------------------- |
| `__stream_name__` | The existing stream name |

:::{versionadded} 0.42.0
The `__stream_name__` variable.
Expand Down
1 change: 1 addition & 0 deletions singer_sdk/mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ def _eval(
names["_"] = record # Add a shorthand alias in case of reserved words in names
names["record"] = record # ...and a longhand alias
names["config"] = self.map_config # Allow map config access within transform
names["__stream_name__"] = self.stream_alias # Access stream name in transform

if self.fake:
from faker import Faker # noqa: PLC0415
Expand Down
28 changes: 28 additions & 0 deletions tests/core/test_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -780,6 +780,12 @@ def discover_streams(self):
"aliased_stream_quoted.jsonl",
id="aliased_stream_quoted",
),
pytest.param(
{"mystream": {"source_table": "__stream_name__"}},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_stream_name.jsonl",
id="builtin_variable_stream_name",
),
pytest.param(
{"mystream": {"__alias__": "'aliased_' + __stream_name__"}},
{"flattening_enabled": False, "flattening_max_depth": 0},
Expand All @@ -792,6 +798,28 @@ def discover_streams(self):
"builtin_variable_stream_name_alias_expr.jsonl",
id="builtin_variable_stream_name_alias_expr",
),
pytest.param(
{
"mystream": {
"email": "self.upper()",
"__else__": None,
}
},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_self.jsonl",
id="builtin_variable_self",
),
pytest.param(
{
"mystream": {
"email": "_['email'].upper()",
"__else__": None,
}
},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_underscore.jsonl",
id="builtin_variable_underscore",
),
pytest.param(
{},
{"flattening_enabled": True, "flattening_max_depth": 0},
Expand Down
6 changes: 6 additions & 0 deletions tests/snapshots/mapped_stream/builtin_variable_self.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"type":"object","properties":{"email":{"type":["string","null"]}}},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"ALICE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"BOB@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"CHARLIE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"properties":{"email":{"type":["string"]},"count":{"type":["integer","null"]},"user":{"properties":{"id":{"type":["integer","null"]},"sub":{"properties":{"num":{"type":["integer","null"]},"custom_obj":{"type":["string","null"]}},"type":["object","null"]},"some_numbers":{"items":{"type":["number"]},"type":["array","null"]}},"type":["object","null"]},"source_table":{"type":["string","null"]}},"type":"object","required":["email"]},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"alice@example.com","count":21,"user":{"id":1,"sub":{"num":1,"custom_obj":"obj-hello"},"some_numbers":[3.14,2.718]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"bob@example.com","count":13,"user":{"id":2,"sub":{"num":2,"custom_obj":"obj-world"},"some_numbers":[10.32,1.618]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"charlie@example.com","count":19,"user":{"id":3,"sub":{"num":3,"custom_obj":"obj-hello"},"some_numbers":[1.414,1.732]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"type":"object","properties":{"email":{"type":["string","null"]}}},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"ALICE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"BOB@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"CHARLIE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}

0 comments on commit 702c0db

Please sign in to comment.