Skip to content

Releases: snowplow/enrich

Version 3.1.4

01 Jun 07:08
Compare
Choose a tag to compare

This release aims at reducing the CPU used when enriching the enriched event before emitting. More details on #608.

CHANGELOG

  • common: update the validation of the enriched event with static field lengths (#608)
  • Bump jackson-databind to 2.13.3 (#615)
  • common: Bump GCP sdk to 2.7.2 (#613)
  • common-fs2: Bump fs2-blobstore to 0.8.6 (#614)
  • common-fs2: Bump http4s to 0.21.33 (#612)
  • common-fs2: Change http4s client backend to Ember (#611)

Version 3.1.3

27 Apr 15:29
Compare
Choose a tag to compare

This is a patch release for enrich-kinesis and enrich-pubsub.

CHANGELOG

  • Fix the clean termination of enrich-kinesis and enrich-pubsub (#593)
  • Add retry mechanism for the checkpointing to Kinesis (#591)
  • Fix the integration test of enrich-kinesis (#599)

Version 3.1.2

30 Mar 21:21
Compare
Choose a tag to compare

A patch release to address a problem launching enrich-pubsub due to corrupt configuration.

Changelog

  • common: Configure sbt memory in github action (#590)
  • enrich-pubsub: Fix syntax error in application.conf (#588)
  • Fix stable version tagging (#584)

Version 3.0.3

30 Mar 20:46
Compare
Choose a tag to compare

Changelog

  • enrich-pubsub: Fix syntax error in application.conf (#588)

Version 3.1.1

28 Mar 15:22
Compare
Choose a tag to compare

A patch release to the 3.1 branch, aimed at improving stability of the enrich-kinesis app.

  • Stream Enrich: Remove logging of collector payloads (#522)
  • enrich-kinesis: Cap recordTtl in KPL (#581)
  • kinesis: Fix disabling cloudwatch metrics (#580)
  • common-fs2: Fix precedence of configuration system properties (#579)
  • enrich-kinesis: Harden fetching enrichment configs from DynamoDB (#578)

Version 3.0.2

28 Mar 15:01
Compare
Choose a tag to compare

A patch release to the 3.0 branch, aimed at improving stability of the enrich-kinesis app.

  • Stream Enrich: Remove logging of collector payloads (#522)
  • enrich-kinesis: Cap recordTtl in KPL (#581)
  • kinesis: Fix disabling cloudwatch metrics (#580)
  • common-fs2: Fix precedence of configuration system properties (#579)
  • enrich-kinesis: Harden fetching enrichment configs from DynamoDB (#578)

Version 3.1.0

23 Mar 18:46
Compare
Choose a tag to compare

This release contains an experimental metadata reporting feature.
Currently, the metadata consists of events and entities clusters observed over a defined period of time. Within a reporting window, pipeline receives various event types, ie. page views, clicks. For each of these events a metadata event containing a set of entities that were attached to it is emitted.

Experimental status means we reserve right to change or completely remove it. All the configuration is located in a new experimental section in config hocon.

  • common-fs2: Add experimental metadata aggregation sink (#569)
  • Enable publishing snapshot images (#568)
  • Add an event generator test for ETL pipeline (#535)

Version 3.0.1

23 Mar 12:29
Compare
Choose a tag to compare

This is a patch release which makes the kinesis/gcp apps a bit more resilient when downloading assets needed for enrichment. It addresses a problem in 3.0.0 that if an asset download resulted in a corrupted file, then Enrich would try to use the corrupted file when the app restarted.

  • common: Recover from corrupt asset downloads (#573)
  • common: Sentry should report exceptions in initialising environment (#572)
  • common: Retry downloading assets from S3 and GCS (#574)

Version 3.0.0

04 Mar 17:58
Compare
Choose a tag to compare

Assets

This release concerns 3 assets:

  1. enrich-kinesis: this is the new enrich asset for AWS that aims at replacing Stream Enrich.
  2. enrich-pubsub: this is now the only enrich asset maintained for GCP.
  3. Stream Enrich: this asset for AWS is still supported until the transition to enrich-kinesis is complete and until a new asset enrich-kafka is ready. In this release it just received libs bumps.

As announced previously in this post, Beam Enrich is now deprecated, in favor of enrich-pubsub.

enrich-kinesis

This new enrich asset for Kinesis is based on fs2 and shares most of its codebase with enrich-pubsub.

Compared with Stream Enrich, this app brings several improvements:

  • It can export metrics. More details can be found on this page.
  • Assets used in the enrichments (e.g. MaxMind DB) can be periodically refreshed while enrich is running with this config parameter:
"assetsUpdatePeriod": "7 days"
  • It uses Kinesis Consumer Library 2.x.
  • It provides the pipeline operator with more possibilities for fine-tuning.
  • It is now possible to use Kinesis aggregation, which consists in putting several user records (e.g. enriched events) into one Kinesis record. It allows to improve the throughput and/or possibly reduce the number of shards needed (in particular if records are bigger than 1 kb). More information about aggregation can be found here. It can be activated with the following section in the configuration (e.g. for enriched events):
"output": {
  "good": {
    "aggregation": {
       "maxCount": 1000
       "maxSize": 51200
    }
  }
}
  • It is possible to run the app with a very minimal configuration file, like such:
{
  "input": {
    "streamName": "collector-payloads"
  }

  "output": {
    "good": {
      "streamName": "enriched"
    }

    "bad": {
      "streamName": "bad"
    }
  }
}

Instructions to run enrich-kinesis can be found on this page and details about its configuration on this page.

enrich-pubsub

More parameters have been exposed in the config file to get more fine-grained control on the app.

All the details about its configuration can be found on this page.

Javascript enrichment: ECMAScript 6 features (#508)

Users of the Javascript enrichment will be pleased to hear that starting from this version, most of ECMAScript 6 features are supported. For example, ES6 features like the arrow => syntax and the const keyword are now available. This change is fully backward-compatible and the existing configs will keep on working.

More details on Javascript enrichment can be found on this page.

Enriched events validation in enrich-kinesis and enrich-pubsub (#517)

Enriched events emitted by enrich are expected to match atomic schema. If an event is not valid against this schema (for instance because a field is too long), a bad row should be emitted instead of the enriched event. In order to improve furthermore the data quality inside the pipeline, enrich 3.0.0 introduces this additional check.

However, we are aware that this is a breaking change, and we want to give some time to users to adapt, in case today they are working downstream with enriched events that are not valid against atomic. For this reason, this new validation was added as a feature that can be deactivated like that:

"featureFlags": {
  "acceptInvalid": true
}

In this case, enriched events that are not valid against atomic schema will still be emitted as before, so that enrich 3.0.0 can be fully backward compatible. It will be possible to know if the new validation would have had an impact by 2 ways:

  1. A new metric invalid_enriched has been introduced. It reports the number of enriched events that were not valid against atomic schema. As the other metrics, it can be seen on stdout and/or StatsD.
  2. Each time an enriched event is invalid against atomic schema, a line will be logged with the bad row that would have been emitted normally instead of the enriched event (add -Dorg.slf4j.simpleLogger.log.InvalidEnriched=debug to the JAVA_OPTS to see it).

In a few months, we'll remove the feature flag and it will become impossible to emit invalid enriched events.

Metrics for enrich-kinesis and enrich-pubsub (#494)

There were 2 issues with the metrics periodically sent by enrich-pubsub:

  1. The counts of collector payloads, enriched events and bad rows were ever-increasing and not reset to 0 after sending the metrics.
  2. These counts were sent to StatsD with this format: snowplow.enrich.good:1234|g|#key1:value1 where g means gauge, whereas it should be c for counter.

This has been fixed. On top of that, it is now possible to see the metrics directly in the logs of the app, with this section in the config file:

"monitoring": {
  "metrics": {
    "stdout": {
      "period": "1 minute"
      "prefix": "snowplow.enrich."
    }
  }
}

Because enrich-pubsub and enrich-kinesis share most of the code, all of the above is also true for the latter.

More information about metrics can be found on this page.

YAUAA context 1-0-3 (#515)

The context attached by YAUAA enrichment has been updated to 1-0-3.

Compared to 1-0-2, this version allows a longer agentVersionMajor string field, which addresses a problem in which some some user agents caused the old maximum length to be exceeded, resulting in a failed event.

Telemetry in enrich-kinesis and enrich-pubsub (#487)

enrich-kinesis and enrich-pubsub introduce telemetry, which consists in regularly sending heartbeats with some meta-information about the application (schema here). This is done to help us to improve the product, we need to understand what is popular, so that we can focus our development effort in the right place.

At the base, telemetry is sending the application name and version every hour. It would be helpful for us if users could provide userProvidedId in the config file :

"telemetry": {
  "userProvidedId": "myCompany"
}

Telemetry can be deactivated by putting the following section in the configuration file:

"telemetry": {
  "disable": true
}

Changelog

  • enrich-pubsub: split into common module and PubSub module (#473)
  • enrich-pubsub: Bump fs2-google-pubsub to 0.18.1 (#513)
  • enrich-kinesis: create enrich asset based on fs2 (#480)
  • common-fs2: Metrics: send counts instead of gauges (#494)
  • common-fs2: File sink should rotate files with maximum size (#440)
  • common-fs2: put good bad and pii inside output {} in config (#493)
  • Bump circe to 0.14.1 (#496)
  • Set spray-json transitive dependency to 1.3.6 (#498)
  • Bump jackson-databind to 2.11.4 (#499)
  • Bump snowplow-badrows to 2.1.1 (#500)
  • Remove tomcat-embed-core transitive dependency (#501)
  • Set netty transitive dependency to 4.1.68.Final (#502)
  • Add possibility to use STS to authenticate (#318)
  • Bump Snowplow Scala tracker to 1.0.0 (#504)
  • common-fs2: add telemetry (#489)
  • Bump Iglu client to 1.1.1 (#507)
  • enrich-pubsub: add reference.conf and provide minimal config example (#505)
  • Add Github Action to scan Docker images with lacework (#506)
  • common: use schemas/nl.basjes/yauaa_context/jsonschema/1-0-3 (#515)
  • Enable ES6 by default in javascript enrichment (#508)
  • Publish arm64 and amd64 docker images (#491)
  • Beam Enrich: deprecate (#530)
  • common: SQL enrichment: fix getConnection for Sync (#546)
  • common: catch and handle errors in the CurrencyConversionEnrichment (#542)
  • Validate enriched event against atomic schema before emitting (#517)
  • Stream Enrich Kafka: Enable AWS MSK IAM Authentication (#547)
  • Stream Enrich Kafka: bump Kafka Client to 2.8.1 (#518)
  • common: add Adapted type (#560)
  • enrich-kinesis: add integration test (#531)
  • enrich-pubsub: bump GCP SDK to 2.4.2 (#562)
  • Set protobuf-java transitive dependency to 3.19.4 (#561)
  • enrich-pubsub: set gson transitive dependency to 2.9.0 (#565)
  • enrich-pubsub: set google-oauth-client transitive dependency to 1.33.1 (#566)

Version 2.0.7

28 Feb 11:27
Compare
Choose a tag to compare

Stream: Bump gson to 2.9.0 (#563)
Stream: Set google-cloud-storage transitive dependency to 1.118.1 (#503)