Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to specify uuid when capturing events #68

Merged
merged 10 commits into from
Aug 22, 2024

Conversation

evgeek
Copy link
Contributor

@evgeek evgeek commented Aug 19, 2024

Hello!

We really miss the ability to specify the uuid of events manually when capturing events.
The thing is that we cannot guarantee that events come to our system only once. In the current implementation, in such cases, events in Posthog will be duplicated.
In the SDK in other languages, we got around this by manually generating the uuid, calculated based on the immutable fields of the event - however, in the Golang SDK, unfortunately, there is no such option yet.
Please consider the PR adding this functionality.

P.S. As far as I can see, the current implementation does not use the posthog.Config.uid() function, so I removed it.

@evgeek
Copy link
Contributor Author

evgeek commented Aug 22, 2024

@fuziontech @dmarticus Hi guys!
Sorry to bother you, but maybe you can help me with this? :)

I checked the functionality on our project - everything works perfectly well!

Copy link
Contributor

@dmarticus dmarticus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments about versioning and the changelog, but the implementation itself looks solid. Thanks for the contribution!

Before I approve this change though, I'd like to understand a bit more about the motivation and use case. Right now, PostHog auto-generates UUIDs for given events when we ingest them, and there shouldn't be any risk of event duplication, since we also dedupe events as part of our ingestion pipeline. Generally, PostHog clients aren't responsible for assigning UUIDs to events themselves – as far as I know, we don't support specifically passing in UUIDs in any of our other SDKs (Python example.

What's the use case here that you're solving for that can't be handled by just passing the UUID in through the properties payload? Ideally, you shouldn't be managing event uniqueness at all on your side when sending things through PostHog.

CHANGELOG.md Outdated
Comment on lines 1 to 7
# Changelog

## 1.3.0 - 2024-08-14

1. Added the ability to explicitly specify the uuid for capturing events. If uuid is not specified, Posthog will calculate it automatically (suitable for most cases).
2. Removed unused config function `uid()`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evgeek this file is auto-generated, you don't need to make edits to it.

Copy link
Contributor Author

@evgeek evgeek Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+

version.go Outdated
@@ -3,7 +3,7 @@ package posthog
import "flag"

// Version of the client.
const Version = "1.2.18"
const Version = "1.3.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note here; we auto-bump the version when we deploy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+

@evgeek
Copy link
Contributor Author

evgeek commented Aug 22, 2024

We have bunch of projects that send events. They save them in a DB, and then send them to different services via AMQP (https://github.com/umbrellio/table_sync), where they are also saved in a DB. These messages can be sent multiple times for different reasons: from updating a row in a DB to forcing sync of some events. In our services, we handle retries using message versions (old messages are ignored, new ones update previously written data by primary key). But sending to Posthog is different from saving to a DB because of updating unavailable, and we have to ensure that a retried AMQP message does not cause new events. We can save the id of the sent events in a DB, but this will cause unnecessary overhead: creating and managing a new DB, slowing down the system due to IO, resolving race conditions when multiple k8s pods try to access the same row, etc. Or we can just calculate a reproductible UUID - in this case, Posthog will simply skip the re-sent event. Very simple and elegant, I belive :)

About deduplication on ingestion: This is a great feature, but since our events are stored in a database and can be updated, it may not always recognize duplicates. And I'm not sure how dedupe will work if the same event is sent on different days.

About other SDKs - I haven't tried them all, but PHP definitely allows it, we used the same approach in the previous version of the event service.

@dmarticus
Copy link
Contributor

@evgeek thanks for your thorough reply! I understand your situation better now. I think your perspective makes sense, and frankly it makes me think that all of our SDKs should let consumers supply optional UUIDs. Let's get this change merged, thanks for doing it!

@dmarticus dmarticus merged commit 9e0199a into PostHog:master Aug 22, 2024
1 check passed
@evgeek
Copy link
Contributor Author

evgeek commented Aug 23, 2024

Excellent! Thanks for your responsiveness and awesome product! :)

@evgeek evgeek deleted the custom-uuid-capturing branch August 23, 2024 11:22
@migueldv90
Copy link

I think this pull request broke posthog.Capture. I am currently getting an error in v1.2.19.
posthog 2024/08/24 23:59:13 INFO: response 400 400 Bad Request – failed to parse request: data did not match any variant of untagged enum RawRequest

I had to downgrade to v1.2.18 and everything is working as expected.

@dmarticus
Copy link
Contributor

I think this pull request broke posthog.Capture. I am currently getting an error in v1.2.19.

posthog 2024/08/24 23:59:13 INFO: response 400 400 Bad Request – failed to parse request: data did not match any variant of untagged enum RawRequest

I had to downgrade to v1.2.18 and everything is working as expected.

Ooof; thanks for the report! Sorry to cause breakage; I'm away from my computer this weekend but I'll look into a fix when I'm back!

@evgeek
Copy link
Contributor Author

evgeek commented Aug 25, 2024

@migueldv90 Sorry for the inconvenience!

@dmarticus It seem I tested the feature on the /capture endpoind, but the SDK sends via /batch. My bad :(

Difference is that /capture can ingest uuid with an empty string, while /batch cannot.
Good news: /capture can handle batch events. Example:

curl --location 'https://eu.i.posthog.com/capture/' \
--header 'Content-Type: application/json' \
--header 'User-Agent: posthog-go (version: 1.2.19)' \
--data '{
  "api_key" : "phc_...",
  "batch" : [ {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event1",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test3"
    },
    "send_feature_flags" : false
  },
  {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event2",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test2"
    },
    "send_feature_flags" : false
  },
  {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event3",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test1"
    },
    "send_feature_flags" : false
  } ]
}'

Screenshot 2024-08-25 at 14 58 23

As a quick fix I made this PR - #70. It works, but using an endpoint not designed for batch processing is obviously is not very good decision. It seems that adjusting /batch endpoint to works with empty uuid is better solution.

@dmarticus
Copy link
Contributor

I'm going to revert this change so that folks get the fix ASAP. @evgeek has the start of a fix here: #70, but I want to clarify a few things first, and this will stop the pain for folks pulling from latest.

@dmarticus
Copy link
Contributor

dmarticus commented Aug 26, 2024

Hi @evgeek

Did some digging into our ingestion endpoint and found that, yes – due to the way we serialize batch requests, we can handle missing uuid entries or valid uuids, but not empty strings. I think it makes the most sense to fix our ingestion endpoint to handle empty strings, rather than work around it within the SDK. I've made a PR for this here: PostHog/posthog#24586. Once it ships, you should be able to re-implement your original changes and the batch endpoint should handle empty UUIDs.

@evgeek
Copy link
Contributor Author

evgeek commented Aug 26, 2024

Excellent, thank you, I'll be looking forward to it!

@evgeek
Copy link
Contributor Author

evgeek commented Aug 28, 2024

@dmarticus Hi!

I see your PR about /batch is already in production.

I made a new PR with reimplementation of uuid specification feature - #72. I tested it, now it works with empty uuid. Please review it when you can :)

@dmarticus
Copy link
Contributor

Hi @evgeek

Thanks for reaching out! I was actually going to DM you today; I merged the /batch PR but for that service, we do manual releases (it's a new service, we're treating it with a bit more caution), so it's only live in our EU datacenter for now (which is why your testing worked – your PostHog account is in the EU). I'm going to release this to US today and then we can merge your PR so that everyone gets UUID goodness :)

@evgeek
Copy link
Contributor Author

evgeek commented Aug 29, 2024

Got it!

@dmarticus
Copy link
Contributor

Hi @evgeek !

Our /batch changes have been deployed everywhere! I'm going to merge #72 now :)

@evgeek
Copy link
Contributor Author

evgeek commented Aug 29, 2024

@dmarticus We did it! Peachy! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants