-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to specify uuid when capturing events #68
Conversation
@fuziontech @dmarticus Hi guys! I checked the functionality on our project - everything works perfectly well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments about versioning and the changelog, but the implementation itself looks solid. Thanks for the contribution!
Before I approve this change though, I'd like to understand a bit more about the motivation and use case. Right now, PostHog auto-generates UUIDs for given events when we ingest them, and there shouldn't be any risk of event duplication, since we also dedupe events as part of our ingestion pipeline. Generally, PostHog clients aren't responsible for assigning UUIDs to events themselves – as far as I know, we don't support specifically passing in UUIDs in any of our other SDKs (Python example.
What's the use case here that you're solving for that can't be handled by just passing the UUID in through the properties payload? Ideally, you shouldn't be managing event uniqueness at all on your side when sending things through PostHog.
CHANGELOG.md
Outdated
# Changelog | ||
|
||
## 1.3.0 - 2024-08-14 | ||
|
||
1. Added the ability to explicitly specify the uuid for capturing events. If uuid is not specified, Posthog will calculate it automatically (suitable for most cases). | ||
2. Removed unused config function `uid()`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evgeek this file is auto-generated, you don't need to make edits to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+
version.go
Outdated
@@ -3,7 +3,7 @@ package posthog | |||
import "flag" | |||
|
|||
// Version of the client. | |||
const Version = "1.2.18" | |||
const Version = "1.3.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same note here; we auto-bump the version when we deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+
We have bunch of projects that send events. They save them in a DB, and then send them to different services via AMQP (https://github.com/umbrellio/table_sync), where they are also saved in a DB. These messages can be sent multiple times for different reasons: from updating a row in a DB to forcing sync of some events. In our services, we handle retries using message versions (old messages are ignored, new ones update previously written data by primary key). But sending to Posthog is different from saving to a DB because of updating unavailable, and we have to ensure that a retried AMQP message does not cause new events. We can save the id of the sent events in a DB, but this will cause unnecessary overhead: creating and managing a new DB, slowing down the system due to IO, resolving race conditions when multiple k8s pods try to access the same row, etc. Or we can just calculate a reproductible UUID - in this case, Posthog will simply skip the re-sent event. Very simple and elegant, I belive :) About deduplication on ingestion: This is a great feature, but since our events are stored in a database and can be updated, it may not always recognize duplicates. And I'm not sure how dedupe will work if the same event is sent on different days. About other SDKs - I haven't tried them all, but PHP definitely allows it, we used the same approach in the previous version of the event service. |
@evgeek thanks for your thorough reply! I understand your situation better now. I think your perspective makes sense, and frankly it makes me think that all of our SDKs should let consumers supply optional UUIDs. Let's get this change merged, thanks for doing it! |
Excellent! Thanks for your responsiveness and awesome product! :) |
I think this pull request broke posthog.Capture. I am currently getting an error in v1.2.19. I had to downgrade to v1.2.18 and everything is working as expected. |
Ooof; thanks for the report! Sorry to cause breakage; I'm away from my computer this weekend but I'll look into a fix when I'm back! |
@migueldv90 Sorry for the inconvenience! @dmarticus It seem I tested the feature on the Difference is that curl --location 'https://eu.i.posthog.com/capture/' \
--header 'Content-Type: application/json' \
--header 'User-Agent: posthog-go (version: 1.2.19)' \
--data '{
"api_key" : "phc_...",
"batch" : [ {
"type" : "capture",
"uuid" : "",
"library" : "posthog-go",
"library_version" : "1.2.19",
"timestamp" : "2024-08-25T11:50:20.154164+00:00",
"distinct_id" : "1234",
"event" : "event1",
"properties" : {
"$lib" : "posthog-go",
"$lib_version" : "1.2.19",
"project" : "test3"
},
"send_feature_flags" : false
},
{
"type" : "capture",
"uuid" : "",
"library" : "posthog-go",
"library_version" : "1.2.19",
"timestamp" : "2024-08-25T11:50:20.154164+00:00",
"distinct_id" : "1234",
"event" : "event2",
"properties" : {
"$lib" : "posthog-go",
"$lib_version" : "1.2.19",
"project" : "test2"
},
"send_feature_flags" : false
},
{
"type" : "capture",
"uuid" : "",
"library" : "posthog-go",
"library_version" : "1.2.19",
"timestamp" : "2024-08-25T11:50:20.154164+00:00",
"distinct_id" : "1234",
"event" : "event3",
"properties" : {
"$lib" : "posthog-go",
"$lib_version" : "1.2.19",
"project" : "test1"
},
"send_feature_flags" : false
} ]
}' As a quick fix I made this PR - #70. It works, but using an endpoint not designed for batch processing is obviously is not very good decision. It seems that adjusting |
Hi @evgeek Did some digging into our ingestion endpoint and found that, yes – due to the way we serialize batch requests, we can handle missing |
Excellent, thank you, I'll be looking forward to it! |
@dmarticus Hi! I see your PR about I made a new PR with reimplementation of uuid specification feature - #72. I tested it, now it works with empty uuid. Please review it when you can :) |
Hi @evgeek Thanks for reaching out! I was actually going to DM you today; I merged the |
Got it! |
@dmarticus We did it! Peachy! Thanks :) |
Hello!
We really miss the ability to specify the uuid of events manually when capturing events.
The thing is that we cannot guarantee that events come to our system only once. In the current implementation, in such cases, events in Posthog will be duplicated.
In the SDK in other languages, we got around this by manually generating the uuid, calculated based on the immutable fields of the event - however, in the Golang SDK, unfortunately, there is no such option yet.
Please consider the PR adding this functionality.
P.S. As far as I can see, the current implementation does not use the
posthog.Config.uid()
function, so I removed it.