-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update kafka consumer #14
base: master
Are you sure you want to change the base?
Conversation
The current client is several major versions old and uses deprecated functionality that doesn't play nice with other clients (zookeeper offsets).
This looks great. Is it worth documenting the updating the config documentation in the readme, too? |
public static final String AUTO_COMMIT_ENABLE_DEFAULT = "true"; | ||
public static final String AUTO_OFFSET_RESET_DEFAULT = "smallest"; | ||
public static final String AUTO_COMMIT_ENABLE_DEFAULT = "false"; | ||
public static final String AUTO_OFFSET_RESET_DEFAULT = "latest"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the changeset as small as possible and avoid surprises for users, I'd lean to keeping the old defaults in place unless there is a particular reason why have these two defaults changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-commit is off-by-default because this code does manual committing after we insert a batch of records.
For offset_reset, kafka changed the value names. "latest" is the new name for "smallest".
} | ||
nanoCtr = System.nanoTime(); | ||
consumer.commitSync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offset commiting is here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, we cherry-picked these changes to the version we're runing and have found that committing manually is significantly reducing the throughput. The synchronous call was the bottleneck and since we prioritise latency, we switched back to using auto-commits. It's not a problem if a couple of points are written multiple times if a commit fails as long as these failres are rare.
In our setup, the node we ran with this version of the plugin managed to write up to 9,000 datapoints per second and after switching to auto-commits every 5000 ms, we reached more than 40,000.
Whether it's necessary to commit after every poll is maybe something to reconsider for this PR. As a middle way, we have seen good results with making the commit call async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was concerned about data durability, hence doing the commits manually and waiting for them to finish. However, if submitting the same metric multiple times just overwrites the same data, it shouldn't be a problem.
The current client is several major versions old and uses deprecated functionality (zookeeper offsets) that doesn't play nice with other clients.
This PR changes the project to use the new kafka consumer API.
Note that existing configurations are incompatible, since the new kafka client does not use the same config names as the old one.