Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kafka consumer #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

datto-aparrill
Copy link
Contributor

The current client is several major versions old and uses deprecated functionality (zookeeper offsets) that doesn't play nice with other clients.

This PR changes the project to use the new kafka consumer API.

Note that existing configurations are incompatible, since the new kafka client does not use the same config names as the old one.

The current client is several major versions old and uses deprecated functionality
that doesn't play nice with other clients (zookeeper offsets).
@muffix
Copy link
Contributor

muffix commented Mar 24, 2019

This looks great. Is it worth documenting the updating the config documentation in the readme, too?

public static final String AUTO_COMMIT_ENABLE_DEFAULT = "true";
public static final String AUTO_OFFSET_RESET_DEFAULT = "smallest";
public static final String AUTO_COMMIT_ENABLE_DEFAULT = "false";
public static final String AUTO_OFFSET_RESET_DEFAULT = "latest";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the changeset as small as possible and avoid surprises for users, I'd lean to keeping the old defaults in place unless there is a particular reason why have these two defaults changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-commit is off-by-default because this code does manual committing after we insert a batch of records.

For offset_reset, kafka changed the value names. "latest" is the new name for "smallest".

}
nanoCtr = System.nanoTime();
consumer.commitSync();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset commiting is here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, we cherry-picked these changes to the version we're runing and have found that committing manually is significantly reducing the throughput. The synchronous call was the bottleneck and since we prioritise latency, we switched back to using auto-commits. It's not a problem if a couple of points are written multiple times if a commit fails as long as these failres are rare.

In our setup, the node we ran with this version of the plugin managed to write up to 9,000 datapoints per second and after switching to auto-commits every 5000 ms, we reached more than 40,000.

Whether it's necessary to commit after every poll is maybe something to reconsider for this PR. As a middle way, we have seen good results with making the commit call async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was concerned about data durability, hence doing the commits manually and waiting for them to finish. However, if submitting the same metric multiple times just overwrites the same data, it shouldn't be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants