-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve health check to fail if schema_reader raises exceptions #957
Conversation
karapace/schema_reader.py
Outdated
except ShutdownException: | ||
self._stop_schema_reader.set() | ||
shutdown() | ||
except Exception as e: # pylint: disable=broad-except | ||
self.stats.unexpected_exception(ex=e, where="schema_reader_loop") | ||
self.consecutive_unexpected_errors += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only care about the message handling errors? This won't consider other possible kafka errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume if there are kafka problems they would show up in consume call
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
Where are the test cases to show this works and to verify that it will not be broken later? |
7cd6251
to
d35e3fb
Compare
@Claudenw i added tests now, pls have another look |
28bd71c
to
73b645f
Compare
73b645f
to
66db599
Compare
UNHEALTHY_TIMEOUT_SECONDS: Final = 10.0 | ||
UNHEALTHY_CONSECUTIVE_ERRORS: Final = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not mandatory for this pr, would be nice to have those configurable. Maybe depending on the load and the cluster size those numbers needs to be adjusted.
except ShutdownException: | ||
self._stop_schema_reader.set() | ||
shutdown() | ||
except KafkaUnavailableError: | ||
self.consecutive_unexpected_errors += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to be able to differentiate the cause, like if kafka its down obviously karapace isn't going to work. But again not something I wanna see in this pr, as it is its a good improvement already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
About this change - What it does
This PR improves the health check endpoint by returning 503 if the schema_reader throws >= 3 unexpected consecutive exceptions in 10 seconds.
In addition it will check if topic is available on the kafka cluster. This is needed because consumer will only return error once in this case.
Why this way
In its current state health check would always report healthy service even if connection to kafka was broken.