Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve health check to fail if schema_reader raises exceptions #957

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

keejon
Copy link
Contributor

@keejon keejon commented Sep 19, 2024

About this change - What it does

This PR improves the health check endpoint by returning 503 if the schema_reader throws >= 3 unexpected consecutive exceptions in 10 seconds.
In addition it will check if topic is available on the kafka cluster. This is needed because consumer will only return error once in this case.

Why this way

In its current state health check would always report healthy service even if connection to kafka was broken.

@keejon keejon requested review from a team as code owners September 19, 2024 12:41
except ShutdownException:
self._stop_schema_reader.set()
shutdown()
except Exception as e: # pylint: disable=broad-except
self.stats.unexpected_exception(ex=e, where="schema_reader_loop")
self.consecutive_unexpected_errors += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only care about the message handling errors? This won't consider other possible kafka errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume if there are kafka problems they would show up in consume call

Copy link

github-actions bot commented Sep 19, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/karapace
  config.py
  karapace.py 109-112
  karapace_all.py
  schema_reader.py 266
  schema_registry_apis.py 119-123
Project Total  

This report was generated by python-coverage-comment-action

@Claudenw
Copy link

Where are the test cases to show this works and to verify that it will not be broken later?

@keejon keejon marked this pull request as draft September 19, 2024 13:58
@keejon keejon force-pushed the keejon/improve-schema-registry-health-check branch 10 times, most recently from 7cd6251 to d35e3fb Compare September 24, 2024 12:51
@keejon keejon marked this pull request as ready for review September 24, 2024 14:19
@keejon
Copy link
Contributor Author

keejon commented Sep 24, 2024

Where are the test cases to show this works and to verify that it will not be broken later?

@Claudenw i added tests now, pls have another look

@keejon keejon force-pushed the keejon/improve-schema-registry-health-check branch 3 times, most recently from 28bd71c to 73b645f Compare October 1, 2024 07:45
@keejon keejon force-pushed the keejon/improve-schema-registry-health-check branch from 73b645f to 66db599 Compare October 9, 2024 13:42
Comment on lines +68 to +69
UNHEALTHY_TIMEOUT_SECONDS: Final = 10.0
UNHEALTHY_CONSECUTIVE_ERRORS: Final = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not mandatory for this pr, would be nice to have those configurable. Maybe depending on the load and the cluster size those numbers needs to be adjusted.

except ShutdownException:
self._stop_schema_reader.set()
shutdown()
except KafkaUnavailableError:
self.consecutive_unexpected_errors += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to be able to differentiate the cause, like if kafka its down obviously karapace isn't going to work. But again not something I wanna see in this pr, as it is its a good improvement already

Copy link
Contributor

@eliax1996 eliax1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eliax1996 eliax1996 merged commit aaa7812 into main Oct 10, 2024
9 checks passed
@eliax1996 eliax1996 deleted the keejon/improve-schema-registry-health-check branch October 10, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants