Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple flagstores per service #87

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FlorianKothmeier
Copy link

@FlorianKothmeier FlorianKothmeier commented Mar 27, 2024

This PR copies the flagstore implementation from ECSC 2022.

To support multiple flagstores, multiple checker scripts have to be provided by the service authors. Each flagstore is then registered as a seperate service. Additionally, service categories group the flagstores to provide the "real" services.

Tested locally and the python checkerlib is confirmed working. I have tested multiple flagstore result combinations and they all behave as expected (OK+OK -> OK, OK+FAULTY -> FAULTY, OK+NOT CHECKED -> NOT CHECKED, etc.)

I suspect that the go checkerlib will need refactoring, because each (python) checker function additionally returns a message that is shown on the scoreboard. However, the go checkerlib has not been adapted to support this. I don't know whether we want to keep that feature. Furthermore, I know next to nothing about go, therefore someone else would need to take a look at it if we decide to keep it.

Additionally, I'm not sure about the scoring formula. AFAIK, each service group gets SLA points equal to COUNT(FLAGSTORE) iff all of them are OK. This should be easy to adapt though, once we decide on what scoring formula to use.

There is still some commented out SQL, which should be removed before merging.

Resolves #77

sla_ok_by_servicegroup_tick AS (
SELECT team_id,
scoring_service.service_group_id AS service_group_id,
0.5 * servicegroup.services_count * (COUNT(*) = servicegroup.services_count)::int AS sla_ok
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the SLA formula for each service group, which should be easy to replace in case we want to use another formula.

Copy link
Author

@FlorianKothmeier FlorianKothmeier Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually evaluates to 0.5 * len(services[group]) for each check result instead of 0.5 * len(services[group])**2 as I initially thought.

But, because it gets evaluated once for each service in the group this still results in 0.5 * len(services[group])**2. I think this was unintended, as their scoring provides the formula

sla = (count(ticks_with_status['up'] + 0.5 * ticks_with_status['recovering'])) * sqrt(count(teams)) * count(flag_stores_in_service)

Also note that this formula is the same as the formula for recovering sla points, so this formula should have had a factor of 1.0 instead.

To get the documented behavior, the formulas should have been:
sla: 1.0 * (COUNT(*) = servicegroup.services_count)::int
sla_recover: 0.5 * (COUNT(*) = servicegroup.services_count)::int

EDIT: their formula does work as intended actually

Some additional thoughts:

  • Do we want sla to scale linearly with the number of flagstores? Other options include sla that doesn't depend on the number of flagstores or some non-linear scaling (maybe sqrt(len(services[group])?)
  • Currently no partial sla points are awarded for having some flagstores available, when others are down. Maybe we should award y/x * sla if y of x services in the group are up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flag Stores where implemented by ECSC 2022
1 participant