clouddriver-ecs: regression with 1.24.0 - broken #6245

lifeofguenter · 2020-12-21T10:40:14Z

Analogue to #6084 there is now a new endpoint that is affected from the 4k SQL bug:

the same API call on 1.23.5 only does ~100 SQL queries:

Which still seems a bit high, but much more acceptable :) (e.g. downgrading fixed the issue for us)

piradeepk · 2020-12-21T14:22:08Z

Just to clarify, the api isn't broken but rather affected by the same issue described in: #6084

To provide some context, previously, the application/server group information was being inferred in Clouddriver, but with the newly implemented moniker support (as you can see the api's are different one queries name, while the other does groupName), we're now using the cache to retrieve this information, and it's using the same logic that is described in the above issue.

piradeepk · 2020-12-21T17:23:12Z

@lifeofguenter would you be able to provide some information as to what query you're trying to run/request you're making? What are you trying to do when you see this behaviour? What is the experience that you're having? Are you no longer able to use Spinnaker or are you just seeing a large number of queries?

Also, in your hal config, do you have any namingStrategies defined/configured?

lifeofguenter · 2020-12-21T18:40:32Z

@piradeepk

In deck -> application -> clusters -> select server group -> sidebar pops up with server-group information

Deck will do the following API request to retrieve information for that sidebar:

https://spinnaker-gate.foo/applications/fooservice/serverGroups/prod-ecs/us-east-1/fooservice-v016?includeDetails=false

Also, in your hal config, do you have any namingStrategies defined/configured?

we are not using any namingStrategy currently (so it should be defaulting to something?) - Given that the amount of SQL queries are almost exactly the same like in #6084, I don't think this is an issue with the new implementation, rather the new implementation uncovered an old issue.

I do however see this in our clouddriver.yml which I am not sure if its correct:

sql:
  enabled: true
  taskRepository:
    enabled: true
  cache:
    enabled: true

e.g. what is "taskRepository" and why is "cache.enabled" required?

Are you no longer able to use Spinnaker

We are unable to use 1.24 unfortunately and had to rollback. With every new version of Spinnaker we are losing new components due to the 4k SQL query bug. And since we have limited understanding of what other API endpoints might be affected as well in 1.24 we decided to not take a chance :)

I think someone needs to look into clouddriver-ecs and how it translates lookups into sql queries :) - but I don't think it will be an easy task, nor something that will be fixed anytime soon 😞

piradeepk · 2020-12-21T19:04:18Z

Thanks for the quick response. Will continue to investigate with the given information.

piradeepk · 2020-12-21T21:16:19Z

From my initial investigation, it looks to be related to a PR to support Monikers for ECS, in which we removed the applicationName (to support monikers): spinnaker/clouddriver@fc08d9a#diff-08ef4737fffbc8ef45383c0%5B%E2%80%A6%5D2d4b4b314665b4f134c0b5a4L527 (customers with a large number of serverGroups, would previously use that value to prevent the EcsServerClusterProvider from checking every serverGroup, if the service's appname didn't match the name of the application provided, it wouldn't check the rest of the logic).

The check that would bypass all of the logic if the appname didn't match: spinnaker/clouddriver@fc08d9a#diff-08ef4737fffbc8ef45383c0%5B%E2%80%A6%5D2d4b4b314665b4f134c0b5a4R131 (application is being passed in as null every time so that check is always false, but anyone with a larger number of server groups is timing out now since we're checking every single server group, rather than just the ones that match serverGroupName).

piradeepk · 2020-12-21T22:26:26Z

@lifeofguenter I've made a change (and it's been merged to master and the next patch release) that should hopefully fix this issue for you. A patch release should be released soon, please let us know if this is still an issue once 1.24.1 is released (hopefully within the next few hours)

lifeofguenter · 2020-12-22T09:59:26Z

thanks @piradeepk - your changes have fixed the issue 🍾

I am now back to a more "sane" number of mysql queries for that endpoint:

I must note though, that compared to a EC2 lookup its still concerning high:

E.g. the sidebar opens up with a EC2 servergroup quite snappy in less than 50ms - as its only doing 3 sql queries. With ECS it does ~50 sql queries and thus takes almost 1000ms for the same endpoint.

We have both equally amount of services on EC2 and ECS. My question: why is ECS so much slower? Why does it require a magnitude of 50-100x of SQL queries for the same information?

piradeepk added provider/ecs sig/aws labels Dec 21, 2020

piradeepk mentioned this issue Dec 21, 2020

fix(ecs): add application name filter to findCluster spinnaker/clouddriver#5168

Merged

mergify bot closed this as completed in spinnaker/clouddriver#5168 Dec 21, 2020

piradeepk reopened this Dec 21, 2020

piradeepk self-assigned this Dec 21, 2020

lifeofguenter closed this as completed Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clouddriver-ecs: regression with 1.24.0 - broken #6245

clouddriver-ecs: regression with 1.24.0 - broken #6245

lifeofguenter commented Dec 21, 2020 •

edited

Loading

piradeepk commented Dec 21, 2020 •

edited

Loading

piradeepk commented Dec 21, 2020 •

edited

Loading

lifeofguenter commented Dec 21, 2020 •

edited

Loading

piradeepk commented Dec 21, 2020

piradeepk commented Dec 21, 2020

piradeepk commented Dec 21, 2020

lifeofguenter commented Dec 22, 2020 •

edited

Loading

clouddriver-ecs: regression with 1.24.0 - broken #6245

clouddriver-ecs: regression with 1.24.0 - broken #6245

Comments

lifeofguenter commented Dec 21, 2020 • edited Loading

piradeepk commented Dec 21, 2020 • edited Loading

piradeepk commented Dec 21, 2020 • edited Loading

lifeofguenter commented Dec 21, 2020 • edited Loading

piradeepk commented Dec 21, 2020

piradeepk commented Dec 21, 2020

piradeepk commented Dec 21, 2020

lifeofguenter commented Dec 22, 2020 • edited Loading

lifeofguenter commented Dec 21, 2020 •

edited

Loading

piradeepk commented Dec 21, 2020 •

edited

Loading

piradeepk commented Dec 21, 2020 •

edited

Loading

lifeofguenter commented Dec 21, 2020 •

edited

Loading

lifeofguenter commented Dec 22, 2020 •

edited

Loading