Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: corrupt data in routes() response due to healthchecker data #11844

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Revolyssup
Copy link
Contributor

@Revolyssup Revolyssup commented Dec 19, 2024

Description

Fixes # (issue)
Currently there is an intermittent bug. Below are the steps to reproduce:
Reproduction steps

  1. Create a route and enable health check
curl http://127.0.0.1:9180/apisix/admin/routes/1 -H 'X-API-KEY: wqGBivhkYbFXiMdcMEAOMZanCGnLSbWE' -X PUT -d '
{
    "uri": "/*",
    "upstream": {
         "nodes": {
            "httpbin.org:80": 1,
            "mockbin.org:80": 1
        },
        "type": "roundrobin",
        "checks": {
            "active": {
                "timeout": 5,
                "type": "tcp",
                "healthy": {
                    "interval": 2,
                    "successes": 1
                },                
                "unhealthy": {
                    "interval": 2,
                    "successes": 1
                }
            }
        }
    }
}'
  1. access routes to trigger health checks
curl http://127.0.0.1:9080/get
  1. When accessing the control api, 500 will be returned and the error in the problem description will be generated.

Checklist

This error is intermittent and multiple tries might be needed to reproduce.
On debugging it was found that the cause of issue was http_router.user_routes table being polluted by adding healthchecks leaving unexpected data which while trying to parse, the error was generated.

As you can see in the generated logs, user_route returned healthchecker data.
Screenshot from 2024-12-19 15-11-00
Screenshot from 2024-12-19 15-10-51

Fix

Remove healthcheck related objects before sending to user.

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels Dec 19, 2024
@Revolyssup Revolyssup marked this pull request as draft December 20, 2024 08:06
@Revolyssup Revolyssup marked this pull request as ready for review December 22, 2024 17:33
@Revolyssup Revolyssup changed the title fix: corrupt data in routes() due to user requeset fix: corrupt data in routes() response due to healthchecker data Dec 22, 2024
@membphis
Copy link
Member

we can not update the watched data from etcd, it is dangerous

@membphis membphis closed this Dec 23, 2024
@Revolyssup Revolyssup reopened this Dec 23, 2024
@Revolyssup
Copy link
Contributor Author

Revolyssup commented Dec 23, 2024

we can not update the watched data from etcd, it is dangerous

The original data is not being modified. It is already deepcopied. But to make it easier to review I have refactored to move the modification logic just below the deepcopy function call so that it's clear that the original data is not being modified.

Comment on lines +276 to +278
if new_route.clean_handlers then
new_route.clean_handlers = {}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just also set to nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some existing test cases grep logs and expect this to {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can fix the test case as well.
Checking the logs has no meaning for the control api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants