Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix race in TestIntraQueryCache_ClientError and TestInterQueryCache_ClientError #7280

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Juneezee
Copy link
Contributor

@Juneezee Juneezee commented Jan 17, 2025

Why the changes in this PR are needed?

A race condition was observed in the TestIntraQueryCache_ClientError and TestInterQueryCache_ClientError tests.

First Job Attempt

CI job link: https://github.com/open-policy-agent/opa/actions/runs/12801275263/job/35690527578

2025/01/16 03:11:32 http: panic serving 127.0.0.1:38698: send on closed channel
goroutine 1575 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1947 +0x10a
panic({0x10dc6e0?, 0x16ea610?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/open-policy-agent/opa/v1/topdown.TestIntraQueryCache_ClientError.func1.1({0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/src/v1/topdown/http_test.go:3360 +0x4d
net/http.HandlerFunc.ServeHTTP(0xc000ac3010, {0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/usr/local/go/src/net/http/server.go:2220 +0x48
net/http.serverHandler.ServeHTTP({0xc0010d6900?}, {0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/usr/local/go/src/net/http/server.go:3210 +0x2a2
net/http.(*conn).serve(0xc0012d6120, {0x16f39f0, 0xc000e35500})
	/usr/local/go/src/net/http/server.go:2092 +0x12a5
created by net/http.(*Server).Serve in goroutine 1505
	/usr/local/go/src/net/http/server.go:3360 +0x8ed
--- FAIL: TestIntraQueryCache_ClientError (0.00s)
    --- FAIL: TestIntraQueryCache_ClientError/no_raised_errors (0.05s)
        http_test.go:3379: Expected exactly 1 call to HTTP server, got 0
--- FAIL: TestInterQueryCache_ClientError (0.00s)
    --- FAIL: TestInterQueryCache_ClientError/raised_errors (0.07s)
        http_test.go:3446: Expected exactly 1 call to HTTP server, got 0

Second Job Attempt

CI job link: https://github.com/open-policy-agent/opa/actions/runs/12801275263/job/35734323510

2025/01/16 18:49:26 http: panic serving 127.0.0.1:55452: send on closed channel
goroutine 1584 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1947 +0x10a
panic({0x10dc6e0?, 0x16ea610?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/open-policy-agent/opa/v1/topdown.TestIntraQueryCache_ClientError.func1.1({0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/src/v1/topdown/http_test.go:3360 +0x4d
net/http.HandlerFunc.ServeHTTP(0xc001312a10, {0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/usr/local/go/src/net/http/server.go:2220 +0x48
net/http.serverHandler.ServeHTTP({0xc00047eae0?}, {0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/usr/local/go/src/net/http/server.go:3210 +0x2a2
net/http.(*conn).serve(0xc000127830, {0x16f39f0, 0xc000a2dd70})
	/usr/local/go/src/net/http/server.go:2092 +0x12a5
created by net/http.(*Server).Serve in goroutine 1597
	/usr/local/go/src/net/http/server.go:3360 +0x8ed
--- FAIL: TestIntraQueryCache_ClientError (0.00s)
    --- FAIL: TestIntraQueryCache_ClientError/no_raised_errors (0.04s)
        http_test.go:3379: Expected exactly 1 call to HTTP server, got 0

Root Cause

opa/v1/topdown/http_test.go

Lines 3423 to 3439 in 8067014

ch := make(chan *http.Request)
// A HTTP server that always causes a timeout
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ch <- r
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte(`{"foo": "bar"}`)) // ignore error
}))
defer ts.Close()
var rules []string
for _, r := range tc.rules {
rules = append(rules, strings.ReplaceAll(r, "%URL%", ts.URL))
}
runTopDownTestCase(t, data, tc.note, rules, tc.expected)
requests := getAllRequests(ch)

The issue lies in the default case in the select statement of the getAllRequests function:

opa/v1/topdown/http_test.go

Lines 3452 to 3467 in 8067014

func getAllRequests(ch chan *http.Request) []*http.Request {
defer close(ch)
var requests []*http.Request
for {
select {
case x, ok := <-ch:
if ok {
requests = append(requests, x)
} else {
return requests
}
default:
return requests
}
}
}

The race condition occurs when runTopDownTestCase completes before the http.HandlerFunc in the test server executes. This leaves the ch chan *http.Request channel empty, triggering the default case to return an empty []*http.Request. Then, the defer close(ch) call closes the channel. When the http.HandlerFunc finally executes and tries to send send to the already closed channel, it causes the panic http: panic serving 127.0.0.1:55452: send on closed channel.

What are the changes in this PR?

This PR updates the TestIntraQueryCache_ClientError and TestInterQueryCache_ClientError tests to use context.Context for waiting. A timeout of 1 second ensures all HTTP requests are processed by the test server before proceeding.

Notes to assist PR review:

Further comments:

Copy link
Contributor Author

@Juneezee Juneezee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 Looks like we have another flaky test https://github.com/open-policy-agent/opa/actions/runs/12829170916/job/35774782220?pr=7280#step:6:372

--- FAIL: TestCertPoolReloading (3.05s)
    server_test.go:6095: server address: 127.0.0.1:36833
    server_test.go:6151: server started
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6213: server still doesn't trust client cert
    server_test.go:6203: server didn't accept client cert before deadline

…ryCache_ClientError`

Observed in [1]:

2025/01/16 03:11:32 http: panic serving 127.0.0.1:38698: send on closed channel
goroutine 1575 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1947 +0x10a
panic({0x10dc6e0?, 0x16ea610?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/open-policy-agent/opa/v1/topdown.TestIntraQueryCache_ClientError.func1.1({0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/src/v1/topdown/http_test.go:3360 +0x4d
net/http.HandlerFunc.ServeHTTP(0xc000ac3010, {0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/usr/local/go/src/net/http/server.go:2220 +0x48
net/http.serverHandler.ServeHTTP({0xc0010d6900?}, {0x16f2740, 0xc0001ec1c0}, 0xc0006ee140)
	/usr/local/go/src/net/http/server.go:3210 +0x2a2
net/http.(*conn).serve(0xc0012d6120, {0x16f39f0, 0xc000e35500})
	/usr/local/go/src/net/http/server.go:2092 +0x12a5
created by net/http.(*Server).Serve in goroutine 1505
	/usr/local/go/src/net/http/server.go:3360 +0x8ed
--- FAIL: TestIntraQueryCache_ClientError (0.00s)
    --- FAIL: TestIntraQueryCache_ClientError/no_raised_errors (0.05s)
        http_test.go:3379: Expected exactly 1 call to HTTP server, got 0
--- FAIL: TestInterQueryCache_ClientError (0.00s)
    --- FAIL: TestInterQueryCache_ClientError/raised_errors (0.07s)
        http_test.go:3446: Expected exactly 1 call to HTTP server, got 0

and [2]:

2025/01/16 18:49:26 http: panic serving 127.0.0.1:55452: send on closed channel
goroutine 1584 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1947 +0x10a
panic({0x10dc6e0?, 0x16ea610?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/open-policy-agent/opa/v1/topdown.TestIntraQueryCache_ClientError.func1.1({0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/src/v1/topdown/http_test.go:3360 +0x4d
net/http.HandlerFunc.ServeHTTP(0xc001312a10, {0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/usr/local/go/src/net/http/server.go:2220 +0x48
net/http.serverHandler.ServeHTTP({0xc00047eae0?}, {0x16f2740, 0xc0001ac380}, 0xc0000268c0)
	/usr/local/go/src/net/http/server.go:3210 +0x2a2
net/http.(*conn).serve(0xc000127830, {0x16f39f0, 0xc000a2dd70})
	/usr/local/go/src/net/http/server.go:2092 +0x12a5
created by net/http.(*Server).Serve in goroutine 1597
	/usr/local/go/src/net/http/server.go:3360 +0x8ed
--- FAIL: TestIntraQueryCache_ClientError (0.00s)
    --- FAIL: TestIntraQueryCache_ClientError/no_raised_errors (0.04s)
        http_test.go:3379: Expected exactly 1 call to HTTP server, got 0

[1]: https://github.com/open-policy-agent/opa/actions/runs/12801275263/job/35690527578
[2]: https://github.com/open-policy-agent/opa/actions/runs/12801275263/job/35734323510

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee Juneezee force-pushed the test/querycache-race branch from f8e691a to c64d53a Compare January 23, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant