Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky TestCluster_Etcd_Serf #120

Closed
ohkinozomu opened this issue Aug 13, 2024 · 4 comments · Fixed by #121
Closed

flaky TestCluster_Etcd_Serf #120

ohkinozomu opened this issue Aug 13, 2024 · 4 comments · Fixed by #121

Comments

@ohkinozomu
Copy link
Contributor

ohkinozomu commented Aug 13, 2024

OS version

ProductName:		macOS
ProductVersion:		14.6
BuildVersion:		23G80

go version

go version go1.22.4 darwin/arm64

revision

060d7066d4f6b5ba4b08911ee26d545d33a2d89d

How to reproduce

go test -v -run ^TestCluster_Etcd_Serf$ ./cluster -count 20

Result

    agent_test.go:348: Test completed successfully
time=2024-08-13T20:03:49.537+09:00 level=INFO msg="stopping raft..."
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"1","remote-peer-id":"2","error":"context canceled"}
{"level":"warn","msg":"peer became inactive (message send to peer failed)","peer-id":"2","error":"failed to read 2 on stream MsgApp v2 (context canceled)"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"1","remote-peer-id":"2","error":"context canceled"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"1","remote-peer-id":"1","error":"EOF"}
{"level":"warn","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"1"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"1","remote-peer-id":"1","error":"EOF"}
{"level":"warn","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"1"}
{"level":"warn","msg":"peer became inactive (message send to peer failed)","peer-id":"1","error":"failed to dial 1 on stream MsgApp v2 (context canceled)"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"1","remote-peer-id":"3","error":"context canceled"}
{"level":"warn","msg":"peer became inactive (message send to peer failed)","peer-id":"3","error":"failed to read 3 on stream MsgApp v2 (context canceled)"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"1","remote-peer-id":"3","error":"context canceled"}
time=2024-08-13T20:03:49.537+09:00 level=INFO msg="raft stopped"
time=2024-08-13T20:03:49.537+09:00 level=INFO msg="stopping node..."
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"2","remote-peer-id-stream-handler":"2","remote-peer-id-header":"1","remote-peer-id-from":"2","cluster-id":"1199"}
{"level":"warn","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"3","remote-peer-id":"1"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
{"level":"warn","msg":"ignored streaming request; ID mismatch","local-member-id":"3","remote-peer-id-stream-handler":"3","remote-peer-id-header":"1","remote-peer-id-from":"3","cluster-id":"1199"}
2024/08/13 20:03:49 [INFO] serf: EventMemberLeave: 1 127.0.0.1
2024/08/13 20:03:49 [INFO] serf: EventMemberLeave: 1 127.0.0.1
time=2024-08-13T20:03:49.936+09:00 level=INFO msg="member leave" node=1 addr=127.0.0.1 port=65436
time=2024-08-13T20:03:49.936+09:00 level=INFO msg="raft leave" node=1 addr=127.0.0.1:65437
raft2024/08/13 20:03:49 INFO: 3 switched to configuration voters=(2 3)
{"level":"warn","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"1"}
raft2024/08/13 20:03:49 INFO: 2 switched to configuration voters=(2 3)
{"level":"warn","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"1"}
time=2024-08-13T20:03:49.946+09:00 level=INFO msg="[raft] node is removed to the cluster" node=1
{"level":"warn","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","remote-peer-id":"1"}
time=2024-08-13T20:03:49.946+09:00 level=INFO msg="[raft] node is removed to the cluster" node=1
2024/08/13 20:03:50 [INFO] serf: EventMemberLeave: 1 127.0.0.1
time=2024-08-13T20:03:50.136+09:00 level=INFO msg="member leave" node=1 addr=127.0.0.1 port=65436
time=2024-08-13T20:03:50.136+09:00 level=INFO msg="raft leave" node=1 addr=127.0.0.1:65437
raft2024/08/13 20:03:50 INFO: 3 switched to configuration voters=(2 3)
{"level":"panic","msg":"unexpected removal of unknown remote peer","remote-peer-id":"1"}
raft2024/08/13 20:03:50 INFO: 2 switched to configuration voters=(2 3)
{"level":"panic","msg":"unexpected removal of unknown remote peer","remote-peer-id":"1"}
panic: unexpected removal of unknown remote peer

goroutine 150 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x40?, 0x1?, {0x0?, 0x0?, 0x0?})
	/Users/ookinozomu/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196 +0x78
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0x1400057a270, {0x14001b63040, 0x1, 0x1})
	/Users/ookinozomu/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262 +0x1c4
go.uber.org/zap.(*Logger).Panic(0x1011122c0?, {0x100eed344?, 0x1?}, {0x14001b63040, 0x1, 0x1})
	/Users/ookinozomu/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285 +0x54
go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).removePeer(0x140006048c0, 0x1)
	/Users/ookinozomu/go/pkg/mod/go.etcd.io/etcd/server/v3@v3.5.13/etcdserver/api/rafthttp/transport.go:346 +0x10c
go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).RemovePeer(0x140006048c0, 0x1)
	/Users/ookinozomu/go/pkg/mod/go.etcd.io/etcd/server/v3@v3.5.13/etcdserver/api/rafthttp/transport.go:329 +0x70
github.com/wind-c/comqtt/v2/cluster/raft/etcd.(*Peer).publishEntries(0x140003b2000, {0x14000d6c2d8, 0x1, 0xf?})
	/Users/ookinozomu/project5/comqtt/cluster/raft/etcd/peer.go:520 +0x31c
github.com/wind-c/comqtt/v2/cluster/raft/etcd.(*Peer).serveChannels(0x140003b2000)
	/Users/ookinozomu/project5/comqtt/cluster/raft/etcd/peer.go:358 +0x69c
created by github.com/wind-c/comqtt/v2/cluster/raft/etcd.(*Peer).startRaft in goroutine 115
	/Users/ookinozomu/project5/comqtt/cluster/raft/etcd/peer.go:269 +0x60c
FAIL	github.com/wind-c/comqtt/v2/cluster	19.851s
FAIL

I plan to look into this.

@ohkinozomu
Copy link
Contributor Author

Interestingly, it appears that updating the etcd library to v3.6.0-alpha.0 prevents the tests from failing: #121

@wind-c
Copy link
Owner

wind-c commented Aug 16, 2024

Is it confirmed?

@ohkinozomu
Copy link
Contributor Author

Hmm, it will take some time, but I'll try to identify which commit resolved the issue.

@ohkinozomu
Copy link
Contributor Author

I found that the commit is in etcd-io/etcd#13645 .
It has not been included in the 3.5.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants