Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] kb-post-provision-job fails with "ERR Duplicate master name." #8114

Closed
MarkKharitonov opened this issue Sep 9, 2024 · 3 comments
Closed
Assignees
Labels
kind/bug Something isn't working Stale
Milestone

Comments

@MarkKharitonov
Copy link

Describe the bug

mark@L-R910LPKW:~$ k get pod
NAME                                                         READY   STATUS    RESTARTS   AGE
aida-dev-xyz-mining-redis-0                             3/3     Running   0          27h
aida-dev-xyz-mining-redis-1                             3/3     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-0                    1/1     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-1                    1/1     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-2                    1/1     Running   0          27h
kb-post-provision-job-aida-dev-xyz-mining-redis-6l9gq   0/1     Error     0          3m
kb-post-provision-job-aida-dev-xyz-mining-redis-7qjhh   0/1     Error     0          3m25s
kb-post-provision-job-aida-dev-xyz-mining-redis-dcxt8   0/1     Error     0          3m41s
mark@L-R910LPKW:~$

To Reproduce
Not sure, but for me it is reproduced very easily - I just need to delete the job to let it be created again and it errors out.

Expected behavior
No errors.

Additional context
I have 5 Redis instances deployed with KB, each with sentinels and each having 2 replicas for the database and 3 for the sentinels. Only one instance exhibits the problematic behavior:

mark@L-R910LPKW:~$ k get job
NAME                                                   STATUS   COMPLETIONS   DURATION   AGE
kb-post-provision-job-aida-dev-xyz-mining-redis   Failed   0/1           6m53s      6m53s
mark@L-R910LPKW:~$ k delete job --all
job.batch "kb-post-provision-job-aida-dev-xyz-mining-redis" deleted
mark@L-R910LPKW:~$ k get job
NAME                                                   STATUS    COMPLETIONS   DURATION   AGE
kb-post-provision-job-aida-dev-xyz-mining-redis   Running   0/1           2s         2s
mark@L-R910LPKW:~$ sleep 30
mark@L-R910LPKW:~$ k get job
NAME                                                   STATUS    COMPLETIONS   DURATION   AGE
kb-post-provision-job-aida-dev-xyz-mining-redis   Running   0/1           41s        41s
mark@L-R910LPKW:~$ k get pod
NAME                                                         READY   STATUS    RESTARTS   AGE
aida-dev-xyz-mining-redis-0                             3/3     Running   0          27h
aida-dev-xyz-mining-redis-1                             3/3     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-0                    1/1     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-1                    1/1     Running   0          27h
aida-dev-xyz-mining-redis-sentinel-2                    1/1     Running   0          27h
kb-post-provision-job-aida-dev-xyz-mining-redis-7dhpq   0/1     Error     0          6s
kb-post-provision-job-aida-dev-xyz-mining-redis-b89f8   0/1     Error     0          47s
kb-post-provision-job-aida-dev-xyz-mining-redis-ffmpj   0/1     Error     0          31s
mark@L-R910LPKW:~$ k logs kb-post-provision-job-aida-dev-xyz-mining-redis-b89f8
+ declare -g default_initialize_pod_ordinal
+ declare -g redis_advertised_svc_host_value
+ declare -g redis_advertised_svc_port_value
+ declare -g headless_postfix=headless
+ declare -g redis_default_service_port=6379
+ echo 'redis sentinel component replicas found, register to sentinel.'
+ register_to_sentinel_wrapper
+ '[' -z aida-dev-xyz-mining-redis-sentinel-0,aida-dev-xyz-mining-redis-sentinel-1,aida-dev-xyz-mining-redis-sentinel-2 ']'
+ '[' -z aida-dev-xyz-mining-redis-sentinel-headless ']'
+ get_minimum_initialize_pod_ordinal
+ '[' -z aida-dev-xyz-mining-redis-0,aida-dev-xyz-mining-redis-1 ']'
+ IFS=,
+ read -ra pod_list
+ for pod in "${pod_list[@]}"
+ '[' -z '' ']'
redis sentinel component replicas found, register to sentinel.
++ extract_ordinal_from_object_name aida-dev-xyz-mining-redis-0
++ local object_name=aida-dev-xyz-mining-redis-0
++ local ordinal=0
++ echo 0
+ default_initialize_pod_ordinal=0
+ continue
+ for pod in "${pod_list[@]}"
+ '[' -z 0 ']'
++ extract_ordinal_from_object_name aida-dev-xyz-mining-redis-1
++ local object_name=aida-dev-xyz-mining-redis-1
++ local ordinal=1
++ echo 1
+ pod_ordinal=1
+ '[' 1 -lt 0 ']'
+ default_redis_primary_pod_name=aida-dev-xyz-mining-redis-0
+ redis_default_primary_pod_headless_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc
+ init_redis_service_port
+ '[' -n 6379 ']'
+ redis_default_service_port=6379
+ parse_redis_advertised_svc_if_exist aida-dev-xyz-mining-redis-0
+ local pod_name=aida-dev-xyz-mining-redis-0
+ [[ -z '' ]]
+ echo 'Environment variable REDIS_ADVERTISED_PORT not found. Ignoring.'
Environment variable REDIS_ADVERTISED_PORT not found. Ignoring.
+ return 0
+ old_ifs='
'
+ IFS=,
+ set -f
+ read -ra sentinel_pod_list
+ set +f
+ IFS='
'
+ for sentinel_pod in "${sentinel_pod_list[@]}"
+ sentinel_pod_fqdn=aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless
+ '[' -n '' ']'
+ echo 'register to sentinel:aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless with ClusterIP service: redis_default_primary_pod_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc, redis_default_service_port=6379'
register to sentinel:aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless with ClusterIP service: redis_default_primary_pod_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc, redis_default_service_port=6379
+ register_to_sentinel aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless aida-dev-xyz-mining-redis aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc 6379
+ local sentinel_host=aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless
+ local master_name=aida-dev-xyz-mining-redis
+ local sentinel_port=26379
+ local redis_primary_host=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc
+ local redis_primary_port=6379
+ local timeout=600
++ date +%s
+ local start_time=1725899217
+ local current_time
+ set +x
Checking connectivity to aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless on port 26379 using redis-cli...
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless is reachable on port 26379.
Checking connectivity to aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc on port 6379 using redis-cli...
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc is reachable on port 6379.
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
ERR Duplicate master name.
Command failed with status 0 or output not OK.
mark@L-R910LPKW:~$ k get job
NAME                                                   STATUS   COMPLETIONS   DURATION   AGE
kb-post-provision-job-aida-dev-xyz-mining-redis   Failed   0/1           71s        71s
mark@L-R910LPKW:~$
@MarkKharitonov MarkKharitonov added the kind/bug Something isn't working label Sep 9, 2024
@Y-Rookie
Copy link
Collaborator

@MarkKharitonov Thank you for raising this issue.
In the current KubeBlocks Redis, the kb-post-provision-job-xxx is primarily used to register Redis to all Redis Sentinel instances, enabling high availability capabilities for the Redis cluster.
Currently, the implementation of this job is not idempotent. When Redis successfully registers with some Redis Sentinel instances but fails to register with others (due to various reasons such as network connectivity issues or unhealthy instances), the post-provision-job fails and retries (which can also be triggered by deleting the job, as you mentioned).
When the job retries, the Sentinel instances that have already been successfully registered will return the error "ERR Duplicate master name." This is the reason behind the issue you encountered.
We will address this problem in the future by optimizing the Redis registration logic to make it idempotent.
Thank you again for bringing this to our attention.

Copy link

This issue has been marked as stale because it has been open for 30 days with no activity

@github-actions github-actions bot added the Stale label Oct 14, 2024
@Y-Rookie
Copy link
Collaborator

Y-Rookie commented Nov 6, 2024

main branch fixed in: apecloud/kubeblocks-addons#1060

@Y-Rookie Y-Rookie closed this as completed Nov 6, 2024
@github-actions github-actions bot added this to the Release 0.9.2 milestone Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants