You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mark@L-R910LPKW:~$ k get pod
NAME READY STATUS RESTARTS AGE
aida-dev-xyz-mining-redis-0 3/3 Running 0 27h
aida-dev-xyz-mining-redis-1 3/3 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-0 1/1 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-1 1/1 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-2 1/1 Running 0 27h
kb-post-provision-job-aida-dev-xyz-mining-redis-6l9gq 0/1 Error 0 3m
kb-post-provision-job-aida-dev-xyz-mining-redis-7qjhh 0/1 Error 0 3m25s
kb-post-provision-job-aida-dev-xyz-mining-redis-dcxt8 0/1 Error 0 3m41s
mark@L-R910LPKW:~$
To Reproduce
Not sure, but for me it is reproduced very easily - I just need to delete the job to let it be created again and it errors out.
Expected behavior
No errors.
Additional context
I have 5 Redis instances deployed with KB, each with sentinels and each having 2 replicas for the database and 3 for the sentinels. Only one instance exhibits the problematic behavior:
mark@L-R910LPKW:~$ k get job
NAME STATUS COMPLETIONS DURATION AGE
kb-post-provision-job-aida-dev-xyz-mining-redis Failed 0/1 6m53s 6m53s
mark@L-R910LPKW:~$ k delete job --all
job.batch "kb-post-provision-job-aida-dev-xyz-mining-redis" deleted
mark@L-R910LPKW:~$ k get job
NAME STATUS COMPLETIONS DURATION AGE
kb-post-provision-job-aida-dev-xyz-mining-redis Running 0/1 2s 2s
mark@L-R910LPKW:~$ sleep 30
mark@L-R910LPKW:~$ k get job
NAME STATUS COMPLETIONS DURATION AGE
kb-post-provision-job-aida-dev-xyz-mining-redis Running 0/1 41s 41s
mark@L-R910LPKW:~$ k get pod
NAME READY STATUS RESTARTS AGE
aida-dev-xyz-mining-redis-0 3/3 Running 0 27h
aida-dev-xyz-mining-redis-1 3/3 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-0 1/1 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-1 1/1 Running 0 27h
aida-dev-xyz-mining-redis-sentinel-2 1/1 Running 0 27h
kb-post-provision-job-aida-dev-xyz-mining-redis-7dhpq 0/1 Error 0 6s
kb-post-provision-job-aida-dev-xyz-mining-redis-b89f8 0/1 Error 0 47s
kb-post-provision-job-aida-dev-xyz-mining-redis-ffmpj 0/1 Error 0 31s
mark@L-R910LPKW:~$ k logs kb-post-provision-job-aida-dev-xyz-mining-redis-b89f8
+ declare -g default_initialize_pod_ordinal
+ declare -g redis_advertised_svc_host_value
+ declare -g redis_advertised_svc_port_value
+ declare -g headless_postfix=headless
+ declare -g redis_default_service_port=6379
+ echo 'redis sentinel component replicas found, register to sentinel.'
+ register_to_sentinel_wrapper
+ '[' -z aida-dev-xyz-mining-redis-sentinel-0,aida-dev-xyz-mining-redis-sentinel-1,aida-dev-xyz-mining-redis-sentinel-2 ']'
+ '[' -z aida-dev-xyz-mining-redis-sentinel-headless ']'
+ get_minimum_initialize_pod_ordinal
+ '[' -z aida-dev-xyz-mining-redis-0,aida-dev-xyz-mining-redis-1 ']'
+ IFS=,
+ read -ra pod_list
+ for pod in "${pod_list[@]}"
+ '[' -z '' ']'
redis sentinel component replicas found, register to sentinel.
++ extract_ordinal_from_object_name aida-dev-xyz-mining-redis-0
++ local object_name=aida-dev-xyz-mining-redis-0
++ local ordinal=0
++ echo 0
+ default_initialize_pod_ordinal=0
+ continue
+ for pod in "${pod_list[@]}"
+ '[' -z 0 ']'
++ extract_ordinal_from_object_name aida-dev-xyz-mining-redis-1
++ local object_name=aida-dev-xyz-mining-redis-1
++ local ordinal=1
++ echo 1
+ pod_ordinal=1
+ '[' 1 -lt 0 ']'
+ default_redis_primary_pod_name=aida-dev-xyz-mining-redis-0
+ redis_default_primary_pod_headless_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc
+ init_redis_service_port
+ '[' -n 6379 ']'
+ redis_default_service_port=6379
+ parse_redis_advertised_svc_if_exist aida-dev-xyz-mining-redis-0
+ local pod_name=aida-dev-xyz-mining-redis-0
+ [[ -z '' ]]
+ echo 'Environment variable REDIS_ADVERTISED_PORT not found. Ignoring.'
Environment variable REDIS_ADVERTISED_PORT not found. Ignoring.
+ return 0
+ old_ifs='
'
+ IFS=,
+ set -f
+ read -ra sentinel_pod_list
+ set +f
+ IFS='
'
+ for sentinel_pod in "${sentinel_pod_list[@]}"
+ sentinel_pod_fqdn=aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless
+ '[' -n '' ']'
+ echo 'register to sentinel:aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless with ClusterIP service: redis_default_primary_pod_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc, redis_default_service_port=6379'
register to sentinel:aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless with ClusterIP service: redis_default_primary_pod_fqdn=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc, redis_default_service_port=6379
+ register_to_sentinel aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless aida-dev-xyz-mining-redis aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc 6379
+ local sentinel_host=aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless
+ local master_name=aida-dev-xyz-mining-redis
+ local sentinel_port=26379
+ local redis_primary_host=aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc
+ local redis_primary_port=6379
+ local timeout=600
++ date +%s
+ local start_time=1725899217
+ local current_time
+ set +x
Checking connectivity to aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless on port 26379 using redis-cli...
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
aida-dev-xyz-mining-redis-sentinel-0.aida-dev-xyz-mining-redis-sentinel-headless is reachable on port 26379.
Checking connectivity to aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc on port 6379 using redis-cli...
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
aida-dev-xyz-mining-redis-0.aida-dev-xyz-mining-redis-headless.system-d-redis-aida-dev-xyz-mining.svc is reachable on port 6379.
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
ERR Duplicate master name.
Command failed with status 0 or output not OK.
mark@L-R910LPKW:~$ k get job
NAME STATUS COMPLETIONS DURATION AGE
kb-post-provision-job-aida-dev-xyz-mining-redis Failed 0/1 71s 71s
mark@L-R910LPKW:~$
The text was updated successfully, but these errors were encountered:
@MarkKharitonov Thank you for raising this issue.
In the current KubeBlocks Redis, the kb-post-provision-job-xxx is primarily used to register Redis to all Redis Sentinel instances, enabling high availability capabilities for the Redis cluster.
Currently, the implementation of this job is not idempotent. When Redis successfully registers with some Redis Sentinel instances but fails to register with others (due to various reasons such as network connectivity issues or unhealthy instances), the post-provision-job fails and retries (which can also be triggered by deleting the job, as you mentioned).
When the job retries, the Sentinel instances that have already been successfully registered will return the error "ERR Duplicate master name." This is the reason behind the issue you encountered.
We will address this problem in the future by optimizing the Redis registration logic to make it idempotent.
Thank you again for bringing this to our attention.
Describe the bug
To Reproduce
Not sure, but for me it is reproduced very easily - I just need to delete the job to let it be created again and it errors out.
Expected behavior
No errors.
Additional context
I have 5 Redis instances deployed with KB, each with sentinels and each having 2 replicas for the database and 3 for the sentinels. Only one instance exhibits the problematic behavior:
The text was updated successfully, but these errors were encountered: