Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Undefined behavior for ZSCAN using readonly cluster client. #222

Open
theanti9 opened this issue Sep 30, 2017 · 7 comments
Open

Undefined behavior for ZSCAN using readonly cluster client. #222

theanti9 opened this issue Sep 30, 2017 · 7 comments
Labels
3.0.0 All issues that will be looked at for 3.0.0 release Accepted Accepted bug or enhancement Accepting PR type: bug

Comments

@theanti9
Copy link

While using a ClusterReadOnlyConnectionPool, the zscan and zscan_iter will not successfully provide the scan guarantees. I believe this will apply to sscan and hscan as well, though I've not tested.

The issue arises from the overridden get_node_by_slot on ClusterReadOnlyConnectionPool. This will randomly choose between any master or slave node that has the slot for the key. So as subsequent scan commands are issued, it will issue them to random nodes. The cursor values Redis returns are stateless, they are effectively offsets, however, they're not guaranteed to be consistent between master and slave.

An example I used to debug this was on a sorted set that currently contains 164 values, but has fluctuated significantly over its lifetime. If I issue an initial ZSCAN request to the master that owns it, I get the following response:

Command: ZSCAN mysortedset 0

1) "136"
2)  1) "2836834"
    2) "0.4802068521853653"
    3) "3599906"
    4) "0.4656334842469258"
    5) "3490931"
    6) "0.22173393426291121"
    7) "82109"
    8) "0.48914307544693797"
    9) "1244405"
   10) "0.53974599172088655"
   11) "2199081"
   12) "0.34818095160929963"
   13) "3967992"
   14) "0.49706414372896185"
   15) "1390822"
   16) "0.20662256529819331"
   17) "540680"
   18) "0.53718780580831582"
   19) "2840317"
   20) "0.20240259687939222"
   21) "812937"
   22) "0.16832229396956788"
   23) "2181749"
   24) "0.23085035776582155"

Note: cursor value of 136.

If I run this same command on the slave node that has this slot, I get the following results.

1) "240"
2)  1) "1078146"
    2) "0.19975230285488776"
    3) "3788365"
    4) "0.59107889142186887"
    5) "3195524"
    6) "0.30029325316059524"
    7) "1325801"
    8) "0.42741925550104209"
    9) "769388"
   10) "0.19214136348401703"
   11) "3718988"
   12) "0.22575183419216338"
   13) "3580511"
   14) "0.3962135436706839"
   15) "380687"
   16) "0.3458031319795174"
   17) "1999627"
   18) "0.60407053063340199"
   19) "1274471"
   20) "0.37309465665899166"

Note: Different values, fewer values, and different cursor offset.

While the underlying set doesn't change, reissuing these commands on the same server does provide consistent results, but switching between servers means that you end up with an arbitrary subset of what's actually in the sorted set. This can be demonstrated through the python library with the following (called on the same set as above with no underlying changes in between)

Using the readonly cluster client, called in quick succession:

In [41]: len(set([i for i, _ in readonly_cluster.zscan_iter("mysortedset")]))
Out[41]: 120

In [42]: len(set([i for i, _ in readonly_cluster.zscan_iter("mysortedset")]))
Out[42]: 124

In [43]: len(set([i for i, _ in readonly_cluster.zscan_iter("mysortedset")]))
Out[43]: 122

In [44]: len(set([i for i, _ in readonly_client.zscan_iter("mysortedset")]))
Out[44]: 125

Using a non-readonly cluster client, called in quick succession

In [45]: len(set([i for i, _ in cluster.zscan_iter("mysortedset")]))
Out[45]: 164

In [46]: len(set([i for i, _ in cluster.zscan_iter("mysortedset")]))
Out[46]: 164

In [47]: len(set([i for i, _ in cluster.zscan_iter("mysortedset")]))
Out[47]: 164
@Grokzen
Copy link
Owner

Grokzen commented Nov 1, 2017

@theanti9 Interesting edge case that you found. I get that the error happens and i get how/why but i have to think/tinker some in order how to solve this issue.

@theanti9
Copy link
Author

theanti9 commented Nov 1, 2017

There are several other commands (at least per the documentation) that are always directed at the appropriate master node. Maybe the *SCAN methods should also use that behavior. There can only be one master per shard, so it should fix the issue. It's not the most ideal, but either that or there'd have to be some sort of "sticky session"-like mechanism for iterative commands, which is also more failure prone and higher overhead.

@Grokzen
Copy link
Owner

Grokzen commented Mar 6, 2018

@theanti9 I am working on a fix, it should not be that difficult to solve, i just have to identify all methods that require this change and patch them inside the ReadonlyConnectionPool. Will update when i am done with it.

@theanti9
Copy link
Author

theanti9 commented Mar 6, 2018

Awesome, thanks!

@Grokzen
Copy link
Owner

Grokzen commented Mar 13, 2018

@theanti9 Please try the new commit ce9f3d9 and see if it solves your problem. If solved then please close the issue.

@Grokzen
Copy link
Owner

Grokzen commented Mar 14, 2018

Ignore that commit, it was broken af... I am working on a new solution.

@Grokzen Grokzen added Accepted Accepted bug or enhancement type: bug labels Jan 8, 2019
@Grokzen
Copy link
Owner

Grokzen commented Sep 20, 2020

Just a note, all xSCAN style methods is either broken or do not work properly in both normal cases and in failover cases. They need another pass to sort them out and to make them work across multiple nodes in the cluster and to work with the cursor mechanism. I will put this into the 3.0.0 backlog to take another big look at all xSCAN methods and sort them out.

@Grokzen Grokzen added 3.0.0 All issues that will be looked at for 3.0.0 release Accepting PR labels Sep 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
3.0.0 All issues that will be looked at for 3.0.0 release Accepted Accepted bug or enhancement Accepting PR type: bug
Projects
None yet
Development

No branches or pull requests

2 participants