Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified sliding sync returns negative values for some field where a positive one is expected #17737

Closed
jmartinesp opened this issue Sep 20, 2024 · 10 comments · Fixed by #17748

Comments

@jmartinesp
Copy link
Member

Description

We received a report from a user of Element X Android (@spaetz in here, if I'm not mistaken) where he couldn't get the app to work with SSS at all with one account in his homeserver but it worked fine with different one in the same HS, the cause seemed to be the client was cancelling each /sync request before the HS finished sending it (lots of 200! status codes for /sync requests).

After some debugging, we found these errors in the logs of the SDK:

2024-09-20T16:01:09.229399Z DEBUG matrix_sdk::http_client: Error while sending request: Api(Deserialization(Json(Error("invalid value: integer `-67914`, expected u64", line: 1, column: 47547)))) | crates/matrix-sdk/src/http_client/mod.rs:228 | spans: sync_once > send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 60s } request_id="REQ-4" method=POST uri="https://matrix.sspaeth.de/_matrix/client/unstable/org.matrix.simplified_msc3575/sync" request_size="476 B" status=200 response_size="100.5 kiB"}
2024-09-20T16:01:09.229622Z ERROR matrix_sdk_ui::sync_service: Error while processing room list in sync service: SlidingSync failed: deserialization failed: invalid value: integer `-67914`, expected u64 at line 1 column 47547 | crates/matrix-sdk-ui/src/sync_service.rs:283

We can deduce some field has an invalid format, where some positive number was expected but we receive a negative one, and in the same logs we can find the same error happening at different indexes inside the /sync response, so it's a value that's repeated several times in the responses:

2024-09-20T16:01:09.772705Z DEBUG matrix_sdk::http_client: Error while sending request: Api(Deserialization(Json(Error("invalid value: integer `-67914`, expected u64", line: 1, column: 44112)))) | crates/matrix-sdk/src/http_client/mod.rs:228 | spans: sync_once > send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 60s } request_id="REQ-11" method=POST uri="https://matrix.sspaeth.de/_matrix/client/unstable/org.matrix.simplified_msc3575/sync" request_size="476 B" status=200 response_size="100.5 kiB"}
2024-09-20T16:01:09.772976Z ERROR matrix_sdk_ui::sync_service: Error while processing room list in sync service: SlidingSync failed: deserialization failed: invalid value: integer `-67914`, expected u64 at line 1 column 44112 | crates/matrix-sdk-ui/src/sync_service.rs:283
...
2024-09-20T16:01:10.332448Z DEBUG matrix_sdk::http_client: Error while sending request: Api(Deserialization(Json(Error("invalid value: integer `-67914`, expected u64", line: 1, column: 42678)))) | crates/matrix-sdk/src/http_client/mod.rs:228 | spans: sync_once > send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 60s } request_id="REQ-14" method=POST uri="https://matrix.sspaeth.de/_matrix/client/unstable/org.matrix.simplified_msc3575/sync" request_size="476 B" status=200 response_size="100.5 kiB"}
2024-09-20T16:01:10.332808Z ERROR matrix_sdk_ui::sync_service: Error while processing room list in sync service: SlidingSync failed: deserialization failed: invalid value: integer `-67914`, expected u64 at line 1 column 42678 | crates/matrix-sdk-ui/src/sync_service.rs:283

Sadly, we don't have the response contents to check which field is causing this issue.

Steps to reproduce

  • Log into an affected account in a client using the Rust SDK.
  • Every /sync request will fail.

Homeserver

https://sspaeth.de/

Synapse Version

0.115.0

Installation Method

I don't know

Database

Workers

I don't know

Platform

Debian

Configuration

No response

Relevant log output

-

Anything else that would be useful to know?

No response

@spaetz
Copy link

spaetz commented Sep 21, 2024

Hi there, let me know if I can do anything to support debugging this. If you need access to the actual content being returned (or to the raw database), give me a shoutout. Happy to give you an access token or postgres access. Thanks in advance.

@spaetz
Copy link

spaetz commented Sep 21, 2024

Been randomly poking in some slidingsync tables:

select event_stream_ordering from sliding_sync_membership_snapshots WHERE event_stream_ordering<=0;
event_stream_ordering
-98673
-67914
-44846
-38983
-14132

Are there supposed to be negative values in here, for instance? (I have 1016 positive rows)

And just in case they are not supposed to be negative (and that is where I will stop poking in the dark). After truncating all sliding_sync_* tables, this table is being recreated with those same negative numbers by synapse 1.115, so it is not some historical garbage.
(I know that the tables are currently repopiulated when empty, which will not work in future synapse versions).

@spaetz
Copy link

spaetz commented Sep 21, 2024

Err, the negative event_stream_ordering value of -67914 in my db corresponds to the wrong value as captured by the SDK log, so there seems to be an impedance mismatch on what is sent and what is expected.

@Hywan
Copy link
Member

Hywan commented Sep 21, 2024

@spaetz Side note: thanks for helping and for your patience!

@spaetz
Copy link

spaetz commented Sep 21, 2024

You are welcome, it is in my own interest. BTW, great talk of yours today. You hiked all those peaks yourself? :-)

@clokep
Copy link
Contributor

clokep commented Sep 21, 2024

Negative stream ordering corresponds to backfilled events IIRC.

@erikjohnston
Copy link
Member

This is odd. As clokep says, negative stream ordering happens when we backfill events. However, these stream orderings are for the membership events of local users, so how did they end up being backfilled?!

@spaetz did you by any chance delete the room? That might explain why local users membership events are being backfilled.

Can you also run:

SELECT membership, sender == user_id, event_stream_ordering FROM sliding_sync_membership_snapshots
WHERE event_stream_ordering<=0;

@spaetz
Copy link

spaetz commented Sep 23, 2024

SELECT membership, sender=user_id, event_stream_ordering FROM sliding_sync_membership_snapshots
WHERE event_stream_ordering<=0;

 membership | ?column? | event_stream_ordering 
------------+----------+-----------------------
 leave      | t        |                -38983
 leave      | t        |                -14132
 leave      | t        |                -98673
 leave      | t        |                -44846
 join       | t        |                -67914

(5 Zeilen)

The last row is room #community-moderation-effort-bl:neko.dev (that I am actually still a member of). But I have powerlevel 0 (not allowed to write in the room).

I probably joined and left and rejoined that room in the history of the user, but I certainly have joined the room using this very synapse server.

@erikjohnston
Copy link
Member

Thanks @spaetz, that was useful.

Hopefully fixed by: #17748

@spaetz
Copy link

spaetz commented Sep 25, 2024

Just to confirm that 1.116rc1 has fixed the issue and SSS syncs now just fine with EXA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants