Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3886: Simple client rendezvous capability #3886

Closed
wants to merge 33 commits into from
Closed
Changes from 24 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
66cc232
Simple rendezvous capability
hughns Sep 7, 2022
e15fda0
MSC3886: Simple rendezvous capability
hughns Sep 7, 2022
0350bf8
Note that it is client rendezvous rather than anything else
hughns Sep 7, 2022
fcc3270
Describe protocol + more notes
hughns Sep 7, 2022
60f8115
Use HTTP response headers only instead of JSON for POST /
hughns Sep 7, 2022
5a9c3c7
Fix POST response code
hughns Sep 7, 2022
fac41d1
Add extra emphasis to relative nature of Location header
hughns Sep 9, 2022
cbffa67
Use rendezvous URI not IDs
hughns Sep 9, 2022
c67a3d6
Update CORS with more headers
hughns Sep 9, 2022
1ec9ce2
Sort HTTP response codes
hughns Sep 9, 2022
8a0d559
Add requirement for content-length header on POST and PUT
hughns Sep 9, 2022
cacae4e
fix typo & clarification
ara4n Sep 11, 2022
94ef9dd
Add link to current CORS headers
hughns Sep 30, 2022
953c4ee
Add 307 response for POST
hughns Oct 3, 2022
ff9a373
Add unstable prefixes
hughns Oct 3, 2022
6937a86
MSCyyyy => MSC3906
hughns Oct 13, 2022
4ab59f8
Apply suggestions from code review
hughns Mar 15, 2023
5db0af6
Add missing arrows to diagram
hughns Mar 15, 2023
8a1af85
Clarify that CORS changes are not global
hughns Mar 15, 2023
97f1709
Authentication + 30x clarifications
hughns Mar 15, 2023
b5c6c7a
Import section from MSC3903 about to-device messaging
hughns Mar 15, 2023
931cf07
Create a rendezvous point via `/_matrix/client`
Nov 13, 2023
aee7d81
Define Matrix error codes
Nov 13, 2023
90a8b49
Define 400 response for missing required headers
Nov 13, 2023
dcbbcb0
Emphasise that the this is an untrusted channel
Nov 13, 2023
74d9094
Refer to MSC3903
Nov 13, 2023
4c63493
Add internal hyperlink
Nov 13, 2023
58f1e86
Require `PUT` to supply an `If-Match`
Nov 13, 2023
65d697c
High-level description
Nov 13, 2023
43d9871
Forbid complicated etags
Nov 13, 2023
3fecfcd
Rename M_DIRTY_WRITE -> M_CONCURRENT_WRITE
Nov 13, 2023
82fcb44
Explictly reject complicated etags
Nov 13, 2023
a08d14b
Update proposals/3886-simple-rendezvous-capability.md
hughns Feb 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
302 changes: 302 additions & 0 deletions proposals/3886-simple-rendezvous-capability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
# MSC3886: Simple client rendezvous capability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, this is a really simple and elegant standalone function. On the other hand, I'm a bit worried that it duplicates the semantics of to-device API (i.e. basic store & forward between devices), albeit with short-polling rather than long-polling.

I wonder how bad it would be if we opened up to-device messages to guests, and used the existing APIs for rendezvous? So a new device would go and /login as a guest to get a temporary access token, and then publish its device ID & HS url in its QR code to let another device rendezvous with it.

My only reason for proposing this is to avoid having two store-and-forward APIs which look suspiciously similar, but have different semantics (short/long poll), and so require more code for client implementors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'll work up an alternative based on to-device messages and see how that feels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started some discussion on the to-device based alternative as part of #3903

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how bad it would be if we opened up to-device messages to guests, and used the existing APIs for rendezvous? So a new device would go and /login as a guest to get a temporary access token, and then publish its device ID & HS url in its QR code to let another device rendezvous with it.

ugh, the complexity of this feels horrible to me.

My only reason for proposing this is to avoid having two store-and-forward APIs which look suspiciously similar, but have different semantics (short/long poll), and so require more code for client implementors.

Sure, having two store-and-forward APIs is rather less than ideal, but this one is so simple and easy to use that I don't really buy that it's a meaningful amount of extra code for clients comparing to have to grab a temporary access token and then start /syncing.

For me, the simplicity of this proposal outweighs the fact it looks a bit like to-device messaging. (Or even matrix rooms, if you squint hard enough and invent "ephemeral rooms".)

The only thing I'd say here is that it would be good if the "Alternatives" section in this MSC said something about this idea (even if it's just a link to MSC3903's alternatives section).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was broadly coming to a similar conclusion to Rich. Adding guest access to to-device feels about as complex as this separate impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've move the section from MSC3903 alternatives section into this proposal as there is much feedback here than on MSC3903 itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above, it appears we've settled on using a new channel rather than exposing to-device to guests. @matrix-org/spec-core-team if you disagree then please raise comments :)


In [MSC3906](https://github.com/matrix-org/matrix-spec-proposals/pull/3906) a proposal is made to allow a user to login on a new device using an existing device by means of scanning a
QR code.

In order to facilitate this the two devices need some bi-directional communication channel which they can use to exchange
information such as:

- the homeserver being used
- the user ID
- facilitation of issuing a new access token
- device ID for end-to-end encryption
- device keys for end-to-end encryption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be sending (private) device keys over the wire like this. They should be generated by the new device, which may be the device ID given, but not transmitted over the wire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dcbbcb0 and 74d9094 clarify that this is an untrusted communications channel.


To enable [MSC3906](https://github.com/matrix-org/matrix-spec-proposals/pull/3906) and support any future proposals this MSC proposes a simple HTTP based protocol that can be used to
establish a direct communication channel between two IP connected devices.

It will work with devices behind NAT. It doesn't require homeserver administrators to deploy a separate server.

## Proposal

It is proposed that a general purpose HTTP based protocol be used to establish ephemeral bi-directional communication
channels over which arbitrary data can be exchanged.

A typical flow might look like this where device A is initiating the rendezvous with device B:

```mermaid

sequenceDiagram
participant A as Device A
participant R as Rendezvous Server
participant B as Device B
Note over A: Device A determines which rendezvous server to use

A->>+R: POST /rendezvous Hello from A
R->>-A: 201 Created Location: /abc-def-123-456

A-->>B: Rendezvous URI between clients, perhaps as QR code: e.g. https://rendzvous-server/abc-def-123-456

Note over A: Device A starts polling for contact at the rendezvous

B->>+R: GET <rendezvous URI>
R->>-B: 200 OK Hello from A

loop Device A polls for rendezvous updates
A->>+R: GET <rendezvous URI> If-None-Match: <ETag>
R->>-A: 304 Not Modified
end

B->>+R: PUT <rendezvous URI> Hello from B
R->>-B: 202 Accepted

Note over A,B: Rendezvous now established
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this is that A and B take turns writing to the same rendezvous URI until they're done. So when it's B's turn to write, A keeps polling (using the ETag) until the server says the data has changed, and vice versa.

What happens if B tries to write, but gets some sort of network error, or an error from a proxy? If the server got B's data, but B received a network error, then it seems to me what could happen is:

  • A receives B's data, thinks it's now their turn to send data, so sends their data and gets a new ETag
  • B retries the request, overwriting A's data (and never receiving it)
  • A polls for new data, using the new ETag
  • since B overwrote A's data, the data doesn't match the ETag, so A gets the data B sent, again

So B will miss a message from A, and A will get a duplicate message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could mitigate against this by using a RFC7232 If-Match on the PUT requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM that every PUT should be required to cite a previous ETag so that the rendezvous server can enforce a linear ordering. (The initial ETag is included in the POST and GET response, so both A and B should be fully aware of it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

58f1e86 does this.


Please note that it is intentional that this protocol does nothing to ensure the integrity of the data exchanged at a rendezvous.

### Protocol
hughns marked this conversation as resolved.
Show resolved Hide resolved

#### Create a new rendezvous point: `POST /_matrix/client/rendezvous`

HTTP request headers:

- `Content-Length` - required
- `Content-Type` - optional, server should assume `application/octet-stream` if not specified

HTTP request body:

- any data up to maximum size allowed by the server

HTTP response codes, and Matrix error codes:

- `201 Created` - rendezvous created
- `400 Bad Request` (`M_MISSING_PARAM`) - no `Content-Length` was provided.
- `403 Forbidden` (`M_FORBIDDEN`) - forbidden by server policy
- `413 Payload Too Large` (`M_TOO_LARGE`) - the supplied payload is too large
- `429 Too Many Requests` (`M_UNKNOWN`) - the request has been rate limited
- `307 Temporary Redirect` - if the request should be served from somewhere else specified in the `Location` response header
hughns marked this conversation as resolved.
Show resolved Hide resolved

n.b. the relatively unusual [`307 Temporary Redirect`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/307) response
code has been chosen explicitly for the behaviour of ensuring that the method and body will not change whilst the user-agent
follows the redirect. For this reason, no other `30x` response codes are allowed.

HTTP response headers for `201 Created`:
hughns marked this conversation as resolved.
Show resolved Hide resolved

- `Location` - required, the allocated rendezvous URI which can be on a different server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this needs URI needs to be not guessable, to prevent attackers from guessing this and impersonating the intended recipient?

- `X-Max-Bytes` - required, the maximum allowed bytes for the payload
Comment on lines +133 to +134
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these headers and not response body parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection I would agree that if it is going to be part of the C-S API then it would make sense to consider consistency with the rest of the C-S API where headers are not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned with just these two headers? Or do we want all of the response and request headers to be expressed via HTTP bodies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response body for GET ought to be just the payload itself. Since the payload data is an arbitrary byte sequence, it would be painful to embed this in JSON. Therefore I would encourage the current GET response headers (Content-Type, ETag, Expires, Last-Modified) to continue to be expressed via headers. For consistency it makes sense to do so in all other resposnes.

For POST this leaves the two highlighted headers: Location and X-Max-Bytes. We could present them as a JSON-encoded body, but it would seem odd to spread the POST response metadata in two places without any meaningful distinction to justify it. My vote would be to leave things as they are. But I neither have a vote, nor any strong opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response body for GET ought to be just the payload itself. Since the payload data is an arbitrary byte sequence, it would be painful to embed this in JSON.

We could base64 encode the payload, but I tend to agree. Not all things need to be shoehorned through JSON.

- `ETag` - required, ETag for the current payload at the rendezvous point as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.etag)
- `Expires` - required, the expiry time of the rendezvous as per [RFC7234](https://httpwg.org/specs/rfc7234.html#header.expires)
- `Last-Modified` - required, the last modified date of the payload as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.last-modified)

Example response headers:

```http
Location: /abcdEFG12345
X-Max-Bytes: 10240
ETag: VmbxF13QDusTgOCt8aoa0d2PQcnBOXeIxEqhw5aQ03o=
Expires: Wed, 07 Sep 2022 14:28:51 GMT
Last-Modified: Wed, 07 Sep 2022 14:27:51 GMT
```

#### Update payload at rendezvous point: `PUT <rendezvous URI>`

HTTP request headers:

- `Content-Length` - required
- `Content-Type` - optional, server should assume `application/octet-stream` if not specified
- `If-Match` - optional, as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.if-match) server will assume `*`
if not specified

HTTP request body:

- any data up to maximum size allowed by the server

HTTP response codes, and Matrix error codes:

- `202 Accepted` - payload updated
- `400 Bad Request` (`M_MISSING_PARAM`) - no `Content-Length` was provided.
- `404 Not Found` (`M_NOT_FOUND`) - rendezvous URI is not valid (it could have expired)
- `412 Precondition Failed` (`M_DIRTY_WRITE`, **a new error code**) - when `If-Match` is supplied and the ETag does not match
- `413 Payload Too Large` (`M_TOO_LARGE`) - the supplied payload is too large
- `429 Too Many Requests` (`M_UNKNOWN`) - the request has been rate limited

HTTP response headers for `202 Accepted` and `412 Precondition Failed`:

- `ETag` - required, ETag for the current payload at the rendezvous point as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.etag)
- `Expires` - required, the expiry time of the rendezvous as per [RFC7233](https://httpwg.org/specs/rfc7234.html#header.expires)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that the expiry time is incremented every time the rendezvous payload is updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have assumed so in 65d697c

- `Last-Modified` - required, the last modified date of the payload as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.last-modified)

#### Get payload from rendezvous point: `GET <rendezvous URI>`

HTTP request headers:

- `If-None-Match` - optional, as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.if-none-match) server will
only return data if given ETag does not match
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice for servers to have the option to delay responding until it gets content that doesn't match the ETag, so we can do long-polling.


HTTP response codes, and Matrix error codes:

- `200 OK` - payload returned
- `304 Not Modified` - when `If-None-Match` is supplied and the ETag does not match
- `404 Not Found` (`M_NOT_FOUND`) - rendezvous URI is not valid (it could have expired)
- `429 Too Many Requests` (`M_UNKNOWN`)- the request has been rate limited

HTTP response headers for `200 OK` and `304 Not Modified`:

- `ETag` - required, ETag for the current payload at the rendezvous point as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.etag)
- `Expires` - required, the expiry time of the rendezvous as per [RFC7233](https://httpwg.org/specs/rfc7234.html#header.expires)
- `Last-Modified` - required, the last modified date of the payload as per [RFC7232](https://httpwg.org/specs/rfc7232.html#header.last-modified)

- `Content-Type` - required for `200 OK`

#### Cancel a rendezvous: `DELETE <rendezvous URI>`

HTTP response codes:

- `204 No Content` - rendezvous cancelled
- `404 Not Found` (`M_NOT_FOUND`) - rendezvous URI is not valid (it could have expired)
- `429 Too Many Requests` (`M_UNKNOWN`)- the request has been rate limited

### Authentication

These API endpoints do not require authentication. This is because the protocol is explicitly treated as untrusted,
with trust established at a higher level outside the scope of the present proposal.

### Maximum payload size

The server should enforce a maximum payload size for the payload size. It is recommended that this be no less than 10KB.

### Maximum duration of a rendezvous

The rendezvous only needs to persist for the duration of the handshake. So a timeout such as 30 seconds is adequate.

Clients should handle the case of the rendezvous being cancelled or timed out by the server.

### ETags

The ETag generated should be unique to the rendezvous point and the last modified time so that two clients can
distinguish between identical payloads sent by either client.

### CORS

To support usage from web browsers, the server should allow CORS requests to the `/rendezvous` endpoint from any
origin and expose the `ETag`, `Location` and `X-Max-Bytes` headers as:

```http
Access-Control-Allow-Headers: Content-Type,If-Match,If-None-Match
Access-Control-Allow-Methods: GET,PUT,POST,DELETE
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: ETag,Location,X-Max-Bytes
```

Currently the [spec](https://spec.matrix.org/v1.4/client-server-api/#web-browser-clients) specifies a single set of
CORS headers to be used. Therefore, care will be required to make it clear in the spec that the headers will
vary depending on the endpoint.

### Choice of server

Ultimately it will be up to the Matrix client implementation to decide which rendezvous server to use.

However, it is suggested that the following logic is used by the device/client to choose the rendezvous server in order
of preference:

1. If the client is already logged in: try and use current homeserver.
1. If the client is not logged in and it is known which homeserver the user wants to connect to: try and use that homeserver.
1. Otherwise use a default server.

## Potential issues

Because this is an entirely new set of functionality it should not cause issue with any existing Matrix functions or capabilities.

The proposed protocol requires the devices to have IP connectivity to the server which might not be the case in P2P scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential issue here is that if A sends a message to B, then waits for a message from B using the ETag, but the message that B sends to A happens to be exactly the same as the message that A sent, then A will get the 304 Not Modified response, and never realize that B sent a message. So anything built on top of this needs to ensure that a message is never identical to the preceding message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a problem? From the current text, it sounds like 304 Not Modified will only be returned when a matching ETag is supplied in a If-None-Match. Given the resolution of this thread, clients will have to supply a previous ETag when doing a PUT, which means we no longer have to rely on the sameness of the content to decide whether the content has been modified. That is, a PUT request that specifies the previous ETag A should be regarded as altering the payload at A, even if the payload is unchanged, and therefore, it should be assigned a new ETag.

## Alternatives

### Send-to-Device messaging

The combination of this proposal and [MSC3903](https://github.com/matrix-org/matrix-spec-proposals/pull/3903) look similar in
some regards to the existing [Send-to-device messaging](https://spec.matrix.org/v1.6/client-server-api/#send-to-device-messaging)
capability.

Whilst to-device messaging already provides a mechanism for secure communication
between two Matrix clients/devices, a key consideration for the anticipated
login with QR capability is that one of the clients is not yet authenticated with
a Homeserver.

Furthermore the client might not know which Homeserver the user wishes to
connect to.

Conceptually, one could create a new type of "guest" login that would allow the
unauthenticated client to connect to a Homeserver for the purposes of
communicating with an existing authenticated client via to-device messages.

Some considerations for this:

Where the "actual" Homeserver is not known then the "guest" Homeserver nominated
by the new client would need to be federated with the "actual" Homeserver.

The "guest" Homeserver would probably want to automatically clean up the "guest"
accounts after a short period of time.

The "actual" Homeserver operator might not want to open up full "guest" access
so a second type of "guest" account might be required.

Does the new device/client need to accept the T&Cs of the "guest" Homeserver?

### Other existing protocols

Try and do something with STUN or TURN or [COAP](http://coap.technology/).

### Implementation details

Rather than requiring the devices to poll for updates, "long-polling" could be used instead similar to `/sync`.

## Security considerations

### Confidentiality of data

Whilst the data transmitted can be encrypted in transit via HTTP/TLS the rendezvous server does have visibility over the
data and can also perform man in the middle attacks.

As such, for the purposes of authentication and end-to-end encryption the channel should be treated as untrusted and some
form of secure layer should be used on top of the channel such as a Diffie-Hellman key exchange.

### Denial of Service attack surface

Because the protocol allows for the creation of arbitrary channels and storage of arbitrary data, it is possible to use
it as a denial of service attack surface.

As such, the following standard mitigations such as the following may be deemed appropriate by homeserver implementations
and administrators:

- rate limiting of requests
- imposing a low maximum payload size (e.g. kilobytes not megabytes)
- limiting the number of concurrent channels

## Unstable prefix

While this feature is in development the new endpoint should be exposed using the following unstable prefix:

- `/_matrix/client/unstable/org.matrix.msc3886/rendezvous`

Additionally, the feature is to be advertised as unstable feature in the `GET /_matrix/client/versions`
response, with the key `org.matrix.msc3886` set to `true`. So, the response could look then as
following:

```json
{
"versions": ["r0.6.0"],
"unstable_features": {
"org.matrix.msc3886": true
}
}
```

## Dependencies

None, although it's intended to be used with [MSC3906](https://github.com/matrix-org/matrix-spec-proposals/pull/3906).

## Credits

This proposal was influenced by https://wiki.mozilla.org/Services/KeyExchange which also has some helpful discussion
around DoS mitigation.
Loading