Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2704: Explicitly allow alternative origins in MXC URIs and specify deduplication requirements on uploads #2704

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions proposals/2704-mxc-duplication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# MSC2704: Handling duplicate media on `/upload` + clarifying the origin of an MXC URI

Currently some servers will de-duplicate media in an unpredictable way whereas others will not.
Further, some implementations have the capability to return a potentially unexpected origin for
their MXC URIs. This proposal aims to acknowledge the status quo by specifying it explicitly.

## Proposal

MXC URIs can have an origin which does not match the server name on `/upload`. This is currently
implied as potentially being possible under the specification, however this MSC aims to make that
behaviour to be valid and expected by clients. This means, for example, that `@alice:example.org`
could receive an MXC URI pointing to `mxc://cdn.upstream.com/abc123`. No changes are implied by the
origin: it is to be looked up like any other domain name, just as it does today.

Servers SHOULD NOT attempt to "deduplicate" media by returning the same MXC URI for previously
uploaded content, unless the upload meets requirements outlined below. Uploads are often accompanied
by a single reference in an event, and in a world where it is possible to delete media by event ID
it is important to be able to delete a specific record without side effects. How the implementation
handles this internally is up to it - it just cannot return the same MXC URI for what appears to
be the same content.

If the server wants to support deduplication, it should only do so when the media (body), uploader,
origin homeserver, and provided filename all match. This scenario could be perceived as a missed
request on the client side and therefore could be a retry.

## Potential issues

Enforcing that media cannot be deduplicated at the MXC URI level could lead to media ID exhaustion
on the server side, however by explicitly allowing the server to return a different origin for the
URI the pool of potential IDs is unbounded.

By explicitly allowing the server to return a `content_uri` which does not match their server name
the server could potentially imply that media was uploaded to a different server. For example, a user
wishing to upload to `example.com` could be told that their media got uploaded to the public `matrix.org`
homeserver instead. This is perceived by the proposal as a bad idea and needs no enforcement to prevent,
as unless the server managed to gain access to `matrix.org` the media will safely 404.

Implementations may have already deduplicated media such that one MXC URI does not reference one event,
however the intent is to fix the problem going forward and less so resolve the past. Some clients also
have "Forward" features which do not re-upload media, which would cause multiple events to reference
the same media.

## Alternatives

We could not handle deduplication at the spec level, however this leaves implementations open to issues
down the line when we do support deleting/erasing media.

We could also not allow the returned `content_uri` to reference another server. The use case for allowing
this specific behaviour is to allow media to be hosted by a dedicated CDN-like service instead of forcing
all traffic through the homeserver.

## Security considerations

Some considerations are mentioned in the Potential Issues section.

Though not mentioned in the specification, servers can already lie about the MXC URI being returned,
such as always returning a reference to the same image regardless of what was uploaded. This is not
solved by this proposal, and generally not perceived as a legitimate threat currently.

## Unstable prefix

No unstable prefixes are required for this MSC.