Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for authenticated media freeze #17433

Merged
merged 8 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/17433.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Prepare for authenticated media freeze.
5 changes: 5 additions & 0 deletions synapse/config/repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,11 @@ def read_config(self, config: JsonDict, **kwargs: Any) -> None:
remote_media_lifetime
)

self.authenticate_new_media = config.get("authenticate_new_media", False)
self.enforce_authenticated_media = config.get(
"enforce_authenticated_media", False
Copy link
Contributor Author

@H-Shay H-Shay Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Should this be one config option? Something like enable_authenticated_media?
  2. Should this option be documented in the config manual, since we are going to switch it on default and strongly encourage it's use - perhaps documenting it will make it seem more optional than we'd like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think documenting any config option (except experimental ones) is worth doing, even if you discourage changing it and outline the plans for the default value of the option. May also be worth saying when the option is expected to be removed (which I imagine we will do?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agreed. I'm sure there will be some people that will need to fiddle with the config options (e.g. those using custom clients), so having them documented would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should this be one config option? Something like enable_authenticated_media?

I'm in two minds a bit here. Having two gives us more flexibility, but not actually sure of the use case. The only thing that comes to mind is if we change the defaults at different times, but then I think we wouldn't want to do that at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also went back and forth about making it one config option, and I think in the end it makes slightly more sense to make it one, essentially for the reasons you listed - I couldn't really think of a case where the two would be set independently of each other. I am going to switch it to one - this will also make documentation clearer. If anyone comes up with a reason to switch it back let me know...

)

def generate_config_section(self, data_dir_path: str, **kwargs: Any) -> str:
assert data_dir_path is not None
media_store = os.path.join(data_dir_path, "media_store")
Expand Down
43 changes: 42 additions & 1 deletion synapse/media/media_repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,7 @@ async def get_local_media(
media_id: str,
name: Optional[str],
max_timeout_ms: int,
allow_authenticated: bool = True,
federation: bool = False,
) -> None:
"""Responds to requests for local media, if exists, or returns 404.
Expand All @@ -442,6 +443,7 @@ async def get_local_media(
the filename in the Content-Disposition header of the response.
max_timeout_ms: the maximum number of milliseconds to wait for the
media to be uploaded.
allow_authenticated: whether media marked as authenticated may be served to this request
federation: whether the local media being fetched is for a federation request

Returns:
Expand All @@ -451,6 +453,10 @@ async def get_local_media(
if not media_info:
return

if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
if media_info.authenticated:
raise NotFoundError

self.mark_recently_accessed(None, media_id)

media_type = media_info.media_type
Expand Down Expand Up @@ -481,6 +487,7 @@ async def get_remote_media(
max_timeout_ms: int,
ip_address: str,
use_federation_endpoint: bool,
allow_authenticated: bool = True,
) -> None:
"""Respond to requests for remote media.

Expand All @@ -495,6 +502,8 @@ async def get_remote_media(
ip_address: the IP address of the requester
use_federation_endpoint: whether to request the remote media over the new
federation `/download` endpoint
allow_authenticated: whether media marked as authenticated may be served to this
request

Returns:
Resolves once a response has successfully been written to request
Expand Down Expand Up @@ -526,6 +535,7 @@ async def get_remote_media(
self.download_ratelimiter,
ip_address,
use_federation_endpoint,
allow_authenticated,
)

# We deliberately stream the file outside the lock
Expand All @@ -548,6 +558,7 @@ async def get_remote_media_info(
max_timeout_ms: int,
ip_address: str,
use_federation: bool,
allow_authenticated: bool,
) -> RemoteMedia:
"""Gets the media info associated with the remote file, downloading
if necessary.
Expand All @@ -560,6 +571,8 @@ async def get_remote_media_info(
ip_address: IP address of the requester
use_federation: if a download is necessary, whether to request the remote file
over the federation `/download` endpoint
allow_authenticated: whether media marked as authenticated may be served to this
request

Returns:
The media info of the file
Expand All @@ -581,6 +594,7 @@ async def get_remote_media_info(
self.download_ratelimiter,
ip_address,
use_federation,
allow_authenticated,
)

# Ensure we actually use the responder so that it releases resources
Expand All @@ -598,6 +612,7 @@ async def _get_remote_media_impl(
download_ratelimiter: Ratelimiter,
ip_address: str,
use_federation_endpoint: bool,
allow_authenticated: bool,
) -> Tuple[Optional[Responder], RemoteMedia]:
"""Looks for media in local cache, if not there then attempt to
download from remote server.
Expand All @@ -619,6 +634,15 @@ async def _get_remote_media_impl(
"""
media_info = await self.store.get_cached_remote_media(server_name, media_id)

if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
# if it isn't cached then don't fetch it
if not media_info:
raise NotFoundError()
# and if it's authenticated don't serve it
else:
if media_info.authenticated:
raise NotFoundError()
H-Shay marked this conversation as resolved.
Show resolved Hide resolved

# file_id is the ID we use to track the file locally. If we've already
# seen the file then reuse the existing ID, otherwise generate a new
# one.
Expand Down Expand Up @@ -792,6 +816,11 @@ async def _download_remote_file(

logger.info("Stored remote media in file %r", fname)

if self.hs.config.media.authenticate_new_media:
authenticated = True
else:
authenticated = False

return RemoteMedia(
media_origin=server_name,
media_id=media_id,
Expand All @@ -802,6 +831,7 @@ async def _download_remote_file(
filesystem_id=file_id,
last_access_ts=time_now_ms,
quarantined_by=None,
authenticated=authenticated,
)

async def _federation_download_remote_file(
Expand Down Expand Up @@ -915,6 +945,11 @@ async def _federation_download_remote_file(

logger.debug("Stored remote media in file %r", fname)

if self.hs.config.media.authenticate_new_media:
authenticated = True
else:
authenticated = False

return RemoteMedia(
media_origin=server_name,
media_id=media_id,
Expand All @@ -925,6 +960,7 @@ async def _federation_download_remote_file(
filesystem_id=file_id,
last_access_ts=time_now_ms,
quarantined_by=None,
authenticated=authenticated,
)

def _get_thumbnail_requirements(
Expand Down Expand Up @@ -1030,7 +1066,12 @@ async def generate_local_exact_thumbnail(
t_len = os.path.getsize(output_path)

await self.store.store_local_thumbnail(
media_id, t_width, t_height, t_type, t_method, t_len
media_id,
t_width,
t_height,
t_type,
t_method,
t_len,
)

return output_path
Expand Down
48 changes: 43 additions & 5 deletions synapse/media/thumbnailer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

from PIL import Image

from synapse.api.errors import Codes, SynapseError, cs_error
from synapse.api.errors import Codes, NotFoundError, SynapseError, cs_error
from synapse.config.repository import THUMBNAIL_SUPPORTED_MEDIA_FORMAT_MAP
from synapse.http.server import respond_with_json
from synapse.http.site import SynapseRequest
Expand Down Expand Up @@ -274,13 +274,20 @@ async def respond_local_thumbnail(
m_type: str,
max_timeout_ms: int,
for_federation: bool,
allow_authenticated: bool = True,
) -> None:
media_info = await self.media_repo.get_local_media_info(
request, media_id, max_timeout_ms
)
if not media_info:
return

# if the media the thumbnail is generated from is authenticated, don't serve the
# thumbnail over an unauthenticated endpoint
if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
if media_info.authenticated:
raise NotFoundError
H-Shay marked this conversation as resolved.
Show resolved Hide resolved

thumbnail_infos = await self.store.get_local_media_thumbnails(media_id)
await self._select_and_respond_with_thumbnail(
request,
Expand All @@ -307,14 +314,20 @@ async def select_or_generate_local_thumbnail(
desired_type: str,
max_timeout_ms: int,
for_federation: bool,
allow_authenticated: bool = True,
) -> None:
media_info = await self.media_repo.get_local_media_info(
request, media_id, max_timeout_ms
)

if not media_info:
return

# if the media the thumbnail is generated from is authenticated, don't serve the
# thumbnail over an unauthenticated endpoint
if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
if media_info.authenticated:
raise NotFoundError

thumbnail_infos = await self.store.get_local_media_thumbnails(media_id)
for info in thumbnail_infos:
t_w = info.width == desired_width
Expand Down Expand Up @@ -381,14 +394,27 @@ async def select_or_generate_remote_thumbnail(
max_timeout_ms: int,
ip_address: str,
use_federation: bool,
allow_authenticated: bool = True,
) -> None:
media_info = await self.media_repo.get_remote_media_info(
server_name, media_id, max_timeout_ms, ip_address, use_federation
server_name,
media_id,
max_timeout_ms,
ip_address,
use_federation,
allow_authenticated,
)
if not media_info:
respond_404(request)
return

# if the media the thumbnail is generated from is authenticated, don't serve the
# thumbnail over an unauthenticated endpoint
if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
if media_info.authenticated:
respond_404(request)
return

thumbnail_infos = await self.store.get_remote_media_thumbnails(
server_name, media_id
)
Expand Down Expand Up @@ -446,16 +472,28 @@ async def respond_remote_thumbnail(
max_timeout_ms: int,
ip_address: str,
use_federation: bool,
allow_authenticated: bool = True,
) -> None:
# TODO: Don't download the whole remote file
# We should proxy the thumbnail from the remote server instead of
# downloading the remote file and generating our own thumbnails.
media_info = await self.media_repo.get_remote_media_info(
server_name, media_id, max_timeout_ms, ip_address, use_federation
server_name,
media_id,
max_timeout_ms,
ip_address,
use_federation,
allow_authenticated,
)
if not media_info:
return

# if the media the thumbnail is generated from is authenticated, don't serve the
# thumbnail over an unauthenticated endpoint
if self.hs.config.media.enforce_authenticated_media and not allow_authenticated:
if media_info.authenticated:
raise NotFoundError

thumbnail_infos = await self.store.get_remote_media_thumbnails(
server_name, media_id
)
Expand Down Expand Up @@ -485,8 +523,8 @@ async def _select_and_respond_with_thumbnail(
file_id: str,
url_cache: bool,
for_federation: bool,
server_name: Optional[str] = None,
media_info: Optional[LocalMedia] = None,
server_name: Optional[str] = None,
) -> None:
"""
Respond to a request with an appropriate thumbnail from the previously generated thumbnails.
Expand Down
3 changes: 2 additions & 1 deletion synapse/rest/media/download_resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ async def on_GET(

if self._is_mine_server_name(server_name):
await self.media_repo.get_local_media(
request, media_id, file_name, max_timeout_ms
request, media_id, file_name, max_timeout_ms, allow_authenticated=False
)
else:
allow_remote = parse_boolean(request, "allow_remote", default=True)
Expand All @@ -106,4 +106,5 @@ async def on_GET(
max_timeout_ms,
ip_address,
False,
allow_authenticated=False,
)
5 changes: 4 additions & 1 deletion synapse/rest/media/thumbnail_resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ async def on_GET(
m_type,
max_timeout_ms,
False,
allow_authenticated=False,
)
else:
await self.thumbnail_provider.respond_local_thumbnail(
Expand All @@ -107,6 +108,7 @@ async def on_GET(
m_type,
max_timeout_ms,
False,
allow_authenticated=False,
)
self.media_repo.mark_recently_accessed(None, media_id)
else:
Expand Down Expand Up @@ -134,6 +136,7 @@ async def on_GET(
m_type,
max_timeout_ms,
ip_address,
False,
use_federation=False,
allow_authenticated=False,
)
self.media_repo.mark_recently_accessed(server_name, media_id)
Loading
Loading