Check if attachment is actually(!) referred to #9585

pabzm · 2024-08-15T13:39:26Z

This finishes what #9472 intended to do – but didn't actually do, as I found out.

The code now checks for each non-text mime-part in a multipart-part if its Content-ID or Content-Location is (probably) used in a sibling HTML-part, and only if that matches the respective mime-part is considered an "inline" attachment (that won't show up as downloadable or below the message content).

The second commit in this PR makes sure that for all image-parts, all mime-part-headers are loaded from the server, in order to actually get hands on the Content-Location-header (which isn't always fetched in the first place). It is limited to image-parts because those are the most common ones maybe having a Content-Location-header and I assumed that we shouldn't load the headers for every mime-part, so this seems like a workable real-world distinction for me.

One could probably change how the BODYSTRUCTURE response is fetched and parsed to ensure a Content-Location-header is always fetched in the first place, but I didn't dare to touch that code.

This fix closes #9565

alecpl · 2024-08-17T06:59:46Z

program/lib/Roundcube/rcube_message.php

+                        }
+                        // Note: There might be more than one HTML part, thus
+                        // we use a callback and concatenate the results.
+                        $html_content = implode('', array_map(function ($part) { return $this->get_part_body($part->mime_id); }, $html_parts));


There's indeed my @todo comment 70 lines below to get the HTML bodies and check for references. However, this has a performance impact, that for a message with many images and big HTML content might be noticeable. As of now I considered fetching part headers acceptable, but part body is another story.

We need some more considerations. For example, when loading an image attachment (rcmail_attachment_handler) maybe we could fetch the image without needing to parse the message structure and loading HTML parts again only to get the attachment part data.

Maybe checking for references must be done outside of rcube_message. So rcube_message is not that heavy (to not slow down all the cases where we deal with the message), but do not really need the full structure information, e.g. when viewing source or downloading the message, or when dealing with a single attachment.

Or maybe we need to use cache. It might not help much when dealing with parallel requests (loading image attachments) though. And caches are usually optional.

I see your point and am working on it.

FYI #9565 (comment)

I changed the approach so the reference checking will be only done on demand. I'd like to test this, but I have a hard time figuring out how, since the entire class essentially depends on the response to an IMAP bodystructure command, which I'd like to avoid mocking.

If there's no reference to it in a sibling HTML part then we handle it as a classic attachment (which is shown as downloadable).

Previously all headers were only fetched for message/rfc822, or if the Content-Type's "name" parameter was set, or if a Content-ID was set. The RFC doesn't require neither the "name" parameter nor a Content-ID for using Content-Location, though, so we shouldn't depend on those. Instead now all headers are also fetched if the main part of the Content-Type is "image", to catch more cases.

pabzm mentioned this pull request Aug 15, 2024

Attached picture is not shown if text-part is present in multipart/mixed and image-part as Content-ID #9565

Open

2 tasks

pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from d9fe09b to 73cbc95 Compare August 15, 2024 13:41

pabzm changed the title ~~Check if attachment is actually referred to~~ Check if attachment is actually(!) referred to Aug 15, 2024

pabzm self-assigned this Aug 15, 2024

pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 73cbc95 to 636dcdf Compare August 15, 2024 13:45

pabzm requested a review from alecpl August 15, 2024 13:52

alecpl reviewed Aug 17, 2024

View reviewed changes

pabzm added 2 commits November 11, 2024 12:18

Check if "inline" msg part is actually referred to

fa1ddc6

If there's no reference to it in a sibling HTML part then we handle it as a classic attachment (which is shown as downloadable).

pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 636dcdf to 1373f57 Compare November 11, 2024 11:23

pabzm added 2 commits November 12, 2024 11:08

Parse HTML for references only on demand

bcd60d1

Typos and comment formatting

8f12091

pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 1373f57 to 8f12091 Compare November 12, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if attachment is actually(!) referred to #9585

Check if attachment is actually(!) referred to #9585

pabzm commented Aug 15, 2024

alecpl Aug 17, 2024

pabzm Sep 6, 2024

olegStreejak Sep 6, 2024

pabzm Nov 12, 2024

Check if attachment is actually(!) referred to #9585

Are you sure you want to change the base?

Check if attachment is actually(!) referred to #9585

Conversation

pabzm commented Aug 15, 2024

alecpl Aug 17, 2024

Choose a reason for hiding this comment

pabzm Sep 6, 2024

Choose a reason for hiding this comment

olegStreejak Sep 6, 2024

Choose a reason for hiding this comment

pabzm Nov 12, 2024

Choose a reason for hiding this comment