Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: rotate images on source acquisition #51

Merged
merged 2 commits into from
Jul 22, 2024
Merged

fix: rotate images on source acquisition #51

merged 2 commits into from
Jul 22, 2024

Conversation

jrdh
Copy link
Member

@jrdh jrdh commented Jul 16, 2024

It would be good to have some more test cases for this, but from testing with the one example I have, this works fine.

Staging (running this branch): https://data-nlb-stg-01.nhm.ac.uk/media/7aeb4a93-26df-467d-91c7-555826c92885
Live (running v0.16.1): https://data.nhm.ac.uk/media/7aeb4a93-26df-467d-91c7-555826c92885

Essentially this change rotates the images on entry to the server (i.e. when we get the image from source) and strips any orientation value from the EXIF data. This puts the onus on the person putting the file into EMu and on EMu to do the right thing™️ and be consistent with any orientation tag that may or may not exist.

@jrdh jrdh requested a review from alycejenni July 16, 2024 18:31
If the first thing we do after lazily opening the image is call exif_transpose, we end up causing some issues with that functions logic. Specifically, TIFF images with a TIFF tag orientation will be double rotated because the exif_transpose function reads the TIFF tag orientation (in Pillow exif also includes these TIFF tags) before the image is loaded into memory, but once the image is loaded into memory the TIFF plugin will rotate the image based on the TIFF orientation tag before the exif_transpose function then also rotates it. To avoid this, do something else which will force the image to be loaded into memory before calling exif_transpose. Bit hacky but hey tis what it is.
@jrdh
Copy link
Member Author

jrdh commented Jul 21, 2024

OK! Who is ready for some fun explaining all this?

We have a couple of examples which show the issues we're trying to resolve with this rotation problem:

https://data.nhm.ac.uk/media/c11e58f9-e0ef-4d5e-8171-9cd51d345565 a lovely picture of a bee:

image

https://data.nhm.ac.uk/media/7aeb4a93-26df-467d-91c7-555826c92885 a lovely picture of a moth:

image

In the examples above, the moth is shown upside down and the bee is the correct way up. In this case we're using the EMu derivative images to serve up both of these images. The bee has no EXIF orientation data but the moth does, we just ignore it during source aquisition. This PR is primarily attempting to solve the problem with the upside down moth, however, in doing so we needed (or at least thought we did) to keep the bee the right way up.

The change in this PR, both commits in fact, fix the issue with the moth picture. However, they break any images created based on the original image of the bee (not ones from the derivatives, they are fine). The TL;DR explanation of this is that the bee's original TIFF image has an incorrectly applied TIFF orientation tag. You can actually see this currently on live now (21/07/24) where https://data.nhm.ac.uk/media/c11e58f9-e0ef-4d5e-8171-9cd51d345565/full/max/0/default.jpg looks like this:

image

The details:

The JPEG derivative images created by EMu all have no EXIF orientation tag present. The current code and the new code both just do nothing to rotate the images and you end up with a correclty orientated image.

The original TIFF image has a TIFF orientation tag (not EXIF) which instructs any code reading the raw image data in the file counter-clockwise 270°. This causes the image to be presented on its side (as in the image above) as the raw image data is actually the correct way up. This is why I believe the problem with this bee original image is actually with the original TIFF not our processing of it.

Now to explain why we have a second commit in this PR with a sutle change where 2 lines are reordered. I've explained this a bit in the commit message but didn't want to overload that with information, but I'll do a more thorough explanation here. We use Pillow to load images from their source and convert them to JPEG before they enter the IIIF processing pipeline. Pillow uses lazy loading and won't actually load the image's data into memory until you try and do something with it (like rotate it, or manipulate it in some way).

In the first commit on this PR, we remove the EXIF stripping stuff and insert a call to the exif_transpose function instead. This seems reasonable, EXIF can appear in TIFFs and any other image format so there was an oversight there, and actually dealing with the EXIF orientation tag seems like a reasonable thing to do. However, when running the original TIFF bee image through this commit's code, we end up with this:

image

That bee is upside down 🙁. How has this happened? Turns out the exif_transpose function is double rotating the image. The TIFF orientation tag is meant to be applied (according to Pillow's devs at least) on image load, whereas EXIF tags are applied optionally later if needed. I believe this is a difference in image format, where the format of the TIFF data is tied closely to the orientation of the image, whereas the EXIF orientation tag is just a presentation hint. Anyway, this difference means that when the TIFF data is actually loaded into memory during the exif_transpose function, the TIFF image plugin in Pillow rotates the image after reading it. Unfortunately, this happens just before the exif_transpose function attempts to perform its own rotation. This is because at the start of the exif_transpose function the EXIF data is read. This includes the TIFF tag data, and therefore the TIFF orientation tag value, but does not cause the image to be loaded into memory by the TIFF image plugin. The exif_transpose function then determines that it needs to rotate the image because it finds the TIFF orientation tag in its image_exif variable, and then calls the transpose method on the image. This causes the image to be loaded before the transpose is applied, resulting in the TIFF image plugin rotating the image. Then the transpose method runs through and rotates the image again, hence you end up with a counter-clockwise 540° rotation and an upside down bee.

The fix for this is in the second commit where we call the convert method on the image before we call exif_transpose to ensure the image has been loaded before exif_transpose attempts to read its EXIF data. This solves the problem because by the time the exif_transpose function extracts the EXIF data, the TIFF orientation tag is gone and therefore it just does nothing and returns the image as is (I haven't pinned down exactly where the TIFF orientation tag is removed, but it makes sense for it to be removed after the image is loaded given the fact it is automatically applied on image load.

So overall, this PR fixes the moth and the bee original image is just incorrectly rotated at source, so we need to get a curator to reupload the image to fix that. I may raise this as an issue on Pillow but it may just be expected behaviour, I'm no TIFF, TIFF tag, or EXIF expert so I can't say for sure.

@jrdh jrdh merged commit 6e169dc into dev Jul 22, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

1 participant