Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Images #606

Open
wants to merge 154 commits into
base: master
Choose a base branch
from
Open

Add Images #606

wants to merge 154 commits into from

Conversation

addie9800
Copy link
Collaborator

No description provided.

@addie9800 addie9800 marked this pull request as draft September 4, 2024 09:58
@MaxDall
Copy link
Collaborator

MaxDall commented Sep 13, 2024

@addie9800 Thanks for pushing this idea 👍 I think this would be an awesome addition to Fundus! 🚀

It looks like you accidentally pushed a lot of unrelated files to the draft, making it harder for me to focus on the core idea. Would you mind removing those files so we can talk about the changes?

@addie9800
Copy link
Collaborator Author

Ah well, sorry about that. I did only intend to push one extra file :)... I cleaned up a bit now, in case you want to have a look at it, but I haven't reached a real milestone yet, since you last had a peek at it. I think the issue I am struggling with most at the moment is the dynamic rescaling of images where some publishers change the path of the url according to the necessary resolution for the given screen, making it difficult to come up with a selector for the corresponding img element. If you have any idea, shoot ;)

@addie9800
Copy link
Collaborator Author

@addie9800
Copy link
Collaborator Author

Update: As of now, I have verified the functionality for TheNamibian, DerStandard, ORF, NineNews and CBCNews. They can be used to get an impression of the intended functionality.

@addie9800 addie9800 marked this pull request as ready for review December 16, 2024 21:55
Comment on lines 640 to 652
# parse description
description = nodes_to_text(alt_selector(node))

# parse authors
authors = []
if isinstance(author_selector, Pattern):
# author is part of the caption
if caption and (match := re.search(author_selector, caption)):
authors = [match.group("credits")]
caption = re.sub(author_selector, "", caption).strip() or None
elif description and (match := re.search(author_selector, description)):
authors = [match.group("credits")]
description = re.sub(author_selector, "", description).strip() or None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest leaving description as is and not applying filters here. If I remember correctly we stated in the documentation, that its the parsed alt attribute of the image, so I would argue one would expect the raw data.

Comment on lines 545 to 552
try:
width = float(source.get("width") or 0) or None
except ValueError:
width = None
try:
height = float(source.get("height") or 0) or None
except ValueError:
height = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know which case leads to a ValueError? In general, I would like to avoid try ... except blocks, especially in highly frequented code.

Copy link
Collaborator Author

@addie9800 addie9800 Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it happens for Taipei Times. They use '100%' as width, which cannot be parsed as a float and I think there is no good way to extract a proper width to use as a value for the size parameter from this without physically considering the image. But I guess there's no necessity to rely on a try ... except. I have seen some approaches relying on regex or string replacement and then calling isdigit()

Update: I checked stackoverflow and someone did a benchmark test and I used his recommended solution as an alternative: https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-represents-a-number-float-or-int

@addie9800 addie9800 requested a review from MaxDall December 21, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants