Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for host urls to include paths #38

Merged
merged 1 commit into from
May 17, 2024
Merged

Allow for host urls to include paths #38

merged 1 commit into from
May 17, 2024

Conversation

Midnighit
Copy link

@Midnighit Midnighit commented May 16, 2024

The urllib.parse.urljoin function will discard the path of the url if the joined path is absolute (i.e. starts with a slash).
E.g.

from urllib.parse import urljoin

host = "https://cdn.example.com/path/to/static/"
url = "/css/image.png"
print(urljoin(host, url))
# output: https://cdn.example.com/css/image.png

This poses a problem because flask.url_for will always return absolute paths and some cdn hosts like Google mandate a path to use their Cloud storage buckets as cdn. My solution suggests removing trailing and leading slashes from host and url respectively and then concatenating them with a slash.
Successfully tested with the following hosts: None, "", "https://cdn.example.com", "https://cdn.example.com/", "https://cdn.example.com/path", "https://cdn.example.com/path/", "https://cdn.example.com/path/to/static", "https://cdn.example.com/path/to/static/".

This function will join the host URL with another URL.
It differs from urljoin in that it will keep the path of the host URL
even if url is an absolute path.
:param url: the URL returned by url_for
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change this to: :param url: The URL returned by url_for

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to url_path as suggested in #38 (comment).

)
def _custom_url_join(self, url):
"""
This function will join the host URL with another URL.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change this entire description to:

Join the host URL with another URL path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return urljoin(
self.host_url, "/".join([self.static_url_path, filename])
)
def _custom_url_join(self, url):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rename the url argument to url_path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts your suggestion in #38 (comment). Should I rename it to url_for or url_path?

if self.host_url is None:
return url

return self.host_url.rstrip("/") + '/' + url.lstrip("/")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than concatenating the string with + what do you think about "/".join([self.host_url.rstrip("/"), url.lstrip("/")])?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either work. Your suggestion is probably more "modern" for lack of a better term. I opted for the + way of concatenating because it looked more intuitive and obvious what's returned to me. Another alternative would be using an f-string f'{self.host_url.rstrip("/")}/{url.lstrip("/")}'. There's also the option to use the removesuffix("/") and removeprefix("/") instead of rstrip("/") and lstrip("/") and while I think those methods would be more fitting, they're relatively new and require Python 3.9 so I opted for the older version to achieve a better compatibility.
In any case, it's your repository so while I respect that you formulated it as a question, I'll ultimately refer to your choice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm dumb. Let's go with an f-string. Given Python 3.9 isn't end of life until October 2025 let's avoid the new functions for now.

@nickjj
Copy link
Owner

nickjj commented May 16, 2024

Thanks for taking this on btw!

Just so we have a reference, this will fix #21.

@Midnighit
Copy link
Author

Midnighit commented May 17, 2024

Thanks for taking this on btw!

Just so we have a reference, this will fix #21.

You're very welcome. 🙂

It should, yes. The criticism I can see is that, like @ybenitezf mentions in #21 (comment), technically speaking, the FLASK_STATIC_DIGEST_HOST_URL is not a host but a scheme plus a host and sometimes a path as far as urllib and the RFC3986 (which urllib references) are concerned. As far as I understand it, that's the reason for the unintuitive behavior of the urllib.parse.urljoin method, where it won't use the path of the base argument if the url argument, that's joined with it, is given as an absolute path (i.e. with a leading slash).

This could possibly be fixed by introducing new variables for the different parts of the url (scheme, host, path etc.) and then combining those somehow, but all my attempts at making better use of base Python functionality to not only do the combining but also verify that the resulting url is valid, have yielded no results.

In the end, I came to the conclusion that it's not worth complicating the process for the sake of semantics when the current solution works with some minor adjustments.

I tested the changes against the list you posted in #21 (comment). I'll leave my results there.
Almost made a stupid post due to misunderstanding how flask.url_for works. Bottom line: the proposed changes work as expected provided flask.url_for is able to build valid urls from the given endpoint and filename.

@nickjj
Copy link
Owner

nickjj commented May 17, 2024

FLASK_STATIC_DIGEST_HOST_URL is not a host but a scheme plus a host and sometimes a path

Yep, I that's why I added a _URL suffix to the variable name which encompasses all of the components. I think the problem here is also using HOST in the name because now it implies just the hostname and suddenly "HOST_URL" is kind of confusing. I remember not wanting to go with CDN_URL because it's not guaranteed you'd be using a CDN to leverage this feature.

With that said, given we always end up calling Flask's url_for with or without FLASK_STATIC_DIGEST_HOST_URL being set we can depend on Flask to handle setting https:// or not. There's a number of ways to do that at https://stackoverflow.com/questions/14810795/flask-url-for-generating-http-url-instead-of-https, if this approach were taken then this extension could remain the same except for a documentation change that mentions not to include the protocol but then having _URL might be confusing because folks may add the protocol.

For now, I do agree leaving things as they are is a good first move. This at least fixes the issue of wanting to set a path as part of your URL.

@nickjj
Copy link
Owner

nickjj commented May 17, 2024

By the way, the failing pipeline isn't due to this PR. It's related to the CI environment using Python 3.8.

Can you roll up all of your commits into a single commit message of: "Fix host url to support a URL path" and force push it?

Once that's complete I'll merge this and fix the CI issue afterwards.

@Midnighit
Copy link
Author

Done, please let me know if you need anything else from me.

@nickjj
Copy link
Owner

nickjj commented May 17, 2024

That's it, thanks for the contribution! I just pushed v0.4.1 which has this fix.

@nickjj nickjj merged commit 3149838 into nickjj:master May 17, 2024
1 check failed
@Midnighit
Copy link
Author

Ugh, I just realized I missed changing the name of the second url_for to url_path in /flask_static_digest/__init__.py:60 🙁
Sorry!

@nickjj
Copy link
Owner

nickjj commented May 17, 2024

I think it works as is, url_for in this case is the Flask function. That's the argument being passed in as url_path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants