-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atom: use correct xml:base for decoded elements #222
Conversation
In order to keep tracking xml:base correctly, the goxpp's `DecodeElement` pops the BaseStack if the start element added a base (if any). That means the atom parser needs keep track of the base *before* calling `DecodeElement` to use for resolving relative URLs within the decoded element. Without this fix, elements with xml:base attributes will be erroneously resolved with the parent xml:base.
Maybe this makes sense @cristoper ? You are suggesting that in
Then in Atom Parser, something like this?:
|
Yes, that's approximately what I had in mind. I just pushed an update to the PR which does similar but copies However, if you'd prefer not duplicating code from |
This provides an equivalent fix that doesn't do any inelegant swapping out of the BaseStack. It also doesn't change `goxpp`'s public API by essentially copying `XmlBaseResolveUrl` to `gofeed`.
d6a83eb
to
25bb801
Compare
PR mmcdole#220 introduced a failing test for detecting images in the "content" element. It should instead be testing the "content:encoded" element. But that uncovered an issue with how extensions were being detected (the "content" namespace was being detected as an extension namespace). As a more robust way of checking for the "content" namespace, this PR exposes `shared.PrefixForNamspace()` as a public function so it can be used in the rss parser. This provides a solution to PR mmcdole#211 (and includes @JLugagne's test case from that PR). Once the fixes to xml:base handling in mmcdole#222 are merged, this should fix the remaining failing test reported in mmcdole#210.
PR mmcdole#220 introduced a failing test for detecting images in the "content" element. It should instead be testing the "content:encoded" element. But that uncovered an issue with how extensions were being detected (the "content" namespace was being detected as an extension namespace). As a more robust way of checking for the "content" namespace, this PR exposes `shared.PrefixForNamspace()` as a public function so it can be used in the rss parser. This provides a solution to PR mmcdole#211 (and includes a test based on @JLugagne's test case from that PR). Once the fixes to xml:base handling in mmcdole#222 are merged, this should fix the remaining failing test reported in mmcdole#210.
PR mmcdole#220 introduced a failing test for detecting images in the "content" element. It should instead be testing the "content:encoded" element. But that uncovered an issue with how extensions were being detected (the "content" namespace was being detected as an extension namespace). As a more robust way of checking for the "content" namespace, this PR exposes `shared.PrefixForNamspace()` as a public function so it can be used in the rss parser. This should also fix PR mmcdole#211 (and includes @JLugagne's test case from that PR). Once the fixes to xml:base handling in mmcdole#222 are merged, this should fix the remaining failing test reported in mmcdole#210.
PR #220 introduced a failing test for detecting images in the "content" element. It should instead be testing the "content:encoded" element. But that uncovered an issue with how extensions were being detected (the "content" namespace was being detected as an extension namespace). As a more robust way of checking for the "content" namespace, this PR exposes `shared.PrefixForNamspace()` as a public function so it can be used in the rss parser. This should also fix PR #211 (and includes @JLugagne's test case from that PR). Once the fixes to xml:base handling in #222 are merged, this should fix the remaining failing test reported in #210.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clean, love this approach.
The issue is that goxpp's DecodeElement() pop's the BaseStack after unmarshaling an element -- as it should to keep the BaseStack in sync with the current document scope. But that means the atom parser needs to keep a reference to the current base before calling DecodeElement() so that it can correctly resolve URLs.
Without this fix, elements with xml:base attributes will be erroneously resolved with the parent xml:base.
This fix is a little awkward with all of its saving and restoring of BaseStacks. A simpler solution would be to simply get the
xml:base
attribute from the decoded element and resolve against that. However, that would require changes togoxpp
: either changeXmlBaseResolveUrl()
to accept a string rather than acting as a method, or maybe adding a separateResolveUrl()
function that takes a base and a relative URL thatgofeed
could use.