-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epub: detect scripted content and improve mathml, svg detection #1659
base: master
Are you sure you want to change the base?
Conversation
push @properties, 'svg' if $doc->findnode('//*[local-name() = "svg"]'); | ||
push @properties, 'scripted' if $doc->findnode($xpath_scripted . ' | ' . $xpath_forms); | ||
push @properties, 'mathml' if $doc->findnode($xpath_mathml); | ||
push @properties, 'svg' if $doc->findnode($xpath_svg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes assume we'll only ever have xhtml in epub. It's at least something to watch out for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is easy to explain: the EPUB specs (both 2 and 3 as far as I know) require HTML5 in XML serialisation, with the appropriate namespaces. So this code should be ok, and more accurate, actually.
my $properties = join(" ", @properties); | ||
$file_item->setAttribute('properties', $properties) if $properties; } } | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is basically what we'd want to determine whether to use img or object, as discussed in #1444, right? That suggests that it might be better higher up the chain, such as Post::Graphics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The img/object distinction is a bit more subtle because the <img>
tag disables every external resource as well. I'd say this code is specific to EPUB (the spec just wants to know if there are scripts and MathML content*).
* Actually, one should also declare external resources, e.g. web fonts not embedded in the file. This requires parsing the CSS content of the image, so it seems quite hard to do reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the harm in simply assuming that any svg image contains mathml and scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per spec, the mathml
and scripted
properties "MUST be set if and only if the criterion specified in the description is met", so the output would be invalid. To my memory, epubcheck was flagging the properties as errors if no mathml/scripts were in the file.
lib/LaTeXML/Post/Manifest/Epub.pm
Outdated
@@ -15,6 +15,25 @@ use warnings; | |||
use File::Find qw(find); | |||
use URI::file; | |||
|
|||
# HTML5 namespaces | |||
our $XPATH = $LaTeXML::Post::Document::XPATH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timing of this is off; it will set $XPATH
to the current value when LaTeXML::Post::Manifest::Epub
is loaded, rather than when the document is being packaged. That will probably work for a one-off conversion because LaTeXML has defered but will not (necessarily) be correct in more general (eg. daemon) contexts.
The problem is that we haven't got a clear clean API for managing namespaces so that we can declare these mappings once, and use them consistently throughout conversion & postprocessing (ideally, they should only make it into the otuput document if they're needed).
Short of that, it would be best to move these registerNS
into one of the Epub
methods; probably initialize
.
I'm trying to get some of these PR's moving again. @dginev do you have any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved most of the $XPATH
code to initialize
. Once you reach initialize
, XPATH
should already exists, right?
819e078
to
7d877fd
Compare
7d877fd
to
440ddf4
Compare
Updated to work with latest Epub.pm updates on master. |
There are situations where an EPUB might contain scripted content (e.g. an interactive SVG, or embedding MathJax). Scripted content must be reported in the manifest with the
scripted
property.The PR cleans up the existing code setting the
mathml
andsvg
properties, plus it sets thescripted
property when needed. This also means parsing the SVG files to find if they contain scripts or MathML content.(Unfortunately this causes a trivial conflict with #1643.)