-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken XML-Result #880
Comments
I believe this is the same issue as #877 - I found it too and submitted a pull requestion that is awaiting review: #878 |
I'm not sure, but it seems like there's a lot more broken in the output than these 0xFFFE characters. Example: Figure 3. Mean NH2OH-to-N2O conversion ratios (RNH2OH-to-N2O) in artificial soils at different pH and Result XML(look at my issue oppening result-xml-part) contains a value, which has a mix/parts of this two Strings (WHY?) and hieroglyphs (WHY?) inbetween: |
Ah, good point, I see what you mean. So the pull request repairs the symptom but not the cause of this problem. I looked at your PDF in a text editor - I see that lines 189418 to 189425 contain the objs with the begining and end of the text you see in the JHOVE output. It looks to me like it is reading in (probably 16-bit) character by character on line 189419, but something happens where it fails to handle the end of the line correctly. This garbles things so it misses the endobj and doesn't correct itself until 5 lines later (possibly by inversing what happened when it ends line 189423). It then picks up from "for organic matter..." at start of line 189424. In that case, this could well be related to this legacy issue: #277 |
Hi both, I'm back for the summer vacation yet and we'll take a look at this issue and review the PR for this year's release candidate. |
Hello dear developers,
I have found a PDF. The JHOVE processing of this file causes a misbehavior. A result XML is generated which contains non-valid characters for an XML document. The XML can then no longer be used by further systems.
Tested Version: release="1.26.1" date="2022-07-14
Example PDF: > https://epflicht.ulb.uni-bonn.de/download/pdf/363239?originalFilename=true
Reproduction command: /bin/sh jhove -c conf/jhove.conf -h XML -m PDF-hul Energy_Environment_390.pdf -o Energy_Environment_390-out.xml
Result:
The text was updated successfully, but these errors were encountered: