-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: PDF 2.0 header detection #964
base: dev/1.33
Are you sure you want to change the base?
Conversation
- added second signature for PDF-2 style header (this can be improved); - added support for PDF-2 headers to `PdfHeader`; and - more work required on valid major/minor version combinations.
- refactored `PDF-hul` PDF Header parsing, particularly the version parsing; - no more `null` return handling, specific exceptions are thrown instead; - these are caught and handled in the `PdfModule` class; - new `PdfVersion` class to handle version parsing and comparison; - PDF version detection now handled by pattern match, with better reporting of errors; - offsets added to `PdfException`s that were in the scope of refactoring; - added new `%PDF-2.` signature to the `PDF-hul` module; - new `edu.harvard.hul.ois.jhove.messages.JhoveMessages.getMessageInstance` convenience method to clone message with sub-message; - added two new test cases to `PDF-hul` regression tests for header version detection; - added necessary `sed` insertions and file copies to handle changed integration test results to `jhove-bbt/scripts/create-1.33-target.sh`; and - refactored "not found" reporting for modules and handlers in jhove-apps `edu.harvard.hul.ois.jhove.Jhove` class. Error messages changed to give more detailed information: - `PDF-HUL-137`: - `PDF-HUL-148`: No longer mentions specific minor versions but has a more detailed sub-message - `PDF-HUL-155`: Reports the header that was too short New Error Messages: - `PDF-HUL-160`: IOException reading PDF header - `PDF-HUL-161`: PDF major version number should be 1.x or 2.x - `PDF-HUL-162`: Malformed PDF version number
- moved field declarations.
} catch (final IOException ee) { | ||
throw new PdfMalformedException( | ||
JhoveMessages.getMessageInstance(MessageConstants.PDF_HUL_160, ee.getMessage()), | ||
offset); // PDF-HUL-137 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small detail // PDF-HUL-137 should be // PDF-HUL-160
throw new PdfMalformedException( | ||
JhoveMessages.getMessageInstance(MessageConstants.PDF_HUL_160, ee.getMessage()), | ||
offset); // PDF-HUL-137 | ||
} catch (PdfException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this swallow the PDF_HUL_33 and PDF_HUL_34 errors in the parser.getNext?
* @param versionString | ||
* the version number from the PDF Header, should be of | ||
* form | ||
* <code>1.x</code> where x should be of the range 0-7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should also be changed to reflect PDF 2.0
PDF-HUL-149 = Invalid indirect destination - referenced object ''{0}'' cannot be found | ||
PDF-HUL-150 = Cross-reference stream must be a stream | ||
PDF-HUL-151 = Unexpected error occurred while attempting to read the cross-reference table | ||
PDF-HUL-152 = Encrypt dictionary has no OE key or it has a null value | ||
PDF-HUL-153 = Encrypt dictionary has no UE key or it has a null value | ||
PDF-HUL-154 = Unknown Developer Prefix: | ||
PDF-HUL-155 = Error parsing mandatory version number from PDF header. | ||
PDF-HUL-155 = Error parsing mandatory version number, PDF header too short: "{0}" | ||
PDF-HUL-156 = Extension can't be defined as an indirect reference | ||
PDF-HUL-157 = Unexpected exception {0} | ||
PDF-HUL-158 = Unexpected exception {0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the wiki needs to be updated :-) #967
Great update, thanks @carlwilson |
@samalloing I'll give you a rundown as to how I've changed the exception handling and the effects inline above before merging this. |
PDF-hul
PDF Header parsing, particularly the version parsing;null
return handling, specific exceptions are thrown instead;PdfModule
class;PdfVersion
class to handle version parsing and comparison;PdfException
s that were in the scope of refactoring;%PDF-2.
signature to thePDF-hul
module;edu.harvard.hul.ois.jhove.messages.JhoveMessages.getMessageInstance
convenience method to clone message with sub-message;PDF-hul
regression tests for header version detection;sed
insertions and file copies to handle changed integration test results tojhove-bbt/scripts/create-1.33-target.sh
; andedu.harvard.hul.ois.jhove.Jhove
class.Error messages changed to give more detailed information:
PDF-HUL-137
:PDF-HUL-148
: No longer mentions specific minor versions but has a more detailed sub-messagePDF-HUL-155
: Reports the header that was too shortNew Error Messages:
PDF-HUL-160
: IOException reading PDF headerPDF-HUL-161
: PDF major version number should be 1.x or 2.xPDF-HUL-162
: Malformed PDF version numberCloses #470