Skip to content

Commit

Permalink
Fixing pymupdf.mupdf.FzErrorFormat crash by recasting as an `Imposs…
Browse files Browse the repository at this point in the history
…ibleParsingError` (#474)
  • Loading branch information
jamesbraza authored Sep 24, 2024
1 parent 148e662 commit 9c669e9
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion paperqa/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,14 @@ def parse_pdf_to_pages(path: Path) -> ParsedText:
total_length = 0

for i in range(file.page_count):
page = file.load_page(i)
try:
page = file.load_page(i)
except pymupdf.mupdf.FzErrorFormat as exc:
raise ImpossibleParsingError(
f"Page loading via {pymupdf.__name__} failed on page {i} of"
f" {file.page_count} for the PDF at path {path}, likely this PDF"
" file is corrupt"
) from exc
pages[str(i + 1)] = page.get_text("text", sort=True)
total_length += len(pages[str(i + 1)])

Expand Down

0 comments on commit 9c669e9

Please sign in to comment.