Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_utils.py has a bug in extract_pdf_content #1034

Closed
cynthiajiangatl opened this issue Aug 1, 2024 · 1 comment
Closed

data_utils.py has a bug in extract_pdf_content #1034

cynthiajiangatl opened this issue Aug 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@cynthiajiangatl
Copy link

Describe the bug
When there is an empty table in a pdf document, if layout model is used, extract_pdf_content will fail with "list index out of range" error.

Expected behavior
Empty table should be skipped.

Code fix needed
Add try catch and skip the empty table.

for table in form_recognizer_results.tables:
try:
table.spans[0]
except:
continue
table_offset = table.spans[0].offset
table_length = table.spans[0].length
if page_offset <= table_offset and table_offset + table_length < page_offset + page_length:
tables_on_page.append(table)

@cynthiajiangatl cynthiajiangatl added the bug Something isn't working label Aug 1, 2024
@vkrd
Copy link
Contributor

vkrd commented Aug 5, 2024

Thanks for pointing this out, fixed in #1040

@vkrd vkrd closed this as completed Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants