Skip to content

Image in Table Preventing Table Extraction #3585

Closed Answered by JorjMcKie
isaac-peter asked this question in Looking for help
Discussion options

You must be logged in to vote

Wow thank you so much for the detailed response! One quick follow-up question. Not all pages have 'Fractional Times:' at the bottom of the table; sometimes there is other text in bold. Do you have any other suggestions to mark the lower left side of the table? What about searching the document for the next Helvetica-Bold text that is not one of the needles? Then extracting the y0 value from that text box?

Thanks again!

Certainly possible, but more complex. It would be easier if you knew a list of text alternatives and could check for them.
But you can extract all bold text spans and fight your way through this jungle.

# bold text spans:
spans = [
    s
    for b in page.get_text("dict", f…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@JorjMcKie
Comment options

@JorjMcKie
Comment options

@isaac-peter
Comment options

@JorjMcKie
Comment options

Answer selected by isaac-peter
@isaac-peter
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants