find_tables() 'cells' attributes format #3629
-
Hello, The extract() method of find_tables() seems to extract using the following format: Each element is a table row, and each subelement is a column in that given row. The cells attribute on the other hand produces a list where each element is a cell, and the subelement is the bbox. How do I relate a given element in the cells attribute back to the cell in the extract() method? Said otherwise, how do I find the bbox of a given cell from the extract() method? I don't believe the "Page" doc describes the structure of the extract() or cells attribute. thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
After reviewing the doc again, I'm thinking that maybe I use the row attribute to get the cells bboxes, and then relate this back to the extract() structure I described above? |
Beta Was this translation helpful? Give feedback.
-
This has been answered in #3587. E.g. imglist = page.get_image_info()
# copy of the table's text content:
tab_text = tab.extract()[:]
# the table's cell bboxes as Rect objects:
tab_cells=[[pymupdf.Rect(c) for c in r.cells] for r in tab.rows] Are 2 lists of lists with the same sizes and indexed as [row][col]. So the text in |
Beta Was this translation helpful? Give feedback.
This has been answered in #3587. E.g.
Are 2 lists of lists with the same sizes and indexed as [row][col]. So the text in
tab_text[row][col]
has the cell coordinatestab_cells[row][col]
(which is aRect
object).