-
in this pdf, there is hiddle text in page top: "Lorem ipsum dolor title on two lines probably", |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
There is no easy way yet. You can however make a pixmap of the text region (using its bbox) and check which colors occur in that text rectangle. If e.g. only one color, then you know that the text is invisible, etc. Here is a demo: import pymupdf
doc = pymupdf.open("test.pdf")
page = doc[0]
rl = page.search_for("two lines probably")
bbox = rl[0]
pix = page.get_pixmap(clip=bbox)
percent, color = pix.color_topusage()
print(f"{percent*100}% of the region contains color {tuple(map(int, color))}") This prints: In one of the next versions however, we will also provide a way to extract basic vector graphic information together with the extract text. You will have the option to see how text and vector graphic blocks follow each other and thus determine any overlaps. If there are overlaps by images however, is still not covered by such an approach. |
Beta Was this translation helpful? Give feedback.
There is no easy way yet. You can however make a pixmap of the text region (using its bbox) and check which colors occur in that text rectangle. If e.g. only one color, then you know that the text is invisible, etc.
This works for any type of overlap - vector graphics or images.
Here is a demo:
This prints:
100.0% of the region contains color (249, 199, 49)
. So you know that the searched text is invisible.In one of the next v…