Skip to content

How to get glyph and convert to character? #3320

Answered by JorjMcKie
june6423 asked this question in Q&A
Discussion options

You must be logged in to vote

Here is a quick test run for OCR-ing this:

import fitz

doc = fitz.open("test.pdf")
page = doc[31]
clip = page.rect / 2
rect = page.search_for("hepg-2", clip=clip)[0]
clip.y0 = rect.y1 + 5
rect = page.search_for("Table 2", clip=clip)[0]
clip.y1 = rect.y0
clip.x0 = 36
pix = page.get_pixmap(clip=clip, dpi=300)
ocr = fitz.open("pdf", pix.pdfocr_tobytes())
print(ocr[0].get_text())
R=—H
58.98 + 0.89
71.55 + 2.91
R—Me
49.60 + 2.03
63.48 + 2.11
R—OMe
49.65 + 2.08
62.41 + 2.23
R=Cl
44.71 + 1.92
43.81 + 1.83

As you see, even a top resolution of 300 dpi will not deliver what you hope to get!

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
11 replies
@JorjMcKie
Comment options

@june6423
Comment options

@JorjMcKie
Comment options

@JorjMcKie
Comment options

Answer selected by june6423
@serhii-brovarnyk
Comment options

@JorjMcKie
Comment options

@JorjMcKie
Comment options

@serhii-brovarnyk
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
not a bug not a bug / user error / unable to reproduce
3 participants
Converted from issue

This discussion was converted from issue #3318 on March 28, 2024 08:53.