Remove_rotation() feature #3560
Replies: 6 comments
-
I don't understand what the problem is that you are reporting here.
If the behavior in point 3 is not as expected, only then we have a bug. |
Beta Was this translation helpful? Give feedback.
-
Clear explanation about the problemBelow code load the pdf input and use the get_drawings() feature to extract the width of the item.
Below code load the pdf input , remove_rotation() remove the orientation of the input from 90 to 0 and use the get drawings feature to extract the width of the item.
The problem is here , input pdf's used for both code is same, but item width is none or 0 for all. I have just printed 15 items in drawing_list, but it gives output 0 for all item width's when I use remove_rotation() feature. Queries
or else |
Beta Was this translation helpful? Give feedback.
-
Let me have the PDF please. This is required for following up |
Beta Was this translation helpful? Give feedback.
-
There is a input pdf. Grace manor-mid rise floor and columnschedule.pdf Please suggest me any other way or solve issue in this feature, because remove_rotation() features works fine but it provide's item width none or 0, it is necessary for my future operations in pdf. |
Beta Was this translation helpful? Give feedback.
-
I checked the results of {'closePath': None,
'color': None,
'dashes': None,
'even_odd': False,
'fill': (1.0, 1.0, 1.0),
'fill_opacity': 1.0,
'items': [('re', Rect(938.3999633789062, 1509.5999755859375, 948.0, 1519.4400634765625), 1)],
'layer': '',
'lineCap': None,
'lineJoin': None,
'rect': Rect(938.3999633789062, 1509.5999755859375, 948.0, 1519.4400634765625),
'seqno': 0,
'stroke_opacity': None,
'type': 'f',
'width': None} The same path after derotation of the page looks like this: {'closePath': False,
'color': None,
'dashes': None,
'even_odd': False,
'fill': (1.0, 1.0, 1.0),
'fill_opacity': 1.0,
'items': [('l', Point(1504.5599365234375, 938.3999633789062), Point(1504.5599365234375, 948.0)),
('l', Point(1504.5599365234375, 948.0), Point(1514.4000244140625, 948.0)),
('l', Point(1514.4000244140625, 948.0), Point(1514.4000244140625, 938.3999633789062)),
('l', Point(1514.4000244140625, 938.3999633789062), Point(1504.5599365234375, 938.3999633789062))],
'layer': '',
'lineCap': None,
'lineJoin': None,
'rect': Rect(1504.5599365234375, 938.3999633789062, 1514.4000244140625, 948.0),
'seqno': 0,
'stroke_opacity': None,
'type': 'f',
'width': None} Yet, both paths refer to the same path which you can see when multiplying the original paths[0]["rect"] * page.rotation_matrix
Rect(1504.5599365234375, 938.3999633789062, 1514.4000244140625, 948.0) This is visibly the same rectangle of the first path after de-rotation. So maybe this is a way for you to circumvent the problem. |
Beta Was this translation helpful? Give feedback.
-
Ya obviously, result of remove_rotation() feature works good, but here I am talking about width item in get_drawings(). It varies after using the remove_rotation () feature. Could you check this here?. In previous message , i have mentioned about this and also added screenshot. In this case, problem is width item in get_drawings(). |
Beta Was this translation helpful? Give feedback.
-
Description of the bug
The set rotation feature rotates the page, but it couldn't provide proper results when extracting the details from the PDF page. However, in PyMuPDF version 1.24.3, there is a remove_rotation() feature. When using this feature to extract the line coordinates based on the width of the lines logic after changing the orientation with remove_rotation(), the width becomes zero or 1.0 for all lines in the PDF page.
can explain about this feature and this bug in remove_rotation( ) feature....
How to reproduce the bug
doc = fitz.open(pdf_path)
page_no = 0
page = doc[page_no]
page.remove_rotation()
mediabox = page.mediabox
width = mediabox.width
height = mediabox.height
orientation = page.rotation
print('width - ',width,'\n','height - ',height,'\n','orientation - ',orientation)
for filtering the lines i pdf pages
doc = fitz.open(pdf_path)
beam_lines = {}
for page_num in range(doc.page_count):
page = doc[page_num]
drawing_list=page.get_drawings()
width_threshold = 0.10
length_threshold = 60
for item in drawing_list:
#print(item)
if item['width']:
try:
if item['items'][0][0] == 'l' and item['width']>width_threshold:
when filtering the lines based on width of line segment it gives output empty because width of all lines are zero, but before remove_rotation this code provide proper width for all lines
PyMuPDF version
1.24.3
Operating system
Windows
Python version
3.10
Beta Was this translation helpful? Give feedback.
All reactions