Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word/DOCX-Eport as ZMS-Action applying python-docx #288

Merged
merged 100 commits into from
Oct 23, 2024
Merged

Word/DOCX-Eport as ZMS-Action applying python-docx #288

merged 100 commits into from
Oct 23, 2024

Conversation

drfho
Copy link
Contributor

@drfho drfho commented Jun 19, 2024

Ref: #287

@drfho drfho changed the title Added notebook prototype Word/DOCX-Eport: notebook prototype Jun 20, 2024
@drfho
Copy link
Contributor Author

drfho commented Jun 23, 2024

The notebook evaluates a function add_htmlblock_to_docx() for converting richtext to docx-objects. Hint: the current code handles only 2 levels of html children.

ec563e7

image

@drfho
Copy link
Contributor Author

drfho commented Jun 24, 2024

Hints for b5c37c0

Certain layout/style properties cannot be added by docx-API, here some helper functions are needed using the XML-API and adding generic opendocument-xml objects:

  1. document-field PAGE number: example code add page couter to footer
  2. paragraph property borders: example code adds a bottom border to the added style "Description"

image

@drfho
Copy link
Contributor Author

drfho commented Jun 24, 2024

@zmsdev for discussion: 7b8517a

A class method standard_json (py) will generate a JSON represenation of the object's content. The JSON should contain all structural/sematic information to get processed by python-docx . I propose a list of dicts each containing a list of dicts each reprepresenting a fully described block that can be easily transformed to a docx-objects.

image

TASK: JSONify mix of paragraphs (block) and runs (inline)

This JSON model is obviously too flat for dealing with inline formatting. The object's content block need to be segmented into w:paragraph and w.run elements. An inspiration for the structure of the JSON code might be:
http://pdfmake.org/playground.html

image

Copy link
Contributor Author

@drfho drfho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-docx still does not support external hyperlinks

https://github.com/python-openxml/python-docx/blob/master/docs/dev/analysis/features/text/hyperlink.rst

So a function is needed for creating the hyperlink run element:

image

Copy link
Contributor Author

@drfho drfho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the jupyter prototype
https://github.com/zms-publishing/ZMS/blob/16b024f4865c26a321acd12ed7d8ce366aaa79fb/docs/notebooks/snippets_11_pythondocx_table%20.ipynb
the table processing now takes a different appoach:

  1. create a 2D list of cell contents, apply cell content as html-string. after adding a grid-coords-prefix (see image below) fill identical content to row/col-spanned cells
  2. fill the docx-table grid and merge cells with identical content in 2 dimensions
  3. convert cell-wise html-content to docx-content

image

Actually this approach is quite stable in respect to cell-merging.

Copy link
Contributor Author

@drfho drfho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

@drfho drfho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new function add_num_element() creates a new numbering reference (ends up in file numbering.xml) for restarting the list counter on any new <ol>-element.

DOCX_numbering

@drfho drfho changed the title Word/DOCX-Eport: notebook prototype Word/DOCX-Export as ZMS-Action Oct 23, 2024
@drfho drfho changed the title Word/DOCX-Export as ZMS-Action Word/DOCX-Eport: Applying python-docx as ZMS-Action Oct 23, 2024
@drfho drfho changed the title Word/DOCX-Eport: Applying python-docx as ZMS-Action Word/DOCX-Eport as ZMS-Action applying python-docx Oct 23, 2024
@drfho drfho merged commit 33e2055 into main Oct 23, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant