Skip to content

Commit

Permalink
Merge pull request #1 from mjanv/develop
Browse files Browse the repository at this point in the history
Pre-release
  • Loading branch information
mjanv authored Jan 20, 2017
2 parents d3d5536 + 467fc09 commit 019e42b
Show file tree
Hide file tree
Showing 8 changed files with 234 additions and 0 deletions.
File renamed without changes.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Mistune Docx Renderer

This tool **merges a list of Markdown files into a brand new Docx file !**

A template docx can be specified which implies that:
* The template defaults styles will be used and can be tweaked to change the appereance of the final document (This template needs to have the following styles defined: `BasicUserQuote` for blockquotes, `BasicUserList` for lists, `BasicUserTable` for tables)
* The template contents will be used as the first page(s) of the final documents

Markdown supported features:
* Paragraphs
* Bold and italic text
* Headers (level 1 to 5)
* Bullet lists
* Tables with cells containing simple text (normal, bold or italic)
* Blockquote
* Images
* Mathematical equations (using Sympy rendering)
* Link
* Page break

Every feature follow the common Markdown syntax. A list of examples can be seen into the `example\` folder. Please remind that special characters such as `%` or `"` need to be espaced with a `\` to be admissible.

## How to use

```
python generate_doc.py output.docx --template example/template.docx --files examples/*.md
```
The command requests:
* The name and location of the output file
* The name and location of the template file. If not specified, uses the defaults styles of the local Word installation and an empty document.
* The regex to locate the markdown files. If multiples Markdown files fits the specified regex, they will be prior assembled into one Markdown file following the alphabetic order.

## Requirements

This tool has been tested with Python 3.5. See _requirements.txt_ for all the libraries dependencies.
59 changes: 59 additions & 0 deletions example/1_Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
## Introduction

You can add a paragraph of one line or multiples lines.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam finibus nisl non leo rhoncus, quis porta nisi egestas. Ut et ex urna. Maecenas arcu ante, congue at commodo et, iaculis in nisi. Aenean vehicula blandit risus, vel pulvinar tellus luctus eget. Quisque nec ullamcorper arcu, et vestibulum nulla. Duis in auctor orci. Aliquam semper erat ac eros porttitor, non tincidunt mauris suscipit. Aenean sit amet sapien quis eros bibendum condimentum non vel lectus. Etiam ac dapibus ex. Nullam volutpat consequat gravida. Vestibulum vel sagittis dui. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed vel urna vel urna feugiat volutpat non et sem. Cras maximus dictum metus id posuere. Curabitur convallis maximus magna, id faucibus nibh egestas at. Mauris vitae efficitur libero.

You can add **bold text** or *italic text*.

You can define headers:

# Header 1

## Header 2

### Header 3

#### Header 4

##### Header 5


You can also add a list of elements:
* which
* are
* going
* to
* be
* displayed
* as
* this !



You like tables ?

| **Number** | **Arrays** | **Others** |
|------------------|---------------------------------------|------------|
| integer | vector (first-order tensor) | string |
| double precision | matrix (second-order tensor) | _ |
| _ | tensor (third-order tensor) | _ |




Blockquotes are defined like this:

```
def double(L):
for x in L:
yield x*2
```

An image can have a legend (and is set to 15 cm width by default)

![legend](example/images/cat.png)

Mathematical equations following the Latex syntax can be written inside dollars \"*brackets*:

$$ \sum_{i=1}^N x_i = \pi $$
4 changes: 4 additions & 0 deletions example/2_Second.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
--------------
## I'm the second file


Binary file added example/images/cat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/template.docx
Binary file not shown.
133 changes: 133 additions & 0 deletions generate_doc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
import os
import shutil
import glob
import re
import itertools
import argparse

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import Pt, Cm
import mistune


class MathBlockGrammar(mistune.BlockGrammar):
block_math = re.compile(r"^\$\$(.*?)\$\$", re.DOTALL)


class MathBlockLexer(mistune.BlockLexer):
default_rules = ['block_math'] + mistune.BlockLexer.default_rules

def __init__(self, rules=None, **kwargs):
if rules is None:
rules = MathBlockGrammar()
super(MathBlockLexer, self).__init__(rules, **kwargs)

def parse_block_math(self, m):
"""Parse a $$math$$ block"""
self.tokens.append({'type': 'block_math', 'text': m.group(1)})


class MarkdownWithMath(mistune.Markdown):
def __init__(self, renderer, **kwargs):
kwargs['block'] = MathBlockLexer
super(MarkdownWithMath, self).__init__(renderer, **kwargs)

def output_block_math(self):
return self.renderer.block_math(self.token['text'])


class PythonDocxRenderer(mistune.Renderer):
def __init__(self, **kwds):
super(PythonDocxRenderer, self).__init__(**kwds)
self.table_memory = []
self.img_counter = 0

def header(self, text, level, raw):
return "p = document.add_heading('', %d)\n" % (level - 1) + text

def paragraph(self, text):
if 'add_picture' in text:
return text
add_break = '' if text.endswith(':")\n') else 'p.add_run().add_break()'
return '\n'.join(('p = document.add_paragraph()', text, add_break)) + '\n'

def list(self, body, ordered):
return body + '\np.add_run().add_break()\n'

def list_item(self, text):
return '\n'.join(("p = document.add_paragraph('', style = 'BasicUserList')", text))

def table(self, header, body):
number_cols = header.count('\n') - 2
number_rows = int(len(self.table_memory) / number_cols)
cells = ["table.rows[%d].cells[%d].paragraphs[0]%s\n" % (i, j, self.table_memory.pop(0)[1:]) for i, j in itertools.product(range(number_rows), range(number_cols))]
return '\n'.join(["table = document.add_table(rows=%d, cols=%d, style = 'BasicUserTable')" % (number_rows, number_cols)] + cells) + 'document.add_paragraph().add_run().add_break()\n'

def table_cell(self, content, **flags):
self.table_memory.append(content)
return content

# SPAN LEVEL
def text(self, text):
return "p.add_run(\"%s\")\n" % text

def emphasis(self, text):
return text[:-1] + '.italic = True\n'

def double_emphasis(self, text):
return text[:-1] + '.bold = True\n'

def block_code(self, code, language):
code = code.replace('\n', '\\n')
return "p = document.add_paragraph()\np.add_run(\"%s\")\np.style = 'BasicUserQuote'\np.add_run().add_break()\n" % code

def link(self, link, title, content):
return "%s (%s)" % (content, link)

def image(self, src, title, alt_text):
return '\n'.join((
"p = document.add_paragraph()",
"p.alignment = WD_ALIGN_PARAGRAPH.CENTER",
"p.space_after = Pt(18)",
"run = p.add_run()",
"run.add_picture(\'%s\')" % src if "tmp" in src else "run.add_picture(\'%s\', width=Cm(15))" % src,
"run.add_break()",
"run.add_text(\'%s\')" % alt_text,
"run.font.italic = True",
"run.add_break()"
)) + '\n'

def hrule(self):
return "document.add_page_break()\n"

def block_math(self, text):
import sympy
if not os.path.exists('tmp'):
os.makedirs('tmp')
filename = 'tmp/tmp%d.png' % self.img_counter
self.img_counter = self.img_counter + 1
sympy.preview(r'$$%s$$' % text, output='png', viewer='file', filename=filename, euler=False)
return self.image(filename, None, "Equation " + str(self.img_counter - 1))

parser = argparse.ArgumentParser(description='Generate Docx reports using a Docx reference template and Markdown files')
parser.add_argument('output', default=None, help='Output file')
parser.add_argument('--template', default=None, help='Docx template')
parser.add_argument('--files', default="*.md", help='Regex for Markdown files')
args = parser.parse_args()

document = Document(os.path.abspath(args.template)) if args.template else Document()

T = []

for part in sorted(glob.glob(args.files)):
with open(part, 'r', encoding="utf-8") as f:
T.append(f.read())

renderer = PythonDocxRenderer()

exec(MarkdownWithMath(renderer=renderer)('\n'.join(T)))
document.save(os.path.abspath(args.output))
if os.path.exists('tmp'):
shutil.rmtree('tmp')
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mistune>=0.7.1
python-docx>=0.8.6
sympy>=0.7.6.1

0 comments on commit 019e42b

Please sign in to comment.