TeX Math Support #92

tajmone · 2022-11-04T08:47:11Z

tajmone
Nov 4, 2022

While working on the pandoc to PML writer I had a chance to explore all the features supported by pandoc documents, including those which don't currently have a counterpart in PML. One of these features is math notation.

Pandoc is a tool widely adopted by academics for writing scientific papers, which is why pandoc invested so much energy on features like bibliography and math notation.

Pandoc supports TeX math in most document formats, and offers a variety of rendering and sub-formats options for HTML rendering.

Markdown Math Markup: Inline vs Display

The markdown notation is to enclose inline math between two $ symbols, and double dollars $$ enclosure for display math (no whitespace on content's edges). The math notation contents then passed through verbatim to the output document, substituting the dollars with the appropriate delimiters for the target format.

This notation actually also works here on GitHub, for example the follow markdown code:

$$
\begin{vmatrix}
  a & b\\
  c & d
\end{vmatrix}
=ad-bc
$$

is rendered as:

$$ \begin{vmatrix} a & b\\ c & d \end{vmatrix} =ad-bc $$

Math Notations & Math Rendering

How each output format handles the math contents is format dependent, and for HTML the user can pick different renderers via CLI options: MathJax, MathML, Webtex, KaTeX, or GladTeX.

MathJax seems to be the common choice for HTML rendering, unless there are special needs that require using another renderer.

Although I don't understand the mathematical notation and the symbols used in these formulas, and I have no experience with Tex, from what I've understood so far is that although most of these notation were born in the "Tex world" they have now taken a life of their own, being adopted by different languages and tools.

So, the reason for multiple math notations seems to be related either to different target domains (formulas conventions) and/or independent development of alternative notation standards. Either way, these notations are treated the same by pandoc, i.e. they share the same markdown notation when it comes to marking them within the source text, with a format to represent inline math and another one for display math (which, I believe, are two different ways of representing formulas complexity, the former allowing them to flow inline with the text, the latter treating them as figure blocks).

Basically, all formats preserve the math notation contents verbatim, each format differing only in the adopted notation to delimit them (always making a distinction between inline and display math). The way these math formulas are rendered in the final document depends on how the conversion tools of each target format work: some formats might use external tools or dedicated extensions to process them, whereas formats like HTML might rely on a JS library to render them in the browser.

In many respects, this is similar to how diagrams notations work, i.e. an attempt to represent via ASCII chars complex contents that will either be represented with advanced Unicode format or rendered into images.

PML Current Status

For my current pandoc PML writer, it looks like my best choice here is rely on the [verbatim node, and just pass them through using the standard HTML delimiters expected by the MathJax rendering library, which is a JS script that can be linked into the document via a CDN, and which will then parse the HTML doc to detect the math contents and render them either as Unicode or SVG images (depending on browser supported features).

Math notations in HTML only to be enclosed within the expected delimiters, i.e.:

$..$ for inline math.
\[..\] for display math.

The MathJax library will then render them as needed.

Future PML Support for Math

Although the [verbatim node solution proposed above works fine for documents targeting HTML, it won't work with other output formats.

So, ideally PML should implement a [math node (or one for math and another for inline display, e.g. [math and [dmath, since the former is an inline node, and the latter a block node), and provide attributes to control the notation used in each node.

Since the topic of math notation is fairly, and there are many different notations and rendering tools, this feature might require a bit of research in order to chose wisely the node names and their attributes, especially in view of future output formats supported by PMLC, and how custom scripts and extensions might need to manipulate these nodes. But overall, it seems a quite straightforward issue: you only need to distinguish between the inline and display node types, and pass them through to the final document verbatim, enclosing them with the correct delimiters, and applying any escaping rules required by the output format.

Initially, PMLC could support math in HTML documents by simply linking into the generated HTML document the required JS libraries when math nodes are found in the document (and according to formats involved, since there are multiple such libraries to cover the different formats). This shouldn't be too hard to implement, but you need to ensure that injection of the JS library dependency will be preserved in documents that use custom templates.

In the course of time, you might even consider looking into the various TeX math libraries and tool and have PMLC render these formulas as SVG images instead, so that the final document is fully standalone (you could even embed them via URI Data). I'm pretty confident that there are some Java tools to handle Tex math rendering, and with the GraalVM you might even use existing libraries in scripting languages that are supported.

Having dedicated math nodes means that their contents will work with any output format that PMLC will support in the future, not just HTML.

In any case, I think that support for math formulas is something that is highly valued in the documentation field, and its adoption will greatly enhance the value of PML as a writing tool. In software engineering articles, especially when describing algorithms, these formulas are needed, which is why LaTex became so popular among software engineers, and why pandoc is so highly appreciated among that user base.

Reference Links

tajmone · 2022-11-04T10:12:20Z

tajmone
Nov 4, 2022
Author

Lua Writer Success with TeX Math

Just wanted to inform you that now the pandoc to PML converter supports rendering TeX math elements to PML.

You can even test it yourself, with a full working example:

math.markdown — pandoc markdown source.
math.pml — output PML document.

I don't have a LiveHTML Preview link, since HTML output files are Git-ignored in the repo. But you can easily build it yourself via rake pandoc.

In the test file, I simply used a raw HTML block to load the MathJax dependency, and it works perfectly.

You can read more details about this in the Status document of the pandoc writer:

STATUS.md » TeX Math Support

0 replies

pml-lang · 2022-11-07T05:30:57Z

pml-lang
Nov 7, 2022
Maintainer

Thanks for this great and useful overview!

In many respects, this is similar to how diagrams notations work, i.e. an attempt to represent via ASCII chars complex contents that will either be represented with advanced Unicode format or rendered into images.
PML Current Status

Exactly!

Besides maths, we should also natively support diagrams notations. IMO a good first choice would be plantuml (available as a .jar file that could be bundled with PMLC).
In Executing OS Programs there is an example of how script nodes can be used to embed plantuml diagrams and mathematical formulas, but maths and diagrams surely deserve dedicated, native PML nodes to make this more convenient for end-users.

ideally PML should implement a [math node (or one for math and another for inline display, e.g. [math and [dmath, since the former is an inline node, and the latter a block node), and provide attributes to control the notation used in each node.

Yes!

A first (easy to do) implementation would be to use two nodes (inline and block) and use MathJax to do the rendering (similar to the way PML currently supports highlighters).

this feature might require a bit of research in order to chose wisely the node names and their attributes, especially in view of future output formats supported by PMLC, and how custom scripts and extensions might need to manipulate these nodes

Indeed, that's the challenge.

But we can start with MathJax, and then refine to provide more options.

Having dedicated math nodes means that their contents will work with any output format that PMLC will support in the future, not just HTML.

Yes.

0 replies

pml-lang · 2022-11-07T05:32:04Z

pml-lang
Nov 7, 2022
Maintainer

now the pandoc to PML converter supports rendering TeX math elements to PML.

Awesome !!!

I simply used a raw HTML block to load the MathJax dependency

That's very clever!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TeX Math Support #92

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

TeX Math Support #92

tajmone Nov 4, 2022

Markdown Math Markup: Inline vs Display

Math Notations & Math Rendering

PML Current Status

Future PML Support for Math

Reference Links

Replies: 3 comments

tajmone Nov 4, 2022 Author

Lua Writer Success with TeX Math

pml-lang Nov 7, 2022 Maintainer

pml-lang Nov 7, 2022 Maintainer

tajmone
Nov 4, 2022

tajmone
Nov 4, 2022
Author

pml-lang
Nov 7, 2022
Maintainer

pml-lang
Nov 7, 2022
Maintainer