# Writing math on the web

This is a huge topic that I expect to revisit very often, always in very small chunks.

Question: How can we create, exchange, and publish mathematical writing in a modern, web-friendly way?

Many, many people have written about this question, and there are hundreds of articles and softwares which address it. I will only attempt to address a small part of it now, and to do so, let me break it down into three parts.

(1) how do we create document structure?
(2) how do we include mathematics equations?
(3) how do we make diagrams?

Question (2) has essentially been solved by MathJax. MathJax is great allows us to use the language we are familiar with, $\LaTeX$, and it gives us the kind of output we are familiar with (just take a look around this site). Of course, there is still some room for improvement in the typesetting, portability, copying and pasting, and so on. But I generally assume that this will be addressed by the very effective MathJax developers as web standards become more powerful. That is why I am willing to think of this problem as done.

But Question (1) is particularly frustrating, because on the one hand it seems like we should have the tools to solve it, and on the other hand there is no clear way to proceed. The simplest idea is to try to force $\LaTeX$ to compile to HTML, see for instance tex4ht. Although I do not want to discuss it at length, this idea is hopeless. In short: to support all of $\LaTeX$’s obscure commands and packages, you have to use the $\LaTeX$ binary. But this binary outputs DVI, and by this time all metadata and structural information has been lost.

So let’s leave $\LaTeX$ behind and start from scratch for a moment.

In preparing for this post, I decided to investigate how far we can go using only the tools natively provided by the browser. I was surprised to find just how much can be accomplished. For instance, consider the following code and output:

<theorem>
There exists a prime number.
</theorem>
<proof>
Consider the number $\sqrt{9}$.
</proof>

There exists a prime number. Consider the number $\sqrt{9}$.

If you take a look at the CSS code (below) that I used to do this, you will not find it difficult to imagine how to handle lemmas and propositions, the title, author, section headings, and so on. It’s even possible to make really simple macros, so that for instance <qedsymbol/> will produce a ◻ symbol anywhere.

Moreover, this approach has numerous advantages. First, it is so simple that it can be very easily converted to $\LaTeX$ and back again. Second, it is completely robust in the sense that the code on the left reflects all of the document structure. (This is not the case for instance with markdown, where one would have to manually write *Theorem.* to get the bold theorem heading.) Finally, it interacts very well with WordPress and with MathJax, since there is no additional processing required.

Unfortunately, this idea is still not very powerful. Assuming that users had access only to the HTML code and not to the CSS code, they would not be able to define their own macros. They could not create their own theorem environments either. And of course, it would never support theorem numbering or references.

It is easy to think of various ways of adding all these features by extending the language, perhaps using a more powerful subset XML and its styles, or perhaps using a more natural language. Either way, we are now going beyond what a browser will natively handle, and so we will have to write a lot of code: either on the server-side to distill the language into straight HTML+CSS, or else javascript to interpret and render the language in the browser.

But however we go about it, I think we will have to start making arbitrary decisions very rapidly, and this is where most current approaches become highly specialized and rather impalatable. The right approach is the one that properly balances power with simplicity and portability. That is, it should be user-friendly, compatible with browsers, interchangeable with $\LaTeX$ and other markups, and easily published to epub and pdf.

I’d be grateful for any feedback! (I’m waiting for the inevitable links to people who have supposedly already solved this problem.) Meanwhile, I plan to actually start extending and using the CSS code from my example!

PS: Here’s that CSS code.

theorem, proof {
display: block;
margin: 12px 0;
}
theorem {
font-style: italic;
}
theorem:before {
content: "Theorem. ";
font-weight: bold;
font-style: normal;
}
proof:before {
content: "Proof.";
font-style: italic;
}
proof:after {
content: "\25FB";
float:right;
}


1. Posted December 7, 2011 at 9:54 pm | Permalink

I love it! It’s a fantastic idea, simple elegant, versatile.

(Especially since it should work perfectly with markdown )

It’s also great since it allows anchoring and linking (another thing markdown is missing). Who needs equation numbering when you can link!

Now my question is: can we find an elegant way to make tables? Arrays are one thing (they work in MathJax), but tables..

—-

A thought on the side. Felix once pointed me to oMeta. The idea being: if you’re so eager to change my markup-language, program your own.

Why am I mentioning this? You wrote “can easily be converted back to $\LaTeX$”- -let’s make sure that this is actually the case!

I.e., let’s write a converter along with the development of some examplary css, or alternatively something that handles any non-standard css class by converting to \begin{class} .. \end{class}. I don’t know, we’ll see.

—-

My usual final thought: eat our own dogfood // focus on *writing*. Just yesterday I left a comment at bit-player.org about the fact that $\TeX$’s prowess has left most people confused — they confuse good writing with good typesetting.

• Posted December 7, 2011 at 9:55 pm | Permalink

Oooooooh. That’s creepy what MathJax does with a begin/end environment… I didn’t know that.

• sam
Posted December 8, 2011 at 12:37 am | Permalink

First, why does MathJax do that? It’s a mathjax error, but why?

\begin{theorem}
hi
\end{theorem}

Meanwhile, yes, I totally agree that we should keep a $\LaTeX$ bi-converter as we go. I would say more that it must produce very readable code in both directions. This will force us to keep the language simple and well structured.

As for tables, let me lump that in with figures: I don’t know how to do it! The & symbol in $\LaTeX$ doesn’t seem like a particularly modern idea.

• sam
Posted December 8, 2011 at 5:43 pm | Permalink

It turns out MathJax does support environments, but it seems that the current behavior is just a prototype.

For instance, you can write \newenvironment{brackets}{[}{]}, but you have to write it inside math mode. Then if you say \begin{brackets}…\end{brackets} you get something like $\newenvironment{brackets}{[}{]}$\begin{brackets}…\end{brackets}

Interesting!

2. Posted December 8, 2011 at 7:09 pm | Permalink

There have been projects for doing this for a very long time, the one I remember most is TeXML and the independent but very similar idea DocBook. Since these are SGML languages integrated with XML and be rendered in any browser (with appropriate DTD and CSS).

The main issue is more fundamental. The SGML family is based on the principle of separating the document structure from the document presentation, ideally letting authors worry only about the structure and publishers worry only about the presentation. The TeX family works the other way around: TeX is all about document presentation, LaTeX and conTeXt came much later to add some document structure. The two ideas end up with a similar type of hybrid product for end users, but the two base philosophies suffer some deep incompatibilities that make back & forth translation much harder than it should be.

I think it’s all there by now it just needs to be put together. Equation content is very sensitive to presentation, TeX syntax works very well and MathJax does a great job rendering it. There could be significant improvements, but those will come with time as MathJax (and competitors) grow and evolve. For diagrams, SVG is perfect, except that humans were not meant to write SVG (or just about any other SGML language). We need some kind of front ends for SVG, much like PGF/TikZ or Xy-pic for TeX.

As for document structure, it wouldn’t be too difficult to extend the basic TeXML DTD (or some alternative) in the same way that LaTeX document classes and packages extend the functionality of plain TeX. Publishers (in the very broad sense) could then supply the necessary CSS for document presentation. Of course, nobody wants to write (correct) SGML, so we would need a front end to simplify authorship. Some extended markdown or wiki-style markup would work, or something like the wordpress editor that adds all the predictable tags and attributes for you when you publish your document. The solutions are all out there, but they are not here yet…

This sounds great, but there is one major issue to deal with: Mathematicians have been forcibly accustomed to dealing with presentation issues by TeX et al. It will take serious effort to let go…

• sam
Posted December 8, 2011 at 7:20 pm | Permalink

Yes, I agree that it is all there and needs to be put together. I like the idea of using the SGML/DTD/CSS as a sort of unseen format, between the markup and the output. Thus it is just the markup that needs to be bi-compatible with LaTeX, and not necessarily the XML.

I’m thinking that wiki and markdown are good, but very structure-forgetful. We need something that is structure-preserving, but not as inhuman/ugly as a full-blown XML.

I never got around to discussing diagrams, but you are completely right that SVG is where it’s at. Of course, MathJax needs to go inside SVG and convert dollar signs (or maybe it already does). I believe PGF/TikZ does support SVG output, but I’m not sure how to actually do it! I wonder if there is any in-browser demonstration of this anywhere.

3. Posted December 8, 2011 at 8:20 pm | Permalink

[Note: I did not see Francois' comment before writing this one, so please excuse some redundancies.]

I think abandoning LaTeX to describe document structure and using a more modern, more flexible and more web centric data format instead is exactly the right way to go. I also agree that Markdown does not offer enough structure here (even though alternatives like Textile are a bit more powerful in that regard). There a couple of points I would like to add, though.

1) I think it is imperative that the published document should conform to a widely adopted standard. (The document that the author writes is a different matter though!) The new XML tags you suggest do not lead to a valid HTML document (even though CSS can cope with them). This is no fundamental problem, however: Your new tags only have to be qualified with a new namespace (and, ideally, associated Doctype and XML Schema definitions).

2) XSLT is the standard transformation language defined by the W3 Consortium to handle the kind of transformations you have in mind, e.g., creating properly formatted Theorem environments from theorem tags, including cross-references and numbering! Modern browsers support XSLT in the same way they support CSS. This means that you can have the browser transform your HTML + custom extensions document into a valid HTML-only document on the fly.

XSLT plays the same role for XML that LaTeX packages play for LaTeX. The difference is mainly that XSLT is much more sophisticated. For example, under certain conditions, it is possible to ensure in advance that the transformation process will not “throw errors” and that the result will conform to the desired document type definition (i.e., that the result will be valid HTML). Also, when using XML namespaces properly, you can guarantee that two different XSLT stylesheets will not interfere with each other.

The drawback of the XSLT transformation language is that it is extremely verbose. Just like MathML, it was simply not meant to be written by hand. You need a development environment targeted specifically at XSLT – which is why this technology never reached the widespread adoption that CSS enjoys.

In summary, I think that while XSLT is the “right” solution for this type of problem, it is not practical for everyday use, in that users cannot “quickly” write a custom environment. Developing a large repository of standard XSLT stylesheets for authors to use (similarly to standard LaTeX packages) might be an option, and there are a number of projects that take this route. The most prominent one is Docbook, which does not target mathematics however. (I will see if I can dig up links to the more math-centric projects later today.)

3) I think customizability for the author is crucial, so I do not think that, ultimately, XSLT is the way to go. The way out is to use a custom file format for authoring and then transform the result into a standard format for publication (either manually, by the author, automatically on the server or on the client via JavaScript). There is a large variety of software that attempts to do this through various combinations of the following two approaches:

* Use a compiler to transform plain text input.
* Use a rich editor to ease the editing of complex documents.

In the following, I want to list just a few of the projects that are out there. All of them are very valuable, but I do not think that there is a perfect solution to this problem yet.

i) Skribe is one of the math-y projects. Here, the document is represented as a variant of a Scheme S-expression (which is a much more lightweight syntax than XML), which is then tranformed to HTML using a solid standard library, which can easily be extended by the author using Scheme functions. This is a great solution, if you want to commit yourself to a custom document format that you edit via a plain text editor. Downside: not actively maintained.

ii) TeXmacs is a great editor that is both structural and WYSIWYG at the same time. (You have to try it to believe it.) It exports to HTML, LaTeX, a custom XML format and a Scheme S-expression. A plugin exporting directly to WordPress could certainly be written. This is a great solution, if you are ready to commit yourself to a custom document format and want to use a rich editor instead of editing plain text. Downside: community is relatively small.

iii) I myself am working on an editor called Qute that is going to see a massive overhaul in the next few months. The idea here is to allow the user to switch between plain text editing and a higher-level presentation at will. In the next version I plan to integrate OMeta, allowing the user to use whatever document format he or she prefers and, moreover, to customize that document format in any way. Philosophically, I think this is the way to go: Empower authors to customize the language they use for writing by using modern text translation tools. Downside: Qute is certainly not ready for primetime yet – I am looking for collaborators, though

So much for the math-centric projects. But this list would not be complete without mentioning how the Web 2.0 people generate HTML.

iv) In modern web development, HTML is (of course) not written by hand, and it is not generated by XSLT. However, templating tools such as PHP or Django are not the latest fashion either. Instead domain specific languages that are subsets of the host language (either on the server or on the client side) are used to generate HTML from application specific data. Examples include the way HTML is a first-class citizen of the Scala programming language or the way Hiccup generates HTML directly from Clojure code. There are also a large variety of client side JavaScript libraries for templating. Some of them are related to compilers that transform standard functional programming languages (which lend themselves to the creation of domain specific languages, e.g. for writing mathematical texts) such as Clojure, OCaml or Haskell to JavaScript.

There are is a lot of software I have not mentioned yet. But I guess I will stop here, because this comment is far too long already.

Bottom line: I think the idea of using a format made up of Markdown/Textile for formatting, TeX/MathJax for math and custom XML for environments is very viable, and using the right combination of CSS + XSLT stylesheets, WordPress plugins and JavaScript, you can certainly produce standards compliant webpages that have all the features of LaTeX documents (and more). In this comment, I mainly wanted to point out all kinds of software that try to solve similar problems. Nonetheless, I think the problem of writing math on the web is far from solved, and I am looking forward to see what comes out of your experiments!

4. Posted December 9, 2011 at 12:47 pm | Permalink

Arg. I wrote a long comment and lost it by crashing my laptop… Ok, quick one then.

First, check out http://jsxgraph.uni-bayreuth.de/wp/ to see what’s possible right now (and say “wow!” — it even works in epubs on the ipad).

Second, I think “presentation” is a thing of the past. That is, for authors; where it’s needed it should be left to the professionals, i.e, typesetters, web-designers, ebook-specialists etc. But authors should stop thinking about presentation.

What do I mean? I think the new question is: how can my two proverbial readers read what I write any way they want? Tiny smartphone screen, huge display, ebook, print, whatever. In other words, I want to author my content regardless, trusting my mature, competent readers to know how they want to present my content.

Finally, PGF\TikZ etc. I’m a burned child — I loved TikZ until I realized that my awesomesauce typesetting couldn’t replace poor mathematical writing.

It seems to me that, historically, TeX produced many incredible tools only because there were no tools outside of TeX. For example, function plotting. A lot of (outspoken) people seem to think that because you can do everything in TeX you should do everything in TeX.

I think there are very few graphical things that should be done in TeX (commutative diagrams perhaps). As the internet saying goes: do what you do best and link to the rest. Graphics should be produced by R, octave or manually with inkscape — and then included. Why? Because if you use the right tools, you can produce Felix’s poster.

And now I’ll hit “post” before my battery dies again…

5. Samuel Coskey
Posted December 26, 2011 at 8:17 pm | Permalink

Hey Felix!

Sorry I never responded to this post. It got busy at the end of the semester and I’m just now catching up on all the details from this discussion.

Let me start with your bottom line. Thank you for the addition of XSLT into the discussion, I completely agree that some combination of CSS, XSLT, and javascript will solve this problem. Although you rightly point out that many are working on similar problems, I really don’t see that anybody is working on such a project. The closest is indeed docbook, so I will have to see what I can learn from it.

What I mainly have issue with is your point (3), customizability for authors. Of course this should be possible, but in my opinion, only by extending a base language in simple and expected ways. And the simple reason for this is collaboration. Two authors should be able to quickly pass source code back and forth. This is usually possible in latex, as long as our personal definitions don’t clash (or using namespaces as you suggest).

As such, I agree with your partial-conclusion that XSLT+CSS+js is basically what I need to use now. I really like the idea of creating a library of mathematical stylesheets, so that users can include them without reading or understanding them!

I guess the biggest problem is allowing users to extend the standard stylesheets simply and easily without learning anything advanced. I suppose javascript could convert some of the markup into additional XSLT directives. For now, I’m at work only on the very very basics.