Author Archives: Peter Krautzberger

Bonus round: Why I care about native MathML

[This is week 7 of the challenge but really a post to make up for dropping the ball on week 5.]

Last week I wrote about why I care about MathML in general. Given that I work for a project that serves as a MathML polyfill, it’s worth while to to point out why native implementations matter; they matter an entire alot of mattering.

A while back, Alex Miłowski asked me for some quotes about how native MathML implementations are important so luckily I can copy myself here.

It’s important.

Some people say, “few people on the web need MathML support.” This is true. Just like saying “few people need children’s clothing”.

Why is MathML important? Education, education, and education. Mathematics is a core skill and a vast amount of educational time and effort is spent on teaching children and adults to understand and apply math & science. Very soon, HTML will be the dominating delivery method for educational content across the world. This means mathematics must be HTML, viz. MathML.

HTML rendering should be native

Where should HTML rendering be implemented? In the browser!

MathML has been HTML from its inception and after a (forced) XML-detour, MathML is back where it belongs: a part of HTML5. MathML layout is core HTML functionality, widely used in everything from web communities to professional publishers to educational startups. HTML and thus MathML rendering belongs in the browser.

Performance

While browser vendors show great interest in enabling polyfills to behave like native implementations, polyfills implementing layout standards (MathML, Flexbox etc), in the end, will not achieve native performance. The reason is simple: layout polyfills simply enter too late in the game — after the browser layout is done, at a point where the user expects content, not additional rendering delays. Moore’s Law helps a little but, ultimately, performance issues will prevent math and science from fulfilling their potential on the web.

Robustness

Even the most advanced polyfilling technology will remain a JavaScript solution. This increases the risk of problematic interactions with regular scripts for design, user interaction, and styling. Native support will always be more robust for web developers and consumers.

Ubiquity

Even the most ideal polyfill will require a conscious choice of the web developer to load it. This poses a grave restriction for end users and the emergence of new platforms for math and science on the web. From webmailer, to web based authoring, to social networks, all of these could turn out to be highly productive platforms — but it’s unlikely their developers will consider adding a polyfill for a perceived niche. With native MathML rendering, rendering MathML would be universal.

The Future

The web has revolutionized how we communicate. Not by magic but because thought leaders continually push the envelope, building new tools and platforms that transform how we work, speak, and think. These innovations feed back into standards development, enabling everyone to benefit and restarting the process, pushing us further.

MathML 3 captures traditional mathematical typography. Thanks to polyfills, we get a glimpse of how MathML might develop, how it can revolutionize the communication and dissemination of scientific knowledge. Yet without native implementations of MathML 3, we will never see MathML 4, 5 or 10, and the opportunities this will open up.

It took 50 years from Gutenberg’s printing press to the first typeset mathematics book. We’re 25 years into the web. Do we wait another 25 years or can browser vendors finally invest 1-2 developer years to get us there?


Update.

First, I changed the embedded video; it was previously this one.

Second, over on Google+, Harald Hanche-Olsen asked about the claim that MathML is a huge success. Here’s what I responded with.

Re success of MathML. Today, almost all equational content is stored as MathML. This is because almost all scientific (including mathematical) publishers have switched to XML workflows for production and archival where MathML fits in very naturally; similarly most technical writing (e.g., aerospace) is done in XML workflows.

For authoring, it’s a bit more complicated. It is similar to, e.g., vector graphics where applications such as Adobe Illustrator have their own formats but when you save vector graphics for re-use you’ll most likely export to SVG.

As I mentioned, there’s definitely the need for a professional-grade, open source pure MathML editor (ideally HTML5). The only one I know of is MathFlow. But if you have ever used MathJax then you have authored MathML — it’s how MathJax works: convert any input to MathML and then leverage our MathML rendering engine.

Similarly, lots of other tools are able to output MathML — besides converters from TeX (such as LaTeXML or tex4ht), Microsoft Word Equation editor can export to MathML, as does Open Office Math editor, MathType, MathMagic, the Windows Math Input Panel (handwriting recognition), MyScript (ditto), Maple, Mathematica and virtually any other tool you might have authored serious equational content in. (Oh well, I should’ve simply linked to http://www.w3.org/Math/wiki/Tools#Authoring_tools which I recently set up.)

Of course, Word is the big reason why most scientific and educational content ends up providing MathML. I don’t claim (or believe) that people are aware of most of this which was one of the reasons I wrote about it.

Why I care about MathML

[This is week 6~7? Mpf, I missed one (and a half?), bummer. I’ll try to make up for it.]

When I started this writing challenge, I had listed a couple of potential blog post titles. One of them was “Why you should care about MathML”. I realized later that I really didn’t want to pretend I could even try to tell my two readers what they should or should not care about. Instead, I want to jot down (remember: 30mins time limt) a few reasons why I started to care about MathML, alot.

I care about this Alot

© Ellie Brosh

Unsurprisingly, it was in many ways a story of my education. Here are two quotes from yours truly.

I think MathML is so far the best solution to present mathematical content on the web
actually me, Dec. 2009

Actually, more stuff wrong on my post; also, referencing Terry Tao’s blog, weird.

But mathml sucks […]
also actually me, Feb. 2011

(In my defence, I probably meant authoring tools and browser support.)

So as you can see, I flip-flopped a bit there (and, in a fundamentally different way, I still do). So here are five short reasons why I care about MathML.

a stable exchange format

When I started using MathJax on a personal blog (thanks to the above quote I realize I started blogging 5 years ago this month, (local copy), although I think I started to blog a year ealier on scivee.tv (though this seems lost)), I was first annoyed and then very happy to not use macros. Obviously, you can use macros with MathJax but I started to avoid personalized macros at all costs. Ultimately, they prevented me from writing mathematics elsewhere and they limited re-use of my writing by other people (well, ok, that’s more hope than reality I suppose).

MathML does not suffer any of these complications (well, technically Content MathML could if anyone used it). Instead, MathML provides a truly stable format for storing equational content while still allowing for re-use. Granted, it’s not exactly easy to write by hand but neither are SVG or HTML/CSS (certainly not as soon as you want to express something more complex). Still, I’d encourage anyone to spend some time with it (e.g., try copy-editing a random piece of MathML and compare that to copy-editing some macro-filled LaTeX horror). In any case, creating MathML is straight forward, especially for those knowing LaTeX syntax (even if we could use a a good open-source MathML editor). Ultimately, MathML is more readable in isolation thanks to its nature of being actually a mark-up language and not a programming language.

a focus beyond research

What struck me early on was how successful MathML was outside of research. Research mathematicians (and scientists) tend to think their habits are vital for the longevity of mathematical writing. However, technical writing (such as industrial (think aerospace) documentation), engineering, and most importantly school-level mathematics are arguably more important — and have benefited enormously from a mathematical markup that is easily handled by researchers and non-researchers alike. MathML has brought high quality rendering together with easy authoring to an incredibly wide and diverse community; a huge accomplishment.

accessibility, for real

What I also learned early on (in crass contrast to my 2009 self above) was that MathML has turned out to be critical for having truly accessible mathematics.

Of course, TV Raman’s AsTeR voiced TeX/LaTeX long before MathPlayer, ChromeVox or VoiceOver voiced MathML. But besides the refinements (which later tools could so easily provide), the notion of accessibility stretches far beyond voicing and visually impaired users. Features like synchronized highlighting would be much harder in TeX (just think about identifying subexpressions in a complex TeX macro, let alone in poorly authored TeX) but they are critical for helping people with learning or physical disabilities. Even more advanced features like summarization and semantic analysis are much more straight forward in a markup language like MathML than in TeX. And so is search whose importance can hardly be overstated in times of ever increasing publication pressure; without search mathematical knowledge won’t be accessible to us in the long run.

the DOM (etc)

The main reason why MathML is irreplaceable on the web is its compatibility with the DOM. This allows web developers to apply the full breadth of their tools to make mathematical content truly native instead of copying print-based layout. We cannot re-invent everything as Knuth did because web “typography” is far from finished and communicating on the web will probably change drastically every couple of years for the foreseeable future (just like communicating using the printing press did in another age). Having a naturally fitting technology allows mathematics to continually evolve its expression alongside other forms of expression on the web — an incredible benefit (and challenge!).

an open future to revolutionize how we “speak” mathematics

This leads me straight to the last and probably main reason why I care for MathML. What the web has already done for regular language (all over the world), it can do for the language of mathematics: transform the way we communicate; expand, enhance, deepen, and lighten the way we express mathematical thought. You don’t have to be Bret Victor to believe that in 30 years we will have developed new forms of expressions that truly leverage web technology and eliminate baroque limitations of black-and-white, print layout. We should strive to do so much better and I believe MathML is an important step in this direction.

#dotAstro FTW

[This is week 4 of the challenge. woohoo.]

Today I only have ~15 min. This week, I happen to be in Chicago for dotAstronomy 6. This might be odd since I’m not an astronomer (nowhere near in fact). It is actually an immense privilege, though, since I’m part of a small group of invited interdisciplinary participants (also including biologists, climate scientists and library scientists). So my perspective is that of an outsider and I hate to admit it: it’s what I suspected all along.

That is, ever since running into the dotAstronomy website a few years ago, I have been a little envious. I kept thinking “This sounds incredibly fantastic. How could we do something like this for mathematics?” Until today I could at least pretend that it couldn’t actually be as great as it appears. Because nothing is, right?

My two readers won’t be surprised to hear it: I was wrong. dotAstro is every bit as exciting, enlightening, creative, and savvy as I had hoped. A fantastic group of scientists from all walks of scientific life, including “recovering” researchers who have been led to non-standard careers while retaining a deep, nay fierce enthusiasm for their field as well as for the untapped potential offered to scientific communities by the web. This first day has been a perfect mix, starting with excellent talks, switching to amazing lightning talks, followed by an exhausting-because-engaging unconference sessions, and finally some great conversation at the pub (including perfectly greasy US bar food).

Luckily, I don’t have to bore you with my notes but can simply point you to the live-blogging of the first day by @vrooje. In case my notes go up in flames, I could probably reconstruct half of it from the Twitter hashtags of the unconference sessions I attended, i.e.,

Now I’m exhausted but excited for tomorrow’s hack day.

On reading and writing and silence

[week 4 of the challenge. It’s time for a quick post to catch up after last week’s delay.]

As you know, this blogging challenge of mine is based on the observation that I would like to write more. And then Jeff Atwood reminds me in this interesting piece that

we badly need to incentivize listening

which makes me wonder if my natural tendency to let things brew for ages might not be a good thing. This blogging challenge will invariably show if I’m actually able to write in decent quality under tighter constraints. (Right now, I’m not so sure.) So perhaps I will have to realize that silence is golden.

On a related note, in recent months, I was forced to think about my comment “policy”. This hadn’t really come up before since I get very few comments and even fewer from strangers. But I think I should point out that nobody leaving a comment should expect said comment to be posted. Similary, nobody should expect a comment that has been posted to stay up (especially if gets posted automatically after I’ve allowed a comment in the past). Finally, nobody should expect me to reply to a comment even if I’ve replied to other comments and even if that happened in the same thread.

This policy has very little to do with trolling, actually, but more with off-topic comments and comments on ancient posts documenting how things have changed (I’m so surprised! not). It’s also related to a different point: I’m probably switching off automated comments at some point next year (ooooooh, something will change, hint hint).

The number of worthwhile comments I get is roughly 1 per month (vs 5-10K of spam). So instead of a comment sytem, I’ll figure out some way you can quickly send me a comment and then I will add it manually. This move is not just laziness about dealing with spam (it will be slightly more work, I suspect) but also reflects the fact that I consider your comments to be additions to the content, not separate from it. This does not mean that a comment needs to be serious, of course — silly comments are just as (more?) (more!) relevant to me, so I hope people will keep’em coming.

LaTeX Something Something Darkside

[This is week 3 of the challenge. Ok, I’m stretching “every week” a bit here. I blame somebody’s first cold or alternatively Turkeys. Also, I cheated; this took longer than 30mins.]

Darth Vader/Stewie: Oh, come on, Luke, come join the Dark Side! It’s really cool!
Luke/Chris: Well I don’t know. Whose on it?
Darth Vader/Stewie: Well um… there’s me, the Emperor, this guy Scott. You’ll like him, he’s awesome…


Where my previous post was more about TeX-like syntax, this is about TeX/LaTeX proper. If you’re a TeX/LaTeX enthusiast, don’t go all crazy on me (I mean, have you seen my thesis?). This is about me feeling a growing awkwardness towards TeX/LaTeX. And this has little to do with TeX/LaTeX itself.

If all you have is a hammer, everything looks like a nail

TeX/LaTeX is a tool. It is a tool designed by Knuth to solve a problem in print layout. The trouble is: print is becoming less and less relevant and I think this holds for most TeX users (when was the last time you went to a library to look at the printed copy of a current journal issue?). What is not obsolete is PDF and TeX is, of course, very good when it comes to generating PDF.

However, this “Portable Document Format” is really quite useless in the one place where people consume more and more information: the web. (I admit I’m of the conviction that the web won’t go away; crazy talk, I know.) And for the web, TeX/LaTeX is the wrong tool. Yes, there are about a gazillion projects out there that try to bridge that gap, try to create HTML out of LaTeX. But if you try them out you’ll soon notice that you’ll have to restrict yourself quite a bit to make conversion work.

Turn this around and you’ll realize that the community as whole has a serious problem: almost nobody writes TeX/LaTeX that way which means almost all TeX/LaTeX will never convert to web formats well. To put it differently, there’s a reason for a large market of blackbox vendors that specialize in TeX to XML/HTML conversion for professional publishers (and this often involves re-keying).

This is, of course, in no way a fault of TeX/LaTeX itself which was designed for print, in 1978. But it is a problem we are facing today.

Everything is nothing

Now TeX is Turing complete and this means we can do everything with TeX (even toast). So a universal output for the web is theoretically possible. However, everything is nothing if we can’t make it practical. Perhaps one day, we’ll be lucky to find another Leslie Lamport who will give us “HTMLTeX”, i.e., a set of macros that work and rapidly become the de-facto standard for authors. I doubt it. (And not just because I know mathematicians who don’t upload to the arXiv because their ancient TeX template won’t compile there.)

I doubt it because there’s no problem to solve here. Where Knuth (and Lamport) solved imminent problems, there is no problem when it comes to authoring for the web — a gazillion tools do it, on every level of professionalism. TeX is neither needed for this nor does it help.

Waste of resources

“The best minds of my generation are thinking about how to write TeX packages.”
— not Jeff Hammerbacher.

Another part of my awkwardness towards TeX/LaTeX these days lies in the resources the community invests in it. It feels like every day, my filter bubble gives me a new post about somebody teaching their students LaTeX. These make me wonder. How many students will need LaTeX after leaving academia? How many would benefit from learning how to author for the web?

And then there’s actual development. How many packages on CTAN are younger than 1/2/5 years? How many of those imitate the web by using computational software in the background or proprietary features such as JS-in-PDF (and who on earth writes a package like that)?

To me, this seems like an unfortunate waste of resources because we need people to move the web forward. If we remain stuck in PDF-first LaTeX-land, we miss a chance to create a web where math & science are first class citizens, not just by name but by technology and adoption from its community.

If only a part of the TeX/LaTeX community would spend an effort on web technologies like IPython Notebook, BioJS (or even MathJax) it would make a huge impact.

Professional?

This brings me to my last awkward feeling about LaTeX for today which comes on strongly whenever somebody points out that LaTeX output is typographically superior.

I understand why somebody would say it but once again LaTeX is a merely tool. The reality of publishing is that almost all LaTeX documents are poorly authored, leading to poor typesetting. In addition, actual typographers will easily point out that good typography is not limited to Knuth’s preferences enshrined in TeX.

So while I can understand why somebody would claim that their documents are well typeset, this is not very relevant. As long as we cannot enforce good practices (let alone best ones), the body of TeX/LaTeX documents will remain a barely usable mess (for anything but PDF generation).

On the other hand, publishers demonstrate every day that you can create beautiful print rendering out of XML workflows, no matter if you give them TeX or MS Word documents. Even MS Word has made huge progress in terms of rendering quality and nowadays ships with a very neat math input language, very decent handwriting recognition and other useful tools.

The web is typographically different. On the one hand, much of its standards (let alone browser implementations) is not on the level of established print practices. On the other hand, its typographic needs are very different from print for many reasons (reflow, reading on screens etc). And even though some of print’s advantages will eventually be integrated, I suspect we will develop a different form of communication for STEM content on the web than we have in print because we have a much more powerful platform.

Ultimately, PDFs have stopped looking professional to me. Instead, Felix’s recent slides, Mike Bostock’s “Visualizing Algorithms”, and Bret Victor’s Tangle are examples where you’ll see my face light up, thinking about how we can build authoring tools to turn these experiments into tools for the average user.