# Publishers should invest in browser development (a comment at the scholarly kitchen)

On a slightly different note. Despite many investments in typesetting technologies in (and of) the past, publishers are investing very little in the primary typesetting technology of the future: HTML rendering engines.

A good example (though I’m biased) is MathML, the W3C standard for mathematical markup. Despite being used in XML publishing workflows for over a decade, and becoming part of HTML5, no browser vendor has ever spent any money on MathML development. Accordingly, browser typesetting “quality” is highly unreliable (unless you use MathJax — which is where I get my bias from).

Trident (Internet Explorer) has no native support (but the excellent MathPlayer plugin), Gecko (Mozilla/Firefox) has good support thanks to volunteer work and WebKit (Chrome, Safari, and now Opera) has partial support — again solely due to volunteer work. (Unfortunately, only Safari is actually using that code; Google recently yanked it out of Chrome after one release.)

This isn’t surprising from a business perspective — for the longest time, there was simply no MathML content on the web. But of course, this was a chicken-and-egg problem: no browser support => no content => no browser support => … And it ignores the impact MathML support would have on the entire educational and scientific sector where it would enable interactivity, accessibility, re-usability, and searchability of mathematical and scientific content. (Including ebooks — MathML is part of the epub3 standard.)

Now you might say MathML is just math, a niche at best. But very likely its success will determine if other scientific markup languages will become native to the web — languages like CellML, ChemML, and data visualization languages. These will probably see even less interest from browser vendors but will have enormous relevance to the scientific community.

Right now, scientific publishers (in my experience) have neither expertise nor interest in browser engine development. Unfortunately, they also don’t put pressure on browser vendors to improve typesetting (whether scientific or otherwise). That’s very short sighted, I think. Given that Gecko and WebKit are open source, a joint effort of publishers could very well fix things — and show the community that publishers have their eyes on the future rather than the past.

# How to include MathJax in an epub3 file to work with iBooks (and possibly others)

At the Joint Mathematics Meetings Present and Future of Mathematics on the Web session, Lila Roberts presented an excellent demo of the good stuff you can do with iBooks author. The demo included MathJax and jsxgraph, and combined both with iBooks Author’s easy, pretty layout tools. Of course, the drawback is that iBooks Author is

• a proprietary format
• restricted to iPads (not just iOS)
• you’re not allowed to sell an iBooks Author file except through iTunes.
• iBooks Author is not transparent about how its formula editor produces SVGs out of TeX but pastes MathML directly into a page, leading to inconsistent renderings of equivalent mathematics
• MathML support of iBooks on iOS5 devices is severly broken (and will likely never be fixed) thanks to a mobile Safari bug that screws up the use of STIX fonts.

Anyway, I mentioned in the session that you can actually include MathJax in epub3 files directly to get much of the same. Well, you have to do the pretty layouts yourself and you’ll depend on a javascript-enabled epub3 reading software (like iBooks) but at least you’re using an open standard and retain your rights.

## Let’s get started!

If you’re lazy, grab the file at the end of the post and hack from there. But I’ll walk you through it.

• If you want to learn something, grab a copy of MathJax
• slim it down as described here
• I went all the way and restricted output to SVG — to minimize things and to make it work. HTML output should work on iOS5, but last I checked Apple changed something on iOS6 that I couldn’t track down for lack of devices.

Alright, that’s the basics. You now have a copy of MathJax that works on any reasonably recent webkit browser, including most Android and iOS versions.

You have all inputs (LaTeX, asciimath, MathML) available but only SVG output (well, and native MathML but if that worked we wouldn’t be here…).

## What’s next?

Create your document. That’s actually hard if you don’t have a workflow already and don’t want to afford InDesign, Blue Griffon etc.

Personally, I will always try pandoc first. It’s the most versatile tool there is and John McFarlane is just fantastic. Its TeX implementation is enough if you are writing TeX with HTML/epub output in mind, I’m sure you won’t run into trouble.

If you can, consider to go through the Haskell-cabal-pain of installing the current development version — see the instructions at the pandoc github wiki. That will get you the new epub3 writer and things should be easy.

Of course, you can hack the example file below and just use a current version of pandoc or whatever you like to generate some xHTML5 (yes, xhtml, not html if you want your file to validate). You’ll have to modify the manifest etc by hand.

Anyway, let’s daringly assume you have an epub3 with your xhtml+mathml content.

• add the slim down version of MathJax to your epub file using your favoriate tool for adding content to a zip file. (Don’t unzip/rezip unless you know what epub needs when zipping…)
• Assuming you’re using the copy as in the attachement, add the following to your manifest (modify paths and id’s if needed)
• To each xhtml file that contains MathML, add
<script type="text/x-mathjax-config"> MathJax.Hub.Config({ jax: ["input/TeX","input/MathML","output/SVG"], extensions: ["tex2jax.js","mml2jax.js","MathEvents.js"], TeX: { extensions: ["noErrors.js","noUndefined.js","autoload-all.js"] }, MathMenu: { showRenderer: false }, menuSettings: { zoom: "Click" }, messageStyle: "none" }); </script> <script type="text/javascript" src="../mathjax/MathJax.js"> </script>
• I have not activated automatic linebreaking because there’s currently a bug in MathJax on iOS6. If MathJax detects the need to break the line, you’ll get Math Processing errors instead.
• For each xhtml file with the above we’ll have to modify the properties-part in the manifest to have both mathml scripted, e.g., in the sample file you’ll see
• <item id="c3" media-type="application/xhtml+xml" href="xhtml/ch1.html" properties="mathml scripted"/>
• And then you can include wonderful MathML and even webkit deficiencies or the horrible iOS5 Safari+STIX bug will be meaningless to your epub file and you can actually publish a mathematical epub file to be read on iBooks.
$\stackrel{^}{x}+\stackrel{^}{xy}+\stackrel{^}{xyz}.$

This text is available as an epub3 file which includes MathJax and should run on iOS devices.

# What’s the best TeX-to-HTML or TeX-to-ePUB converter?

## What’s the best TeX-to-HTML or TeX-to-ePUB converter?

I don’t have that much experience with this, but it might be better than nothing.

I think the two main contenders for TeX-to-html are TeX4ht (which most LaTeX distributions ship) and LaTeXML.

TeX4ht is really a dvi-to-html converter so it behaves accordingly. In my limited experience, it is easier to get results.

LaTeXML seems more powerful, but I could never get it to produce results from “arbitrary” TeX (again, not a lot of time spent on this). On the other hand, LaTeXML is used systematically to convert the arXiv with reasonable success rates.

With respect to epub3 (ignoring html-to-epub3), I’m only aware of pandoc (disclaimer: my personal favorite).

The current development branch has an epub3 writer with MathML support. This works reliably in a handful of tests. Pandoc does not have complete TeX support but John McFarlane is just a fantastic guy who built a strong community around pandoc — something the two others seem to lack.

Addendum: TeX.SE has lots of expertise on tex4ht and latexml, of course. See this example

# epub, mathjax and the iPad — another attempt

It’s a funny thing. I don’t even own an iPad. But a lot of people are interested in getting an epub file with mathjax working on the iPad.

Why is that? Well, as far as I could find out the iPad remains the only “hardware” that does not block javascript within an epub file (epub uses html for its content but javascript is designated “should not” in the epub2 standard). Of course it’s really the software, iBooks, but mentioning the iPad will be much better SEO.

Incidentally, the only other software I know that is not blocking javascript is the fantastic Calibre. Calibre’s reader seems to not care at all about enforcing the epub standard, it just render everything it finds (but I’ll get to that later).

## So what happened?

A while ago, after an email exchange which is now mostly available online, I finally created an epub with a complete mathjax installation. Unfortunately, it was a fluke. The file was was not reliably rendered on the iPad, most likely because of its size (MathJax has 30.000 files for ~20MB unzipped). So Davide Cervone suggested to cut down on unnecessary files which iBooks should not need.

This led to a result that rendered reliably — unfortunately it rendered in a most irritating fashion: half a line below the intended one, writing happily across any other text on the next line, trailing out of the margin etc. That’s far from perfect, obviously.

In the mean time, Davide was able to use my epub file to run some tests — and yesterday told us that things are looking much better now that he can work on the issues.

Of course, iOS5 was released last week. It’s not clear to me if iBooks already supports epub3, but I know that Safari now supports (some) MathML so there’s a chance that iBooks would (since it uses the webkit variant of Safari to render html). So when I had a quick chance last Friday to get my hands on a friend’s freshly updated iPad, I cooked up a quick test file and it rendered; it wasn’t perfect but not totally bad either. With my luck, of course, this will also be a fluke and I won’t know before I get my hands on that iPad again…

In the mean time, and for posterity, here’s how I create epub files. (for the Pros: get ready to laugh at a dilettante).

## The tools

That’s it. (Well, unless you don’t know what those are and how to use them — I won’t cover how to install and run these).

All but ecub is open source, ecub is at least free for personal use — and of course everything runs on Linux, MacOS and Windows (I mostly use linux and sometimes a Mac; I can’t make guarantees for Windows).

## Creating a minimal epub file with pandoc

I love pandoc (ecub was a great help, too, more about that later) so I’ll focus on it.

As you may know, here at Booles’ Ring I write using markdown and MathJax. I use pandoc whenever I want to convert this kind of content into something else (like LaTeX). But pandoc (as its name suggest) can handle much more.

So hit it! Take your favorite test html file (I use this post).

pandoc test.html -o test.epub

That should give you a working epub file — it ain’t fancy, but it’ll do for testing. Be warned that pandoc does not check if your (x)html actually validates. Since the iPad is picky about having valid epub files you should double check (I totally failed the first time and it took me ages to remember this…).

Fortunately, you installed calibre which includes a binary of epub-fix from the epub-tools by the fabulous people over at threepress.

So you find the epub-fix binary and run

epub-fix --epubcheck test.epub

If epub-fix finds errors, fix them: go into the epub file (which is just a zip file) and fix the (most likely html) file that throws an error; in the post I use, the html should complain about a part of the vimeo embedding.

When epub-fix is happy, send the file over to the iPad for a test spin (I use Dropbox for ease of sync). If even a simple test file does not work, throw your epub into threepress’s online validator just to be sure.

Oh, one more thing: remember to always delete your file from iBooks before your load its updated version. In my experience, iBooks does not update the file when something with the same metadata is already in the iBooks library (or maybe just sometimes, I don’t know, just watch out for that).

## Slimming down mathjax

Well, right now we have a nice epub. But if you view it anywhere it will have your typical LaTeX commands all over the place — we need to add mathjax!

Davide Cervone gave me some advice to reduce a mathjax installation to a mere 1.3MB.

• remove the MathJax/fonts/HTML-CSS/TeX/eot, svg, and png directories
• remove the MathJax/unpacked, test, and docs directories
• If you are only using TeX input (not MathML), then use the TeX-AMS_HTML-full configuration file.
• In that case, remove the MathJax/jax/input/MathML, MathJax/jax/output/NativeMML directories, the MathJax/extensions/mml2jax.js and MathJax/extensions/jsMath2jax.js .
• remove the “FontWarnings” and “v1.0-warnings” extensions, as well as all the configuration files you are not using.
• remove the MathJax/jax/output/HTML-CSS/fonts/STIX directory

Now that your MathJax installation is small and tidy, just copy the remaining files into a suitable folder (how about “mathjax”?) inside the epub — an epub file is simply a zip file after all.

While you’re at it, you should add a suitable MathJax configuration to the html files in your epub file. If you’re using my post from above, you should add

<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], processEscapes: true }, }); </script> <script type="text/javascript" src="mathjax/MathJax.js?config=TeX-AMS_HTML-full"></script>

If you don’t use dollar signs for inline math, just take the last line.

After this copying, we’ll have to repair our epub file. An important fact about epub: all files must be listed in the manifest (OPF) file. Since we don’t want to do that manually, we use epub-check again.

epub-fix --unmanifested --epubcheck test.epub

The “unmanifested” option (you guessed it) will ensure that all files will be added to the manifest. Beware: don’t try this on a full MathJax! Epub-fix will slow down after the first 1.000 files…

Now transfer your file to the iPad and low and behold some mathjax will render! Of course, you’ll find that this is not working: the rendering is broken right now. (As mentioned earlier, Davide is working on it)

## iOS5 to the rescue?

Now this post gets flaky. As I wrote earlier, I have only had one test run with an iOS5 iPad, so this might not work. But the process is worthwhile documenting.

As I said above, the thing about iOS5 is that Safari and hence iBooks finally has some MathML support.

Since pandoc is incredibly versatile you won’t be surprised that it can produce MathML and that it is aware of MathJax. So all we have to do is modify our earlier command.

pandoc test.html --mathml -o test.epub

This way, the html now has mathml instead of the LaTeX commands. Just shoot this over to your iPad and see how it renders. What I remember from my quick test with my post mentioned earlier was that some characters would render twice (which I had seen with that unreliable full install of MathJax I mentioned earlier). Also, MathJax’s support for commands like \\color obviously won’t work without adding MathJax again.

Alternatively, you could try using MathJax’s mathml-rendering and see what happens (I hope to test that next week).

## But what if I want to have it all?

As I wrote, I also created an epub file that had a full mathjax install inside of it. This is a terrible idea because a) it rendered only sometimes on the iPad b) every other ebook viewer rejected it or crashed.

But if you cannot resist (or want to modify my approach), here’s the a hurried how-to. Since epub-fix will come to a grinding halt adding 30.000 files to a manifest, use ecub instead.

Start ecub and use the new-project wizard, it’s pretty self-explanatory. Two points might be worth pointing out:

• At “Choose import method” you’ll want “from an existing html file”.
• At “Convert text files” check only “add any HTML file found” and “Also find files in folders under your project folder” (this step will take a short while).

After you’re back at the main window, you’ll still need to “compile” your epub file. This will take a long time. So long, in fact, you’ll think ecub is hanging. To convince yourself that it isn’t go to the project folder you designated in the wizard and watch the 30.000 files be copied into the folder and then watch content.opf grow in size (end result is ~3.5MB).

## Where do we stand?

So for now, we have two broken ways to display mathematical content in an epub file on the iPad: use slimmed down MathJax or use MathML directly. Neither works perfectly but the key point is: they work in principle. Now we can look into the specifics to make things work better. Davide is looking into the mathjax side of things and with webkit (hence Safari, hence iBooks) there’s reason to hope that mathml support will improve, too.

Of course, what I really want is an Android reader with javascript or mathml support…

And that’s it for today. Any questions?

Here are two files at your disposal.

# Tools for your online collaboration

So the winter school in Hejnice ended two weeks ago is long past — and despite my intentions I did not find the time to blog. This is primarily a sign of the quality of the winter school, both scientifically and socially. I do admit I spent the lunch breaks walking in the beautiful surrounding mountains instead of blogging…

Anyway, on the last evening of the winter school a couple of people gathered together to exchange tools for collaborating via the intertubes. I volunteered — also with the upcoming third Young Set Theorists meeting in mind — to make the discussion available online. Of course, the title refers to this wonderful paper by Goldstern and Judah which taught me the little bit of iterated forcing that I know.

For now I will restrict myself to freemium services. Of course, this is an open list — drop me a comment to add to this list (hm, a google wave would be better, right?).

Phones

A much better tool than a phone is? A videophone! (especially for handwaving arguments). Namely, skype comes to mind, but there are alternatives like tokbox or google talk which are web based. With possibly lower video quality they offer other useful things like actual video conferences (whereas skype restricts you afaik to 1-1 video calls) and invitation by link. There are also numerous true VoIP/SIP clients like Ekiga. But they may have the need for some firewall configuring. For more general information, check out wikipedia.

Whiteboards

But what good is a (video)phone if you cannot write on a blackboard together? In any serious mathematical discussion, notation will become an issue sooner or later. A simple, but bandwidth friendly and flash based whiteboard is scriblink — just go to the site and give your partner the invitation link. An alternative is dabbleboard which offers some shape recognition and also allows multiple pages in the free version and — most importantly — PDFs as background images. However, it is a little heavy on the bandwidth, especially latency which often annoys my voip connection.

Of course, if you want to use an online whiteboard efficiently you need some kind of tablet to write with. I personally have been very happy with a graphics tablet, a Wacom Bamboo to be exact. You can get tablets for 40€ and lower in Germany, but prices will differ regionally. Of course, I also use my Gigabyte M1028T tablet pc — although its tablet functionality is basic (no pressure sensitivity, only moving by clicking) making writing with it less suitable for real note taking — see the PDF section below.

Instant Chat, Online Docs and Google Wave

Personally, I have not used instant messaging for mathematics so far — video phones seem better. However, Pidgin has a LaTeX plugin to display basic TeX code. This is of course a useful feature. I’ll come back to the general problem of displaying mathematics on the web later.

I feel I must also mention Google Wave and its competitors. These are powerful tool mixing mail, chat, wikis and collaborative document editing. I have not tried any of these yet but if there’s someone to collaborate with it’s worth a try.

PDFs I — what you can do with them

PDFs is the somewhat dominant standard for (compiled) TeX documents (sorry, dvi and ps fans). Besides the next section there is another aspect which makes them worthwhile — PDF annotation. If you are like me and like to take your notes with you (for all those typos and indices that drive you mad in some papers) there is nothing better than annotating a PDF directly — especially if you invested in a (graphics) tablet.

My favourite is the open source Xournal with excellent tablet support on both linux and windows. Alternatives are Jarnal (which also works as real time whiteboard) and (for Mac users) Skim.

Although it does not quite fit in here (or anywhere): if you feel that PDFs are inadequate to present mathematics, why don’t you take a look at prezi? It offers a different angle on presentations altogether. I sometimes dream of having a prezi like ability to zoom into papers or rather proofs giving me details where I want them and letting me quickly browse through the main ideas dynamically whenever I choose to…

PDFs II — Personal online libraries

It is convenient to store papers and other materials online. If you cannot set up a decent sftp or a version control system on your university’s server, you might want to try dropbox or teamdrive. If you frequently use public computers you might want to use something more web based like google documents or the very pretty isssu that I use from time to time on this blog.

Community Sites

Of course, all science is community driven but I think (pure) mathematics could profit more from an online community than any other science or (liberal) art. The biggest player is certainly facebook — which already has a group for, of course, the winterschool itself. Facebook attracts academia (as opposed to myspace), hence it is the more obvious place to connect — this does not mean that you shouldn’t worry about its privacy settings or rather the partial lack thereof.

On the other hand, there are a couple of science focused community sites, among them researchgate which offer science specific tools like (p)reprint lists, online references, database searches etc. This might be better for purely professional intent but I have no experience using it.

A young and incredibly successful new site is mathoverflow — a mathematical version of the great stackoverflow. You can ask and answer questions of all sorts in a very efficient manner — just don’t get lost in all the fun.

Databases

Of course the mother of all things is the arXiv — do I need to explain it? And then there are Google’s products scholar and book search. A somewhat different database is gigapedia where you can easily search for books and find free ones. In all things beware of legal issues though.

LaTeX or displaying mathematics on the web

Of course mathematicians are used to LaTeX. On the web the best way for displaying mathematics is (from a web standards point of view) mathml. The problem is that mathml is a) too difficult to write as code directly, b) difficult to view since not all browsers view them correctly and from a visually impaired point of view it seems to be a disaster, too (see the discussion on Terry Tao’s blog) and c) it is difficult to convert back to LaTeX.

There are numerous workarounds. On the one hand you can (as I do) use tex4ht to convert LaTeX to mathml. Of course, as my blog shows this is a rather tedious thing if you do not have (or want to have) control over the webserver. Alternatives are jsMath which might be superseded by mathjax. If you have a wordpress blog you can (even on your free account on wordpress.com) use this plugin — which converts basic LaTeX commands into (rather ugly) PNGs.

The winner for best practices with mathml, I think, is the n-Category Cafe. Besides being a very active group blog they have developed impressive technologies such as mathml inclusion, the LaTeX dialect itex, the itex capable instiki with itex2mml to convert tex to mathml on the fly and all of this available in the comments, too.

Blogs, blogs, blogs

Almost last but in no way least, there are blogs.  This would be worth an independent post and there are plenty of examples for this, but here we go.

They come in all colours, for an impressive list go here. Also, go to any of those blogs and check their blogroll to find many more mathematics blogs. If you don’t understand what blogs are good for you might read John Baez’s article. To name a few contenders for ‘most influential mathematical blogs’: What’s new with Terence Tao, the most active single user blog I know, Timothy Gowers’s Weblog and Gil Kallai’s Combinatorics and more.

Of course, they are the ones that got me started with reading math blogs, but it’s the small blogs that got me hooked. The diversity is a challenge (I don’t understand half of what I read) but blogs form the best mathematics newspaper out there.

Polymath

At the moment the most hardcore project when it comes to online collaboration is clearly Polymath. With one paper on the arxiv, two projects finished and three projects going it is the perfect show case. Driven by the “big three” — Tao, Gowers, Kallai — one may argue that their power makes sure that it works (and is protected from theft). Polymath is an exemplary web project. It follows Jeff Jarvis’s rule and shows the synergetic behaviour of web projects — using multiple technologies at once: there’s the blog for the main discussion, but also the authors individual blogs used partly to organize. Finally there’s the wiki for fixing proper definitions and notational issues and finally they frequenly use mathoverflow to recruit new people by e.g. singling out distinct partial or dervitative questions.

But I believe it shows a glimpse of the future of mathematics. On the one hand, many problems have become too complex to be tackled by a single person or research group. On the other hand, although the techology might change considerably in the future, the idea of having researchers on all levels collaborate — with every contribution being valued — could be a prototype that values many soft skills, be it good writing, accessible presentation, social skills for bringing conversations to converge productively, taking a bird’s view of the process to assist or acquiring empirical experimentation and implementation. It is also a very flexible approach where people can help as much or as little as they find the time for while (with proper support like Gower’s current EDP posts) still being able to follow the flow and ideally being able to change their level of involvement as they please.\

That’s all for now. Let me know what I forgot.

2010-02-15

Unicode characters

There was also a question regarding unicode characters and the like (instead of mathml). I just found this chart via mathoverflow — maybe it helps.

2010-02-17

Feeds in either Real simple syndication (RSS) or Atom from are worth mentioning on its own. As a tool for 1-to-infinity communication it’s an important technology for collaboration. You’ll find feeds for all kinds of newssites and blogs, but also for each section of the arxiv. To read feeds you can use lots of different programs and web based services.

Video sites

Videos of research level mathematics are pretty rare. There is the archive of the MSRI and singular popular mathematics gems like Gowers’s talk on multiplication. Also, you should check out MIT’s impressive youtube channel.

To put up a video you don’t need much these days, so it’s strange that there’s not more around — especially since (pure) mathematics seems easier to share than, say, complicated science experiments. There are too many free video sites out there. Next to the already mentioned youtube I would point out the science video site SciVee (with its strong, yet somewhat expensive premium service) and Vimeo with its focus on original content.

Reference management

Thanks to David for reminding me that I forgot one aspect of pdf management — reference management (see the list on wikipedia). Now there are many programs out there to get your citations, i.e., your BibTeX files organized. But there are also programs that connect the citations with the pdf, offer online database searches, tags, pdf annotation and social networking ideas.

A big list can (once again) be found on wikipedia. To present a few. I personally use referencer but David also mentioned Mendeley in his comment which has an impressive list of features including online access and social network aspects and I’ll probably try it out. To give credit where it is due a few of these programs name Papers as inspiration which unfortunately is Mac only. With a different flavour there are the web-only Zotero, a powerful Firefox addon, and I, Librarian, a groupware tool.

# Testing MathML

One of the tools I want to use in this blog will be MathML. I think MathML is so far the best solution to present mathematical content on the web even though the discussion on Terence Tao’s blog shows that MathML has its own deficits, especially when it comes to accessibility.

Nevertheless, tex4ht allows me to wait for a good standard to develop while working with one “generator”, namely LaTeX, to produce multiple outputs.

I chose blogger especially because I wanted to use MathML, e.g.

$x=\frac{-b \pm \sqrt{b^{2}-4ac}}{2a}$

Of course, blogger does not make it easy, but thanks to David Carlisle a good friend of mine was able to hack enough for me to work on for now. Unfortunately, I will now have to add to the header that you really need firefox with javascript, but is that too much to ask these days?

By the way, this tag will hopefully lead to more techological experiments in the future.