2326 views

Skip to first unread message

Apr 21, 2020, 7:20:51 PM4/21/20

to MathJax Users

Do they compete in the same space? Or do you use KaTeX in different scenarios than MathJax? And what makes KaTeX faster?

Apr 28, 2020, 11:33:24 AM4/28/20

to mathja...@googlegroups.com

Do they compete in the same space? Or do you use KaTeX in different scenarios than MathJax?

Partially. KaTeX is purely about TeX/LaTeX input and HTML output, whereas MathJax also processes MathML and AsciiMath (two other math formats), and produces output not just in HTML but also SVG and MathML. MathJax is also deeply involved with making mathematics on the web be accessible to users with assistive technology needs (e.g., generating speech text for the math, or braille output for it, and providing interactive exploration of the math, etc.). While KaTeX does include hidden MathML output for screen readers, assistive support is not central to it. So MathJax has broader goals.

And what makes KaTeX faster?

Comparing performance of packages that work in different ways is always tricky. In this case, the speed comparisons between MathJax and KaTeX can be somewhat deceptive. For example, on the KaTeX home page there is an animation showing KaTeX and MathJax side-by-side on a rather large page from the Mathematics StackExchange site. There are several caveats they don't discuss concerning the comparison, however. Because MathJax has a number of possible input and output formats and a variety of possible font choices, MathJax loads some of its code dynamically. That means it has to wait on network access to obtain those pieces. That delay is part of what is shown on the MathJax side of the comparison; because of KaTeX's more limited mission, it doesn't have to load pieces dynamically, and so doesn't have that network latency to contend with. Although the code for KaTeX also has to be loaded and involves some network delay, none of the timing comparisons I've seen take this into account -- they start the clock after the KaTeX has been loaded, and after MathJax.js (which is relatively small), but include the rest of MathJax's downloading time. While it is true that this is time that the user has to wait for the output, it is a bit of an apples to oranges comparison, especially if the original downloads are not included.

Another consideration is that MathJax updates the page in sections, so that when you have a large page with lots of math, the initial mathematics is typeset and displayed quickly so that you can be reading the page while the rest is processed. Because repainting the display is one of the slowest things the browser does, this means that MathJax causes more page refreshing (in order to get you results quickly), and so may the longer to get the entire page processed because it ha waited for periodic page updates. (The page author can control this by setting various MathJax parameters, but the defaults are set to give quick initial typesetting at the expense of longer finish times.) The animation on the KaTeX page shows a portion of the page that is 2/3 or so of the way through the long age, and MathJax has stopped to allow page updates several times along the way, a you can see but he jumps that are taking place. Had they shown the top of the page, MathJax would have shown its equations first (though taken longer to finish the page). This was a conscious trade-off that MathJax made, and represents a different approach to how the page should be updated. The position of the page in the animation gives a somewhat deceptive view of that.

Also to be considered is that the animation is now quite old. MathJax has improved its speed considerably since that animation was made. There are several different output formats for MathJax, and that animation uses the slowest and oldest one (the HTML-CSS output). That output was developed in the old days when we had to accommodate browsers as old as IE5, and that meant it did a lot of work you don't have to do now. For example, it would have to measure the sizes of some subexpressions because the browsers were not consistent enough in those days to get the same layout, and that measurement involved reflowing the page, which is expensive. In order be able to do those measurements, the MathJax web fonts had to be in place (otherwise the measurements would be for elements with the wrong fonts), so MathJax had to wait for those to load before it could proceed (more rating for network transfers). Note that in the animation KaTeX doesn't wait for the fonts; you will see the horizontal lines show up and then the font appears later.

Even in those days, I think MathJax's SVG output was available, which is roughly 5 times faster than the HTML-CSS output. The CommonHTML output format was not, but it is the replacement for MathJax's olde HTML-CSS output, and is comparable in speed to the SVG output even in version 2 of MathJax. It does not do any measuring and doesn't have to wait for the fonts, so it would make a more apples-to-apples comparison to KaTeX's approach.

Finally, one should note that at the time the animation was created, KaTeX didn't process all the TeX commands that MathJax does. At that time it did not do arrays or alignments or horizontal stretchy characters, or equation numbers, and so on. Although they picked a page where KaTeX could process most of the math, not everything even on that page was something KaTeX could display (e.g., the equation numbers and the alignments). So KaTeX got to skip some of the math layout that MathJax was actually performing, and that helped its timing as well. (They have since added most of those features, so would be able to process that page better, but perhaps more slowly, now.)

Those are some environmental issues that contribute to the differences that you see in the animation speeds. There are also some programmatic differences. As I mentioned above, MathJax has several input formats and several output formats. MathJax uses MathML as its internal format, so the input formats are all translated to MathML, and then the output renderers typeset that MathML format. That allows any input format to work with any output format because we go through a common intermediate format. But this does mean that there are several conversions steps, and it does mean that the complexities of MathML must be taken into account (for example, the way that stretchy delimiters around arrays work in MathML is rather different from how the do in TeX, and accommodating that takes some extra work).

KaTeX, on the other hand, implements the TeX pipeline described in Appendix G of the TeXBook. This is a simpler process that does not involve converting among different formats (it involes converting to TeX's internal math lists, which are designed to work nicely with TeX, and match its output approach much more readily than MathML does). Indeed, the TeX pipeline was what I implemented in MathJax's predecessor, jsMath, back in 2004. It is true that that can be done faster than what MathJax does (and jsMath was faster than Mathjax); but MathJax handles a wider range of formats, both for input and output, and that extra flexibility does come at a cost. Being able to handle MathML input was a key requirement of MathJax's sponsors, many of whom are commercial publishers with workflows that are based around MathML (even when their initial submissions are in TeX).

Another difference is that MathJax kept track of lots of bounding box information that KaTeX doesn't (and it turns out that this was a significant part of the processing times for MathJax). I haven't looked recently, but KaTeX initially only tracked an expressions height and depth (not width), whereas MathJax tracks height, depth, width, and extensions above, below, to the left, and to the right of the main bounding box (e.g., if you used \llap{} so that something extended to the left of the bounding box, MathJax tracked that). This was to handle SVG output as well as MathJax's zooming feature properly.

Finally, there have been many advanced in browsers since MathJax was initially developed. The advances in optimizing javascript performance in the browsers has been substantial, but the techniques needed to take advantage of them have changed over time as well. Some of the design decisions in MathJax made it hard for the browsers to optimize its code. In particular, MathJax would add and remove properties from objects at will, and that turns out to be very bad for optimization. KaTeX coming several years after MathJax, they were working with a more modern language, and so could take advantage of the changes that had occurred in that time. They also didn't have to accommodate older browsers like IE6.

With the MathJax rewrite for v3, we updated MathJax's infrastructure to bring the codebase into alignment with modern language features and programming paradigms. This means that MathJax gets a speed boost over v2 (what was already 5 times faster than the MathJax version to which KaTeX compared itself). We have also made it possible to do single-file downloads rather than v2's dynamic downloads (though both are possible), and don't do the "chunked" output any longer. All of these improve the speed with which MathJax's final output is produced.

I have not done any formal testing against KaTeX, but I suspect that the results with MathJax 3 are now comparable to that of KaTeX, even with the conversion to MathML and all the rest. The only page I know off that does a direct comparison with v3 is this one, which has some problems. If you use their comparison, MathJax v3 is faster than KaTeX, but they have misconfigured MathJax so that in-line math isn't processed. But even if you double the MathJax time to accommodate that, it is still comparable to KaTeX according to that test. I can't vouch for how the timing is done.

So that is a long-winded explanation to say that the claim that KaTeX is the fastest math processing available may no longer be true.

Davide

Apr 29, 2020, 1:47:40 PM4/29/20

to MathJax Users

I noticed in wordpress that Katex can not render equations in RTL languages correctly but Mathjax works fine. So for any one who is involved with RTL languages or maybe bidirectional ones, Mathjax is the only option.

Apr 29, 2020, 10:16:45 PM4/29/20

to MathJax Users

Davide

However, it didn't process the inline math successfully as you noticed (and I missed).

I tried to make that comparison page as fair as possible.

For inline math, I followed the directions in the documentation for the use of $ signs as delimiters, i.e. this block before the call to MathJax:

window.MathJax = {

tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]}

};

However, it didn't process the inline math successfully as you noticed (and I missed).

I changed my delimiters to \( \) and inline math is fine now. It appears to have made MathJax 3 render faster (but not sure why).

As for the timing output, as it says under the table

"Fonts loaded" is determined using document.fonts.onloadingdone."Page complete" is the window.onload event.

The signals for "Process MathJax" (both versions) are as per the help you have given me in the past on this issue.

Regards

Murray

May 1, 2020, 4:06:48 PM5/1/20

to mathja...@googlegroups.com

I tried to make that comparison page as fair as possible.

Yes, I know. I wasn't complaining about your tests being unfair (I was complaining about KaTeX's tests being unfair). Sorry if I sounded critical.

For inline math, I followed the directions in the documentation for the use of $ signs as delimiters, i.e. this block before the call to MathJax:

window.MathJax = {

tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]}

};

However, it didn't process the inline math successfully as you noticed (and I missed).

This is because you replace window.MathJax in the MathJaxDone promise, so it wipes out the tex setting that you had. You should Integrate the two window.MathJax assignments into a single one in the promise.

I changed my delimiters to \( \) and inline math is fine now. It appears to have made MathJax 3 render faster (but not sure why).

This is because your timing isn't really capturing the work MathJax is doing. Your setting of timeStart is unrelated to when MathJax actually does its processing, so it seems to me that you are not getting the right values for the MathJax processing time. Similarly, your setting of the end time is not really the time that MathJax ends its work. You are using the animation frame to do the timing, but note that because javascript is single-threaded, there is no guarantee when your animation frame code will run. It certainly won't run while MathJax is processing, because MathJax v3 won't give up the CPU during that process (MathJax v2 did), and anything else that has been queued up (by promises or setTimeout()) may cause your animation frame code to be delayed. That may give bad ending times for the various processes you are tracking. Plus your animation frame code (if it does run) will count agains the actions being timed.

I would also point out that because you are setting the start time in a script that is well into the page, and in particular, a script that follows the synchronous loading of jQuery, where the browser will have to wait for jQuery to load and run before going on to your script, your start time is not an accurate start time for the page, and so your other values are inaccurate (if they are meant to represent the time from the page load).

So I'd recommend doing the following:

First, record the start time at the very beginning of the page. Something like the following just after the page <title>:

<script>var startTime = new Date();function ElapsedTime() {return new Date() - startTime}</script>

Then for MathJax, use:

<script>MathJaxDone = new Promise((resolve, reject) => {window.MathJax = {tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]},startup: {pageReady() {return MathJax.startup.defaultPageReady().then(resolve);}}}});</script>

Finally, for the timing code, replace everything you currently in the script just after loading jQuery with:

<script>window.addEventListener("DOMContentLoaded", function(event) {$("#ttd").text(ElapsedTime());});window.onload = function() {$("#ttc").text(ElapsedTime());}document.fonts.onloadingdone = function (fontFaceSetEvent) {$("#ttf").text(ElapsedTime());}MathJaxDone.then(() => {$("#ttp").text(ElapsedTime());}).catch((error) => {console.error('MathJax error: ' + error.message);});document.querySelectorAll('div.math').forEach(function (item) {var texTxt = item.innerHTML;var para = document.createElement("p");var node = document.createTextNode("The LaTeX code:");para.appendChild(node);var pre = document.createElement("pre");var node = document.createTextNode(texTxt);pre.appendChild(node);item.parentNode.insertBefore(pre, item.nextSibling);item.parentNode.insertBefore(para, item.nextSibling);item.innerHTML = "\\["+texTxt+"\\]";});document.querySelectorAll('span.math').forEach(function (item) {var texTxt = item.innerHTML;var para = document.createElement("p");var node = document.createTextNode("The LaTeX:");para.appendChild(node);var pre = document.createElement("pre");var node = document.createTextNode(texTxt);pre.appendChild(node);item.parentNode.insertBefore(pre, item.nextSibling);item.parentNode.insertBefore(para, item.nextSibling);item.innerHTML = "\\("+item.innerHTML+"\\)";});</script>

These will print times relative to the start of the page being loaded.

Of course, you should do something similar for KaTeX. Note, however, that you are doing the KaTeX processing within the body of the page (so before the DOMContentLoaded signal), while MathJax doesn't even begin processing the page until DOMContentLoaded. (This allows you to not have to process each expression yourself by hand, as you are doing.) I think KaTeX has a tool to locate and process them math similar to how MathJax does, so using that might be a more consistent comparison.

It is possible to run MathJax in a similar way to how you are handling KaTeX. To do this, you can load MathJax synchronously and use the configuration

<script>window.MathJax = {tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]},startup: {typeset: false}}});</script>

and rather than use MathJaxDone in the timing section, remove that and add

MathJax.startup.defaultReady();MathJax.typeset();$("#ttp").text(ElapsedTime());

at the end of the timing section, after the math has been inserted into the page. (Alternatively, you could move the KaTeX processing into a DOMContentLoaded handler to be comparable to MathJax.)

When I do this (make MathJax process within the page as KaTeX does), the time at which MathJax ends (using local copies of all the scripts and fonts, and removing all the google analytics stuff), is about 2.2 times where KaTeX ends.

Of course, both numbers include all the page processing time, and in some sense doesn't really capture the time actually used by MathJax and KaTeX. If you set the start time to right before MathJax and KaTeX start their processing and the end times to right after, then you get the "real" time that the two take to process the math on the page. In that case, the MathJax time is about 2.7 times as long. That is down considerably from the factor for v2 (which I haven't computed), but I would say at least 50% better, based on other tests I've done.

So while KaTeX is still faster than MathJax v3, MathJax is not far behind, especially considering its different mission.

Davide

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu