[Previously posted at
http://sourceforge.net/projects/mathjax/forums/forum/948700/topic/3826683]
jfine
2010-08-26 10:11:54 PDT
MathJax has, so to speak, introduced a new output format for TeX
source, namely the aptly names HTML-CSS. I can see a lot of benefits
for their being something like dvi to HTML-CSS translator.
What I have in mind is something on the server that translates dvi to
low-level commands that in the browser are translated into HTML-CSS.
This would I think allow MathJax to handle much more complicated
mathematics. It might also improve performance.
Anyone else interested in this?
----------
caseystark
2010-08-26 10:57:46 PDT
I don't know the details of the DVI format, but my guess is that it's
like PDF and you can't extract the math semantics. My understanding is
that a LaTeX to DVI compiler would just typeset the characters and we
can't extract the TeX math source. Davide will have to confirm because
I could be wrong.
If anything this would decrease performance because of the additional
work converting DVI to MathJax internal jax format (a JavaScript
representation of MathML objects). The conversion from a text format
like TeX to jax is fast but my guess is that DVI to jax would be more
expensive. The conversion from jax to HTML-CSS is always the most
costly, so if that's the same, it can't be faster unfortunately. Not
to mention that DVI files are probably much larger than the equivalent
TeX source, which means more time over the wire.
Still, it could be useful to help the TeX stingy migrate to something
much nicer.
----------
paultopping
2010-08-26 11:12:16 PDT
While this particular proposal for server-side optimization might not
be quite right, it does not invalidate the more general scheme of
improving performance by doing some processing ahead of time. If some
work can be done on the server, the results of that processing can be
cached and, therefore, shared by all and the burden on the client can
be reduced.
It may even be possible to push processing even farther back to the
content preparation phase. For example, if the authors can target a
specific browser (or browser set) they may be able to reduce the
processing. Or perhaps one could statically analyze the characters
actually used in a set of pages and build a font set containing only
those characters. There are probably other opportunities for early
binding and the performance optimizations that result from them.
While this kind of thing might be interesting, I hope that it be
rendered moot by faster connections, JavaScript engines, computers,
etc. All of the stuff I mention above would seriously impact the
complexity of MathJax and its testing. My fingers are crossed.
----------
jfine
2010-08-26 12:37:16 PDT
@casey: You're right, that dvi doesn't contain semantics. And also
that dvi to jax would be slow. But I'm looking at factoring jax ->
HTML-CSS into jax -> pseudo-dvi and then pseudo-dvi -> HTML-CSS. If
we did that, and sent pseudo-dvi from the server to the client then
performance would be better, because you'd miss the typesetting stage.
@paul: One of my interests is pages that have only a little bit of
mathematics, and no need to do much with it.
----------
jfine
2010-08-26 12:40:36 PDT
My main reason for making this suggestion is that it would allow
MathJax to handle a wider range of mathematical constructs. It would
allow a low-level confidence that so long as you could typeset it
(using the normal TeX fonts) you could display it using MathJax. This
would apply even to legacy items. Performance is a secondary issue,
and I wouldn't mind if we dropped it from this thread.
----------
robertminer
2010-08-27 09:12:25 PDT
Hi Jonathan,
Your idea of going from DVI to HTML-CSS is an interesting one. As I
know you are aware, the two great schools of TeX conversion, reparse
source vs. interpret DVI, have been battling it out for as long as
I've been tracking the debate, without any clear winner, at least in
my mind. Obviously we went with reparsing source for the LaTeX math
support in MathJax, because that fit our target use cases better. But
as you point out, the big win with the interpret DVI school is that it
deals with whatever TeX itself can handle, regardless of how screwy.
I presume you are thinking of this as something you could do with your
MathTran server?
I think your chief challenge would be that the internal
representation of an expression in MathJax is essentially MathML,
while DVI is more loosely structured, and more positional in nature.
Thus, to reuse the MathJax HTML-CSS formatting engine, which
essentially goes the internal 'element jax' parse tree (again,
essentially isomorphic to a MathML parse tree) to HTML-CSS, you would
have to devise an input processor that would go from your low-level
DVI-like language into that element jax data-structure. I haven't
exhaustively studied the problem myself, but I've watched other people
struggle with it, and my sense is that it is a hard problem. For
example, tex4ht, which also uses the DVI interpretation strategy and
is arguably best of that breed, relies on maintaining a bunch of semi-
semantic hints about the original source in DVI specials in order to
produce that MathML structure.
Of course, one could imaging writing a formatting engine that takes
something basically isomorphic to TeX's math lists (or whatever it is
in the DVI that would be the best starting point) and produced HTML-
CSS code. But one would almost be starting from scratch. That might
be worth doing, but I think its main application would be converting
entire TeX documents to HTML, which is really a different task than
MathJax undertakes. The MathJax framework of pluggable input, element
and output jax would support it, but that and the font manager would
really be all you could reuse, so it would be a major development
project.
--Robert
----------
dpvc
2010-08-29 14:50:19 PDT
As always, Robert's response is well thought out and very much to the
point. I just wanted to add a couple of technical points. I suspect
Robert is right, the DVI to MathML-based element jax conversion would
be difficult. He hints that the alternative is to develop your own
element jax format that is based on something more conducive to the
DVI format; this is perfectly legal to do, and MathJax would support
that. You would then need to write an output jax that would display
that element jax format (since the current HTML-CSS outptu jax won't
help with that), and an input jax to get whatever textual
representation of the DVI file is inserted into the HTML document
converted to the element jax internal format. As Robert points out,
that is not a small task.
I have a feeling that you could not even take much from the font
manager, since the font layout for the MathJax fonts is based on
unicode, not the TeX fonts themselves. For example, where TeX uses
two or three characters to produce long arrows or arrows with hooks,
MathJax has a single character (that was constructed from the original
pieces, but is now a single character in the font), and the individual
pieces are not available in the fonts. So it may be hard to map the
character references in the DVI file onto the existing MathJax
fonts.
One of the losses to going this route is that you no longer have the
"Show Source" menu item, or the accessibility that comes from having
the internal MathML representation.
In any case, it does sound like a big project. Of course, if someone
has funds to support it, I know how to make MathJax do it. :-)
Davide
----------
jfine
2010-09-02 09:36:08 PDT
I still think we are at cross-purposes. All I want to do is display
dvi nicely in a web browser, using HTML-CSS for positioning the
characters.
----------
dpvc
2010-09-02 11:48:23 PDT
Jonathan:
I think we understood that that was what you wanted. We are merely
pointing out that the current output jax used in MathJax is not going
to be of much help to you in that. You suggested that you wanted a
path from dvi to MathJax's HTML-CSS output jax, but that is probably
not going to be the way to go, as the HTML-CSS output jax processes a
format that you will not be likely to produce easily from dvi files.
You will probably need to do your own HTML-CSS-style output instead,
which will likely be a substantial project. The MathJax framework
would support that, but I suspect that there is little that can be
reused from MathJax's current input and output processors. I also
wanted to point out a potential problem trying to use the MathJax web
fonts for this as well. What you suggest probably is possible, but my
impression is that you will be, as Robert says, starting pretty much
from scratch.
Davide
----------
jfine
2010-09-03 01:29:06 PDT
I think dvi to HTML-CSS is worthwhile, and I was hoping that the
MathJax codebase would provide more than you say it's able to provide.