typeset makes html entities to disappear?

256 views
Skip to first unread message

Lele LL

unread,
Mar 5, 2012, 10:48:31 PM3/5/12
to MathJax Users
Hi all,
we are integrating mathdox formula editor (http://bit.ly/AwaHXi)
within tinymce wysiwyg editor, and we use mathjax in order to render
mathml produced by mathdox. This works, and we can render mathml
produced by mathdox inside a tinymce editor window.

We noticed that the Typeset call makes some html entities to disappear
from the original mathml:

source mathml before typeset:
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://
www.w3.org/1998/Math/MathML"> <mrow><mn>2</mn><mo>&#8290;</mo><mi>x</
mi><mo>&#8290;</mo><mn>&#960;</mn><mo>&#8290;</mo><mn>&#8519;</
mn><mo>&#8290;</mo><mn>&#8520;</mn><mo>&#8290;</mo><mo>&#8734;</mo></
mrow> </math>

source mathml after Typeset:
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://
www.w3.org/1998/Math/MathML"> <mrow><mn>2</mn><mo>⁢</mo><mi>x</
mi><mo>⁢</mo><mn>π</mn><mo>⁢</mo><mn>ⅇ</mn><mo>⁢</mo><mn>ⅈ</mn><mo>⁢</
mo><mo>∞</mo></mrow> </math></p>

The resulting Mathml is rendered correctly inside tinymce immediately
after the typeset, but it's unusable for further renderings (missing
html entities will be replaced with "?" if typesetted again)

We obviously need to preserve the original mathml. Is there a way to
force mathjax to leave the original mathml untouched?

Davide P. Cervone

unread,
Mar 6, 2012, 9:59:41 AM3/6/12
to mathja...@googlegroups.com
I'm not sure I understand the situation completely. When you say "the
source mathml after Typeset", where are you obtaining the source?
When MathJax typesets a portion of the page, it actually removes the
original MathML and replaces it with whatever output is requested
(e.g., the HTML-CSS renderer will insert various SPAN elements). It's
true that the MathML renderer will insert new MathML, but I doubt that
is what you are looking at. It is also true that MathJax's internal
format is MathML, and that you can obtain that, but I'm certain that
isn't what you are looking at. The original form IS stored by MathJax
as part of its ElementJax structure, and it may be that you are
looking at that. This form is the result of innerHTML on an element
containing just the <math> element, and since it is the browser that
converts entities to their respective characters, that form is like
the second one you give below. As far as I know, there is no way to
determine whether a character was originally encoded as an entity
(like your original) or as a direct unicode character (like your
second version), as the entities are already removed by innerHTML.

I suspect that you are not actually looking at the results of
MathJax's Typeset() call (which you don't want to edit), but actually
the original HTML that you created (the correct thing to edit), and
that it is not MathJax that is changing the entities but the browser
itself. MathJax doesn't leave the original MathML in the page, it
removes it, so unless you are asking for MathJax's original version, I
don't see that you could be looking at something that MathJax has
altered.

If you could be more specific about the details of what you are
looking at and how it is obtained, that might help us analyze the
situation.

Davide

Lele LL

unread,
Mar 6, 2012, 1:36:33 PM3/6/12
to MathJax Users
Dear Davide,
thank you for helping us out.

Sorry for the brief explanation I provided in my first post: I tried
to be not too verbose in describing our issue. Let me try to explain
in details what we need to do.

Our formula editor runs in a popup window and produces presentation
mathml. It appends formulas to the end of the content of a tinymce
editor window.

The following is a sample output from our formula editor, produced
inserting the formula "e+x*y" (where e is the euler's number).

<?xml version="1.0"?>
<mrow><mn>&#x2147;</mn><mo> + </mo><mi>x</mi><mo>&#x2062;</mo><mi>y</
mi></mrow>
</math>

when the user submits the formula, we inject this mathml (as variable
"data") inside our tinyMCE editor window:

parentWindow.tinyMCE.activeEditor.setContent(currentData + "<p>" +
data + "</p><p> </p>", {format : 'numeric'});

(here parentWindow is the window containing tinymce,
tinyMCE.activeEditor is the editor window itself, currentData is the
data contained in the editor to which we append our formula. <p></p>
are added in order to create an empty line below the formula)

Inspecting the tinymce editor content we have now:

parentWindow.tinyMCE.activeEditor.getContent({format : 'numeric'});
"<p> <math xmlns="http://www.w3.org/1998/Math/MathML"
xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mrow><mn>&#8519;</
mn><mo> + </mo><mi>x</mi><mo>&#8290;</mo><mi>y</mi></mrow> </math></p>
<p></p>"

Ok, it's good. Now we typeset the window in order to render the
formula:

parentWindow.document.getElementById('textslide_1_content_ifr').contentWindow.MathJax.Hub.Typeset()

and this is our editor's content (I'm reporting the math source only,
omitting mathjax output):

<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://
www.w3.org/1998/Math/MathML"> <mrow><mn>ⅇ</mn><mo> + </mo><mi>x</
mi><mo>⁢</mo><mi>y</mi></mrow> </math>

html entities are now (after the typeset) no more expressed in
numerical form.

TinyMCE will submit _all_ the editor content, so we want to remove
mathjax output and leave only the mathml source. This would not be a
problem, but our backend does not accept "rendered" html entities and
will replace them with a "?": it works perfectly instead with numeric
encoded html entities (i.e. saving the content before the typeset()).

We tried both to call MathJax's Remove() and SourceElement() methods
to get back to the "original" version before storing the math in the
DB. The only way I can go back to the "original" encoding is to right
click on the rendered formula and choose "show math as mathml code".
In this way html entities as rendered in numerical form and that is
exactly what we would need. Unfortunately, I can't find a method able
to do this conversion in the API...

Probably I'm missing something... Thanks in advance for your help!

Ciao,
LL

Davide P. Cervone

unread,
Mar 6, 2012, 5:58:50 PM3/6/12
to mathja...@googlegroups.com
> Now we typeset the window in order to render the formula:
>
> parentWindow
> .document
> .getElementById
> ('textslide_1_content_ifr').contentWindow.MathJax.Hub.Typeset()

You should not really be calling Typeset directly, but should be using
the MathJax.Hub command queue (see

http://www.mathjax.org/docs/2.0/queues.html#the-mathjax-processing-queue

for details), and whatever code depends on the results of the
typesetting should also be queued. Note that Typset() may return
before the math is fully rendered, so you must use the queue in order
to guarantee that MathJax is finished when you use its results.

> and this is our editor's content (I'm reporting the math source only,
> omitting mathjax output):
>
> <math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://
> www.w3.org/1998/Math/MathML"> <mrow><mn>ⅇ</mn><mo> + </mo><mi>x</
> mi><mo>⁢</mo><mi>y</mi></mrow> </math>
>
> html entities are now (after the typeset) no more expressed in
> numerical form.

Again, I'm a little confused by what you mean "the math source only".
Are you talking about the contents of the <script type="math/mml"> in
which MathJax has stored the original MathML? In that case, your
results make sense, because (as I pointed out in my original
response), the mml2jax preprocessor uses innerHTML to get the contents
of the original math element, and the entities have already been
replaced by the browser at that point. So the contents of the
<script> will be the actual unicode characters rather than entity
references.

If you want the scripts to contain your original entities, then you
should insert your math via

parentWindow.tinyMCE.activeEditor.setContent(currentData + '<script
type="math/mml">' + data + "</script><p> </p>", {format : 'numeric'});

rather than using a paragraph. That will put the code directly into
the script as you originally had it, and so it will not be subject to
the interaction with the browser that the mml2jax preprocessor does.
You will then no need to run the preprocessor (but that would require
not using a combined configuration file; not sure whether you are or
not; in any case, it won't hurt to run the preprocessor, it just isn't
necessary).

> We tried both to call MathJax's Remove() and SourceElement() methods
> to get back to the "original" version before storing the math in the
> DB. The only way I can go back to the "original" encoding is to right
> click on the rendered formula and choose "show math as mathml code".
> In this way html entities as rendered in numerical form and that is
> exactly what we would need. Unfortunately, I can't find a method able
> to do this conversion in the API...

The output for the show source menu item is generated directly from
the MathJax internal representation. That is obtained from calling a
jax's toMathML() method, but you have to make sure the toMathML.js
extension is loaded first, and you have to use queues for this, since
it is asynchronous, as I recall. It is better to use the <script>
approach above, provided the setContent() allows it. There are some
issues in IE with including scripts when setting innerHTML. I'm not
sure how tinyMCE handles that, so it might not work as hoped.

Anyway, those are my ideas on your situation.

Davide

Thomas Leathrum

unread,
Mar 6, 2012, 9:40:26 PM3/6/12
to mathja...@googlegroups.com
Davide is, of course, correct about how and when to apply the Typeset() method, but I think there is another issue here.  TinyMCE is a bit peculiar about how it handles HTML entities.  It goes ahead and processes the entities into Unicode characters when it saves the file on the server.  This behavior can be modified with configuration options in the tinyMCE.init() method -- in particular, the "entity_encoding" option (choices are "names", "numeric", or "raw", default is "raw", which causes this behavior), but this makes the change global and may affect behavior you expect in other places.  Another approach I have seen is to effectively double-escape the entities -- so, for example, your "&#8290;" should be "&amp;#8290;" when used inside a MathML block.  This will cause TinyMCE to process the "&amp;" into an ampersand, leaving "&#8290;" in the MathML, which is what you want.  This may seem like a peculiar thing to have to do, but it is because of the peculiar behavior of TinyMCE with regard to entities.

Lele LL

unread,
Mar 7, 2012, 8:13:01 PM3/7/12
to MathJax Users
Thank you!
Including my mathml inside a <script> tag did the trick.

> Are you talking about the contents of the <script type="math/mml"> in
> which MathJax has stored the original MathML? In that case, your
> results make sense, because (as I pointed out in my original
> response), the mml2jax preprocessor uses innerHTML to get the contents
> of the original math element, and the entities have already been
> replaced by the browser at that point

Yes, exactly. I was trying to retrieve the mathml from there. And it
happens exactly what you described: entities are replaced by the
browser if I put the mathml outside of scripts tags.

> It is better to use the <script> approach above, provided the setContent() allows it.

TinyMCE's setContent() allows it, with some limitations. One is that
it will wrap what's inside your script tag in a CDATA block. This can
solved removing via regexp the cdata wrapper using tinymce builtin
postprocessor (triggered when injecting the mathml code). The other
limitation is that tinyMCE will change your script type="math/mml" in
"mce-math/mml" in the resulting html.

Let me show you. This is how I put code in tinyMCE editor window:
parentWindow.tinyMCE.activeEditor.setContent(currentData + "\<script
type=\"math\/mml\"\>" + data + "\<\/script\>", {format : 'numeric'});

this will produce the following in my actual editor window:
<script type="mce-math/mml">
<mrow><mn>&#x03C0;</mn><mo>&#x2062;</mo><mn>&#x2147;</
mn><mo>&#x2062;</mo><mn>&#x2148;</mn><mo>&#x2062;</mo><mo>&#x221E;</
mo></mrow>
</math>
</script>

Typeset() works for me only when the script type is math/mml. I think
this is a tinymce related issue, so I've opened a thread about it on
tinymce forum (you will find the post here
http://www.tinymce.com/forum/viewtopic.php?pid=99526#p99526 and a
fiddle with my tinymce code here http://fiddle.tinymce.com/9dbaab)

Anyway, is there a way to tell mathjax to typeset if the script tag
has the type attribute "mce-math/mml"?

Thank you again for your help!
LL

Davide P. Cervone

unread,
Mar 8, 2012, 3:53:42 PM3/8/12
to mathja...@googlegroups.com
>> It is better to use the <script> approach above, provided the
>> setContent() allows it.
>
> TinyMCE's setContent() allows it, with some limitations. One is that
> it will wrap what's inside your script tag in a CDATA block. This can
> solved removing via regexp the cdata wrapper using tinymce builtin
> postprocessor (triggered when injecting the mathml code).

Note that MathJax will handle the CDATA block within the script on its
own, but I guess you will want to remove it in the end when you put
the contents back into the page anyway.

> The other limitation is that tinyMCE will change your script
> type="math/mml" in
> "mce-math/mml" in the resulting html.

> ...


> Anyway, is there a way to tell mathjax to typeset if the script tag
> has the type attribute "mce-math/mml"?

I'm wondering if you can use the postprocessor to change the className
of the script when you remove the CDATA block? That should allow
MathJax to process it.

If not, then something like

<script type="text/x-mathjax-config">
MathJax.Hub.Register.StartupHook("End Jax",function () {
var TEX = MathJax.InputJax.TeX;
if (TEX) {TEX.Register("mce-math/mml")}
});
</script>

should bind mce-math/mml to the TeX input jax (in addition to math/mml).

Davide

Asit

unread,
Apr 10, 2013, 8:46:24 AM4/10/13
to mathja...@googlegroups.com
Hi Lele

Can i see your developed editor?  Because i am searching a whysiwyg html equation editor to integrate in my project. Please replay.
Reply all
Reply to author
Forward
0 new messages