Updated breakdown proposal for the STIX fonts

47 views
Skip to first unread message

Frédéric WANG

unread,
Mar 29, 2013, 7:25:48 AM3/29/13
to mathj...@googlegroups.com
So I've analyzed the output I gave in a previous message and can give a
new proposal that includes the whole set of glyphs in the STIX family.
For the non-unicode characters, we can extract these portions from each
font:

****
STIX_NonUnicode (for STIX-Regular, STIX-Bold, STIX-BoldItalic,
STIXMath-Regular)

Private Use Area (U+E000..U+F8FF)
Other Non Unicode Glyphs (excluding components that are in STIX_Size5 below)

Not sure if we will need all these glyphs. The "Non Unicode Glyphs" can
be moved to the Supplementary Private Use Area-A (U+F0000..U+FFFFF) if
we want them to be accessible.

STIX_Size0, STIX_Size1, STIX_Size2, STIX_Size3, STIX_Size4, STIX_Size5
(for STIXMath-Regular only)

These fonts will contain the glyphs from STIXMath that are referenced in
the Open Type Math Table. The glyphs will be located at the same
position as the stretchy operator for size variants (STIX_Size0 -
STIX_Size4) and in the Private Use Area (U+E000..U+F8FF) for
horizontal/vertical components (STIX_Size5).
****

Next, note that the Unicode code blocks of STIX-Regular is a superset of
the others. Below is the new classification for Unicode characters (for
STIX-Regular, STIX-Bold, STIX-BoldItalic, STIXMath-Regular when the
blocks are nonempty). I'm wondering if the glyphs from STIXMath are just
duplicate of those from STIX-Regular or provide a different math style
that we would like to use?

****
STIX_Main

Basic Latin
Greek and Coptic
Specials: U+00FFFD (REPLACEMENT CHARACTER)
Should the glyphs necessary for the AMS extensions be included in the
STIX_Main?

STIX_Bold, STIX_Italic, STIX_BoldItalic, STIX_Script, STIX_Fraktur,
STIX_DoubleStruck, STIX_SansSerif-Regular, STIX_Monospace

These fonts will contain the corresponding subset of the Mathematical
Alphanumeric Symbols. Glyphs from STIX_Bold, STIX_Italic,
STIX_BoldItalic are probably duplicate of the STIXMain-Bold,
STIXMain-Italic etc that's why I didn't mention them in my original
proposal. But we probably want to keep...

STIX_Latin

Latin-1 Supplement
Latin Extended-A
Latin Extended-B
Latin Extended Additional
Latin Extended-D
Alphabetic Presentation Forms

STIX_Alphabets

Cyrillic
Hiragana
Letterlike Symbols

STIX_Marks

Spacing Modifier Letters
Combining Diacritical Marks
Combining Diacritical Marks for Symbols
General Punctuation
CJK Symbols and Punctuation

STIX_Arrows

Arrows
Supplemental Arrows-A
Supplemental Arrows-B

STIX_Operators

Mathematical Operators
Supplemental Mathematical Operators

STIX_Symbols

Miscellaneous Technical
Miscellaneous Mathematical Symbols-A
Miscellaneous Mathematical Symbols-B

STIX_Shapes

Geometric Shapes
Miscellaneous Symbols
Miscellaneous Symbols and Arrows
Block Elements
Box Drawing
Control Pictures

STIX_Misc

Superscripts and Subscripts
Enclosed Alphanumerics
Currency Symbols
Phonetic Extensions
Phonetic Extensions Supplement
IPA Extensions
Dingbats
Number Forms
****

--
Fr�d�ric Wang
maths-informatique-jeux.com/blog/frederic

Frédéric WANG

unread,
Mar 29, 2013, 11:29:28 AM3/29/13
to mathj...@googlegroups.com
On 29/03/2013 12:25, Fr�d�ric WANG wrote:
> So I've analyzed the output I gave in a previous message and can give
> a new proposal that includes the whole set of glyphs in the STIX family.
I've updated the scripts to follow the new proposal. I attach the latest
fontdata with otf sizes, unicode blocks and stretchy operator constructions.

> The "Non Unicode Glyphs" can be moved to the Supplementary Private Use
> Area-A (U+F0000..U+FFFFF) if we want them to be accessible.
> I'm wondering if the glyphs from STIXMath are just duplicate of those
> from STIX-Regular or provide a different math style that we would like
> to use
For the moment I'm not quite sure about how we want to handle these two
points, so I haven't done anything.
STIX-Web.zip

Frédéric WANG

unread,
Apr 5, 2013, 6:19:46 AM4/5/13
to mathj...@googlegroups.com
Davide, would it be possible to get more information about the
fontdata.js content?

In particular the VARIANT, RANGES and DELIMITERS.HW members?

Frédéric WANG

unread,
Apr 8, 2013, 10:20:10 AM4/8/13
to mathj...@googlegroups.com
On 05/04/2013 12:19, Fr�d�ric WANG wrote:
> Davide, would it be possible to get more information about the
> fontdata.js content?
>
> In particular the VARIANT, RANGES and DELIMITERS.HW members?
>
For the HW list, would it make sense to use

height or width of variant character / height or width of normal character

for the numeric parameter?

Davide P. Cervone

unread,
Apr 8, 2013, 1:38:16 PM4/8/13
to mathj...@googlegroups.com
> would it be possible to get more information about the fontdata.js content?
>
> In particular the VARIANT, RANGES and DELIMITERS.HW members?

VARIANT holds the information for handling the various mathvariant possibilities. So mathvariant="bold" corresponds to VARIANT["bold"]. The contents of the structure include:

fonts: The fonts to use for the variant (they are checking in order and the first one containing the character is the one used),

remap: a structure that specifies characters to remap within this variant (this is done before looking through the fonts). The remapping can map into another variant as in 0x2216:[0x2216,"-TeX-variant"]

bold: indicates the font should have font-weight:bold

italics: indicates that the font should have font-style:italic

There are also offset and variant values that correspond to the RANGES array.

The RANGES array is for remapping groups of characters at once. This was originally intended for mapping variants to the Mathematical Alphabet ranges for the STIX fonts, but is also used to remap Greek and some other characters. Each range has an identifier (given by its "offset" property), and is only used in a VARIANT that has an offset with that letter. E.g., in the STIX data, there is a range

{name: "Alpha", low: 0x41, high: 0x5A, offset: "A"}

and this means that VARIANT's with an offsetA property will have the letters between 0x41 and 0x%a (capital letters) remapped. Because there is a VARIANT defined as

"double-struck": {offsetA: 0x1D538, offsetN: 0x1D7D8,
remap: {0x1D53A: 0x2102, 0x1D53F: 0x210D, 0x1D545: 0x2115, 0x1D547: 0x2119,
0x1D548: 0x211A, 0x1D549: 0x211D, 0x1D551: 0x2124}},

with offsetA:0x1D538, this means that capital letters in this variant will be mapped to the double-struck letter in the STIX fonts beginning at 0x1D538. So <mi mathvariant="double-struck">A</mi> will end up using the character in the STIX fonts at U+1D538. The offsetN takes care of the double-struck numbers.

The remap property says that a few characters are exceptions (there are gaps in the Plane1 characters since some double-struck characters already appear in the letter-like symbols). These are remapped before the RANGES are applied.

The RANGES can also have a remapping, in which case the remapping value is an offset within the range. This used with the Greek letters, for example, to map the san-serif variant to the PUA glyphs in the STIX font for these characters (the remapping handles a few variant symbols that are in different locations in PUA than they are in the Greek and Coptic block).

The RANGES data can also include an "add" property, which is an additional offset for this range. This allows two ranges to use the same offsetX in the VARIANT list. For example, the upper and lower case letters in the Math Alphabets need this because the upper and lower case letters in the ASCII range have several characters between them, but in the Math Alphabet blocks, there are none. So two separate ranges are used, with a common offsetA value, but "add" tells where to start the lower case letters.

A VARIANT value can also include "variantX" (where X is the letter used to identify the RANGES entry), which not only remaps the character positions, but also switches to another VARIANT. So, for example, the lower-case Greek letters in the MathJax normal variant are mapped to the italic variant, since there are no upright lower-case Greek letters in the font set.


As for DELIMITERS.HW, the DELIMITERS object lists the characters that can stretch either vertically or horizontally (MathJax doesn't support stretching in both directions for the same character). The "dir" property tells which direction the character stretches in, and the HW array is a list of characters that are stretched versions of the character. These are in increasing size, and the entries in the array are themselves arrays giving the height or width (depending on whether we are stretching vertically or horizontally) of the character, plus the font where it is found. So

0x0028: // (
{
dir: V, HW: [[1,MAIN],[1.2,SIZE1],[1.8,SIZE2],[2.4,SIZE3],[3.0,SIZE4]],
stretch: {top: [0x239B,SIZE4], ext: [0x239C,SIZE4], bot: [0x239D,SIZE4]}
},

says that U+0028 (left parenthesis) can stretch vertically, and there are five single-character sizes available. These are taken from the MAIN, SIZE1, SIZE2, SIZE3, and SIZE4 fonts, and the heights are 1em,1.2em, 1.8em, 2.4em, and 3.0em respectively. If a parenthesis is needed in a larger size, it is made from the characters specified in the "stretch" property. These give the top, extender, and bottom pieces as a pair (the character number and font).

The HW pairs can also include additional information. The data can actually be [size,font,scale,codepoint], where "size" is the size in em's of the character, "font" is the font from which to take it, "scale" is a decimal number used to scale the character (1.5 is 50% larger, .75 is 25% smaller, 0 is no change, and the default is 0), and "codepoint" is the unicode position of the character to use (default is the position of the delimiter being stretched). So

0x23DC: // top paren
{
dir: H, HW: [[.778,AMS,0,0x2322],[1,MAIN,0,0x2322]],
stretch: {left:[0xE150,SIZE4], rep:[0xE154,SIZE4], right:[0xE151,SIZE4]}
},

says that there are two sizes of U+23DC, and that the first is in the AMS font at U+2322 and is .778em wide, while the second is in the MAIN font at U+2322 and is 1em wide.

The stretchy character data also can include additional data, used for fine tuning the positioning and sizing of the characters. The data is of the form [codepoint,font,dx,dy,scale,dh,dd] where "codepoint" is the unicode position of the character to use, "font" is the font to take it from, "dx" and "dy" are horizontal and vertical offsets to apply to the character, "scale" is a scaling factor used to adjust the character size, and "dh" and "dd" are adjustments to make to the characters height and depth (to make it have more height or depth than its glyph actually has). The latter is used to make things like arrow extenders have the same vertical size as the arrowheads, for example.

I think that covers pretty much all of it. Let me know if something more needs explanation.

Davide

Davide P. Cervone

unread,
Apr 8, 2013, 1:50:06 PM4/8/13
to mathj...@googlegroups.com
> For the HW list, would it make sense to use
>
> height or width of variant character / height or width of normal character
>
> for the numeric parameter?

I don't think so. If you go to a ratio, these will tend to be decimal numbers, and you will need to store more characters in the file to handle the decimal. It is better to have things like .778, 1, 1.2, 1.5 rather than 1, 1.285, 1.542, 1.928 (since that saves 5 characters). Also, the delimiter code has a width (or height) that it is looking for and can simply compare to this list, rather than having to normalize it to be a comparison to the natural size. Finally, some characters have a smaller version than the natural size, and so the first entry in the list may not be the normal one, so there would need to be more data in order to tell which is the one to normalize against.

I'm not sure what you are trying to accomplish by the change. Can you say what is motivating your suggestion?

Davide

Frédéric WANG

unread,
Apr 8, 2013, 2:04:08 PM4/8/13
to mathj...@googlegroups.com
On 08/04/2013 19:50, Davide P. Cervone wrote:
>> I'm not sure what you are trying to accomplish by the change. Can you say what is motivating your suggestion?
I was just trying to figure out what the parameter is and how I should
construct the data (preferably automatically).

Davide P. Cervone

unread,
Apr 8, 2013, 5:33:37 PM4/8/13
to mathj...@googlegroups.com
OK, I think I misread the question.

The number that you need to use in the HW list can be obtained from the data file where the character actually resides. For example, for the example of the open parenthesis:

0x0028: // (
{
dir: V, HW: [[1,MAIN],[1.2,SIZE1],[1.8,SIZE2],[2.4,SIZE3],[3.0,SIZE4]],
stretch: {top: [0x239B,SIZE4], ext: [0x239C,SIZE4], bot: [0x239D,SIZE4]}
},

in order to get the [1,MAIN], you would need to look in the data for MathJax_Main (regular) and find the entry for U+0028:

0x28: [750,250,389,94,333], // LEFT PARENTHESIS

The data here is [h,d,w,l,r] where "h" is the height, "d" the depth, "w" the width, "l" is the left bearing and "r" is the right bearing (if I recall correctly). But these are in thousandths of an em (1 em = 1000).

So for a vertical stretchy character, you want the sum of h+d for the size in [size,font]. In our case, 750+250 = 1000 = 1 em, so we get [1,MAIN].

For the SIZE1 version, the data are

0x28: [850,349,458,152,422], // LEFT PARENTHESIS

so we get 850+349 = 1199 which is effectively 1200, or 1.2 em, so [1.2,SIZE1].

For horizontal stretchy characters, use the w value (divided by 1000) for the em size.

Davide



On Apr 8, 2013, at 2:04 PM, Frédéric WANG wrote:

> On 08/04/2013 19:50, Davide P. Cervone wrote:
>>> I'm not sure what you are trying to accomplish by the change. Can you say what is motivating your suggestion?
> I was just trying to figure out what the parameter is and how I should construct the data (preferably automatically).
>
> --
> Frédéric Wang
> maths-informatique-jeux.com/blog/frederic
>
> --
> You received this message because you are subscribed to the Google Groups "MathJax Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mathjax-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Frédéric WANG

unread,
Apr 9, 2013, 4:23:16 AM4/9/13
to mathj...@googlegroups.com
On 08/04/2013 23:33, Davide P. Cervone wrote:
> OK, I think I misread the question.
>
> The number that you need to use in the HW list can be obtained from the data file where the character actually resides. For example, for the example of the open parenthesis:
>
> 0x0028: // (
> {
> dir: V, HW: [[1,MAIN],[1.2,SIZE1],[1.8,SIZE2],[2.4,SIZE3],[3.0,SIZE4]],
> stretch: {top: [0x239B,SIZE4], ext: [0x239C,SIZE4], bot: [0x239D,SIZE4]}
> },
>
> in order to get the [1,MAIN], you would need to look in the data for MathJax_Main (regular) and find the entry for U+0028:
>
> 0x28: [750,250,389,94,333], // LEFT PARENTHESIS
>
> The data here is [h,d,w,l,r] where "h" is the height, "d" the depth, "w" the width, "l" is the left bearing and "r" is the right bearing (if I recall correctly). But these are in thousandths of an em (1 em = 1000).
>
> So for a vertical stretchy character, you want the sum of h+d for the size in [size,font]. In our case, 750+250 = 1000 = 1 em, so we get [1,MAIN].
>
> For the SIZE1 version, the data are
>
> 0x28: [850,349,458,152,422], // LEFT PARENTHESIS
>
> so we get 850+349 = 1199 which is effectively 1200, or 1.2 em, so [1.2,SIZE1].
>
> For horizontal stretchy characters, use the w value (divided by 1000) for the em size.
>
> Davide
OK, thanks. I've modified the script to generate these values using the
glyph's boundingBox and font's em, as provided by the Python script
(http://fontforge.org/python.html) and that seems to be the metrics
obtained when you generate the afm. Now the fontdata file matches more
or less the old set of fonts, with some differences:

1) In the old data, it seems that the HW table always contains a glyph
from the GENERAL font. The contructions described in the OpenType Math
table sometimes contains a MATHSIZE0 size variant but not always
(sometimes a glyph has no size variants at all). Should I modify the
script to always have a MATHSIZE0 variant, even when it is not specified
in the OpenType Math table?

2) The new font data does not have the scale factor hack used to provide
more size variants.

3) The new font data seem to specify less constructions of stretchy
operators.

It seems that if we still want 2) and 3) for the STIX Web fonts, we need
to add more data by hand to complete the OpenType Math table. However, I
suspect that if the STIX designers didn't put these constructions in the
OpenType Math table then the stretchy operators are not intended to be
built that way.

Frédéric WANG

unread,
Apr 9, 2013, 5:36:19 AM4/9/13
to mathj...@googlegroups.com
On 08/04/2013 19:38, Davide P. Cervone wrote:
> I think that covers pretty much all of it. Let me know if something
> more needs explanation. Davide

In the STIX-Word version, there is one STIX font for each style + a
STIXMath font. The way I split the STIX fonts is

1) I keep the version for each style (Regular, Italic, Bold and BoldItalic)
2) I split according to some Unicode subsets. In particular, I use
"mathvariant" subsets for the Mathematical Alphanumeric Symbols block.

That was not really yet clear in the proposal, but currently I do

1) STIX_*-Regular, STIX_*-Italic, STIX_*-Bold, STIX_*-BoldItalic
2) STIX_Normal-* (Bold, Italic, BoldItalic), STIX_Script-* (Script,
BoldScript), STIX_Fraktur-* (Fraktur, BoldFraktur), STIX_DoubleStruck-*
(DoubleStruck), STIX_SansSerif-* (SansSerif, SansSerifBold,
SansSerifItalic, SansSerifBoldItalic), STIX_Monospace-* (Monospace)

In particular, you can combine the two classifications and
STIX_Script-Italic contains the script characters from the Mathematical
Alphanumeric Symbols and from the original STIX-Italic font.

In theory, "mathvariant" is just supposed to remap some characters to
the corresponding Mathematical Alphanumeric Symbols character. However,
browsers currently just use font-style/font-weight to emulate some
mathvariant values and IIUC in our implementation we use alternative
fonts+font-style/font-weight or unicode remapping. The MathML spec
mentions the possibility to use font-style/font-weight IIRC.

I guess what I should do here is to follow the intended use of
"mathvariant" i.e. just do a remapping. Hence VARIANTS should contain

"normal": {fonts: [STIX_*-*]
"bold": {fonts: [STIX_Normal-*]},
"italic": {fonts: [STIX_Normal-*]},
"fraktur": {fonts: [STIX_Fraktur-*]},
"bold-fraktur": {fonts: [STIX_Fraktur-*]},
...
"monospace": {fonts: [STIX_Monospace-*]},

and RANGES should do the mathvariant remapping. That way people could
use both mathvariant="..." or style="...", for example <mtext
mathvariant="script" style="font-style: italic"> to pick characters from
STIX_Script-Italic (actually I think the MathML spec says that
font-style: italic should be ignored, but we don't do that at the moment).

In the VARIANTS table, should I only specify the STIX_*-Regular style or
also the STIX_*-Italic, STIX_*-Bold, STIX_*-BoldItalic styles? (I guess
it is safe to specify everything, but the browser should be able to find
the right one from the family?)

Also, how should I complete the VARIANTS tables for the "-STIX-variant",
"-tex-caligraphic", "-tex-oldstyle", "-tex-mathit", "-largeOp" and
"-smallOp" keys?

Davide P. Cervone

unread,
Apr 9, 2013, 9:01:31 AM4/9/13
to mathj...@googlegroups.com
>> Now the fontdata file matches more or less the old set of fonts, with some differences:
>
> 1) In the old data, it seems that the HW table always contains a glyph from the GENERAL font. The contructions described in the OpenType Math table sometimes contains a MATHSIZE0 size variant but not always (sometimes a glyph has no size variants at all). Should I modify the script to always have a MATHSIZE0 variant, even when it is not specified in the OpenType Math table?

I don't know what the OpenType Math Table looks like, but the HW array should include the "standard" size of the character (I'm assuming that's the MATHSIZE0 character that you mention) unless there is none in the font (there were a couple of cases with the TeX fonts where that was the case). So HW should not include just the larger variants, but also the original unscaled character as well.

> 2) The new font data does not have the scale factor hack used to provide more size variants.

This is used with a number of vertically stretched glyphs in order to make sure there are sizes that correspond to the \big, \Big, \bigg, and \Bigg macros in TeX. Without this, these macros might not produce different results. So it is important that this be maintained for things like the parentheses, brackets, braces, and other characters that might be used with the \big macros. And the scaling factors should be chosen to get the heights that correspond to the ones needed for those sizes.

> 3) The new font data seem to specify less constructions of stretchy operators.
>
> It seems that if we still want 2) and 3) for the STIX Web fonts, we need to add more data by hand to complete the OpenType Math table.
> However, I suspect that if the STIX designers didn't put these constructions in the OpenType Math table then the stretchy operators are not intended to be built that way.

I put together as many stretchy glyphs as I could reasonably do using the available pieces, so some may not have been intended. I used the MathML operator table to see what glyphs were stretchy by default, and if I could do it, I made data to stretch them. How many are not in the OpenType Math table, and how bad are the results using the old font data file? (I know not all of them were great.)

I think it would be best to cover as many of the ones that are labeled stretchy in the operator table as we can (whether they are in the OpenType math table or not). For instance, I think I remember your saying that the equal sign is not stretchy in the OpenType math table, but it is important to the AMScd extension that this be stretchy (and it is easy to do it).

Davide

Frédéric WANG

unread,
Apr 9, 2013, 9:27:34 AM4/9/13
to mathj...@googlegroups.com
On 09/04/2013 15:01, Davide P. Cervone wrote:
> I don't know what the OpenType Math Table looks like, but the HW array
> should include the "standard" size of the character (I'm assuming
> that's the MATHSIZE0 character that you mention) unless there is none
> in the font (there were a couple of cases with the TeX fonts where
> that was the case). So HW should not include just the larger variants,
> but also the original unscaled character as well.
I don't think the unscaled character is in the OpenType Math Table so
I'll just modify the script to always add it to the HW array. MATHSIZE0
is the normal character from the STIXMath font, but I'm not sure why it
is sometimes included in the table (I added this 0 index when I realized
that). I suspect it also is the same character as the one from the
STIX-Regular font, but I have to check that.

> This is used with a number of vertically stretched glyphs in order to
> make sure there are sizes that correspond to the \big, \Big, \bigg,
> and \Bigg macros in TeX. Without this, these macros might not produce
> different results. So it is important that this be maintained for
> things like the parentheses, brackets, braces, and other characters
> that might be used with the \big macros. And the scaling factors
> should be chosen to get the heights that correspond to the ones needed
> for those sizes.
Is there a simple rule to automatically add them? It seems that the
added size variant is often SIZE1 scaled by a factor of 1.1.

> I put together as many stretchy glyphs as I could reasonably do using
> the available pieces, so some may not have been intended. I used the
> MathML operator table to see what glyphs were stretchy by default, and
> if I could do it, I made data to stretch them. How many are not in the
> OpenType Math table, and how bad are the results using the old font
> data file? (I know not all of them were great.) I think it would be
> best to cover as many of the ones that are labeled stretchy in the
> operator table as we can (whether they are in the OpenType math table
> or not). For instance, I think I remember your saying that the equal
> sign is not stretchy in the OpenType math table, but it is important
> to the AMScd extension that this be stretchy (and it is easy to do it).

OK, it should be doable to add some data to teach more constructions to
the font generator.

Davide P. Cervone

unread,
Apr 9, 2013, 9:50:07 AM4/9/13
to mathj...@googlegroups.com
>> This is used with a number of vertically stretched glyphs in order to make sure there are sizes that correspond to the \big, \Big, \bigg, and \Bigg macros in TeX. Without this, these macros might not produce different results. So it is important that this be maintained for things like the parentheses, brackets, braces, and other characters that might be used with the \big macros. And the scaling factors should be chosen to get the heights that correspond to the ones needed for those sizes.
> Is there a simple rule to automatically add them? It seems that the added size variant is often SIZE1 scaled by a factor of 1.1.

It looks like there are three scaled versions: SIZE1 by 1.1, SIZE2 by 1.11, and SIZE3 by 1.005. It looks like all the standard TeX delimiters get these sizes. (You can see what they are in the old data file.) I didn't see any other scalings (but didn't look really carefully).

Davide

Frédéric WANG

unread,
Apr 9, 2013, 9:54:29 AM4/9/13
to mathj...@googlegroups.com
On 09/04/2013 11:36, Fr�d�ric WANG wrote:
> On 08/04/2013 19:38, Davide P. Cervone wrote:
>> I think that covers pretty much all of it. Let me know if something
>> more needs explanation. Davide
>
So I've done the changes for VARIANT:

https://github.com/fred-wang/MathJax/blob/stix-web-fonts/unpacked/jax/output/HTML-CSS/fonts/STIX-Web/fontdata.js

I'm not sure yet if the remapping to PUA for Greek letters is necessary
or if the arrays REMAP, REMAPACCENT, REMAPACCENTUNDER should be
completed. I haven't changed TeX_factor, baselineskip, lineH or lineD.

Regarding the RULECHAR character, is it assumed to come from a specific
font? The one currently used is not in STIX_Main at the moment.

Frédéric WANG

unread,
Apr 11, 2013, 6:29:50 AM4/11/13
to mathj...@googlegroups.com
On 09/04/2013 15:50, Davide P. Cervone wrote:
> It looks like there are three scaled versions: SIZE1 by 1.1, SIZE2 by
> 1.11, and SIZE3 by 1.005. It looks like all the standard TeX
> delimiters get these sizes. (You can see what they are in the old data
> file.) I didn't see any other scalings (but didn't look really
> carefully). Davide
But how did you get these values for the STIX fonts?

In the array for the left parenthesis, I read:

TeX:
TeX_factor: 1
[[1,MAIN],[1.2,SIZE1],[1.8,SIZE2],[2.4,SIZE3],[3.0,SIZE4]]

STIX:
TeX_factor: 1.125
[[.844,GENERAL],[1.230,SIZE1],[1.353,SIZE1,1.1],[1.845,SIZE2],
[2.048,SIZE2,1.11],[2.460,SIZE3],[2.472,SIZE3,1.005],[3.075,SIZE4]]

I see that the growth of the non-scaled character is proportional:

TeX: 1.2 1.8 2.4 3.0
STIX: 1.230 1.845 2.460 3.075
(1 1.5 2 2.5)

In TeX input jax, I see

big: ['MakeBig',MML.TEXCLASS.ORD,0.85],
Big: ['MakeBig',MML.TEXCLASS.ORD,1.15],
bigg: ['MakeBig',MML.TEXCLASS.ORD,1.45],
Bigg: ['MakeBig',MML.TEXCLASS.ORD,1.75],
TEXDEF.p_height: 1.2 / .85

and MakeBig computes the em values by size * TEXDEF.p_height:

1.2
1.623(529411764706)
2.0470(58823529412)
2.470(588235294118)

But I don't see how it is related to the values in the TeX/STIX arrays
(except that 1.2 is SIZE1 in TeX and 2.0470 and 2.470 is the size of two
scaled STIX characters). TeX_factor is used in the HTML output jax, but
I don't understand how it relates to the other parameters either.

Frédéric WANG

unread,
Apr 11, 2013, 8:18:34 AM4/11/13
to mathj...@googlegroups.com
On 09/04/2013 15:01, Davide P. Cervone wrote:
> I put together as many stretchy glyphs as I could reasonably do using
> the available pieces, so some may not have been intended. I used the
> MathML operator table to see what glyphs were stretchy by default, and
> if I could do it, I made data to stretch them. How many are not in the
> OpenType Math table, and how bad are the results using the old font
> data file? (I know not all of them were great.) I think it would be
> best to cover as many of the ones that are labeled stretchy in the
> operator table as we can (whether they are in the OpenType math table
> or not). For instance, I think I remember your saying that the equal
> sign is not stretchy in the OpenType math table, but it is important
> to the AMScd extension that this be stretchy (and it is easy to do it).
The characters that are in our font data but not the OpenType table:

-0x2015
-0x2017
-0x219E
-0x219F
-0x21A0
-0x21A1
-0x21A5
-0x21A7
-0x21A8
-0x21A9
-0x21AA
-0x21B0
-0x21B1
-0x21B2
-0x21B3
-0x21B4
-0x21B5
-0x21C1
-0x21CB
-0x21CC
-0x21E0
-0x21E1
-0x21E2
-0x21E3
-0x21E4
-0x21E5
-0x21FD
-0x21FE
-0x21FF
-0x2212
-0x2215
-0x2223
-0x2225
-0x2329
-0x232A
-0x23AA
-0x23AF
-0x23B0
-0x23B1
-0x23D0
-0x2500
-0x2758
-0x27F5
-0x27F6
-0x27F7
-0x27F8
-0x27F9
-0x27FA
-0x27FB
-0x27FC
-0x27FD
-0x27FE
-0x2906
-0x2907
-0x2912
-0x2913
-0x294E
-0x294F
-0x2950
-0x2951
-0x2952
-0x2953
-0x2954
-0x2955
-0x2956
-0x2957
-0x2958
-0x2959
-0x295A
-0x295B
-0x295C
-0x295D
-0x295E
-0x295F
-0x2960
-0x2961
-0x2980
-0x2997
-0x2998
-0x2C7
-0x2C9
-0x2CD
-0x2D
-0x3008
-0x3009
-0x3D
-0x5E
-0x5F
-0x7E
-0xAF
-0xFE37
-0xFE38

Among the above characters, those are not stretchy/largeop/fence in the
MathML3 operator dictionary (they might have been in MathML2)

-0x2015
-0x2017
-0x2212
-0x2223
-0x2225
-0x2329
-0x232A
-0x23AA
-0x23AF
-0x23B0
-0x23B1
-0x23D0
-0x2500
-0x2758
-0x2906
-0x2907
-0x2C7
-0x2D
-0x2F
-0x3008
-0x3009
-0x303
-0x30C
-0x332
-0x3D
-0x5C
-0xFE37
-0xFE38

Those that are in the Open Type Math table but not in our font data:

+0x20D0
+0x20D1
+0x20D6
+0x20D7
+0x20E1
+0x20EC
+0x20ED
+0x20EE
+0x20EF
+0x2140
+0x220F
+0x2210
+0x2211
+0x221B
+0x221C
+0x222B
+0x222C
+0x222D
+0x222E
+0x222F
+0x2230
+0x2231
+0x2232
+0x2233
+0x22C0
+0x22C1
+0x22C2
+0x22C3
+0x2772
+0x2773
+0x27F0
+0x27F1
+0x2983
+0x2984
+0x2985
+0x2986
+0x29F8
+0x29F9
+0x2A00
+0x2A01
+0x2A02
+0x2A03
+0x2A04
+0x2A05
+0x2A06
+0x2A07
+0x2A08
+0x2A09
+0x2A0A
+0x2A0B
+0x2A0C
+0x2A0D
+0x2A0E
+0x2A0F
+0x2A10
+0x2A11
+0x2A12
+0x2A13
+0x2A14
+0x2A15
+0x2A16
+0x2A17
+0x2A18
+0x2A19
+0x2A1A
+0x2A1B
+0x2A1C
+0x2AFC
+0x2AFF
+0x2B45
+0x2B46
+0x305
+0x330
+0x338

Those that are in the OpenType table but not the operator dictionary:

-0x20D0
-0x20D1
-0x20D6
-0x20D7
-0x20E1
-0x20EC
-0x20ED
-0x20EE
-0x20EF
-0x2140
-0x221B
-0x221C
-0x29F8
-0x29F9
-0x2F
-0x303
-0x305
-0x30C
-0x330
-0x332
-0x338
-0x5C

Davide P. Cervone

unread,
Apr 11, 2013, 1:55:21 PM4/11/13
to mathj...@googlegroups.com
> I guess what I should do here is to follow the intended use of "mathvariant" i.e. just do a remapping. Hence VARIANTS should contain
>
> "normal": {fonts: [STIX_*-*]
> "bold": {fonts: [STIX_Normal-*]},
> "italic": {fonts: [STIX_Normal-*]},
> "fraktur": {fonts: [STIX_Fraktur-*]},
> "bold-fraktur": {fonts: [STIX_Fraktur-*]},
> ...
> "monospace": {fonts: [STIX_Monospace-*]},
>
> and RANGES should do the mathvariant remapping. That way people could use both mathvariant="..." or style="...", for example <mtext mathvariant="script" style="font-style: italic"> to pick characters from STIX_Script-Italic (actually I think the MathML spec says that font-style: italic should be ignored, but we don't do that at the moment).

I thought we had fixed that, but I see that we haven't. I think we should probably change that to follow the specification.

> In the VARIANTS table, should I only specify the STIX_*-Regular style or also the STIX_*-Italic, STIX_*-Bold, STIX_*-BoldItalic styles?

You will want to specify the Italic forms for the italic variants, the bold for the bold variants, and so on. The list of fonts for these variants should also include the normal fonts, in general, since if a character isn't in the italic form, we should at least show the non-italic form.

> (I guess it is safe to specify everything, but the browser should be able to find the right one from the family?)

The font list is not actually seen by the browser. It is used internally by MathJax to look through the data for each font to locate the characters and only that one font is put into the font-famiy. (And the italic and bold properties are use to add font-style and font-weight values, if needed).

> Also, how should I complete the VARIANTS tables for the "-STIX-variant", "-tex-caligraphic", "-tex-oldstyle", "-tex-mathit", "-largeOp" and "-smallOp" keys?

The -STIX-variant is there to remap certain characters that have more than one form (like the prime symbol that is already in the superscript position in the STIXGeneral font, but needs to be scaled an positioned in the STIXVariants font). There are several characters that use this, listed in the remap data. Some get mapped to the PUA characters in STIXNonUnicode. I would expect modeling it after the existing one should be sufficient, but you may need to change the remappings to be to the places those original characters are in your new web fonts.

The -tex-caligraphic variant is used for \cal in TeX input, and maps to the calligraphic characters in the PUA. This is only used by the TeX input jax, and since TeX only provides upper case calligraphic characters, this has noLowerCase set to true. You just have to arrange for this to proceed the calligraphic characters using the proper offsetA and include the default normal fonts so that everything else goes to them.

The -tex-oldstyle variant is used to get the "old-style" numbers. These are the numbers that extend below the baseline for some of them (3, 4, 5, 7, and 9; use $0123456789 \oldstyle 0123456789$ in TeX to see the difference). These used to be in the PUA, but not in consecutive locations, so there is a complicated remapping for this one. Again, you may need to adjust that to the locations where these character are in the new fonts, and you should include defaults to the normal fonts.

The -tex-mathit variant should be the same as italic but with noIC set to true. (This is italic correction.) The TeX fonts have separate italic fonts for mathematics and text, as the spacing and kerning are different in the two situations. The STIX fonts don't have this distinction, so they are basically the same. We turn off the algorithmic italic correction, which emulates the difference between the two. Use $different\text{ and }\mathit{different}$ to see the difference between the two.

The -largeop variant is used to get the larger operator size for operators like \sum and \int in display style. So this should start with the MATHSIZE1 font. I don't know if you have put the integrals into that font, but if not, the integral font should come next. Then the normal fonts.

Finally, the "-smallop" variant is used to get the normal size for the operators. It can be left blank as it currently is, as that means default to the normal variant.

Hope that helps.

Davde

Davide P. Cervone

unread,
Apr 11, 2013, 2:22:56 PM4/11/13
to mathj...@googlegroups.com
I've taken a quick look at the new font data file, and did see some things that may need fixing.

* I'm not sure the bold and italic forms should be listed in the normal font list, as normal should not be producing bold or italic forms, except for some very special cases (some characters in the letter-like symbols, and the math alphabets). Since most characters in those fonts are already in the regular versions, it probably doesn't hurt; the time it does make difference, however, is when a character is used that doesn't appear in the STIX fonts. That would cause the data for every one of those fonts to be loaded before MathJax could tell that the character was not available, and that might be a bit time-consuming.

* The bold variant should start with NORMALBOLD not NORMALREGULAR otherwise MathJax will find the regular form first and not show the bold one. I guess NORMALBOLDITALIC would be next, then anywhere else that bold character might come from, then the regular versions. The fonts need to be in the order in which the characters should be taken. You will also need to add the bold:true property.

* Similarly for the italic, bold-italic, etc. variants, and the other variants. Remember that there should be fallback to the normal versions so that, e.g., <mtext mathvariant="script">x+y</mtext> would still produce the normal + sign. In the old STIX data and the TeX data, there were not many fonts that were the "normal" fonts, so this was not hard. But now with so many, it may be that we need to modify the code to look through the normal variant after exhausting the font list for the given variant. That way, you don't have to include redundant data.

* The old REMAP data mapped a few characters that weren't in the STIX font to comparable glyphs. The ones for 0x3008 and 0x3009 should be maintained, as this was where \langle and \rangle were in an earlier version of unicode, so some legacy MathML might refer to these. REMAPACCENT is also important to get the proper sized arrow in <mover>.

* I think the delimiter data needs to use font names other than MATHSIZEn, as not all the needed characters are in those sizes. For example, the arrow for U+2190 is is the ARROWSREGULAR not the MATHSIZE1 font, right? So the HW arrays need to refer to the fonts in which the characters are actually located.

* I don't think you want to include all the data about all the fonts in fontdata.js, only the most important fonts and the most important characters in those fonts, in order to allow it to download faster. The rest of the data will be downloaded as needed. The original fontdata.js included basically all the characters available in plain TeX, an let the others be loaded when needed. The main.js files included these "main" characters, and the other files handled the rest.

* There are some characters whose data seems not to be correct. E.g., in STIX_Alphabets-bold, 0x210C (BLOCK-LETTER CAPITAL H) has font data [0,0,1000,0,0], which I don't think is correct. Many of the other block letters are also with this data, and this occurs in other fonts as well.

Anyway, that's what I see at the moment.

Davide


On Apr 9, 2013, at 9:54 AM, Frédéric WANG wrote:

> On 09/04/2013 11:36, Frédéric WANG wrote:
>> On 08/04/2013 19:38, Davide P. Cervone wrote:
>>> I think that covers pretty much all of it. Let me know if something more needs explanation. Davide
>>
> So I've done the changes for VARIANT:
>
> https://github.com/fred-wang/MathJax/blob/stix-web-fonts/unpacked/jax/output/HTML-CSS/fonts/STIX-Web/fontdata.js
>
> I'm not sure yet if the remapping to PUA for Greek letters is necessary or if the arrays REMAP, REMAPACCENT, REMAPACCENTUNDER should be completed. I haven't changed TeX_factor, baselineskip, lineH or lineD.
>
> Regarding the RULECHAR character, is it assumed to come from a specific font? The one currently used is not in STIX_Main at the moment.
>
> --

Davide P. Cervone

unread,
Apr 12, 2013, 9:07:13 AM4/12/13
to mathj...@googlegroups.com
>> It looks like there are three scaled versions: SIZE1 by 1.1, SIZE2 by 1.11, and SIZE3 by 1.005. It looks like all the standard TeX delimiters get these sizes. (You can see what they are in the old data file.) I didn't see any other scalings (but didn't look really carefully). Davide
>
> But how did you get these values for the STIX fonts?

Well, that was done so long ago (4 years or more at this point) that I no longer remember exactly who that was done. I suspect it may just have been empirical experimentation, to be honest about it.

I have looked into the code, and it appears that the only one that actually gets used is the SIZE1 * 1.1 version, and this one IS important, so it needs to be kept. The other extra ones don't get selected by the \big macros. I suspect that either the TeX_factor or the \big macros were changed somewhere along the way and that the other sizes used to be used but no longer are. So I think you can drop the others and just use this one.

> In TeX input jax, I see
>
> big: ['MakeBig',MML.TEXCLASS.ORD,0.85],
> Big: ['MakeBig',MML.TEXCLASS.ORD,1.15],
> bigg: ['MakeBig',MML.TEXCLASS.ORD,1.45],
> Bigg: ['MakeBig',MML.TEXCLASS.ORD,1.75],
> TEXDEF.p_height: 1.2 / .85
>
> and MakeBig computes the em values by size * TEXDEF.p_height:
>
> 1.2
> 1.623(529411764706)
> 2.0470(58823529412)
> 2.470(588235294118)

Right. When the delimiter is created, its em-size is multiplied by the TeX_factor, so for STIX, these values are

1.200 => 1.350 (\big)
1.623 => 1.826 (\Big)
2.047 => 2.303 (\bigg)
2.470 => 2.779 (\Bigg)

since the available (native) sizes are 1.23, .1845, 2.46, 3.075

this would mean that the macros produce the following:

\big => 1.845
\Big => 1.845
\bigg => 2.46
\Bigg => 3.075

That means \big and \Big would be the same. This is why we need to scaled SIZE1 version.

The scaling factor comes from (desired-size/SIZE1-size) = 1.35 / 1.23 = 1.09756, or about 1.1.

We could make sizes that more closely match the other desired heights, but it is not necessary. I tried it out, and the difference is not very apparent, so it is probably not worth it. So I recommend just adding the one [1.35,SIZE1,1.1] size for the vertical delimiters that currently have them.

Davide

Davide P. Cervone

unread,
Apr 12, 2013, 9:54:01 AM4/12/13
to mathj...@googlegroups.com
Thanks for doing the analysis. Here are some comments about the differences

> The characters that are in our font data but not the OpenType table:

Most of these are because they were stretchy in the operator table (see next section), or because they were just horizontal or vertical characters that could easily be stretched by repeating themselves.
Most of these are here because they need to be stretchy for various TeX constructs (like overbraces and underbraces), or because they are horizontal or vertical lines that can be stretched easily by repeated copies of themselves, or because there were multiple size of them in the STIX fonts (like with the angle brackets). It was surprising to me to see how many different vertical or horizontal lines there were and how many of them actually appeared in MathML in the wild.

> -0x2015
> -0x2017
> -0x2212

Easily repeat by multiple copies of themselves. I've seen the minus sign used as an overline in <mover>

> -0x2223
> -0x2225

These are used as delimiters in TeX, and so need to be stretchy.

> -0x2329
> -0x232A
> -0x3008

> -0x3009


These are the angle brackets, which are stretchable in TeX.

> -0x23AA
> -0x23AF
> -0x23D0
> -0x2500
> -0x2758


Horizontal and vertical lines are easy to stretch. These aren't critical, though.

> -0x23B0
> -0x23B1

These are stretchy delimiters in TeX, and the data to stretch them are not hard to produce.

> -0x2906
> -0x2907

These are arrows, and it is good to have as many of them be stretchy as we can.

> -0x2C7

There were multiple sizes of this in the STIX fonts.

> -0x2D

I've seen this used as an overline in <mover>

> -0x2F
> -0x303
> -0x30C
> -0x332
> -0x5C

There are multiple sizes of these in the fonts.

> -0x3D

The equal needs to be stretchy for AMScd and mhchem.

> -0xFE37
> -0xFE38

These are the over- and under-braces, which need to be stretchy for \overbrace and \underbrace.

> Those that are in the Open Type Math table but not in our font data:

Most of these are either large operators (which are handled through the -largeop and -smallop variants in TeX). MathJax doesn't currently handle stretchy versions of these, but it would be good to add them for things like integrals, etc. I think that if they were added to the DELIMITERS list, they could be used in MathML input without any changes to MathJax, so it would be good to include them.

Some of these are combining arrows or other arrows, which I didn't bother with. There are also a couple of vertical delimiters, which I thought I had done, but must have missed.
I think it is fine to make these stretchy, since the fonts know how to do that.

One thing to consider is which delimiters should be in the main font data file and which in the fontdata-extra file.

> -0x20D0
> -0x20D1
> -0x20D6
> -0x20D7
> -0x20E1
> -0x20EC
> -0x20ED
> -0x20EE
> -0x20EF
> -0x2140
> -0x221B
> -0x221C
> -0x29F8
> -0x29F9
> -0x2F
> -0x303
> -0x305
> -0x30C
> -0x330
> -0x332
> -0x338
> -0x5C

Thanks again for your work on this.

Davide

Frédéric WANG

unread,
Apr 12, 2013, 10:05:26 AM4/12/13
to mathj...@googlegroups.com
Thanks for your feedback. I plan to continue development with the
generator next week and your comments will be useful. I already started
to rewrite the font splitter to make it more general and it can work
with Asana-Math, Gyre-Pagella, Gyre-Termes, Neo-Euler, STIX,
Latin-Modern (all the free fonts with the OpenType Math table I'm aware
of at the moment). However, I still need to improve it to generate
fontdata.js automatically (probably with custom input for each font
particularities: to describe additional stretchy ops, how to split the
JS files with glyph metrics etc).

For completeness, I attach an HTML page that extracts the STIX operators
with the "scale" hack.
scale.html.zip

Davide P. Cervone

unread,
Apr 12, 2013, 3:17:21 PM4/12/13
to mathj...@googlegroups.com
Looks good. Thanks.

Davide
> Frédéric Wang
> maths-informatique-jeux.com/blog/frederic
>
> --
> You received this message because you are subscribed to the Google Groups "MathJax Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mathjax-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> <scale.html.zip>

Frédéric WANG

unread,
Apr 15, 2013, 3:25:02 AM4/15/13
to mathj...@googlegroups.com
On 11/04/2013 20:22, Davide P. Cervone wrote:
> * I'm not sure the bold and italic forms should be listed in the normal font list, as normal should not be producing bold or italic forms, except for some very special cases (some characters in the letter-like symbols, and the math alphabets). Since most characters in those fonts are already in the regular versions, it probably doesn't hurt; the time it does make difference, however, is when a character is used that doesn't appear in the STIX fonts. That would cause the data for every one of those fonts to be loaded before MathJax could tell that the character was not available, and that might be a bit time-consuming.
That's what I was wondering too. That's why I asked whether I should
only list *-Regular form. I'll change that.
> * The bold variant should start with NORMALBOLD not NORMALREGULAR otherwise MathJax will find the regular form first and not show the bold one. I guess NORMALBOLDITALIC would be next, then anywhere else that bold character might come from, then the regular versions. The fonts need to be in the order in which the characters should be taken. You will also need to add the bold:true property.
>
> * Similarly for the italic, bold-italic, etc. variants, and the other variants. Remember that there should be fallback to the normal versions so that, e.g., <mtext mathvariant="script">x+y</mtext> would still produce the normal + sign. In the old STIX data and the TeX data, there were not many fonts that were the "normal" fonts, so this was not hard. But now with so many, it may be that we need to modify the code to look through the normal variant after exhausting the font list for the given variant. That way, you don't have to include redundant data.
Here I use what I said in a previous mail about the interpretation of
mathvariant. So "bold", "italic" and "bold-italic" are just used to
remap some characters to the corresponding characters from the
NORMALREGULAR font in the Mathematical Alphanumeric Symbols, not to
provide specific style. That's why I added the offset but not
bold/italic attributes, contrary to the old font data. So my idea was that

<mtext mathvariant="bold">x<mtext> pick 0x1D341 from Normal-Regular
<mtext mathvariant="bold" style="font-weight: bold">x<mtext> pick
0x1D341 from Normal-Bold
<mtext mathvariant="bold" style="font-style: italic">x<mtext> pick
0x1D341 from Normal-Italic
<mtext mathvariant="bold" style="font-style: italic; font-weight:
bold">x<mtext> pick 0x1D341 from Normal-BoldItalic

(note that I put the Mathematical Alphanumeric Symbols for bold, italic
and bold-italic into the same "NORMAL" family of fonts)

But I assume the "NORMALBOLD,NORMALBOLDITALIC,NORMALITALIC" should be
removed here too... (Perhaps these fonts actually contain the same
glyphs, whatever the font style)

> * The old REMAP data mapped a few characters that weren't in the STIX font to comparable glyphs. The ones for 0x3008 and 0x3009 should be maintained, as this was where \langle and \rangle were in an earlier version of unicode, so some legacy MathML might refer to these. REMAPACCENT is also important to get the proper sized arrow in <mover>.
I was not sure about these characters in the newer fonts, so I'll check
that.
> * I think the delimiter data needs to use font names other than MATHSIZEn, as not all the needed characters are in those sizes. For example, the arrow for U+2190 is is the ARROWSREGULAR not the MATHSIZE1 font, right? So the HW arrays need to refer to the fonts in which the characters are actually located.
In the new set of fonts, all the largeop/stretchy characters are in the
same STIXMath font that contains the Open Type Math table (that's one of
the advantage of the new set). Then I split STIXMath into MATHSIZE0,
MATHSIZE1... so I don't need to pick characters from other fonts. That
might be necessary for constructions that are not in the Open Type Math
table, but at the moment there are not any in the fontdata.

> * I don't think you want to include all the data about all the fonts in fontdata.js, only the most important fonts and the most important characters in those fonts, in order to allow it to download faster. The rest of the data will be downloaded as needed. The original fontdata.js included basically all the characters available in plain TeX, an let the others be loaded when needed. The main.js files included these "main" characters, and the other files handled the rest.
Yes, that's what I mentioned during the last hangout. For the moment I
just put everything in main, but that needs to be changed.

> * There are some characters whose data seems not to be correct. E.g., in STIX_Alphabets-bold, 0x210C (BLOCK-LETTER CAPITAL H) has font data [0,0,1000,0,0], which I don't think is correct. Many of the other block letters are also with this data, and this occurs in other fonts as well.
I think what's happening in the Python script is that the glyph is cut
from the original font and paste into the STIX_Fraktur-Bold and when the
unicode block is copied and paste into STIX_Alphabets-bold, it remains
an "empty" or "blank" glyph (that what I see in the font). Then the Perl
script that produce the metrics gives [0,0,1000,0,0]. I thought
sourceforge did not keep this glyph (since the Python interface doc says
that cut glyphs are marked "do not output") but that does not seem to be
the case. So I'll need to change that.

Frédéric WANG

unread,
Apr 15, 2013, 5:18:33 AM4/15/13
to mathj...@googlegroups.com
On 12/04/2013 15:07, Davide P. Cervone wrote:
> Right. When the delimiter is created, its em-size is multiplied by the
> TeX_factor
So another question is how did you get this TeX_factor? When I open the
STIX and TeX fonts in fontforge I see that both have a em size of 1000

http://fontforge.org/fontinfo.html#PS-General

so I wonder if this is not a ratio obtained by comparing the bounding
boxes of a given character on each font? Or another fontforge parameter?

Davide P. Cervone

unread,
Apr 15, 2013, 8:38:46 AM4/15/13
to mathj...@googlegroups.com
* I'm not sure the bold and italic forms should be listed in the normal font list, as normal should not be producing bold or italic forms, except for some very special cases (some characters in the letter-like symbols, and the math alphabets). Since most characters in those fonts are already in the regular versions, it probably doesn't hurt; the time it does make difference, however, is when a character is used that doesn't appear in the STIX fonts.  That would cause the data for every one of those fonts to be loaded before MathJax could tell that the character was not available, and that might be a bit time-consuming.

That's what I was wondering too. That's why I asked whether I should only list *-Regular form. I'll change that.

Sorry, I must have misunderstood the question.


* Similarly for the italic, bold-italic, etc. variants, and the other variants.  Remember that there should be fallback to the normal versions so that, e.g., <mtext mathvariant="script">x+y</mtext> would still produce the normal + sign.  In the old STIX data and the TeX data, there were not many fonts that were the "normal" fonts, so this was not hard.  But now with so many, it may be that we need to modify the code to look through the normal variant after exhausting the font list for the given variant.  That way, you don't have to include redundant data.

Here I use what I said in a previous mail about the interpretation of mathvariant. So "bold", "italic" and "bold-italic" are just used to remap some characters to the corresponding characters from the NORMALREGULAR font in the Mathematical Alphanumeric Symbols, not to provide specific style.

I don't think that is all the variants are supposed to do (and that's not how they are currently used in MathJax).  I don't see this in the specification at all.  My reading on section 3.2.2 suggests that if you use <mo mathvariant="bold">+</bold> should get you a bold plus if there is one available.    Note that is says

In principle, any mathvariant value may be used with any character data to define a specific symbolic token ... Renderers should support those combinations of character data and mathvariant values ... that they can visually distinguish using available font characters.

so I don't see a restriction to just the mathematical alphabets.

That's why I added the offset but not bold/italic attributes, contrary to the old font data. So my idea was that

<mtext mathvariant="bold">x<mtext> pick 0x1D341 from Normal-Regular
<mtext mathvariant="bold" style="font-weight: bold">x<mtext> pick 0x1D341 from Normal-Bold
<mtext mathvariant="bold" style="font-style: italic">x<mtext> pick 0x1D341 from Normal-Italic
<mtext mathvariant="bold" style="font-style: italic; font-weight: bold">x<mtext> pick 0x1D341 from Normal-BoldItalic

But I thought you said that the mathvariant should cause the style values to be ignored.  I can't find anything to that effect in the specification at the moment, though.  I do see where mathvariant supersedes the deprecated fontstyle, fontweight, and other css-like attributes, however.  MathJax DOES handle that properly, and I guess that is what I was thinking about when I said that I thought we had taken care of that.

So I guess I agree that the four examples you give above should do as you say.  On the other hand, the glyph for 0x1D341 from Normal-Regular and from Normal-Bold are the same, so it seems wasteful to have to download both if you end up loading both fonts.  Moreover, it is also the same as the character at U+0078 in Normal-Bold.  So the fonts currently hold THREE copies of the same character, all of which may be downloaded, which seems a waste.  Since we want these fonts to be small for downloading, I'm not sure that's the best choice.

in addition, those four examples are not ALL that mathavariant="bold" should do.  Note that

<mtext mathvariant="bold">+</mtext>

should pick the plus sign from Normal-Bold, not Normal-Regular.  That should not require style="font-weight:bold".  Similarly for the italic variants.  

But I assume the "NORMALBOLD,NORMALBOLDITALIC,NORMALITALIC" should be removed here too... (Perhaps these fonts actually contain the same glyphs, whatever the font style)

Do you mean in the math alphabets?  It is my understanding that the Regular form includes all the math alphabets, but that the Bold form includes only the bold alphabets, and so on; but yes, they are the same glyphs repeated.

* I think the delimiter data needs to use font names other than MATHSIZEn, as not all the needed characters are in those sizes.  For example, the arrow for U+2190 is is the ARROWSREGULAR not the MATHSIZE1 font, right?  So the HW arrays need to refer to the fonts in which the characters are actually located.

In the new set of fonts, all the largeop/stretchy characters are in the same STIXMath font that contains the Open Type Math table (that's one of the advantage of the new set). Then I split STIXMath into MATHSIZE0, MATHSIZE1... so I don't need to pick characters from other fonts. That might be necessary for constructions that are not in the Open Type Math table, but at the moment there are not any in the font data.

I guess I'm confused.  The data that I see for the MATHSIZEn fonts in the entries for things like

HTMLCSS.FONTDATA.FONTS['STIX_Math_Size0'] = { ... }

don't include the character for U+2190, yet this character is what is needed to make the stretchy version of that arrow.  The MATHSIZEn fonts seem only to include the glyphs that have larger versions, not any of the ones that have only a single size but that are used for stretchy characters.  I do see some references to PUA positions in MATHSIZE5, though there is no data for those characters in the FONTS object for that font (so MathJax will probably throw errors trying to work with them).

Do I understand correctly that MATHSIZE0 contains repeated versions of things like the parentheses that also appear in the STIX_Main font?  Again, I'm wondering about the cost of downloading the same glyph in multiple fonts.

* I don't think you want to include all the data about all the fonts in fontdata.js, only the most important fonts and the most important characters in those fonts, in order to allow it to download faster.  The rest of the data will be downloaded as needed.  The original fontdata.js included basically all the characters available in plain TeX, an let the others be loaded when needed.  The main.js files included these "main" characters, and the other files handled the rest.

Yes, that's what I mentioned during the last hangout. For the moment I just put everything in main, but that needs to be changed.

OK, no problem.

* There are some characters whose data seems not to be correct.  E.g., in STIX_Alphabets-bold, 0x210C (BLOCK-LETTER CAPITAL H) has font data [0,0,1000,0,0], which I don't think is correct.  Many of the other block letters are also with this data, and this occurs in other fonts as well.

I think what's happening in the Python script is that the glyph is cut from the original font and paste into the STIX_Fraktur-Bold and when the unicode block is copied and paste into STIX_Alphabets-bold, it remains an "empty" or "blank" glyph (that what I see in the font). Then the Perl script that produce the metrics gives [0,0,1000,0,0].  I thought sourceforge did not keep this glyph (since the Python interface doc says that cut glyphs are marked "do not output") but that does not seem to be the case. So I'll need to change that.

OK, that makes sense.

Davide

Davide P. Cervone

unread,
Apr 15, 2013, 8:53:52 AM4/15/13
to mathj...@googlegroups.com
> On 12/04/2013 15:07, Davide P. Cervone wrote:
>> Right. When the delimiter is created, its em-size is multiplied by the TeX_factor
>
> So another question is how did you get this TeX_factor? When I open the STIX and TeX fonts in fontforge I see that both have a em size of 1000
>
> http://fontforge.org/fontinfo.html#PS-General
>
> so I wonder if this is not a ratio obtained by comparing the bounding boxes of a given character on each font? Or another fontforge parameter?

I made it up from empirical testing. I no longer remember exactly where it came from, but note that the width of the M in STIX_Main is 889, and 889 * 1.125 = 1000.125 so it looks like the factor does make the M into something that is one em wide. My recollection is that everything seemed a bit smaller in the STIX font, and this is that factor that corrected for that. The real issue driving the factor may have been placement of super and subscripts. But it was 4 or more years ago, and I no longer recall.

Davide

Frédéric WANG

unread,
Apr 15, 2013, 10:37:25 AM4/15/13
to mathj...@googlegroups.com
On 15/04/2013 14:53, Davide P. Cervone wrote:
> I made it up from empirical testing. I no longer remember exactly
> where it came from, but note that the width of the M in STIX_Main is
> 889, and 889 * 1.125 = 1000.125 so it looks like the factor does make
> the M into something that is one em wide. My recollection is that
> everything seemed a bit smaller in the STIX font, and this is that
> factor that corrected for that. The real issue driving the factor may
> have been placement of super and subscripts. But it was 4 or more
> years ago, and I no longer recall. Davide
OK, I'll take that definition. Hopefully, this will work with the other
Open Type fonts too.

Frédéric WANG

unread,
Apr 15, 2013, 11:36:14 AM4/15/13
to mathj...@googlegroups.com
So just to clarify about mathvariant:

1) AFAIK, MathJax and browsers have always handled the
mathvariant=bold/italic/bold-italic as equivalent to the corresponding
CSS style. They also don't make mathvariant supersede the CSS style.
IIRC, we discussed the fact that mathvariant/style should supersede the
obsolete attributes fontstyle, fontweight etc (perhaps that's what we
fixed) but we decided to keep the behavior about mathvariant not
superseding the CSS style (since Firefox did that too). That's why I
mentioned the possibility to support the combination style+mathvariant
since MathJax allows that.

2) However, what I've understood from the previous MathML discussions is
that mathvariant is just a way to provide semantics via a specific
rendering and to protect it from style change. Some quotations from the
spec:

"Each token has a specific meaning within a given mathematical
expression and, therefore, needs to be visually distinguished and
protected from inadvertent document-wide style changes which might
change its meaning."
"Each token is identified by the combination of the mathvariant
attribute value and the character data in the token element. ",
"Certain combinations of character data and mathvariant values are
equivalent to assigned Unicode code points that encode mathematical
alphanumeric symbols."
"Note that the appearance of a mathematical alphanumeric symbol
character should not be altered by surrounding mathvariant or other
style declarations. "

So I understand that <mtext mathvariant="bold" style="font-style:
italic">a</mtext> should render bold and not italic since it is
equivalent to <mtext mathvariant="bold">a</mtext>. Also <mi>h</mi>, <mi
mathvariant="italic">h</mi>, <mi mathvariant="italic">&#x221e;</mi> and
<mi>&#x221e;</mi> should all render the same. It seems that in previous
versions of STIX fonts the italic &#x221e; was different from the
regular &#x221e;, as we discussed in
https://github.com/mathjax/MathJax/issues/74

3) As I see it, the only char/mathvariant combinations that are
explicitely required are

"Renderers should support those combinations of character data and
mathvariant values that correspond to Unicode characters, and that they
can visually distinguish using available font characters."

but otherwise it seems that it gives complete freedom and in particular

"Renderers may ignore or support those combinations of character data
and mathvariant values that do not correspond to an assigned Unicode
code point, and authors should recognize that support for mathematical
symbols that do not correspond to assigned Unicode code points may vary
widely from one renderer to another."
"In principle, any mathvariant value may be used with any character data
to define a specific symbolic token. In practice, only certain
combinations of character data and mathvariant values will be visually
distinguished by a given renderer. "
"When MathML rendering takes place in an environment where CSS is
available, the mathematics style attributes can be viewed as predefined
selectors for CSS style rules."

So <mo mathvariant="bold">+</mo> would mean giving a "bold" semantics to
a plus operator and the renderers will try to achieve the rendering with
the fonts available. But this is not guaranteed to work in general since
only the characters that map to a mathematical alphanumeric characters
are specified. So if someone just wants to change the style, <mo
style="font-weight: bold;">+</mo> is more appropriate.

That said, I agree that for the moment we can continue with CSS-based
behavior since we do that for other fonts and browsers do that to. I
also agree that we can avoid duplicate glyphs and move some of them to
ASCII positions (see my previous message)

Frédéric WANG

unread,
Apr 15, 2013, 11:53:28 AM4/15/13
to mathj...@googlegroups.com
I guess I'm confused.  The data that I see for the MATHSIZEn fonts in the entries for things like

HTMLCSS.FONTDATA.FONTS['STIX_Math_Size0'] = { ... }

don't include the character for U+2190, yet this character is what is needed to make the stretchy version of that arrow.  The MATHSIZEn fonts seem only to include the glyphs that have larger versions, not any of the ones that have only a single size but that are used for stretchy characters.  I do see some references to PUA positions in MATHSIZE5, though there is no data for those characters in the FONTS object for that font (so MathJax will probably throw errors trying to work with them).
So it's probably a bug in the script if the metrics from the PUA are ignored. Currently, the Python script does not check that the pieces to build stretchy operators are not already available in a size variants as I assumed this didn't happen. That will require to do two passes to be sure which operators are supported, but that's doable.


Do I understand correctly that MATHSIZE0 contains repeated versions of things like the parentheses that also appear in the STIX_Main font?  Again, I'm wondering about the cost of downloading the same glyph in multiple fonts.
At the beginning, MATHSIZE0 only contained the non-PUA glyphs that are explicitly mentioned in the Open Type Math table but since the HW tables always need a size 0, I modified it to always use a non-PUA glyph from the original STIXMath. Probably I should pick the one from the original STIXMain. However, I'm not yet sure if the characters that are explicitly mentioned in the Open Type Math table are really duplicate between STIXMain/STIXMath.
-- 
Frédéric Wang
maths-informatique-jeux.com/blog/frederic

Frédéric WANG

unread,
Apr 17, 2013, 12:58:26 PM4/17/13
to mathj...@googlegroups.com
So I've continued to improve the font generator. I've addressed many of
Davide's feedback, but I'm not done yet. As previously announced, I've
added a Python configuration file for each font to help the font
generator. Here is the one for the STIX fonts:

https://github.com/fred-wang/MathJax-dev/blob/open-type-fonts/fonts/OpenTypeMath/STIX-Web/config.py

Many of the values (including the glyph metrics and the scale factors
for \big \bigg etc variants) are now computed automatically.

I attach the fontdata.js currently generated (not complete yet).
fontdata.js

Frédéric WANG

unread,
Apr 18, 2013, 7:08:21 AM4/18/13
to mathj...@googlegroups.com
OK, forget the previous fontdata.js, I realized this morning that the
glyphs were no longer copied at the right positions. I fixed that and
also added the possibility to use glyphs from other fonts. This allows
to keep some stretchy constructions that relied on the GeneralBold and
NonUnicode fonts. However, the left/right dashed arrows used the
STIXVariants font but I can't find the glyphs in the STIX-Word
packaging. Hence I made them use the NonUnicode font instead (this was
already the case for the up/bottom dashed arrows). Other than that, all
the stretchy constructions should now be preserved.

Davide P. Cervone

unread,
Apr 18, 2013, 7:20:18 AM4/18/13
to mathj...@googlegroups.com
That sounds good. Looking forward to having a working STIX web font!

Davide



On Apr 18, 2013, at 7:08 AM, Frédéric WANG wrote:

> OK, forget the previous fontdata.js, I realized this morning that the glyphs were no longer copied at the right positions. I fixed that and also added the possibility to use glyphs from other fonts. This allows to keep some stretchy constructions that relied on the GeneralBold and NonUnicode fonts. However, the left/right dashed arrows used the STIXVariants font but I can't find the glyphs in the STIX-Word packaging. Hence I made them use the NonUnicode font instead (this was already the case for the up/bottom dashed arrows). Other than that, all the stretchy constructions should now be preserved.
>
> --
Reply all
Reply to author
Forward
0 new messages