What are your thoughts on defining the following shorthands into unicode-math?:
\dd -> \mathup{d}
\ee -> \mathup{e}
\ii -> \mathup{i}
with options to use ⅆ, ⅇ, ⅈ/ⅉ instead.
I know this doesn't fit into the current ideas of what unicode-math does (in that it's a good analogy with presentation mathml rather than having anything to do with mathematic semantics) but I feel that they're so frequently used that it makes sense to "reserve" the control sequences for them.
Just a random thought while I write my unicode-math paper...
-- Will
I like that, although the names themselves may be debatable.
I discussed various options for markup in my EuroTeX paper:
https://www.tug.org/members/TUGboat/tb30-3/tb96vieth.pdf
Here's what I suggested on the subject:
Concerning suitable markup, there does not seem to be any agreement between
authors of macro packages how these entities should be represented. Obviously, it
would be inconvenient for authors to type \mathrm all over the place and this would
also violate the principle of logical markup. On the other hand, defining the shortest
possible macros (such as \d, \e, \i) conflict with standard TEX macros for the under-dot
and the dotless-i in text mode. (Alternatively, one might prefer to use \dd, \ee, \ii
for these macros, which are still easy enough to type, but will not cause conflicts with
existing macros.)
In our case, we have indeed chosen the shortest possible macros, but using a slightly
more sophisticated setup to limit the redefinitions to math mode while retaining the
original definitions in text mode.
> Just a random thought while I write my unicode-math paper...
I'm looking forward to that paper after watching your talk at River-Vallery.tv.
Regars, Ulrik
----- Original Nachricht ----
Von: Will Robertson <wsp...@gmail.com>
An: uni...@googlegroups.com
Datum: 15.07.2010 12:47
Betreff: [unimath] \dd, \ee, \ii
> Hi all,
>
> What are your thoughts on defining the following shorthands into
> unicode-math?:
>
> \dd -> \mathup{d}
> \ee -> \mathup{e}
> \ii -> \mathup{i}
>
> with options to use ?, ?, ?/? instead.
>
> I know this doesn't fit into the current ideas of what unicode-math does (in
> that it's a good analogy with presentation mathml rather than having
> anything to do with mathematic semantics) but I feel that they're so
> frequently used that it makes sense to "reserve" the control sequences for
> them.
>
> Just a random thought while I write my unicode-math paper...
>
> -- Will
>
> --
> You received this message because you are subscribed to the Google Groups
> "Unicode maths for TeX" group: <http://groups.google.com/group/unimath>. To
> post to this group, send email to <uni...@googlegroups.com>.
>
Ah yes, sorry not to have referenced it.
(I did read it at the time.)
> "In our case, we have indeed chosen the shortest possible macros, but using a slightly
> more sophisticated setup to limit the redefinitions to math mode while retaining the
> original definitions in text mode."
I think the definition here can be improved (only in that expl3's \mode_if_math:TF takes some additional precautions), but I like the idea:
\DeclareRobustCommand{\d}{\relax\ifmmode\mathrm{d}\else\expandafter\@@d\fi}
Does anyone have any objections to adopting this idea in unicode-math? I think it's safe, because we're (safely) overloading already defined macros, and I like the idea of standardising the macros.
(Otherwise I'm also happy with \dd, \ee, \ii.)
>> Just a random thought while I write my unicode-math paper...
>
> I'm looking forward to that paper after watching your talk at River-Vallery.tv.
Lots more work to do, I'm afraid.
-- Will
> Perhaps such definitions should be {\mathup{x}}, that is with extra braces.
Can you remind me why? I thought that {} made math elements \mathord, which the alphabet letters are already (\mathalpha==\mathord). But I might be remembering incorrectly.
W
I think you're right. But my purpose is so that exponents would work, for example 3^\ii. But upon testing now, I think that's not actually necessary (which genuinely surprised me). I guess I'm worried about the "\mathup" being taken as an argument somewhere and the "{x}" being left behind, where the extra set of braces would prevent the separation.
AndrewOn 15 July 2010 21:17, Will Robertson <wsp...@gmail.com> wrote:On 15/07/2010, at 8:54 PM, Andrew Moschou wrote:Can you remind me why? I thought that {} made math elements \mathord, which the alphabet letters are already (\mathalpha==\mathord). But I might be remembering incorrectly.
> Perhaps such definitions should be {\mathup{x}}, that is with extra braces.
W
--
> (Otherwise I'm also happy with \dd, \ee, \ii.)
Chris rebuts!
Begin forwarded message:
> From: c.a.r...@open.ac.uk
> Date: 16 July 2010 3:57:13 AM ACST
> To: Will Robertson <wsp...@gmail.com>
> Subject: Could not sign into google so please pass on
>
> Urgent re \dd, \ee, \ii.
>
> Please do not abuse the namespace of short cs names any more. It is already badly used.
>
> Official names should be: meaningful, and long.
>
> Short names should be kept for user-shorthands.
>
> Only a short-term problem I guess since we shall be using Unicode-aware math editors next week:-) but that is perhaps a reason not to introduce new ad hoc csnames for alphabetics at all.
>
> BTW: I am reminded (but I not yet checked with David and Barbara) Patrick Ion thinks that \ac may be canonical and hence in the W3C standard by now even though that slot is not the Unicode AC character. ... such is the way of history.
>
> chris
Concur w/ Chris Rowley. Also, the \dd macro is only half-useful if it
resolves to ⅆ or \mathup{d}; if I'm going to define the macro I'll add
some magic spacing code. (See my thesis
<http://github.com/jcsalomon/thesis/blob/master/jcs-thesis-math.sty>
for an example.)
—Joel Salomon
On 16/07/2010, at 12:13 PM, Will Robertson wrote:
>> (Otherwise I'm also happy with \dd, \ee, \ii.)
>
>
> Chris rebuts!
What's new ? :-)
>
> Begin forwarded message:
>
>> From: c.a.r...@open.ac.uk
>> Date: 16 July 2010 3:57:13 AM ACST
>> To: Will Robertson <wsp...@gmail.com>
>> Subject: Could not sign into google so please pass on
>>
>> Urgent re \dd, \ee, \ii.
>>
>> Please do not abuse the namespace of short cs names any more. It is already badly used.
>>
>> Official names should be: meaningful, and long.
>>
>> Short names should be kept for user-shorthands.
I certainly agree with this.
In particular, I already frequently use \dd as a macro with argument
for the differential at the end of an integral, using \ddd for a
representation of the character in question here.
That is, typically: \dd{#1} ---> \,\ddd #1
(or the spacing is handled by \mathrel or somesuch).
\ee is in danger of clashing with a typical abbreviation
for \end{enumerate}
--- which I personally detest, but that is beside the point;
\ii occurs on p.388 of Grätzer: More Math into LaTeX, 4th edition.
(and in earlier editions too)
So this really rules out trying to make these the *standard* names.
>>
>> Only a short-term problem I guess since we shall be using Unicode-aware math editors next week:-) but that is perhaps a reason not to introduce new ad hoc csnames for alphabetics at all.
It should be something like:
\Differential@d \Exponential@e \Imaginary@i
(with correct \catcode for @)
along with
\makeatletter
\let\Differentiald\Differential@d
\let\Exponentiale\Exponential@e
\let\Imaginaryi\Imaginary@i
\makeatother
If you want to have some optional shorthands, that a user can
*choose* to load also, then I'd accept
e.g.
\makeatletter
\let\ddd\Differential@d
\let\eee\Exponential@e
\let\iii\Imaginary@i
\makeatother
One should also check what names are already in use by Mathematica,
Maple, Axiom, etc. with their "Export to LaTeX" modules.
>>
>> BTW: I am reminded (but I not yet checked with David and Barbara) Patrick Ion thinks that \ac may be canonical and hence in the W3C standard by now even though that slot is not the Unicode AC character. ... such is the way of history.
Aaargh!
So it *was* originally intended for the AC current (i.e., sine wave)
symbol, as I surmised in an earlier message ?
>>
>> chris
Cheers,
Ross
------------------------------------------------------------------------
Ross Moore ross....@mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
> If you want to have some optional shorthands, that a user can
> *choose* to load also, then I'd accept
> e.g.
> \makeatletter
> \let\ddd\Differential@d
> \let\eee\Exponential@e
> \let\iii\Imaginary@i
> \makeatother
I think I'm going to put this idea on hold for now. I was right when I first said that it's outside the scope of what unicode-math should do, and I don't want to blur any boundaries.
The question is, where should we start collecting these sorts of things? The cool and sTeX packages both go a long way to defining a more "semantic" approach to mathematics input; perhaps we can leverage their work rather than to start from scratch.
>>> BTW: I am reminded (but I not yet checked with David and Barbara) Patrick Ion thinks that \ac may be canonical and hence in the W3C standard by now even though that slot is not the Unicode AC character. ... such is the way of history.
>
> Aaargh!
>
> So it *was* originally intended for the AC current (i.e., sine wave)
> symbol, as I surmised in an earlier message ?
But the macro name itself isn't sacrosanct, which is my only concern.
Will
It is (if I get your meaning right) 'sacrosanct' (==canonical?) if the standard TeX world wants to conform with the w3c entity names and this one is now in the w3c standard as the name of this slot.
chris
---------------------------------------------------------------------------
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302)
>
>>> But the macro name itself isn't sacrosanct, which is my only concern.
>
> It is (if I get your meaning right) 'sacrosanct' (==canonical?) if the standard TeX world wants to conform with the w3c entity names and this one is now in the w3c standard as the name of this slot.
I see your point but I'm afraid that's not that simple. When the W3C adopted non-TeX names it essentially became impossible the TeX world to remain compatible. E.g., from my extended abstract at the conference:
> The two `set minus' characters in Table~\ref{slashes} inherit their names from Plain \TeX\ and the \textsf{amssymb} package, respectively. \acro{U+2216} is \cs{smallsetminus} and \acro{U+29F5} is \cs{setminus}. However, MathML does it differently: \acro{U+2216} is referred to by \verb|setminus| \emph{as well as} \verb|smallsetminus| (among other synonyms); \acro{U+29F5} is as-yet unnamed \cite{carlisle2008-w3c}. This might make it difficult to move between MathML and \LaTeX. Luckily these sorts of conflicts are few.
I will consider adding something \mathml{...} as a way to get symbols with their W3C names (you could even imagine a bit of catcode jiggery-pokery to get a literal "&infty;" syntax working) but otherwise there is no mapping between TeX names and W3C names.
I just took another look at the W3C names, just to confirm my suspicions, and yes -- the W3C names in general have not tried to be compatible with TeX at all. E.g.,
U+221E, INFINITY , infin
vs \infty for TeX.
I don't see this as a big problem, though, because MathML and TeX have different purposes; only TeX is designed for *writing* mathematics, so it doesn't really matter underneath how MathML is representing its symbols.
-- Will
But I think that Barbara should be involved in fixing any new names, and David should be kept in the loop, somewhat officially.
chris
-----Will Robertson <wsp...@gmail.com> wrote: -----
To: c.a.r...@open.ac.uk
From: Will Robertson <wsp...@gmail.com>
Date: 18/07/2010 06:55
cc: uni...@googlegroups.com
Subject: Re: Aw: [unimath] \dd, \ee, \ii
U+221E, INFINITY , infin
vs \infty for TeX.
-- Will
On 18/07/2010, at 8:09 PM, c.a.r...@open.ac.uk wrote:
> Sure there is no need to have the same names but there is also little point in having different ones for such obscure and unused characters. Historical differences/mistakes are no justification for needless further confusion.
Given these facts and different aims, then it looks like we should follow a strategy such as the following.
1. All characters have a csname that is protected, e.g. with @ in the name, and not easily typed. This is used exclusively by internal macros and processing.
2. All characters have a 2nd long meaningful csname that is simply \let to the name in 1.
3. For compatibilty with non-TeX naming schemes, there are other subsets of csnames that can be used, according to package-loading options.
Implementing this subsets is done by simply
\let\newname\latex@name
where the latter is the name in 1. not the name in 2.
When there are known problems, there can be a robust definition in which the expansion includes writing a warning into the .log file, or performs some other useful action before placing the appropriate character from 1.
It must be this way, since the same macro name might occur in both 2 and 3, but referring to different characters.
Documentation needs to be provided for each defined subset in 3. to explain any known name clashes with existing packages and with the standard names of 2.
E.g. If the [w3c] option is specified, then \ac would be a documented clash with the acronym package. It's first usage could print a message into the .log file, as well as producing the ∾ character. Also, there can be a test \AtBeginDocument that checks whether {acronym} has been loaded, and if so, then explaining the nature of the problem.
Note that this setup also allows for support of extra entities that do not have a single Unicode slot; e.g. characters constructed from 2 or more code-points, such as constructions using combining characters.
It also allows users to define their own shorthands, which may clash with existing names. But this clash cannot affect the internal names, so there is a method to recover effectively without requiring substantial edits throughout an author's manuscript.
One of the subsets in 3. can be a collection of recommended shorthands, such as \ddd, \iii, \eee and others that do not clash with existing packages. Having such a set will be effectively defining a new standard. But this is better than encouraging open slather, as Gratzer's book does.
>
> But I think that Barbara should be involved in fixing any new names, and David should be kept in the loop, somewhat officially.
Sounds fine to me.
>
> chris
Hope this helps,
Ross
>
> -----Will Robertson <wsp...@gmail.com> wrote: -----
>
> To: c.a.r...@open.ac.uk
> From: Will Robertson <wsp...@gmail.com>
> Date: 18/07/2010 06:55
> cc: uni...@googlegroups.com
> Subject: Re: Aw: [unimath] \dd, \ee, \ii
>
> On 17/07/2010, at 1:53 AM, c.a.r...@open.ac.uk wrote:
>
>>
>>>> But the macro name itself isn't sacrosanct, which is my only concern.
>>
>> It is (if I get your meaning right) 'sacrosanct' (==canonical?) if the standard TeX world wants to conform with the w3c entity names and this one is now in the w3c standard as the name of this slot.
>
> I see your point but I'm afraid that's not that simple. When the W3C adopted non-TeX names it essentially became impossible the TeX world to remain compatible. E.g., from my extended abstract at the conference:
>
>> The two `set minus' characters in Table~\ref{slashes} inherit their names from Plain \TeX\ and the \textsf{amssymb} package, respectively. \acro{U+2216} is \cs{smallsetminus} and \acro{U+29F5} is \cs{setminus}. However, MathML does it differently: \acro{U+2216} is referred to by \verb|setminus| \emph{as well as} \verb|smallsetminus| (among other synonyms); \acro{U+29F5} is as-yet unnamed \cite{carlisle2008-w3c}. This might make it difficult to move between MathML and \LaTeX. Luckily these sorts of conflicts are few.
>
>
> I will consider adding something \mathml{...} as a way to get symbols with their W3C names (you could even imagine a bit of catcode jiggery-pokery to get a literal "&infty;" syntax working) but otherwise there is no mapping between TeX names and W3C names.
>
> I just took another look at the W3C names, just to confirm my suspicions, and yes -- the W3C names in general have not tried to be compatible with TeX at all. E.g.,
>
> U+221E, INFINITY , infin
>
> vs \infty for TeX.
>
> I don't see this as a big problem, though, because MathML and TeX have different purposes; only TeX is designed for *writing* mathematics, so it doesn't really matter underneath how MathML is representing its symbols.
>
> -- Will
>
>
> ---------------------------------------------------------------------------
> The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302)
>
> 3. For compatibilty with non-TeX naming schemes, there are other subsets of csnames that can be used, according to package-loading options.
> Implementing this subsets is done by simply
> \let\newname\latex@name
> where the latter is the name in 1. not the name in 2.
I'm not expecting any problems with this, but does anyone have any objections to having a few thousand unused control sequences lying around unused?
In a sense, this is unnecessary since the unicode glyph is the underlying representation for all the symbols. E.g., \infty simply expands to ^^^^^0221E.
Actually that's a bit of a lie. Some unicode characters are made math-active (e.g., big operators) which expand to control sequences (plus other "stuff" such as \limits) that have been \mathchardef'ed to the appropriate symbols.
> Note that this setup also allows for support of extra entities that do not have a single Unicode slot; e.g. characters constructed from 2 or more code-points, such as constructions using combining characters.
This is something I've thought about before, since where possible it would be good to replace \not\equals (etc.) by their pre-composed glyph equivalent.
>> But I think that Barbara should be involved in fixing any new names, and David should be kept in the loop, somewhat officially.
>
> Sounds fine to me.
I'll get in contact with David.
-- Will
A bit off topic, STIX have many precomposed negated symbols that have no
Unicode code points, I was thinking about implementing them as ligatures,
do you think it is good idea, should we also expand it to negated
symbols with code points, which sequence should be used
<symbol><negation slash> (i.e. treating the slash as combining mark) or
the reverse, or both? Sorry for the many questions, but I'd like to get
the opinions of people who no math better than me.
Regards,
Khaled
--
Khaled Hosny
Arabic localiser and member of Arabeyes.org team
Free font developer
I just checked the Unicode math annex[1] and it seems to suggest symbol
followed combining negation slash (or vertical line).
[1] http://unicode.org/reports/tr25/tr25-8.html#_Toc217
> A bit off topic, STIX have many precomposed negated symbols that have no
> Unicode code points, I was thinking about implementing them as ligatures,
> do you think it is good idea, should we also expand it to negated
> symbols with code points, which sequence should be used
> <symbol><negation slash> (i.e. treating the slash as combining mark) or
> the reverse, or both?
Can LuaTeX math deal with ligatures? I didn't think that that machinery really worked in math mode in general. (But as always I could be mistaken.)
If so, I think yes, combining accents/ligatures for negated symbols would be a good idea -- and then I don't have to write macros to do a similar job :) I am aware of some complications between choosing a diagonal or vertical slash, but I forget the specifics right now. (Is there some unicode glyph designed to choose between which output form is desired, or something awkward like that?)
-- Will
I think so, since "base" mode does support TeX style ligatures, I've to
test it though.
>
> If so, I think yes, combining accents/ligatures for negated symbols would be a good idea -- and then I don't have to write macros to do a similar job :) I am aware of some complications between choosing a diagonal or vertical slash, but I forget the specifics right now. (Is there some unicode glyph designed to choose between which output form is desired, or something awkward like that?)
Theres is a variation selector, but only for symbols with partial
vertical stroke. If we went with the ligature approach, it should be a
matter of using different overlay symbol. Check:
http://unicode.org/reports/tr25/tr25-8.html#_Toc217
Hi all,
A bit late but only just got back from the US.
Overall: I support Chris in rebutting the names.
> On 15/07/2010, at 10:43 PM, Andrew Moschou <and...@gmail.com> wrote:
>
> I think you're right. But my purpose is so that exponents would work, for
> example 3^\ii. But upon testing now, I think that's not actually necessary
> (which genuinely surprised me). I guess I'm worried about the "\mathup"
> being taken as an argument somewhere and the "{x}" being left behind, where
> the extra set of braces would prevent the separation.
>
> This is a genuine concern.
> TeX itself gets it right with super/sub-scripts, because it is looking for
> an Atom to be placed,
Or { (either explicit or implicit).
> rather than a single Token. However macros are written
> in terms of tokens, not atoms. So it is quite easy to get the separation
> that Andrew fears. This is especially so when you are trying to emulate what
> TeX does within another scripting language, for a program that is not built
> upon TeX itself, but needs to parse (La)TeX source.
> This is becoming increasingly common these days.
> Even with TeX-based programs, you might want to do some preprocessing in
> macros before doing the actual super/sub-script. So the issue will indeed be
> encountered.
> Is this enough reason to use extra braces?
The safest for users has always been to use braces around the
arguments of sub/superscripts and I think if you look at Lamport this
is shown in each and every example.
It is also easiest for users to think of subscripts and superscripts
acting as if they take exactly one argument as that is the usual case.
Always using braces in the input is much easier than explaining why
a_\mathbf{d} produces a boldface subscript d and a_\cong returns an
error!
In breqn ^ and _ are in fact turned into macros taking one argument
which helps the lazy typist (amongst other things) and I do not wish
to support syntax such as a_\mathbf{d}.
As for Ross' question: Adding the braces in the definition would work
for this kind of thing because they are supposed to be Ord but the
same technique cannot be employed for things that are not Ord as the
spacing would be wrong. For this reason I think it is better to leave
off the braces and let the user type it themselves as instructed.
Alternatively let the user use something like breqn that'll do the
same/right thing wether or not one types a_{\ii} or a_\ii.
Best,
--
Morten
> but I don't understand what this is referring to?
>
>>> BTW: I am reminded (but I not yet checked with David and Barbara) Patrick Ion thinks that \ac may be canonical and hence in the W3C standard by now even though that slot is not the Unicode AC character. ... such is the way of history.
You might have found this out by now anyway, but the context for this comment comes from the thread here:
<http://groups.google.com/group/unimath/browse_frm/thread/634e75d28658ee4b>
Skip to the second half of the messages, beginning with Ross's reply.
For unicode-math, I've changed \ac to \invlazys and I think Barbara was going to follow suite (otherwise I'll adapt to whatever TeX name she chooses).
-- Will
> One problem is that over time three things have become partially combined
>
> "lasy S" (which is supposed to be an s on its side.
> "inverted tilde" (which ought to be flatter)
> "sin wave" (which ought to look like a sin wave)
>
> It's impossible to be self consistent and compatible with legacy
> software so you just have to pick something and go with it:-) Since
> the entities spec referenced above is now a W3C recommendation (and
> all the single-character entity names are now used from html5 as well
> as mathml) I'd really rather not change those
Understood, and I agree. Thanks for the explanation; I see I should have got in contact with you far earlier about all of this :)
Cheers,
-- Will
On 20/07/2010, at 6:43 PM, David Carlisle wrote:
>> You might have found this out by now anyway, but the context for this comment comes from the thread here:
>>
>> <http://groups.google.com/group/unimath/browse_frm/thread/634e75d28658ee4b>
>>
>> Skip to the second half of the messages, beginning with Ross's reply.
>>
>
> ah thanks. We lost lots of sleep over ac acE race and friends,
> basically the old ISO entity names were incoherently described and
> (perhaps consequently) inconsistently mapped to different unicode
> characters, so it was well nigh impossible to get a sensible mapping.
> (by the way I see there were references to MathML 1 table's in that
> thread, please don't use them (except to check history) use
>
>
> http://www.w3.org/2003/entities/2007doc/Overview.html the table
Great. This has both an alphabetical listing and by Unicode slot.
Very. very useful.
>
> http://www.w3.org/2003/entities/2007doc/Overview.html#chars_math-multiple-tables
>
> which lists the race variants may be particularly relevant.
Isolating these is a great idea too
--- also, the glyphs that use the Variant selector.
>
> One problem is that over time three things have become partially combined
>
> "lasy S" (which is supposed to be an s on its side.
> "inverted tilde" (which ought to be flatter)
> "sin wave" (which ought to look like a sin wave)
O.K.
>
> It's impossible to be self consistent and compatible with legacy
> software so you just have to pick something and go with it:-) Since
> the entities spec referenced above is now a W3C recommendation (and
> all the single-character entity names are now used from html5 as well
> as mathml) I'd really rather not change those, however it isn't
> essential that TeX uses the same names.
As default, no.
But as an option, this would be a desirable thing to have.
It would make converting between TeX and other formats a lot easier.
Most of the entity names can be used for TeX control-sequences
without problem. Many already are; of course.
However, I noticed a few that could be quite problematic:
∥ U+2225 , PARALLEL TO,
\par paragraphing primitive/macro
⊂ U+2282 , SUBSET OF,
\sub old(?) use as synonym for _ (subscript selector)
e.g. with keyboards that don't have the ^ or _ character
⊃ U+2283 , SUPERSET OF
\sup old(?) use as synonym for ^ (superscript selector)
similar usage to \sub
© U+00A9 , COPYRIGHT SIGN,
\copy primitive for copying TeX box contents
∂ U+2202 , PARTIAL DIFFERENTIAL
\part is a sectioning command in some document classes
⁢ U+2062 , INVISIBLE TIMES
\it font switch to italics
≻ U+227B , SUCCEEDS
\sc font switch to small-caps
∧ U+2227 , LOGICAL AND
\and used a lot with multiple authors in bibliographies
∨ U+2228 , LOGICAL OR
\or is a separator primitive, used with \ifcase
° U+00B0 , DEGREE SIGN
\deg is the mathematical function name 'deg'
⪚ U+2A9A , DOUBLE-LINE EQUAL TO OR GREATER-THAN,
\eg commonly used for e.g. ???
see The LaTeX Companion, 2nd ed. p.80
concerning uses of {xspace} package, by DC !
☆ U+2606 , WHITE STAR
\star isn't an open or white star
∾ U+223E , INVERTED LAZY S
\ac used by the {acronym} package
There may be others that clash with other packages,
and there may be others that I didn't notice.
e.g.
+ U+002B , PLUS SIGN
\plus from the {euro} package take 2 arguments
similarly with \minus
○ U+25CB , WHITE CIRCLE
\cir is used as a macro by Xy-pic
Then there are the ISOGRK bold symbols, such as:
b.alpha, b.beta, etc.
Does &b.alpha; actually work in web browsers?
You cannot just drop the '.' in a TeX macro; else
you'll get clashes, such as \beta .
Maybe \bfalpha etc. is OK for these.
Fractions:
½ --- does this work in browsers?
while \frac12 works, \tfrac12 is better in displays.
superscripts:
¹ ² ³
\sup1 etc. work as is; but \sup itself may need to change.
This brings up the question of what are the names for super-
and subscripted letters, in the ranges: U+2079 --> U+2094 .
The tables do not cover these, yet people *will* try to use them
in mathematics, and in bookmark strings.
>
> David
> I don't really think it makes sense in a TeX context to blow on csname
> per letter variant for mathvariants. \bm{\alpha} would work fine for
> example. Presumably in a more naturally Unicode flavoured TeX with
> more math families to hand you wouldn't need to go to quite the
> contortions bm.sty goes to to switch math families.
Except that maths alphabets are no longer in ascii locations, so using the variable math class and switching \fam's no longer really works. Perhaps I'm doing it in the worst possible way, at the moment, but unicode-math's technique is to not only blow a csname per letter variant, but also to define \mathbf in terms of a *whole bunch* of local \mathcode re-definitions.
A few other alternatives but nothing very satisfying. Any suggestions?
-- Will
Ross,
in addition to the ones you list (var)epsilon and (var)phi have the
meaning switched from the usual TeX meanings. as highlighted here:
http://www.w3.org/2003/entities/2007doc/Overview.html#oddities
Then there are the ISOGRK bold symbols, such as:b.alpha, b.beta, etc.Does &b.alpha; actually work in web browsers?
No, neither MathML nor HTML preloads isogrk4 by default.
You cannot just drop the '.' in a TeX macro; elseyou'll get clashes, such as \beta .Maybe \bfalpha etc. is OK for these.
I don't really think it makes sense in a TeX context to blow on csname
per letter variant for mathvariants. \bm{\alpha} would work fine for
example.
Presumably in a more naturally Unicode flavoured TeX with
more math families to hand you wouldn't need to go to quite the
contortions bm.sty goes to to switch math families.
For TeX (and MathML) most of the subscripted characters and pre-made
fractions are of no real interest.
This brings up the question of what are the names for super-and subscripted letters, in the ranges: U+2079 --> U+2094 .The tables do not cover these, yet people *will* try to use themin mathematics, and in bookmark strings.
We should discourage their use so not name them. (Packages can
internally use those characters for \frac{1}{2} while making up
bookmarks and other internal strings restricted to character data with
no markup.
The XML entity names should be seen as of mainly legacy interest.
MathML3 strongly suggests they not be used. We've removed all use of
named entity references in examples in the spec for example. It works
a lot better in an XML context if the names are restricted to the
editing environment and the actual file just has character data or
numeric references. If systems do that they are portable at the
Unicode level without needing to agree on names for things.
David
Have to admit looking in detail at the unicode based TeX variants is
still on my to-do list, but I'd assumed (hoped? guessed?) that if you
were not restricted to ascii input you might get a bold alpha just by
entering that character. It's still good to have an ascii based input
syntax and there I can see you may want to use csnames internally to
do the mapping, but personally as a user interface I think it's more
natural to think of the mathvariants as commands rather than as a
bunch of symbols with unique names.
I'd rather do \mathbb{A} than \mathbbA even if \mathbb internally is
just \csname @mathbb#1\endcsname.
> Perhaps I'm doing it in the worst possible way, at the moment, but unicode-math's technique is to not only blow a csname per letter variant, but also to define \mathbf in terms of a *whole bunch* of local \mathcode re-definitions.
> Yes as I note above, if that's what it takes that would be my preference for a user interface, define \mathbf
> A few other alternatives but nothing very satisfying. Any suggestions?
I'd need to look what xetex really does with mathcodes...
David
> Have to admit looking in detail at the unicode based TeX variants is
> still on my to-do list, but I'd assumed (hoped? guessed?) that if you
> were not restricted to ascii input you might get a bold alpha just by
> entering that character.
Oh sure, that works too :)
> I'd rather do \mathbb{A} than \mathbbA even if \mathbb internally is
> just \csname @mathbb#1\endcsname.
I don't have a strong preference. Right now unicode-math allows you to write <literal bb A glyph> or \mathbb{A} or \mbbA, so take your pick whichever you prefer.
I did consider the reductionist implementation
\mathbb{#1} -> \csname @mathbb#1\endcsname
but didn't want to have to deal with edge cases with people writing weird things inside the argument.
>> A few other alternatives but nothing very satisfying. Any suggestions?
>
> I'd need to look what xetex really does with mathcodes...
Don't on my account; I was just checking I hadn't missed something obvious in my implementation. LuaTeX makes things easier here because font glyphs can be remapped dynamically as they're (about to be) typeset, so in time we can adopt that strategy.
-- Will
On 21/07/2010, at 9:30 PM, David Carlisle wrote:
>> For TeX (and MathML) most of the subscripted characters and pre-made
>> fractions are of no real interest.
>>
>> To us oldies, yes. But what about the next generation?
>
> I think even for the next generation of markup users (be that TeX or
> XML) they are of essentially no value. Certainly at the rendering
> level I wouldn't want the fractions used, I'd want all fractions to be
> rendered the same not those that happen to have unicode code points
> being rendered differently.
That's true, when there are lots of fractions.
There will be some uses where the pre-made ones are enough.
> On the input side it may be worth making
> U+00bd into (the equivalent of) and active character expanding to
> \frac{1}[2} but it's not that important. I can't think of any cases
> (in math mode) where I;d actually want to use the font's U+00bd
How about a textbook or workbook teaching simple
arithmetic to primary children?
Or just laying out a poster or lecture slide which happens
to have some simple fractions, using a fancy font which
need not support all of mathematics.
The possibilities are endless.
(Though I suppose many of these don't really require math-mode.)
> However humans like seeing names 9it's a lot easier to remember
> \rightarrow or & rightarrow; than to remember the code for an arrow,
> so it makes sense to try to use similar names where possible.
Certainly agree here.
It's dealing with "where possible" that creates difficulties.
>
>> People need to be able to read the characters that they see onscreen.
>> Thus there have to be words to use. OK, this is best if it is the name of
>> the concept that the symbol represents, rather than the name of the symbol.
>> But what if you are encountering it for the first time? As a non-expert
>> trying to learn a new field? Then it is the name of the symbol or entity
>> that you would read. And this is surely the default for a screen reader that
>> doesn't have any extra heuristics to help.
>> So there will be situations in which these names *must* be used.
>
> But it may make more sense (and you will get _far_ bigger coverage of
> symbol names) if you ignore the particular input convention the author
> used & #x21892; or & rightarrow or \rightarrow or utf8 data or ... and
> just screen read, as you would font render, using the uniocde code
> point after all entity expansion done and read
> [RIGHTWARDS ARROW] that way you can screen read any Unicode
> character, whether or not it has an entity name,
Sure. That is how I build my TeX+MathML combiner for tagged PDF.
Whatever the input form, it maps to a single numeric value,
which is used to look hash tables to get the necessary data
for /ActualText and (ultimately) /Alt key-value pairs.
>>
>> Now when I was reading the spec for Content MathML I noticed that all the
>> *strict* examples did not use entities, but pure words in ASCII letters. The
>> equivalent non-strict examples did use entities.
>
> No, the MathML3 spec doesn't use entities in any examples.
Sorry, my mistake.
Here I mean empty content tags, such as <plus/> and <times/> etc.
>> My question about this is, could these have been replaced by single
>> characters?
>> In either or both strict and non-strict forms?
>
> You can use any unicode characters in a csymbol, in the strict form it
> is supposed to be restricted to an OpenMath name (which is the same as
> an XML element name)
OK. So what used to be:
<apply><plus/>
<ci>x</ci>
<ci>y</ci>
</apply>
is now:
<apply><csymbol cd="arith1">plus</csymbol>
<ci>x</ci>
<ci>y</ci>
</apply>
or with a character:
<apply><csymbol cd="arith1">+</csymbol>
<ci>x</ci>
<ci>y</ci>
</apply>
It is this latter that suits what I need for tagged PDF.
>> The reason is to do with exporting valid XML from tagged PDF.
>> There the content is defined to be what is represented with font characters.
>> The content of tags is the Unicode code points of the font characters. This
>> is what is exported, unless you jump through hoops to map these to something
>> else. But such mappings also affect the result of Copy/Paste, where you
>> really do want the Unicode points.
>> Hence my desire for the characters to be valid, as in Presentational MathML.
>>
>
> I'm not sure I follow you here, any element that allows characters
> allows the characters to be entered as character data or entity
> references, the XML processors that follow the parse simply can't see
> the difference.
I'm thinking about what goes into the PDF content stream,
and how Acrobat exports this along with other structure information
into an XML file.
The '+' character (above example) is already there, as is 'x' and 'y'.
I don't want to tag the '+' with /ActualText(plus) , since then
a Copy/Paste would produce: x plus y instead of x + y .
With a bit more testing, things seems to a be a bit more complicated.
XeTeX doesn't support OpenType ligatures in math mode at all, or so it
seems to me.
LuaTeX supports ligatures in math (with "base" OT processing), I got a
\not\equiv ligature to work after changing \not from \mathaccent to
\mathord in unicode-math-table.tex.
When \not is a \mathord, I get the following list:
...\hbox(6.62+1.56)x9.62772, shifted 167.68614, direction TLT
....\EU2/XITSMath(0)/m/n/10 ̸
....\glue(\thickmuskip) 2.77771 plus 2.77771
....\EU2/XITSMath(0)/m/n/10 ≡
LuaTeX will hapilly ignore the \glue and apply the ligature, but with
\not set to \mathaccent I get:
...\hbox(8.45999+0.0)x6.85, shifted 169.075, direction TLT
....\vbox(8.45999+0.0)x6.85, direction TLT
.....\hbox(6.62+1.56)x0.0, shifted 3.425, direction TLT
......\EU2/XITSMath(0)/m/n/10 ̸
.....\kern-4.5
.....\hbox(4.78+0.0)x6.85, direction TLT
......\EU2/XITSMath(0)/m/n/10 ≡
The \kern and the second \hbox will break the ligature, and if we are
going to use the negation symbols as math accents, ligatures are not
going to work. Ligatures need to be applied in a much earlier stage, may
be before mlist to hlist processing (through a callback) or may be
earlier, but I'm not sure if it is worth the trouble, may be fixing
negation symbol positioning is all what we really need.
We can also ignore the unencoded precomposed negated symbols, and only
map the encoded ones at input level.
> Ligatures need to be applied in a much earlier stage, may
> be before mlist to hlist processing (through a callback) or may be
> earlier, but I'm not sure if it is worth the trouble, may be fixing
> negation symbol positioning is all what we really need.
We also have the possibility of mapping things in the input stage; something like
\not\equals :-> \nequals
if \nequals exist but use the combining char if not.
> We can also ignore the unencoded precomposed negated symbols, and only
> map the encoded ones at input level.
Right.
Ideally, unicode-math still needs a way to handle PUA maths symbols, so that additional symbols in STIX, for example, can be given names. I imagine something similar for precomposed symbols here.
Thanks for looking into it. I might not be able to look into this immediately, but addressing these problems is definitely on my todo list.
Cheers,
-- Will
On 26/07/2010, at 7:42 AM, Will Robertson wrote:
> On 26/07/2010, at 3:19 AM, Khaled Hosny wrote:
>
>> Ligatures need to be applied in a much earlier stage, may
>> be before mlist to hlist processing (through a callback) or may be
>> earlier, but I'm not sure if it is worth the trouble, may be fixing
>> negation symbol positioning is all what we really need.
>
> We also have the possibility of mapping things in the input stage; something like
>
> \not\equals :-> \nequals
>
> if \nequals exist but use the combining char if not.
In line with comments in another thread, I'd say that
it should be something like:
on input
\nequals :-> \ltxm@nequals
\not\equals :-> \ltxm@nequals
on output
\ltxm@nequals
maps to whatever the font supports;
be it a single character, combining pair,
or an overlay using an \hbox construction.
The mappings should be kept completely separate, so that
it can be arranged that whatever the input form, you'll
always get the same output.
Then at some point in the future, I'll be able to add
extra tagging to PDF output, by working entirely with
the internal forms. ( ltxm ::: LaTeX Math )
>
>> We can also ignore the unencoded precomposed negated symbols, and only
>> map the encoded ones at input level.
>
> Right.
> Ideally, unicode-math still needs a way to handle PUA maths symbols, so that additional symbols in STIX, for example, can be given names. I imagine something similar for precomposed symbols here.
If there are common families of these:
e.g. U+E00C --> U+E085 STIX variant negated forms
then there should be an option to recognise the characters on input,
mapping them to internal names.
The output from those internal names could be either:
i. standard code-point --- different glyph with same meaning
ii. standard code-point + variation selector
iii. a PUA code-point, if the font supports it.
iv. completely ignored -- hopefully not!
This choice should be made separately from the input form,
by analysing what the output font supports, for the internal form,
using choices supplied as options by the document author.
This kind of mapping may have to depend upon whether you are in
math-mode or text-mode, since the same slots are used by some text fonts:
e.g.
Latin Modern
Minion Pro
Linux Libertine
some TeX Gyre fonts (Antykwa Torunska)
Adobe Ming
Adobe Pi
Titus Cyberbit
(and presumably others that I don't have installed)
I think we are going to need some PUA area support like this,
else it is going to cause big trouble in the future.
IMHO, filtering them out at the input level is not the answer.
They should be converted to internal robust macros, and retained
until the last possible opportunity. If no suitable meaning (and
corresponding handling method) can be determined, then discard them
at the end while writing also a log-message.
>
> Thanks for looking into it. I might not be able to look into this immediately, but addressing these problems is definitely on my todo list.
>
> Cheers,
> -- Will
Hope this helps,