Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Internationalisation

45 views
Skip to first unread message

Ian Fantom

unread,
Oct 28, 2024, 11:28:53 AM10/28/24
to latexus...@googlegroups.com
I seem to have got into a rabbit hole in internationalisation. I must be
missing something obvious. Here is a summary:

With the pdflatex compiler I get a list of error messages of the sort
'Package inputenc Error: Unicode char 😀 (U+1F600)(inputenc) not set up
for use with Latex. The special symbols are just left out in the pdf text.

I initially tried lulatex, but Peter suggests that that would be
overkill (my words). He uses Xelatex, so I tried that. That gives an
error just for the above example, but not for other special characters
I've used, such as ί (in an Irish name) and μ ≤ (as in maths). The
difference is that Xelatex doesn't give the errors, but just misses out
the symbols in the pdf text.

I did, however, find that the Esperanto special characters, which I
shall need, appear in both compilers: Ĉĉ Ĝĝ Ŝŝ Ĵ̂ Ĥĥ Ŭŭ . So I'm left
wondering what the advantage is of Xelatex.

I also discovered Babel
(https://mirror.ox.ac.uk/sites/ctan.org/macros/latex/required/babel/base/babel.pdf).
I tried \usepackage[french]{babel} - because it was their example. This
didn't seem to make any difference either.

The other reason for this email is to see whether Google Groups is still
blocking me. I don't know why I was being blocked.

Thanks,

Ian

Gildas Cotomale

unread,
Oct 28, 2024, 12:16:25 PM10/28/24
to LaTeX Users Group
Hello Ian,

Let's take the example of the web: there's the HTML with rules defined by W3C that development teams try to fix so their product produce the expected output. So you have an engine used only by Firefox in a side, anonther one used by Microsoft Edge and Chrome in another side, and many other engines more or less confidential.
Similar thing goes in TeX-world: they're various engines to compile LaTeX format. pdfLaTeX, luaLaTeX, XeLaTeX, etc. Here, they're some small adaptations requiered by your document depending on the engine you plan to use.

pdfLaTeX is compatible with the legacy which were designed for use with ASCII and 8-bits encoding. As it's agnostic, you have to tell it early what's your file encoding is with InputEnc. With that information (and package parameter here), the engine knows how to understand the bytes it reads.

With newer engines luaLaTeX and XeLaTeX, use of InputEnc is forbidden and your file must use Unicode encoding (more precisely UTF-8)

But that's only a part of the problem: in order to show the corresponding glyphs (or let's call'em characters) they must be available in the fonts used…

Peter Flynn

unread,
Oct 28, 2024, 6:56:52 PM10/28/24
to latexus...@googlegroups.com
On 28/10/2024 10:03, Ian Fantom wrote:
> I seem to have got into a rabbit hole in internationalisation. I must be
> missing something obvious.

I may not have explained this very well.
>
> With the pdflatex compiler I get a list of error messages of the sort
> 'Package inputenc Error: Unicode char 😀 (U+1F600)(inputenc) not set up
> for use with Latex. The special symbols are just left out in the pdf text.

Correct. pdflatex only supports a very limited set of multibyte
characters, although it is possible to provide ones you want as manual
exceptions. That allows inputenc to get pflatex to read the character.
You then have to provide a font containing the glyph.

> I initially tried lualatex, but Peter suggests that that would be
> overkill (my words).

Just my feeling. Others are ardent advocates for it over xelatex. YMMV.

> He uses Xelatex, so I tried that. That gives an
> error just for the above example,

There should be no error. I get no error processing this file with xelatex:

\documentclass{article}
\begin{document}
The characters are 😀 and ί or í and μ and ≤ and Ĉĉ Ĝĝ Ŝŝ Ĵ̂ Ĥĥ Ŭŭ
\end{document}

But that MUST be a UTF-8 file. Can you check that? (ie the Linux/UNIX
"file" command)

> but not for other special characters
> I've used, such as ί (in an Irish name)

That's U+03AF which is a Greek Small Letter Iota with Tonos.
Did you mean í which is U+00ED, a Latin Small Letter I with Acute? also
known as i-acute (i-fhada) which is indeed Irish.

> and μ ≤ (as in maths).

You should get the Irish í but none of the others, as Gildas has pointed
out, because you need a font that contains them, The í is built-in.

> The difference is that Xelatex doesn't give the errors, but just
> misses out the symbols in the pdf text.
Sorry, I may have misunderstood here. You just said above that it gives
an error.

> I did, however, find that the Esperanto special characters, which I
> shall need, appear in both compilers: Ĉĉ Ĝĝ Ŝŝ Ĵ̂ Ĥĥ Ŭŭ .

Those all work for the same reason as í.

> So I'm left wondering what the advantage is of Xelatex.

It correctly reads all those characters and does not give an error.
pdflatex will just gag.

Now you need to \usepackage{fontspec} and then use a font containing
that glyphs you need. I don't know that you can do that with just one
font. Verdana is supposed to be fairly complete but mine is missing the
smiley face and the j-circumflex. Noto Sans is similarly deficient.

> I tried \usepackage[french]{babel} - because it was their example. This didn't seem to make any difference either.

No, babel is for a completely different purpose (semantic
multinationalisation). It is being superseded by polyglossia.

> The other reason for this email is to see whether Google Groups is still
> blocking me. I don't know why I was being blocked.

I found that and fixed it.

Peter

Ian Fantom

unread,
Oct 29, 2024, 7:49:33 AM10/29/24
to latexus...@googlegroups.com
Thanks, Peter. I think the problem in answering my initial query was
that for me there seemed to be disparate problems that seemed
unconnected, and I couldn't formulate the question well. I sympathise!

It seems the thing I was missing was that I need to provide a font
containing each glyph.

I now have the problem of knowing which fonts contain the characters I
need. I've just done a quick search, and am still a bit confused. Is
there an easy answer to this? I may need more than one font to get all
that I need.

I checked my tex file with the Unix 'file' command, as you suggested,
and it is indeed a UTF-8 file.

Thanks for explaining the confusion over the Irish í. I'll look at that.

What I meant by 'error' was 'not converting correctly to pdf' rather
than an error message. I was switching between xelatex and pdflatex to
check for error messages. Only one of these messages was fatal, as it
turned out, and that was  U+200F (RIGHT-TO-LEFT MARK)! So having removed
that (which somehow got into the text spuriously) it was an advantage to
see the error messages for the rest.

One thing that attracted me to Babel was the ability to switch languages
at any point, and also the need for different rules for line breaks, and
sort order for the index. Writing sometimes in Esperanto for world
readerships I perhaps need to look into that eventually. Or Polyglossia.
But for the moment possibly not, if it has nothing to do with converting
UTF-8 to glyphs.

And thanks for fixing the Google Groups problem.

Best wishes,

Ian

Peter Flynn

unread,
Oct 29, 2024, 6:57:01 PM10/29/24
to latexus...@googlegroups.com
On 29/10/2024 11:49, Ian Fantom wrote:
> It seems the thing I was missing was that I need to provide a font
> containing each glyph.

Yep.

> I now have the problem of knowing which fonts contain the characters I
> need. I've just done a quick search, and am still a bit confused. Is
> there an easy answer to this? I may need more than one font to get all
> that I need.

Almost certainly. With the fontspec package loaded, but no font
specified, you of course get Computer Modern as the default. In this
case, the 😀, ί, μ, and ≤ are all missing, but the letters with a
circumflex or a breve are all present.

If you \setmainfont{Noto Serif} (Noto ["No TOfu"] prides itself on NOT
having missing characters showing as empty rectangles, but the emoji is
still missing, but the iota and less-than-or-equals are now present, but
the j-circumflex is messed up. There *is*, however, a separate Noto
Emoji font which has all the current emoji.

TBF you can always get any Latin letter accented by using the TeX
accents, so you can get the missing j-circumflex with \^\j (circumflex
over dotless-j). See
https://latex.silmaril.ie/formattinginformation/accents.html#accentcodes

\setmainfont{Verdana} (also supposed to have all characters) and the
same problem is present.

Less-than-or-equals is in any case a math symbol. LaTeX produces that
with \(\leq\) and the same applies to most common math symbols.

There is no font with every glyph in Unicode, for technical reasons (see
https://stackoverflow.com/questions/34732718/why-isnt-there-a-font-that-contains-all-unicode-glyphs#34734338
and
https://tex.stackexchange.com/questions/117442/latex-font-with-all-utf8-symbols-defined)
so arbitrary characters cannot be used without foreknowledge of a font
containing them.

The LaTeX Font Catalogue at https://tug.org/FontCatalogue/ has many
fonts but mostly text fonts, not symbols. Fonts (for the moment) are
inherently monochromatic: you can typeset in colour, but the glyphs
cannot contain colours (work is ongoing in that field). If you want
emoji in colour you have to use an image.

Do read the documentation for the fontspec package on how to specify the
default font and how to load extra individual fonts.

> I checked my tex file with the Unix 'file' command, as you suggested,
> and it is indeed a UTF-8 file.

Right, good.

> What I meant by 'error' was 'not converting correctly to pdf' rather
> than an error message.

OK.

> One thing that attracted me to Babel was the ability to switch languages
> at any point, and also the need for different rules for line breaks, and
> sort order for the index. Writing sometimes in Esperanto for world
> readerships I perhaps need to look into that eventually. Or Polyglossia.
> But for the moment possibly not, if it has nothing to do with converting
> UTF-8 to glyphs.

Nothing at all. Both packages offer similar facilities, but my
impression is that Polyglossia is more up to date. YMMV

Peter

Ian Fantom

unread,
Oct 29, 2024, 7:14:53 PM10/29/24
to latexus...@googlegroups.com
Many thanks! It'll take a while to work through that, so I'm thanking
you now! I should now be able to get there :-)

Best wishes,

Ian

Ian Fantom

unread,
Mar 19, 2025, 6:33:21 AMMar 19
to latexus...@googlegroups.com
I'm just coming back to internationalisation, with a test document in
Esperanto. The issues are (using TexStudio and Xelex):

1. babel: Do I have to install something first, because I can't get it
to run.

2. pollyglossia: works perfectly, but is limited. For Esperanto there
are no line breaks. Any ideas on how to set up the language database
required? I haven't found the files yet that define the language
characteristics. Also, I'm wondering how sorting will work. I notice
that in Ubuntu command line the 'sort' command will sort in the correct
order, ie: c,ĉ,g,ĝ,h,ĥ,j,ĵ,u,ŭ. Would the same be necessarily true for
Latex sorting as for instance in an index?

3. Am I best using Xelex, or should I switch to Lua or some other engine?

Once I'm on the right track I'll probably be able to sort out many of
the details myself.

Regards,

Ian

Peter Flynn

unread,
Mar 19, 2025, 8:39:18 AMMar 19
to latexus...@googlegroups.com
On 19/03/2025 10:33, Ian Fantom wrote:
> I'm just coming back to internationalisation, with a test document in
> Esperanto. The issues are (using TexStudio and XeLaTex):

Bearing in mind I have never used Esperanto, but there does seem to be
some support at https://ctan.org/topic/esperanto

> 1. babel: Do I have to install something first, because I can't get it
> to run.

babel is a standard part of all TeX systems, so it's all right there
apart from some very specialist language packages (of which
https://ctan.org/pkg/babel-esperanto may be one!).

When you say "can't get it to run" what is the exact error message you
get that tells you this?

> 2. polyglossia: works perfectly, but is limited.

I suspect it just hasn't been developed as much as other languages.

> For Esperanto there are no line breaks.

I'm not clear what this means. Do you mean it does not hyphenate? Or
does not justify? Or it tries to create paragraphs as single
limitless-length lines with no breaks at all?

This is something that would need to be taken up with the developers, or
asked on tex.stackexchange.com or comp.text.tex

> Any ideas on how to set up the language database required?

For polyglossia generally? (that should be there) or for esperanto
specifically? (the polyglossia documentation should describe this).

> I haven't found the files yet that define the language
> characteristics.

Mine are in /usr/share/texlive/texmf-dist/tex/latex/polyglossia/ eg
gloss-esperanto.ldf

> Also, I'm wondering how sorting will work. I notice that in Ubuntu
> command line the 'sort' command will sort in the correct order, ie:
> c,ĉ,g,ĝ,h,ĥ,j,ĵ,u,ŭ. Would the same be necessarily true for LaTeX
> sorting as for instance in an index?
Sorting for indexes and glossaries for LaTeX is provided by makeindex.
If Ubuntu is using Unicode defaults, then I would hope that makeindex
would do the same. If not, there is another sorter called xindy but I
have not used it. Different languages do have different requirements,
and maybe no-one has yet written those for Esperanto.

> 3. Am I best using XeLaTex, or should I switch to LuaLaTeX or some
> other engine?
Until last week my answer would have been XeLaTeX but after the fairly
intensive discussions on the tex-live mailing list, I am now convinced
to switch to LuaLaTeX.

The main reason is that XeLaTeX is not longer being developed.
Both handle TTF/OTF fonts, both handle Unicode multibyte characters.

My concerns were that

(a) LuaLaTeX uses its own font-finder (luaotfload) whereas XeLaTeX uses
the standard (Unix) fontconfig; and

(b) LuaLaTeX is slower than XeLaTeX.

But luaotfload seems to work fine, and the speed difference should not
be noticeable on most users' typically small files like white-papers,
articles or essays; although it will be noticeable on big documents like
theses, books, whole journals, etc.

LuaLaTeX of course has the Lua scripting language built in, but this is
a minority attraction, although important for those who need it. It also
creates a PDF direct instead of using an intermediate format.

Overall I think it's a price worth paying, so I have switched without
pain, and I'm now converting _Formatting Information_ to reference
LuaLaTeX instead of XeLaTeX, which means a LOT of testing, so it won't
happen until the summer.

FWIW pdflatex is no longer a candidate: it can't handle TTF/OTF fonts
nor Unicode multibyte characters, so if you're using it, switch now. Its
remaining feature is that it DOES cater for PDF accessibility
requirements, but so does LuaLaTeX (XeLaTeX does not, as far as I
understand the discussion).

> Once I'm on the right track I'll probably be able to sort out many of
> the details myself.

Good luck!

Peter

Ian Fantom

unread,
Mar 19, 2025, 10:48:53 AMMar 19
to latexus...@googlegroups.com

Thanks, Peter! That's useful.

For Polyglossia there is a warning on compilation: '

\documentclass[french]{article}
\usepackage[T1]{fontenc}  % <- With XeTeX or LuaTeX, delete this line
\usepackage{babel}

\begin{document}
Plus ça change, plus c'est la même chose!
\end{document}

Ian Fantom

unread,
Mar 19, 2025, 11:22:58 AMMar 19
to latexus...@googlegroups.com
Thanks, Peter! That's useful.

* For Babel:

\documentclass[esperanto]{books}

\usepackage[esperanto]{babel}
Error on compilation: Package babel Error: Unknown option 'esperanto'.

\usepackage[french]{babel}
Error on compilation: Package babel Error: Unknown option 'french'.

\usepackage[]{babel}
Error on compilation: You haven't specified a language.

* For Polyglossia, by "For Esperanto there are no line breaks." I meant
it doesn't hyphenate. There is a warning on compilation that no
hyphenation patterns were loaded for Esperanto. So I've just got to find
out how to create hyphenation patterns. (In the Bronze Age I wrote a
program to split Esperanto words into morphemes, when working on the DLT
machine translation research - I'd probably be better at it now!).

It seems I put the query forward at just the right moment as regards
LuaTex! I'll look at that later. As you say, it could mean a lot of
testing. Polyglossia is OK for me right at the moment, but I'll have to
look at sort order when I do something with an index. I had assumed that
Unicode would be sorted according to the Unicode number, as ASCII was in
the Bronze Age, but the 'sort' command gets it right in Unix, though c &
ĉ etc aren't in sequential order in Unicode, and so there must be some
unseen magic going on.

But Polyglossia, it seems, was written specifically for XeTex as an
alternative to Babel - though I don't know why. It seems I may need
Babel when I move to LuaTex. I didn't manage to contact the developers
of Babel for Esperanto, and I haven't yet traced the developers of
Polyglossia for Esperanto.

Regards,

Ian

On 19/03/2025 12:39, Peter Flynn wrote:

Ian Fantom

unread,
Mar 19, 2025, 11:24:45 AMMar 19
to latexus...@googlegroups.com

Sorry, I pressed the 'Send' button in stead of 'Delete'!

Ian

RGarth Silvers

unread,
Mar 19, 2025, 11:42:48 AMMar 19
to latexus...@googlegroups.com
Hi,

I’m trying to use \label within either itemize or enumerate environments, in order to be able to reference an item for which I attach a specific label.

\begin{itemize}
\item[$C$ only]\label{cond1} yadda yadda yadda

\item\label{cond0[$D$ only]} yadda yadda yadda

\item\label{[$C$ and $D$]condint} yadda yadda yadda
\end{itemize}

We can see from \ref{cond0} that yadda but that from \ref{cond1} yadda yadda yadda and then from \ref{condint} something else.

***

The itemized list appears as I want it to, with $C$ only appearing in lieu of the bullet point (or 1 if I had used the enumerate environment) but a bullet point for the second and third items.

The subsequent sentence reads

We can see from that yadda but that from ?? yadda yadda yadda and then from ?? something else.

That is, the cond0 ref appears only as a blank space instead of $C$ only, and the other two refs are unfound and so appear as bold ??

When I do the above in Overleaf, \ref{cond1} returns 3 but \ref{cond0} and \ref{condint} return ??


Help much appreciated,
Randy

Peter Flynn

unread,
Mar 19, 2025, 12:38:31 PMMar 19
to latexus...@googlegroups.com
On 19/03/2025 14:48, Ian Fantom wrote:
> For Polyglossia there is a warning on compilation: '
>
> \documentclass[french]{article}
> \usepackage[T1]{fontenc} % <- With XeTeX or LuaTeX, delete this line
> \usepackage{babel}
>
> \begin{document}
> Plus ça change, plus c'est la même chose!
> \end{document}

I get no error running this. BUT...

a) if you want to use pdflatex, you need to add
\usepackage[utf8x]{inputenc}
(fontenc only deals with fonts: if you have non-ASCII input,
using inputenc will help it handle 2-byte characters).

b) if you want to use XeLaTeX or LuaLaTeX, you must (as you say)
comment out the fontenc line, but you should replace it with
\usepackage{fontspec}

No errors with all three processors. What is the warning you get?

> \usepackage[esperanto]{babel}
> Error on compilation: Package babel Error: Unknown option 'esperanto'.

That's clear enough. But on this machine (TL 2019) I have the following:

$ locate babel|grep esperanto
/usr/share/doc/texlive-doc/generic/babel-esperanto
/usr/share/doc/texlive-doc/generic/babel-esperanto/.uuid
/usr/share/doc/texlive-doc/generic/babel-esperanto/esperanto.pdf
/usr/share/texlive/texmf-dist/tex/generic/babel-esperanto
/usr/share/texlive/texmf-dist/tex/generic/babel/esperanto.sty
/usr/share/texlive/texmf-dist/tex/generic/babel/locale/eo/babel-esperanto.tex
/usr/share/texlive/texmf-dist/tex/generic/babel-esperanto/esperanto.ldf
/usr/share/texlive/texmf-dist/tex/latex/babelbib/esperanto.bdf

If you're missing these, then you need to find out why, and locate the
relevant package to install.

> \usepackage[french]{babel}
> Error on compilation: Package babel Error: Unknown option 'french'.

For babel French, the option name is apparently 'frenchb' (there IS an
option 'french' but it's obsolete). But it should still be there. If
you're missing it, perhaps your installation was an abbreviated version.

> For Polyglossia, by "For Esperanto there are no line breaks." I meant
> it doesn't hyphenate.
Then I'm afraid you might have to create the hyphenation patterns. The
details are in Appendix H of Knuth's TeXBook but there are some examples
at
https://tex.stackexchange.com/questions/262588/how-are-hyphenation-patterns-written

> sort order

There must be a way to specify locale (language and culture) when using
makeindex, as there must be with sort. I've never had to do it.

> But Polyglossia, it seems, was written specifically for XeTex as an
> alternative to Babel - though I don't know why.
Because babel only deals with ASCII (and maybe with 2-byte character
encoding). For Unicode compatibility you probably need polyglossia.

Authors' name and addresses should be in the package files, so check
babel.sty and polyglossia.sty

Peter

Peter Flynn

unread,
Mar 19, 2025, 12:45:21 PMMar 19
to latexus...@googlegroups.com
On 19/03/2025 15:42, RGarth Silvers wrote:
> I’m trying to use \label within either itemize or enumerate
> environments, in order to be able to reference an item for which I
> attach a specific label.

You can't have labels in an itemized list because there is nothing to
refer to. End of story¹

For enumerate, usurping the numeric label with a value in square
brackets makes it unreferenceable.

Peter
----------
¹ Unless you are generating LaTeX from some other source like XML, in
which case you can precalculate the ordinal number and make it create a
reference like "the first item in the list on p.XX".

Peter Flynn

unread,
Mar 19, 2025, 7:16:26 PMMar 19
to latexus...@googlegroups.com
On 19/03/2025 15:42, RGarth Silvers wrote:
> I’m trying to use \label within either itemize or enumerate
> environments, in order to be able to reference an item for which I
> attach a specific label.

Reply all
Reply to author
Forward
0 new messages