Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pdfTeX and UCS

107 views
Skip to first unread message

GL

unread,
Oct 30, 2012, 3:31:43 PM10/30/12
to
Hello,

Is somebody well informed about whether or not pdfTeX
will sometimes read UTF-8 ?

Thanks.

Philipp Stephani

unread,
Oct 30, 2012, 6:48:45 PM10/30/12
to
Since pdfTeX development has essentially stopped, I wouldn't hold my
breath. Even if it did support UTF-8, what should it do with it? It
remains an Unicode-unaware engine, and for that a hack like inputenc
seems sufficient.

GL

unread,
Oct 30, 2012, 7:04:47 PM10/30/12
to
Le 30/10/2012 23:48, Philipp Stephani a �crit :
What does this mean: "unicode-unaware engine" ?

Either it reads unicode, and choose the glyphs accordingly
(for fonts defined with an extended set of characters) like
XeTeX or LuaTeX

either it doesn't read unicode and simulation is performed
by character activation ("C2 to "DF according to UTF-8
two-bytes characters) in inputenc, which replaces some of the
input wide-characters by a control sequence (like pdfTeX now
works).

Then if pdfTeX is frozen, only LuaTeX seems to be the future
(I think this is still a beta version nowadays...).

Good night.

zappathustra

unread,
Oct 31, 2012, 3:16:41 AM10/31/12
to
GL <goua...@gmail.com> a écrit:
You can use LuaTeX as if it were PDFTeX and you'll have what you want.
You won't see the difference, barring a few missing primitives (and I
think Heiko has written something that mimicks them in Lua).

Best,
Paul

Robin Fairbairns

unread,
Oct 31, 2012, 6:50:06 AM10/31/12
to
GL <goua...@gmail.com> writes:

> Le 30/10/2012 23:48, Philipp Stephani a �crit :
>> GL <goua...@gmail.com> writes:
>>
>>> Hello,
>>>
>>> Is somebody well informed about whether or not pdfTeX
>>> will sometimes read UTF-8 ?
>>
>> Since pdfTeX development has essentially stopped, I wouldn't hold my
>> breath. Even if it did support UTF-8, what should it do with it? It
>> remains an Unicode-unaware engine, and for that a hack like inputenc
>> seems sufficient.
>
> What does this mean: "unicode-unaware engine" ?

contrast to "unicode-aware"

> Either it reads unicode, and choose the glyphs accordingly
> (for fonts defined with an extended set of characters) like
> XeTeX or LuaTeX
>
> [or] it doesn't read unicode and simulation is performed
> by character activation ("C2 to "DF according to UTF-8
> two-bytes characters) in inputenc, which replaces some of the
> input wide-characters by a control sequence (like pdfTeX now
> works).

the latter.

> Then if pdfTeX is frozen, only LuaTeX seems to be the future
> (I think this is still a beta version nowadays...).

luatex (or, i guess, xetex) is (are) the future. fortunately, the
future isn't yet here. a "declared finished" luatex will happen when
the future comes, after all.

(oddly, i don't remember similar whinges in the 90s when pdftex was new
and tex was frozen. it took me some time to catch up with pdftex, back
then, just as it's taking me time to catch up with luatex.)

robin the mostly contented snail...
--
Robin Fairbairns, Cambridge
sorry about all this posting. i'll go back to sleep in a bit.

GL

unread,
Oct 31, 2012, 6:57:51 AM10/31/12
to
Of course: pdftexcmds.sty does it !

> Best,
> Paul
>

The problem is that MiKTeX says today:

(Fatal format file error; I'm stymied)

for LuaTeX and LuaLaTeX (even after having rebuilt those formats
of course)... too bad ;-(

GL

unread,
Oct 31, 2012, 7:07:33 AM10/31/12
to
Le 31/10/2012 11:50, Robin Fairbairns a écrit :
> GL <goua...@gmail.com> writes:
>
Do you also know "The turtle and the rabbit" ? (A well known "Fable de
La Fontaine" in frankreich - well, in frankpoor ;-) )

I'm not fond of XeTeX
- because of the lack of pdf primitive, although XeTeX only produces
pdf outputs. (\pdfliteral, \pdfsetmatrix, \pdfoutline, \pdfximage)
- because XeTeX does not provide colour transparency (afaik)

Have a nice day.

Enrico Gregorio

unread,
Oct 31, 2012, 8:31:36 AM10/31/12
to
GL <goua...@gmail.com> wrote:
pdftex uses tfm files for its fonts; so basically only 256
characters are allowed in a font.

Ciao
Enrico

GL

unread,
Oct 31, 2012, 10:10:07 AM10/31/12
to
Le 31/10/2012 13:31, Enrico Gregorio a �crit :
> GL <goua...@gmail.com> wrote:
>
> pdftex uses tfm files for its fonts; so basically only 256
> characters are allowed in a font.
>
> Ciao
> Enrico
>
Yes. I don't know the tfm format, but it's possible to create
tfm files from ttf files (and from opentype certainly). Then
it might be possible to have tfm files with more than 256 "slots".
(but it might not be called tfm then...)

Nevertheless, how would pdfTeX do with UTF-8 inputenc and a unicode
font ? Have we to define:

\DeclareUnicodeCharacter{HHHH}{<load font <FoNt> use char slot nr <i>>}

?

This would mean that to use a unicode font with pdfTeX (and inputenc)
we shall cut the font in subfonts, each of whose containing 256 glyphs ?

Thanks in advance.


Philipp Stephani

unread,
Oct 31, 2012, 4:27:06 PM10/31/12
to
GL <goua...@gmail.com> writes:

> I'm not fond of XeTeX
> - because of the lack of pdf primitive, although XeTeX only produces
> pdf outputs. (\pdfliteral, \pdfsetmatrix, \pdfoutline, \pdfximage)
> - because XeTeX does not provide colour transparency (afaik)

AFAIK LuaTeX should support all these since it was originally based on
PDFTeX. Also XeTeX is essentially frozen as well...
LuaTeX itself is not frozen, however, LuaLaTeX depends on the luaotfload
package for loading fonts (the support for OTF it not built into the
\font primitive as for XeTeX), but it seems that luaotfload is
unfortunately also on somewhat shaky ground since the developer left the
project (after doing an immensely great work). So the only non-frozen
TeX project seems to be ConTeXt Mk. 4 right now?!

Philipp Stephani

unread,
Oct 31, 2012, 4:35:25 PM10/31/12
to
GL <goua...@gmail.com> writes:

> Nevertheless, how would pdfTeX do with UTF-8 inputenc and a unicode
> font ? Have we to define:
>
> \DeclareUnicodeCharacter{HHHH}{<load font <FoNt> use char slot nr <i>>}
>
> ?
>
> This would mean that to use a unicode font with pdfTeX (and inputenc)
> we shall cut the font in subfonts, each of whose containing 256 glyphs ?
>
> Thanks in advance.

That's possible, but would remain a hack and would have some severe
consequences, e.g. the inability to hyphenate a word with more than one
font.
As far as I know, the ucs package has been the most comprehensive
attempt to shoehorn Unicode onto pdfTeX, but eventually people decided
to move on to Unicode-aware engines with OTF support that really solve
the problem and cause lots of problems to disappear.

GL

unread,
Oct 31, 2012, 4:58:33 PM10/31/12
to
I thought ConTeXt was a format, not an engine...
Is this really an engine ?

Enrico Gregorio

unread,
Oct 31, 2012, 4:59:43 PM10/31/12
to
GL <goua...@gmail.com> wrote:
It wouldn't work, without deep surgery in the TeX engine: TeX
doesn't hyphenate words with characters from more than one font.

The TFM format actually allows more than 256 characters, but,
roughly speaking, the characters are divided into 256 slot "planes"
and characters sharing the position in these planes must have
the same dimensions (height, depth, width and italic correction).

This extension of the TFM format (which is /not/ used by TeX or
pdfTeX) has been thought for Chinese and Japanese; even if TeX
used it, it would impose too severe limitations on the font
structure.

Ciao
Enrico

GL

unread,
Oct 31, 2012, 7:01:50 PM10/31/12
to
Le 31/10/2012 21:59, Enrico Gregorio a ÔøΩcrit :
> GL <goua...@gmail.com> wrote:
>
>> Le 31/10/2012 13:31, Enrico Gregorio a ÔøΩcrit :
Well. This means that pdfTeX will die, superseeded by LuaTeX,
or may be ConTeXt, as Phillip Stephani said, but I thought this
was a format and not an engine (I read in Wikipedia that ConTeXt
MKIV was based on LuaTeX...)

Using unicode font, when available, seems too attractive for
typographers.


Good night.

Enrico Gregorio

unread,
Oct 31, 2012, 6:15:42 PM10/31/12
to
GL <goua...@gmail.com> wrote:

> Le 31/10/2012 21:59, Enrico Gregorio a ⁄crit :
> > GL <goua...@gmail.com> wrote:
> >
> >> Le 31/10/2012 13:31, Enrico Gregorio a ⁄crit :
ConTeXt is a format; it basically uses LuaTeX, in the more recent
incarnation, but can also use XeTeX or even pdfTeX, with less
features.

Ciao
Enrico

Philipp Stephani

unread,
Nov 1, 2012, 4:07:03 PM11/1/12
to
It (Mk. IV) is a format that depends on LuaTeX as engine, like
LuaLaTeX. For a LuaTeX-based format it's very important to be tightly
integrated with the LuaTeX engine: while XeTeX would automatically
extend the \font primitive to enable OpenType fonts, LuaTeX doesn't, and
so OpenType support has to partially come from the engine. (Unicode
support still comes for free, but in almost all cases you also want
OpenType support.)

Guenter Milde

unread,
Nov 6, 2012, 10:51:02 AM11/6/12
to
On 2012-10-31, GL wrote:
> Le 31/10/2012 21:59, Enrico Gregorio a Žcrit :
>> GL <goua...@gmail.com> wrote:
>>> Le 31/10/2012 13:31, Enrico Gregorio a Žcrit :
>>>> GL <goua...@gmail.com> wrote:

>>>> pdftex uses tfm files for its fonts; so basically only 256
>>>> characters are allowed in a font.
...
>>> This would mean that to use a unicode font with pdfTeX (and inputenc)
>>> we shall cut the font in subfonts, each of whose containing 256 glyphs ?

Yes.

...

> Well. This means that pdfTeX will die, superseeded by LuaTeX,
...
> Using unicode font, when available, seems too attractive for
> typographers.

Eventually, yes.
Just like the original TeX was superseded by eTeX and later pdfTeX.
It's a slow process, though.

Günter
0 new messages