I have a curious anomaly.
I am still using Xelex, and polyglossia for Esperanto. I presume
this could apply to any language or UTF-8 file using non-ASCII
symbols.
I compiled a booklet successfully, with each chapter in a separate .tex file, and including them with eg:
\input{uea_enketo_enkonduko.tex}
Everything worked fine. I did a preprint with Amazon kdp.
Then I added a chapter in exactly the same way as I had done with the others:
\input{antaux_kaj_post_la_enketo.tex} % 2025-05-11
The first few lines of the file are:
- -------------
\chapter[Antaŭ kaj Post la Enketo]{Antaŭ kaj Post la Enketo}
Test: ĉĥĝĵŝŭ
Antaŭ kaj Post la Enketo
Ian Fantom
- ----------------
I rebuilt, and the pdf was produced just fine:
However, the display of the tex file in the editor had changed to:
- -----------
\chapter[AntaÅ kaj Post la Enketo]{AntaÅ kaj Post la Enketo}
Test: Ä‰Ä¥Ä ÄµÅ Å
AntaÅ kaj Post la Enketo
Ian Fantom
- --------------
I can't edit this with the editor using ĉĥĝĵŝŭ from the keyboard!
This applies only to this one file. If I copy a para from this .tex file and past it onto the main file, or onto another file, then the special characters do show properly in the editor.
I've tried removing the '\chapter' line and get the same result. I've tried removing the '.tex' extension and get the same result. I'm at a loss to see any difference whatsoever between this file and the files for the other chapters. It seems some transliteration program has intervened to produce 8-bit sequences instead of the Unicode letters.
I'm baffled!
Regards,
Ian
Many thanks, Peter. I wasn't expecting such a knowledgeable reply
- it was such an unexpected and weird problem! So it's interesting
that you've met something similar before.
I tried:
$ file antaux_kaj_post_la_enketo.tex
antaux_kaj_post_la_enketo.tex: LaTeX document, UTF-8 Unicode text,
with very long lines
If I copy a para from this .tex file and paste it onto the main
file, or onto another file, then the special characters do show
properly in the editor.
vi antaux_kaj_post_la_enketo.tex
Then use the right mouse button to copy a para, and paste it into the tex editor with the right button.
Ah - I've just seen an ambiguity: Right button, CTRL-v, Right button no formatting. I'll try all of those. ... I got exactly the same with:
TEST: cp using cntrl-v:
Test: ĉĥĝĵŝŭ
TEST: cp using right mouse click 'Paste': Test: ĉĥĝĵŝŭ
TEST: cp using right mouse click 'Paste as Latex': Test: ĉĥĝĵŝŭ
I typed the 'Test: ĉĥĝĵŝŭ' directly into the file using vi,
before copying.
I haven't used any other utilities on the file. I created it by
'Save as' in LibreOffice Writer, then I tried again by reading the
file in the text editor Mousepad. The result was the same. I tried
playing the the end-of-line marker on saving: LF | CR LF. Same
result.
I'm still baffled!
Best wishes,
Ian
I've just tried another test. I copied from the screen a section from the LibreOffice Writer directly to the tex file in the Tex editor. It showed correctly in the Tex editor, but the special characters appeared as question marks in the pdf file.
I've just been looking at the man page for 'file'. It seems there's no flag to say what the encoding is, but 'file' figures it out from the text, and concludes that my file is UTF-8. I wonder how the Tex editor does it.
My tex editor is:
TeXstudio 2.12.22 (Build: 2.12.22+debian-1build1) Using Qt Version 5.12.8, compiled with Qt 5.12.5 R
Perhaps the Tex editor guesses from the contents, too, and comes to a different conclusion for that particular file.
I'll look for any difference that might have thrown the Tex equivalent of 'file'.
I haven't yet ventured into trying out other Tex editors!
Another observation is that the Tex editor rewrites the file using '^M^M' for line endings rather than '^L' or '^M^L'. I'm not sure what the implication is, but it might mean it thinks it's a Microsoft format? The '^M^M' in the Unix file doesn't create a new line, and so the whole document appears as one paragraph with the occasional 'MM' shown in blue.
Regards,
Ian