BibTeX and UTF-8 and accentued characters

Merciadri Luca

unread,

Dec 7, 2009, 6:29:00 AM12/7/09

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I am using BibTeX on my Debian Lenny, with a texlive. According to
what I read on the Internet, BibTeX seems not to support the utf-8
format, for many reasons. My first reaction was to
$ iconv -f utf-8 -t iso-8859-1 mybibfile.bib
where `mybibfile.bib' is one of my bib files of one of my documents.
Okay, it works, but, when compiling the main .tex file, I receive
things like
`synth.tex:127:Package utf8x Error:
MalformedUTF-8sequence. ... F.~Bastin}, {\em {Examen d'entr\'ee � la'
where `F. Bastin' is actually an author and `Examen d'entrée' is the
name of a document. I am using
\usepackage[utf8x]{inputenc}
in my preamble. The problem is that inputenc parses *.bbl as utf-8
files, but as BibTeX does not support utf-8, these errors appear.
Or I am obliged to use
\usepackage[utf8x]{inputenc}
because of my document's encoding. I am thus unable to modify this. I
can modify the encoding of the *.bib files, but if I switch to
iso-8859-1, BibTeX is happy, but inputenc not. If I switch back to
utf-8, inputenc is happy, but BibTeX not.

I have introduced accents using LaTeX syntax, e.g.
`ée' is written `\'ee' in the .bib files. I had already tried with the
mere accents directly coming from the keyboard (i.e. without any LaTeX
commands), but it does not work too.

I read that using LuaTeX or another thing would be useful, but I do
not want to switch to this. I desperately tried
\usepackage{amsrefs}
but it did not change anything, with(out)
\usepackage{frbib}.

All I want is my .bib file(s) to be parsed correctly, and to be
rendered normally in my output .pdf file. I am using the routine
latex (dvi) -> dvi2ps (ps) -> ps2pdf (pdf).

Using directly pdfLaTeX is evidently not possible according to my
code.

I would appreciate any idea, folks.

- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iEYEARECAAYFAksc5vwACgkQM0LLzLt8MhzZEgCfdq9Zb1Ij+KeRegOrMwdT7uLg
80kAoINjzPFYReGnyIev66sakrnJyGG+
=H/1W
-----END PGP SIGNATURE-----

Ulrike Fischer

unread,

Dec 7, 2009, 6:35:20 AM12/7/09

to

Am Mon, 07 Dec 2009 12:29:00 +0100 schrieb Merciadri Luca:

> I am using BibTeX on my Debian Lenny, with a texlive. According to
> what I read on the Internet, BibTeX seems not to support the utf-8
> format, for many reasons. My first reaction was to
> $ iconv -f utf-8 -t iso-8859-1 mybibfile.bib
> where `mybibfile.bib' is one of my bib files of one of my documents.
> Okay, it works, but, when compiling the main .tex file, I receive
> things like
> `synth.tex:127:Package utf8x Error:
> MalformedUTF-8sequence. ... F.~Bastin},

You can switch the inputencoding before reading the bbl-file:

\begingroup
\inputencoding{latin1}
bibliography
\endgroup

Also bibtex does support utf-8 to some extend (it doesn't really
care). The main problem is that it can put newlines between the
utf-8 code of chars.

--
Ulrike Fischer

Merciadri Luca

unread,

Dec 7, 2009, 7:40:59 AM12/7/09

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ulrike Fischer <ne...@nililand.de> writes:

Thanks for this, Ulrike. I was not aware of these facts. It works like a
charm, now.

- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iEYEARECAAYFAksc99sACgkQM0LLzLt8MhzNaACfUBXSR0Y8geDSK4sp2G9nlN28
9z0An3TIOr105FFUXGgz5YoVTXCwOPsM
=MuBC
-----END PGP SIGNATURE-----

Thomas Arildsen

unread,

Dec 7, 2009, 9:34:58 AM12/7/09

to

I am using a different approach. I keep my .bib-file in UTF-8 as well as
my LaTeX document. I use:
\usepackage[utf8]{inputenc}
I use biblatex, which is capable of handling UTF-8:
\usepackage[bibencoding=inputenc]{biblatex}
In stead of BibTeX, I use Biber:
http://sourceforge.net/projects/biblatex-biber/
It is designed to handle unicode and specifically intended for biblatex.
It is still not out in a stable version, but it seems to work fine for my
purposes.

Thomas Arildsen

--
All email to sender address is lost.
My real adress is at es dot aau dot dk for user tha.

Peter Flynn

unread,

Dec 7, 2009, 6:11:00 PM12/7/09

to

Merciadri Luca wrote:
> I am using BibTeX on my Debian Lenny, with a texlive. According to
> what I read on the Internet, BibTeX seems not to support the utf-8
> format, for many reasons. My first reaction was to
> $ iconv -f utf-8 -t iso-8859-1 mybibfile.bib
> where `mybibfile.bib' is one of my bib files of one of my documents.

I'm in a similar position under Ubuntu 9.10. But I go the other way,
iconv -t utf-8 mybibfile.bib
which seems to work for almost everything except an s-caron, which I
ended up editing in as \v{s}.

> Okay, it works, but, when compiling the main .tex file, I receive
> things like
> `synth.tex:127:Package utf8x Error:
> MalformedUTF-8sequence. ... F.~Bastin}, {\em {Examen d'entr\'ee � la'
> where `F. Bastin' is actually an author and `Examen d'entrée' is the
> name of a document. I am using
> \usepackage[utf8x]{inputenc}
> in my preamble. The problem is that inputenc parses *.bbl as utf-8
> files, but as BibTeX does not support utf-8, these errors appear.

On reflection, I suspect my file may contain only those characters which
are in 8859-1, and thus inoffensive as a proper subset of UTF-8.

> Or I am obliged to use
> \usepackage[utf8x]{inputenc}
> because of my document's encoding. I am thus unable to modify this. I
> can modify the encoding of the *.bib files, but if I switch to
> iso-8859-1, BibTeX is happy, but inputenc not. If I switch back to
> utf-8, inputenc is happy, but BibTeX not.

I believe biblatex solves this, but I haven't grokked it yet. And I
don't know if it works with JabRef and other BIBTeX-oriented tools.

> I have introduced accents using LaTeX syntax, e.g.
> `ée' is written `\'ee' in the .bib files. I had already tried with the
> mere accents directly coming from the keyboard (i.e. without any LaTeX
> commands), but it does not work too.

\'{e} for some utterly obscure reason, seems to be the canonical format.

> All I want is my .bib file(s) to be parsed correctly, and to be
> rendered normally in my output .pdf file. I am using the routine
> latex (dvi) -> dvi2ps (ps) -> ps2pdf (pdf).

I think I may have to spend some time learning biblatex...

///Peter

Philipp Stephani

unread,

Dec 7, 2009, 6:27:46 PM12/7/09

to

Peter Flynn <peter...@m.silmaril.ie> writes:

> Merciadri Luca wrote:
>> I am using BibTeX on my Debian Lenny, with a texlive. According to
>> what I read on the Internet, BibTeX seems not to support the utf-8
>> format, for many reasons. My first reaction was to
>> $ iconv -f utf-8 -t iso-8859-1 mybibfile.bib
>> where `mybibfile.bib' is one of my bib files of one of my documents.
>
> I'm in a similar position under Ubuntu 9.10. But I go the other way,
> iconv -t utf-8 mybibfile.bib
> which seems to work for almost everything except an s-caron, which I
> ended up editing in as \v{s}.

The combination of UTF-8 and bibtex sometimes sort-of works, but this
combination has many flaws, see sec. 2.4.3 of the biblatex manual.

>> Okay, it works, but, when compiling the main .tex file, I receive
>> things like
>> `synth.tex:127:Package utf8x Error:
>> MalformedUTF-8sequence. ... F.~Bastin}, {\em {Examen d'entr\'ee � la'
>> where `F. Bastin' is actually an author and `Examen d'entrée' is the
>> name of a document. I am using
>> \usepackage[utf8x]{inputenc}
>> in my preamble. The problem is that inputenc parses *.bbl as utf-8
>> files, but as BibTeX does not support utf-8, these errors appear.
>
> On reflection, I suspect my file may contain only those characters
> which are in 8859-1, and thus inoffensive as a proper subset of UTF-8.

UTF-8 is a character encoding, not a characer set, and so it makes no
sense to speak of subsets of UTF-8. Perhaps you mean one of the
following:

- ISO-8859-1 is a subset of Unicode. This is certainly true, but every
other character set is also a subset of Unicode, and bibtex doesn't
know anything about these matters anyway. But ISO-8859-1 has the
unique property of sharing all code point–character mappings with
Unicode.

- Every well-formed ISO-8859-1 string is also a valid UTF-8 string,
which would be false.

I don't think that characters from the ISO-8859-1 character set work any
better in bibtex, and if so, then only by accident.

>> Or I am obliged to use
>> \usepackage[utf8x]{inputenc}
>> because of my document's encoding. I am thus unable to modify this. I
>> can modify the encoding of the *.bib files, but if I switch to
>> iso-8859-1, BibTeX is happy, but inputenc not. If I switch back to
>> utf-8, inputenc is happy, but BibTeX not.
>
> I believe biblatex solves this, but I haven't grokked it yet. And I
> don't know if it works with JabRef and other BIBTeX-oriented tools.

Yes, biblatex supports mixed encodings. However, if you are using
biblatex, then you should use biber anyway, which is capable of UTF-8.

>> I have introduced accents using LaTeX syntax, e.g.
>> `ée' is written `\'ee' in the .bib files. I had already tried with the
>> mere accents directly coming from the keyboard (i.e. without any LaTeX
>> commands), but it does not work too.
>
> \'{e} for some utterly obscure reason, seems to be the canonical
> format.

No, {\'e}; this is also explained in sec. 2.4.3 of the biblatex manual.

--
Change “LookInSig” to “tcalveu” to answer by mail.

Peter Flynn

unread,

Dec 7, 2009, 6:37:12 PM12/7/09

to

Philipp Stephani wrote:
[...]

>> which are in 8859-1, and thus inoffensive as a proper subset of UTF-8.
>
> UTF-8 is a character encoding, not a characer set, and so it makes no
> sense to speak of subsets of UTF-8. Perhaps you mean one of the

Quite so. I meant to type Unicode.

> Yes, biblatex supports mixed encodings. However, if you are using
> biblatex, then you should use biber anyway, which is capable of UTF-8.

I'm not really concerned about mixed encodings. If it will handle
everything in UTF-8 that's fine. But I would like to have tools like
JabRef...

>> \'{e} for some utterly obscure reason, seems to be the canonical
>> format.
>
> No, {\'e}; this is also explained in sec. 2.4.3 of the biblatex manual.

Sorry, my fingers are slipping tonight. That's what I meant.

///Peter

Merciadri Luca

unread,

Dec 8, 2009, 8:39:57 AM12/8/09

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for the answers. I am now okay with this.

- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iEYEARECAAYFAkseVy0ACgkQM0LLzLt8Mhy7BgCgiGwm7fYaqsJcqnzognCcIGEl
nuYAoKBhPBgtlw45TKtIx3kx+7Csywco
=Fbar
-----END PGP SIGNATURE-----