Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

unicode H-underdot and biblatex

179 views
Skip to first unread message

talazem

unread,
May 17, 2007, 6:43:51 PM5/17/07
to
I have a bibtex entry, in which the editor's last name begins with an
H with a dot underneath (an H-underdot).

I am using XeLaTeX, and Biblatex.

After running XeLaTeX once, then Bibtex (Biblatex), then come to run
the 3rd run (XeLaTeX), i get the following message:

=========
Runaway argument?
...
!Paragraph ended before \name was complete.
=========

Now, this ONLY happens with the underdot under the H when it is the
first letter of the last name. I have hundreds of underdots throughout
the document, and in various parts of the bibtex file, with no
problems; only here. If I remove the underdot, it works fine.

Yes, I know I can use {\d H}, but I'm sure there must be some other
way around this problem, since this is the only unicode issue I've run
into. That's also not the ideal solution, because in the resulting
PDF, the H-underdot is not considered part of the rest of the word; so
for example, if you do a search for the word "Ḥamza", it won't show:
the H-underdot is one word, and the "amza" is another.

Any ideas as to what is wrong, or how to fix it? Thanks.

Joseph Wright

unread,
May 19, 2007, 4:45:06 AM5/19/07
to

Could we have a minimal example?

Joseph Wright

talazem

unread,
May 19, 2007, 6:26:24 AM5/19/07
to
> Could we have a minimal example?

Sure. Here's a bib entry:

@Book{IbnTaymiyya1970,
author = {Ibn Taymiyyah, Aḥmad ibn ʿAbd al{-}Halīm},
title = {Naqḍ al{-}manṭiq},
shorttitle = {Naqḍ al-manṭiq},
editor = {Ḥamzah, Aḥmad},
publisher = {Maktabat a{l-}Sunnah},
address = {Cairo},
year = {1970},
sortname = {IbnTaymiyyaNaqdalmantiq},
keywords = { Logic, Medieval}}

As you can see, the editor's last name begins with an H-underdot.
Again, this isn't a LaTeX issue in the sense that underdots and all
the rest of the diacrtics work fine in the main body of the text...and
even in the footnotes and bibliography for that matter. Just not this
one for some reason.

Hope this helps.

Simon Spiegel

unread,
May 19, 2007, 6:52:51 AM5/19/07
to

Doesn't it work, if put it like this in the bib file?

Editor = {{\d H}amzah, A{\d h}mad},

simon

talazem

unread,
May 19, 2007, 8:53:39 AM5/19/07
to
> Doesn't it work, if put it like this in the bib file?
>
> Editor = {{\d H}amzah, A{\d h}mad},
>
> simon

Thanks for the response. However, as I mentioned in the first post, I
am trying to get to the bottom of why the underdot does not work for
only the first letter of the last name. It works fine for the ḥ in
"Aḥmad", and everywhere else. And as I said in the first post, I am
aware of using the {\d H} option, and am temporarily using it, but:
1. since XeLaTeX accepts all diacritics, this particular problem is
abnormal, on an abstract level;
2. on a practical level, the resulting PDF reads the "Ḥ" resulting
from the manual transcription as one word, and the rest of "amza" as
another word, which is problematic when you want to search through the
PDF for the name, for example.

Any other explanations of why this is happening would be most welcome.
And if this is a bug of some sort, perhaps it can be sorted out.
Thanks.

Simon Spiegel

unread,
May 19, 2007, 10:13:45 AM5/19/07
to
On 2007-05-19 14:53:39 +0200, talazem <tal...@gmx.net> said:

>
> Any other explanations of why this is happening would be most welcome.
> And if this is a bug of some sort, perhaps it can be sorted out.
> Thanks.

I think you will get more answers if you ask your question on the XeTeX
mailing list.

simon


Ulrike Fischer

unread,
May 19, 2007, 10:21:50 AM5/19/07
to
Am 17 May 2007 15:43:51 -0700 schrieb talazem:

> I have a bibtex entry, in which the editor's last name begins with an
> H with a dot underneath (an H-underdot).
>
> I am using XeLaTeX, and Biblatex.
>
> After running XeLaTeX once, then Bibtex (Biblatex), then come to run
> the 3rd run (XeLaTeX), i get the following message:
>
> =========
> Runaway argument?
> ...
> !Paragraph ended before \name was complete.
> =========
>
> Now, this ONLY happens with the underdot under the H when it is the
> first letter of the last name. I have hundreds of underdots throughout
> the document, and in various parts of the bibtex file, with no
> problems; only here. If I remove the underdot, it works fine.

I don't have XeLaTeX and don't use unicode. But I would suspect a
conflict with some command that tries to uppercase the first char of
the name. Does it work if you put braces around the name?

--
Ulrike Fischer

talazem

unread,
May 19, 2007, 2:59:05 PM5/19/07
to
> Does it work if you put braces around the name?

Unfortunately not.

Ulrike Fischer

unread,
May 20, 2007, 5:46:38 AM5/20/07
to
Am 19 May 2007 11:59:05 -0700 schrieb talazem:

>> Does it work if you put braces around the name?
>
> Unfortunately not.

And if you put something empty before the name, e.g. \relax,
\hbox{}, {}?
--
Ulrike Fischer

Joseph Wright

unread,
May 20, 2007, 8:00:44 AM5/20/07
to

Just a thought, but what if you have a dotted capital H elsewhere? I
assume that this is "abnormal", but I wonder if it is the position
(first in the last name) or the character itself that is the
problem.

Joseph Wright

P.S. No XeTeX here at the moment, hope to sort it out later today.

Philipp Lehman

unread,
May 28, 2007, 12:14:50 PM5/28/07
to
talazem wrote:

> I have a bibtex entry, in which the editor's last name begins with
> an H with a dot underneath (an H-underdot).
> I am using XeLaTeX, and Biblatex.

In contrast to what a lot of users think, it is not and has never been
possible to use UTF-8 in bib files. Neither traditional Bibtex nor
Bibtex8 support multibyte encodings such as UTF-8. If it seems to
work with some files, it only does so by chance.

> Any ideas as to what is wrong, or how to fix it? Thanks.

The only workaround is using Ascii notation: "{\d H}". I agree that
it's inconvenient but there is absolutely no other way.

--
Sender address blackholed; do not reply to From: address.
You can still reach me by email at: plehman gmx net.

Simon Spiegel

unread,
May 28, 2007, 3:14:30 PM5/28/07
to
On 2007-05-28 18:14:50 +0200, Philipp Lehman
<devnull....@spamgourmet.com> said:

> talazem wrote:
>
>> I have a bibtex entry, in which the editor's last name begins with
>> an H with a dot underneath (an H-underdot).
>> I am using XeLaTeX, and Biblatex.
>
> In contrast to what a lot of users think, it is not and has never been
> possible to use UTF-8 in bib files. Neither traditional Bibtex nor
> Bibtex8 support multibyte encodings such as UTF-8. If it seems to
> work with some files, it only does so by chance.
>
>> Any ideas as to what is wrong, or how to fix it? Thanks.
>
> The only workaround is using Ascii notation: "{\d H}". I agree that
> it's inconvenient but there is absolutely no other way.

I always thought that bibtex was kind of agnostic when it comes to file
encodings. I remember I did some testing with cyrillic letters and
xelatex quite some time ago and this worked well. Seems I was just
lucky.

simon

talazem

unread,
May 30, 2007, 8:42:16 PM5/30/07
to
On May 28, 5:14 pm, Philipp Lehman <devnull.1.leh...@spamgourmet.com>
wrote:

> In contrast to what a lot of users think, it is not and has never been
> possible to use UTF-8 in bib files. Neither traditional Bibtex nor
> Bibtex8 support multibyte encodings such as UTF-8. If it seems to
> work with some files, it only does so by chance.
>

> The only workaround is using Ascii notation: "{\d H}". I agree that
> it's inconvenient but there is absolutely no other way.

Fair enough, and that's what I've been doing practically. But I don't
understand why this only happens when the very first letter of the
surname has this underdot; i have lots of underdots and other macros
throughout the bibtex file, and they all work fine. It's not an issue
of chance, per se, since it is consistently JUST the underdot on the
first letter of the author's last name.

Thanks for everyone's help and advice all the same.

Dan

unread,
May 31, 2007, 10:01:00 AM5/31/07
to

Most of the character data in a .bib file is passed without
being processed very much by bibtex. The first letters of
names are, however, (almost) always treated further for
sorting purposes. Thus, multibyte characters elsewhere
are passed as being multiple characters, but not looked at
closely.

The problem with {\d H} in searching the PDF is a font
encoding problem (I would guess). With the correct font
encoding (and correct LaTeX support for that encoding)
{\d H} should get translated to a particular character slot
in the proper encoding. This would make it searchable. I
don't know enough about encodings (or about xetex) to
know if there is support for this.


Dan

Philipp Lehman

unread,
May 31, 2007, 3:35:48 PM5/31/07
to
talazem wrote:

> On May 28, 5:14 pm, Philipp Lehman

>> In contrast to what a lot of users think, it is not and has never
>> been possible to use UTF-8 in bib files. Neither traditional Bibtex
>> nor Bibtex8 support multibyte encodings such as UTF-8. If it seems
>> to work with some files, it only does so by chance.
>>
>> The only workaround is using Ascii notation: "{\d H}". I agree that
>> it's inconvenient but there is absolutely no other way.
>
> Fair enough, and that's what I've been doing practically. But I
> don't understand why this only happens when the very first letter of
> the surname has this underdot; i have lots of underdots and other
> macros throughout the bibtex file, and they all work fine. It's not
> an issue of chance, per se, since it is consistently JUST the
> underdot on the first letter of the author's last name.

It's not just pure chance but it's not specific to the first letter
either. Here's an example (save it in UTF-8 encoding):

---------- test.bib ----------
@book{test1,
author = {Muller},
title = {Title},
year = {1995},
}
@book{test2,
author = {Müller},
title = {Title},
year = {1996},
}
---------- test.bib ----------

---------- test.tex ----------
\documentclass{article}
\usepackage[utf8]{inputenc}
\begin{document}
\cite{test1}\par
\cite{test2}\par
\bibliography{\jobname}
\bibliographystyle{alpha}
\end{document}
---------- test.tex ----------

The alpha.bst uses labels like:

Jones 1995 -> Jon95
Müller 1995 -> Mül95

If you compile the above example, you get:

[Mul95]
[Mü96]

Note that the second label is missing an "l".

To understand what's going on here, you have to keep in mind that
bibtex was written with Ascii in mind so it's based on the assumption
that one printable character is represented by exactly one byte (or
octet). When the alpha.bst style tells it to "get the first three
characters" of a name, this means "get the first three bytes" for
Bibtex. This works fine with single-byte encodings like Ascii or even
Latin 1 (modulo the messed up sorting order...), but it fails with
multi-byte encodings.

For example, in Big-5 (a Chinese encoding), one 'character' is
represented by two bytes. UTF-8 is even trickier because it is a
variable-length encoding. A single character may be represented by
one to four bytes. The first 128 characters use a single byte. In
this range, UTF-8 is identical to Ascii (which is why it seems to
work with Bibtex in some cases). Starting with the 129th character,
it uses more than one byte to represent a printable character. The
name "Muller", written as an UTF-8 byte sequence, is:

4D756C6C6572

The octets are written as hexadecimal numbers here, let's clarify that
a bit:

[(4D)][(75)][(6C)][(6C)][(65)][(72)]

I'm using parentheses to mark the bytes and brackets to mark the
printable characters. As you can see, one character is one byte.
Let's look at Müller:

[(4D)][(C3)(BC)][(6C)][(6C)][(65)][(72)]

Note the second character which is represented by two bytes:

...(C3)(BC)...

Now, when alpha.bst tells bibtex to get "three characters", it
actually gets:

4DC3BC

and that's "Mü" in UTF-8. This will not trigger an error because it's
still valid UTF-8.

Now let's look at your example. "\d{H}amzah" in UTF-8 is:

[(E1)(B8)(A4)][(61)][(6D)][(7A)][(61)][(68)]

The H-underdot character is "E1B8A4" (three bytes). When bibtex
generates initials, it only gets one byte:

E1

but that's not a valid UTF-8 bytes sequence.

Philipp Lehman

unread,
May 31, 2007, 3:47:11 PM5/31/07
to
talazem wrote:

One more thing I just noticed:

> sortname {IbnTaymiyyaNaqdalmantiq},

"sortname" is a name list, it's not a literal field. You should omit
non-Ascii characters, but apart from that, it's used just like
"author", so that should probably be:

sortname = {Ibn Taymiyyah, Ahmad ibn Abd al-Halim},

0 new messages