Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hyphenation of words containing dashes

1,646 views
Skip to first unread message

Alex M.

unread,
Dec 13, 2011, 6:18:18 AM12/13/11
to
Hello,
just for interest: I would like to know why LaTeX does not hyphenate
words containing dashes on other positions than the dash. I can't see
any special reason for that...

Alex

Alex M.

unread,
Dec 13, 2011, 6:41:55 AM12/13/11
to
And why is hyphenation allowed after a dash at the beginning of a word
(for example an "Ergänzungsstrich" in german). Is there any case where
it makes sense that the dash stands alone on the previous line...

Marc van Dongen

unread,
Dec 13, 2011, 8:59:52 AM12/13/11
to
That's how LaTeX works; it's not supposed to hyphenate words at
illegal
positions. It has a reasonable idea of how to hyphenate most words but
by providing a word with hyphenation positions you effectively tell
it:
look, these are the _only_ places where you may hyphenate this word.

Regards,


Marc van Dongen

Herbert Voss

unread,
Dec 13, 2011, 9:27:19 AM12/13/11
to
you can use the german babel shortcuts which allows
hyphenation of words with a dash, written as dash"=word



\documentclass{article}
\usepackage[ngerman,english]{babel}
\useshorthands{"}
\addto\extrasenglish{\languageshorthands{ngerman}}
\begin{document}

\frame{\minipage{1cm}\hrulefill\par
\hspace{0pt}zoning-facilitated

\hspace{0pt}zoning"=facilitated
\endminipage}\hspace{1cm}
%
\frame{\minipage{1cm}
\hspace{0pt}zoning-facilitated
\endminipage}

\end{document}

Herbert

Dominik Waßenhoven

unread,
Dec 13, 2011, 9:02:37 AM12/13/11
to
In English, words containing dashes should not be hyphenated, AFAIK. I
am not sure why (but I am no native speaker). In German, however, it is
common, since German has long words with dashes that would look ugly if
not hyphenated (at least in justified texts). Babel with the (n)german
option offers special commands for such dashes. See e.g.
http://homepage.ruhr-uni-bochum.de/Georg.Verweyen/silbentrennung.html
for an overview of these.

Regards,
Dominik.-

--
UK-TeX-FAQ: http://www.tex.ac.uk/cgi-bin/texfaq2html
minimal example: http://www.minimalbeispiel.de/mini-en.html
biblatex styles: http://biblatex.dominik-wassenhoven.de/?en

Lee Rudolph

unread,
Dec 13, 2011, 11:28:25 AM12/13/11
to
Marc van Dongen <don...@cs.ucc.ie> writes:

>That's how LaTeX works; it's not supposed to hyphenate words at
>illegal
>positions. It has a reasonable idea of how to hyphenate most words

...for, as I learned in the last week or so, values
of "reasonable idea" and "most words" that result in
the hyphenations Got-tleib and Her-rmann (proper names
of course are bound to be a point of difficulty, but
I thought both of those were a bit bizarre as proposed
English hyphenations, though come to think of it the
presence of "rattled" in standard dictionaries *would*
lead to the first--but how the second slipped in, I have
no idea).

Lee Rudolph

Alex M.

unread,
Dec 13, 2011, 12:02:32 PM12/13/11
to
Thank you for your answers.
I did not know that English words with dashes are not hyphenated. As I
write in German, I already use the "= dash instead of -.

Indeed, I also found a macro making - an active character to still
allow hyphenation:
http://stackoverflow.com/questions/2193307/how-to-get-latex-to-hyphenate-a-word-that-contains-a-dash
However, when there is a label/ref containing a dash, this runs into
problems...

According to the link given by Dominik, German words should not be
hyphenated near a dash. It would be ideal if TeX would preferentially
hyphenate at the dash instead of one single syllable next to it, e.g.,

... Monte-Carlo-
Simulation ...

instead of

... Monte-Carlo-Si-
mulation ...

Is there a way to enforce this?

Alex

Peter Flynn

unread,
Dec 13, 2011, 5:12:40 PM12/13/11
to
I think what the OP meant was in compound words like
"illegally-provided", where the presence of the binding
hyphen inhibits TeX from breaking in other perfectly
valid places (ille\-gally\-pro\-vi\-ded). It's a standing
problem with the algorithm used in TeX.
In the case of the suspended hyphen as in the second
hyphen in "Tiergesetzreform-Befürworter und -Gegner",
I don't have an example of the second word being
separately hyphenated. Can you provide a minimal
example in LaTeX?

///Peter

Peter Flynn

unread,
Dec 13, 2011, 5:14:17 PM12/13/11
to
On 13/12/11 14:02, Dominik � wrote:
> .:|Alex M.|:. wrote:
>
>> just for interest: I would like to know why LaTeX does not hyphenate
>> words containing dashes on other positions than the dash. I can't see
>> any special reason for that...
>
> In English, words containing dashes should not be hyphenated, AFAIK.

If possible, yes. But a compound of two or more long words may need it,
and it is an error otherwise.

///Peter

Alex M.

unread,
Dec 14, 2011, 3:36:27 AM12/14/11
to
On 13 Dez., 23:12, Peter Flynn <pe...@silmaril.ie> wrote:

> In the case of the suspended hyphen as in the second
> hyphen in "Tiergesetzreform-Befürworter und -Gegner",
> I don't have an example of the second word being
> separately hyphenated. Can you provide a minimal
> example in LaTeX?

Hello!
One minimal example would be:

\documentclass{article}
\usepackage{ngerman}
\begin{document}

\frame{\minipage{3.6cm}\hrulefill\par
Warenein- und -ausgang \\ % dash alone
Warenein"= und "=ausgang \\ % as expected
der Warenein"= und "=ausgang % dash again alone
\endminipage}

\end{document}


Alex

Ulrike Fischer

unread,
Dec 14, 2011, 4:53:49 AM12/14/11
to
No. Imho this happens because it is simply technically a bit
difficult to suppress the line break. Perhaps this will change in
newer engines like luatex. You can surround the hyphen by \mbox, or
if you use T1-encoding you can use the other hyphen in such cases.
In d.c.t.t I just posted this example which redefines the "~ used in
ngerman:

\documentclass{article}
\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel}

\shorthandon{"}
\makeatletter
\declare@shorthand{ngerman}{"~}{\char127}
\makeatother
\shorthandoff{"}

\begin{document}
Dies ist ein unsinniger Text, der länger als eine Zeile ist, weil er
eineiii -Worttrennung enhalten soll.

Dies ist ein unsinniger Text, der länger als eine Zeile ist, weil er
eineiii "~Worttrennung enhalten soll.


Dies ist ein unsinniger Text, der länger als eine Zeile ist, weil er
eineiii \mbox{-}Wort\-trennung enhalten soll.
\end{document}

--
Ulrike Fischer

Dominik Waßenhoven

unread,
Dec 14, 2011, 6:48:43 AM12/14/11
to
Thanks for the clarification.

Dan Luecking

unread,
Dec 14, 2011, 2:28:46 PM12/14/11
to
On Tue, 13 Dec 2011 16:28:25 +0000 (UTC), Lee Rudolph
<lrud...@panix.com> wrote:

>Marc van Dongen <don...@cs.ucc.ie> writes:
>
>>That's how LaTeX works; it's not supposed to hyphenate words at
>>illegal
>>positions. It has a reasonable idea of how to hyphenate most words
>
>...for, as I learned in the last week or so, values
>of "reasonable idea" and "most words" that result in
>the hyphenations Got-tleib and Her-rmann (proper names

You can't expect german names (or even english proper names)
to be correctly handled by an english hyphenation algorythm.

>of course are bound to be a point of difficulty, but
>I thought both of those were a bit bizarre as proposed
>English hyphenations, though come to think of it the
>presence of "rattled" in standard dictionaries *would*
>lead to the first--but how the second slipped in, I have
>no idea).

Because no english word exists that forbids this hyphenation, but
many many exist that permit "rr" to be split.

The logic behind hyphenation patterns is to have short simple
rules (e.g., hyphen.tex permits one inside almost any "rr")
and then handle exceptions to that rule by looking at minimally
more context and forbid one in that context (for example,
hyphen.tex forbids one within "rr" in the context "nerr" and
there are several more). This could now inhibit a small number
of allowed hyphenations, so these exceptions would be handled
by looking at slightly more context.

Clearly Herrmann is permitted because it matches the "rr" pattern
but not any of the inhibiting patterns. No such inhibiting pattern
exists for Herrmann, because no english word requires it.


Dan
To reply by email, change LookInSig to luecking

Peter Flynn

unread,
Dec 14, 2011, 4:44:29 PM12/14/11
to
On 14/12/11 08:36, Alex M. wrote:
> On 13 Dez., 23:12, Peter Flynn<pe...@silmaril.ie> wrote:
>
>> In the case of the suspended hyphen as in the second
>> hyphen in "Tiergesetzreform-Bef�rworter und -Gegner",
>> I don't have an example of the second word being
>> separately hyphenated. Can you provide a minimal
>> example in LaTeX?
>
> Hello!
> One minimal example would be:
>
> \documentclass{article}
> \usepackage{ngerman}
> \begin{document}
>
> \frame{\minipage{3.6cm}\hrulefill\par
> Warenein- und -ausgang \\ % dash alone
> Warenein"= und "=ausgang \\ % as expected
> der Warenein"= und "=ausgang % dash again alone
> \endminipage}
>
> \end{document}

Thank you. This looks like a bug to me, but it should be reported by a
native German speaker.

///Peter

Robin Fairbairns

unread,
Dec 15, 2011, 6:25:00 AM12/15/11
to
it's not clear to me that it's even soluble (other than by putting the
hyphen in a box with some part of the word ... i.e., manual
hyphenation).

may be possible in luatex, but until i'm up to speed with that, i can't
say for sure.
--
Robin Fairbairns, Cambridge
my address is @cl.cam.ac.uk, regardless of the header. sorry about that.

Peter Flynn

unread,
Dec 15, 2011, 5:12:05 PM12/15/11
to
On 15/12/11 11:25, Robin Fairbairns wrote:
> Peter Flynn<pe...@silmaril.ie> writes:
>
>> On 14/12/11 08:36, Alex M. wrote:
>>> On 13 Dez., 23:12, Peter Flynn<pe...@silmaril.ie> wrote:
>>>
>>>> In the case of the suspended hyphen as in the second
>>>> hyphen in "Tiergesetzreform-Bef�rworter und -Gegner",
>>>> I don't have an example of the second word being
>>>> separately hyphenated. Can you provide a minimal
>>>> example in LaTeX?
>>>
>>> Hello!
>>> One minimal example would be:
>>>
>>> \documentclass{article}
>>> \usepackage{ngerman}
>>> \begin{document}
>>>
>>> \frame{\minipage{3.6cm}\hrulefill\par
>>> Warenein- und -ausgang \\ % dash alone
>>> Warenein"= und "=ausgang \\ % as expected
>>> der Warenein"= und "=ausgang % dash again alone
>>> \endminipage}
>>>
>>> \end{document}
>>
>> Thank you. This looks like a bug to me, but it should be reported by a
>> native German speaker.
>
> it's not clear to me that it's even soluble

I'd be surprised. The rule is very simple: never break after a hyphen
that is preceded by a space.

///Peter

Robin Fairbairns

unread,
Dec 15, 2011, 7:06:33 PM12/15/11
to
Peter Flynn <pe...@silmaril.ie> writes:

> On 15/12/11 11:25, Robin Fairbairns wrote:
>> Peter Flynn<pe...@silmaril.ie> writes:
>>
>>> On 14/12/11 08:36, Alex M. wrote:
>>>> On 13 Dez., 23:12, Peter Flynn<pe...@silmaril.ie> wrote:
>>>>
>>>>> In the case of the suspended hyphen as in the second
>>>>> hyphen in "Tiergesetzreform-Bef�rworter und -Gegner",
>>>>> I don't have an example of the second word being
>>>>> separately hyphenated. Can you provide a minimal
>>>>> example in LaTeX?
>>>>
>>>> One minimal example would be:
>>>>
>>>> \documentclass{article}
>>>> \usepackage{ngerman}
>>>> \begin{document}
>>>>
>>>> \frame{\minipage{3.6cm}\hrulefill\par
>>>> Warenein- und -ausgang \\ % dash alone
>>>> Warenein"= und "=ausgang \\ % as expected
>>>> der Warenein"= und "=ausgang % dash again alone
>>>> \endminipage}
>>>>
>>>> \end{document}
>>>
>>> Thank you. This looks like a bug to me, but it should be reported by a
>>> native German speaker.
>>
>> it's not clear to me that it's even soluble
>
> I'd be surprised. The rule is very simple: never break after a hyphen
> that is preceded by a space.

and you achieve that how? i've toyed with all routes (that i can think
of) with standard tex, and i can't demonstrate that it's possible.
trivial modification of tex, but who's going to do that.

i wouldn't be surprised to learn that it can be made to work with
luatex, or some of those fancy facilities (that i don't yet understand
about) in xetex. but not tex.

Ulrike Fischer

unread,
Dec 16, 2011, 6:17:56 AM12/16/11
to
Am Thu, 15 Dec 2011 22:12:05 +0000 schrieb Peter Flynn:


>> it's not clear to me that it's even soluble

> I'd be surprised. The rule is very simple: never break after a hyphen
> that is preceded by a space.

and preceded by an opening brace (-ausgang) or by a quote, or ...


--
Ulrike Fischer

Donald Arseneau

unread,
Dec 16, 2011, 4:42:51 PM12/16/11
to
Both problems are "easily" soluble by using an alternate hyphen char,
as the T1 font encoding has provided since way back. There are support
files also, but they seem not to have had major uptake.

Three things are needed:

1) Set the hyphen character for the font to be the alternate hyphen.

2) Set the \lccode of the regular hyphen to itself (not 0)

3) Add some hyphenation patterns to the particular language definition.

Only item 3 sounds hard but it was easy when I tried it for portuguese
hyphenation (also making special duplicate hyphens). There is the file
hypht1.tex meant to be added to any languages hyphenation patterns, but
I am shocked now to see that it is not included in MiKTeX or TeX Live.


... Oh wait ... it only works in this simplest way when breaks at explicit
hyphens produce two hyphen characters (as with Portuguese).


Donald Arseneau as...@triumf.ca

Peter Flynn

unread,
Dec 16, 2011, 4:43:01 PM12/16/11
to
I rest my case :-)

///Peter

Dan

unread,
Dec 16, 2011, 4:52:33 PM12/16/11
to
The better rule is: only allow the break within a "word"

TeX already has a rule for what is a "word" in its hyphenation
algorithm: Searching ahead from glue, skip whatsits, implicit
kerns and *characters with zero lccode*; then start a word with
the first character with nonzero lccode.

So, situations like
"-ausgang
<space>-ausgang
(-ausgang)
are all covered by a rule which is a minor addition to an already
implemented algorithm.


Dan

Aditya Mahajan

unread,
Dec 20, 2011, 6:16:51 PM12/20/11
to
On Dec 15, 6:25 am, Robin Fairbairns <r...@cl.cam.ac.uk> wrote:
> it's not clear to me that it's even soluble (other than by putting the
> hyphen in a box with some part of the word ... i.e., manual
> hyphenation).
>
> may be possible inluatex, but until i'm up to speed with that, i can't
> say for sure.

For some time now, ConTeXt MkIV correctly hyphenates words with dashes
without any additional setup. For example


\mainlanguage[de]
\starttext
\showhyphens{Warenein-und-ausgang}
\stoptext

gives


languages > hyphenation > show: Waren[-||]ein-und-aus[-||]gang

where [-||] means possible hyphenation points.

Aditya

Guenter Milde

unread,
Jan 11, 2012, 4:02:05 PM1/11/12
to
Does this also work for

\showhyphens{Warenein- und -ausgang}

(the short form for "Wareineingang und Warenausgang")?

Günter
0 new messages