Hyphenation in mixed math-text word

29 views
Skip to first unread message

gedeoned...@gmail.com

unread,
Mar 13, 2022, 4:01:44 AM3/13/22
to PreTeXt development
Hi devs,
it is well known that words like \(F\)-linear are not hyphenated by TeX and this could lead to an overfull in output. When you are writing a (La)TeX document the typical approach is to check the output and, when an overfull arises, manage the single word, possibly inserting the hyphenation points in the word linear by \- or (better) by adding a 0-space after the dash 

\(F\)-\hspace{0pt}linear 

or (even better) use a \(0\)-space and \nobreakdash- from amsmath

\(F\)\nobreakdash-\hspace{0pt}linear

(the last solution avoids hyphenation just after the dash).

Of course PreTeXt authors shouldn't modify the LaTeX file once that it is generated, but maybe the conversion PreTeXt->LaTeX->PDF could take this in account. 

If the source contains something like

<m>math</m>-word

then the conversion to LaTeX could produce

\(math\)\nobreakdash-\hspace{0pt}word

Is this possible/reasonable/desirable? Maybe it has been already considered and discarded for good reasons (I searched in the forums but I didn't find anything about this)

Cheers,
Valerio

Sean Fitzpatrick

unread,
Mar 13, 2022, 11:08:52 AM3/13/22
to PreTeXt development
I'll probably get in trouble with Rob and David for saying this, but it's fine to edit your LaTeX.
(Perhaps it's more correct to say authors shouldn't *have* to edit their LaTeX source.)

This sounds like something that could be a general improvement worth implementing.

But when I've got a few tweaks I want to implement in my LaTeX source, I generally ask myself, which is easier:
- implementing the tweak in my LaTeX style sheet (custom xsl)
- running a quick find/replace on the LaTeX source output

Often it's the latter.

--

David Farmer

unread,
Mar 13, 2022, 11:53:54 AM3/13/22
to PreTeXt development

This is a known problem, and it is even worse in the HTML
because if the sentence ends

... therefore <m>x = 1</m>.

the period at the end of the sentence can end up on the next
line.

Alex Jordan is working on this, which means that he has code
which can detect when math is directly adjacent to non-math.
And he has a way to fix it in HTML.

So, it is just a matter of implementing a similar solution
in LaTeX. I suspect the analogue is wrapping the string in a
hbox or something else that does not allow line breaks (but that
is just me speculating).

Note that it is always wrong to hand edit the HTML or LaTeX.
But in rare cases you are stuck doing that because there are
things which are not yet properly implemented in PreTeXt -- so two
wrongs make a right.


On Sun, 13 Mar 2022, Sean Fitzpatrick wrote:

> I'll probably get in trouble with Rob and David for saying this, but it's fine to edit your
> LaTeX.
> (Perhaps it's more correct to say authors shouldn't *have* to edit their LaTeX source.)
>
> This sounds like something that could be a general improvement worth implementing.
>
> But when I've got a few tweaks I want to implement in my LaTeX source, I generally ask myself,
> which is easier:
> - implementing the tweak in my LaTeX style sheet (custom xsl)
> - running a quick find/replace on the LaTeX source output
>
> Often it's the latter.
>
> On Sun., Mar. 13, 2022, 1:01 a.m. gedeoned...@gmail.com wrote:
> Hi devs,it is well known that words like \(F\)-linear are not hyphenated by TeX and
> this could lead to an overfull in output. When you are writing a (La)TeX document the
> typical approach is to check the output and, when an overfull arises, manage the
> single word, possibly inserting the hyphenation points in the word linear by \- or
> (better) by adding a 0-space after the dash 
>
> \(F\)-\hspace{0pt}linear 
>
> or (even better) use a \(0\)-space and \nobreakdash- from amsmath
>
> \(F\)\nobreakdash-\hspace{0pt}linear
>
> (the last solution avoids hyphenation just after the dash).
>
> Of course PreTeXt authors shouldn't modify the LaTeX file once that it is generated, but
> maybe the conversion PreTeXt->LaTeX->PDF could take this in account. 
>
> If the source contains something like
>
> <m>math</m>-word
>
> then the conversion to LaTeX could produce
>
> \(math\)\nobreakdash-\hspace{0pt}word
>
> Is this possible/reasonable/desirable? Maybe it has been already considered and discarded
> for good reasons (I searched in the forums but I didn't find anything about this)
>
> Cheers,
> Valerio
>
> --
>
> --
> You received this message because you are subscribed to the Google Groups "PreTeXt development"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> pretext-dev...@googlegroups.com.
> To view this discussion on the web visithttps://groups.google.com/d/msgid/pretext-dev/CAH%2BNcPa2hPwXsQGwaKSTTSZ-eUCQfexYA%2BSjg%3DUfZJN
> K4K-x0Q%40mail.gmail.com.
>
>

Rob Beezer

unread,
Mar 13, 2022, 12:04:24 PM3/13/22
to prete...@googlegroups.com
This one has been delayed, for no particular reason.

https://github.com/PreTeXtBook/pretext/pull/1596

I haven't looked, but perhaps detection can be abstracted into -common if it isn't already, with particular remedies in -html and in -latex.

Michael Doob

unread,
Mar 13, 2022, 12:10:17 PM3/13/22
to PreTeXt development
Sorry to keep repeating this: always remember the TeX is a macro language. If you don't, you will get
bitten eventually.

If you put anything in \hbox{...}, when the macro completes the contents are immutable. That includes
hyphenation. So \hbox{\(F\)-linear} will not split across lines in LaTeX output. For html you could
use <m>F\text{-linear}</m>. in your ptx coding. I believe that MathJax will do the right thing. 

David Farmer

unread,
Mar 13, 2022, 12:22:29 PM3/13/22
to PreTeXt development

In the HTML we wrap in a span with a special class. That performs the
same function as the hbox -- thank you for confirming that hbox does
what we want.

On Sun, 13 Mar 2022, Michael Doob wrote:

> Sorry to keep repeating this: always remember the TeX is a macro language. If you don't, you will
> getbitten eventually.
>
> If you put anything in \hbox{...}, when the macro completes the contents are immutable. That
> includes
> hyphenation. So \hbox{\(F\)-linear} will not split across lines in LaTeX output. For html you
> could
> use <m>F\text{-linear}</m>. in your ptx coding. I believe that MathJax will do the right thing. 
>
>
>
> On Sunday, March 13, 2022 at 10:08:52 AM UTC-5 dsfitz...@gmail.com wrote:
> I'll probably get in trouble with Rob and David for saying this, but it's fine to
> edit your LaTeX.
> (Perhaps it's more correct to say authors shouldn't *have* to edit their LaTeX source.)
>
> This sounds like something that could be a general improvement worth implementing.
>
> But when I've got a few tweaks I want to implement in my LaTeX source, I generally ask
> myself, which is easier:
> - implementing the tweak in my LaTeX style sheet (custom xsl)
> - running a quick find/replace on the LaTeX source output
>
> Often it's the latter.
>
> On Sun., Mar. 13, 2022, 1:01 a.m. gedeoned...@gmail.com wrote:
> Hi devs,it is well known that words like \(F\)-linear are not hyphenated by TeX
> and this could lead to an overfull in output. When you are writing a (La)TeX
> document the typical approach is to check the output and, when an overfull
> arises, manage the single word, possibly inserting the hyphenation points in
> the word linear by \- or (better) by adding a 0-space after the dash 
>
> \(F\)-\hspace{0pt}linear 
>
> or (even better) use a \(0\)-space and \nobreakdash- from amsmath
>
> \(F\)\nobreakdash-\hspace{0pt}linear
>
> (the last solution avoids hyphenation just after the dash).
>
> Of course PreTeXt authors shouldn't modify the LaTeX file once that it is generated,
> but maybe the conversion PreTeXt->LaTeX->PDF could take this in account. 
>
> If the source contains something like
>
> <m>math</m>-word
>
> then the conversion to LaTeX could produce
>
> \(math\)\nobreakdash-\hspace{0pt}word
>
> Is this possible/reasonable/desirable? Maybe it has been already considered and
> discarded for good reasons (I searched in the forums but I didn't find anything about
> this)
>
> Cheers,
> Valerio
>
> --
>
> --
> You received this message because you are subscribed to the Google Groups "PreTeXt development"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> pretext-dev...@googlegroups.com.
> To view this discussion on the web visithttps://groups.google.com/d/msgid/pretext-dev/76670125-8adf-4ae9-a5fe-4690d00b90ben%40googlegrou
> ps.com.
>
>

Michael Doob

unread,
Mar 13, 2022, 1:08:09 PM3/13/22
to PreTeXt development
Actually, I simplified (fibbed) since this was being applied to a short expression. There is, in fact
a \unhbox (and \unvbox) command which could change things if necessary. The most obvious
example is the output routine that takes the last paragraph (vbox) on the page and opens it up 
again with \unvbox to \vsplit the paragraph into the part at the bottom of the page and the part 
that migrates to the next page. Even this is simplified since floats change everything. 

The current output routine for LaTeX is largely a mystery to me. I had a real life example of a math paper
where adding some text reduced the number of pages by one. Nonetheless, I think that for short
expressions of a few words it's safe to put into an hbox and expect it to remain locked for the duration. 

Alex Jordan

unread,
Mar 13, 2022, 1:29:30 PM3/13/22
to prete...@googlegroups.com
This PR
https://github.com/PreTeXtBook/pretext/pull/1596

abstractly breaks up text in such a way that you can do things to the
math-adjacent text (text adjacent to the math with no space
characters). PreTeXt source like
foo bar<m>x</m>baz
will give the opportunity to insert things like
foo [bar{x}baz]

Each of [, {, }, and ] are just place-holders for whatever you can
imagine. In HTML, the brackets are a span preventing line breaking and
the braces are \( and \). Currently in LaTeX, the brackets are empty.

In LaTeX, the closing bracket could include "\nobreakdash". But that
doesn't quite fit what you describe, because using the example, what
you describe would change "baz" in some way. But there is an abstract
`$text-after` that is part of the processing. It could be examined for
its first character being a hyphen and then do something special.

All subject to exploration of possible undesired side effects, of course.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/alpine.LRH.2.21.2203131221020.30491%40li375-150.members.linode.com.

Rob Beezer

unread,
Mar 13, 2022, 6:06:27 PM3/13/22
to prete...@googlegroups.com
Thanks, Alex. I've moved this one up in the queue (but it'll be a few days
still...)

gedeoned...@gmail.com

unread,
Mar 13, 2022, 7:28:41 PM3/13/22
to PreTeXt development
Thank you everybody for all the answers (and yes, I sometimes edit the LaTeX output)

However, I think we have two different issues in html and LaTeX output (probably I didn't describe
properly the LaTeX problem in my previous message).

In HTML you don't want that \(F\)-linear is broken like this 
\(F\)
-linear
or similar problems: it happens that if math is followed by a textual parenthesis like this 

(we will prove later that \(1+1=2\))

then sometimes the closing parenthesis goes on the new line.
 
In LaTeX, the word linear in \(F\)-linear is not hyphenated even if it occurs at the end of the line
and this can result in an overfull of the line. So my goal is: hyphenate the word
linear like this when it is necessary

\(F\)-line-
ar

but avoid an hyphenation like this

\(F\)-
linear

Putting everything inside an hbox keeps \(F\)-linear on the same line and this could cause an overfull

However this code (borrowed from amsmath documentation)

\(F\)\nobreakdash-\hspace{0pt}linear

does exactly what I mean. I quote from Amsmath documentation

"The last example shows how to prohibit a line break after the hyphen but allow
normal hyphenation in the following word. (It suffices to add a zero-width space
after the hyphen.)"

Bye,
Valerio

Rob Beezer

unread,
Jul 21, 2022, 7:29:45 AM7/21/22
to prete...@googlegroups.com
Mostly for Alex. Davide Cervone was showing me an experiment he was running
with MathJax, related to the line-breaks that we have been trying to avoid on
either side of inline math in paragraphs. He was in some sort of test
environment he runs, but was using the vanilla Inspector.

If he un-checked

display : "inline-block"

on the top-most/inner-most set of rules then trailing text was behaving, while
leading text was behaving in Firefox, not in Chrome (or the other way around?).
He said that rule was there so you could put borders around the math, etc -
which sounded like things we are not doing. A replacement would be

display : "inline ! important"

Maybe you can experiment? I know this is a bit incomplete (which element was he
playing with?) but perhaps we can get Davide to provide more detail.

Rob

Rob Beezer

unread,
Jul 21, 2022, 10:18:49 AM7/21/22
to prete...@googlegroups.com
More from Davide. An upcoming MathJax update (late August?) will have improved
line-breaking. So this could become moot.

Rob

Rob Beezer

unread,
Jul 23, 2022, 2:47:24 PM7/23/22
to prete...@googlegroups.com
And more still from Davide Cervone.

He's got a very promising (and simple) fix going right now, so we'll keep an eye
out for that in the next update.

Rob
Reply all
Reply to author
Forward
0 new messages