Abbreviations

14 views
Skip to first unread message

Rob Beezer

unread,
Mar 7, 2023, 10:59:27 PM3/7/23
to prete...@googlegroups.com
We have thirteen empty elements, which generate abbreviations. They are "Latin"
abbreviations, e.g: AM, vs., i.e., etc. If used they would promote consistency
and correctness (AM, not A.M., am, a.m.).

Seven of these get a "special" period in LaTeX, which will not typeset like it
was the end of a sentence.

Chicago Manual of Style has about 35 pages on abbreviations (Chapter 15, 15e),
mostly with lists of them (some are scientific units, which we have covered
pretty well). So I've certainly not been adding new ones, becaue our list would
never be complete.

I've stopped using these in my own writing, and have just been using a "nbsp"
element in places where I know it will make LaTeX do an acceptable job.

I suspect etc., i.e., and e.g. are the most commonly used.

Options:

1. Status quo. They are fine and not a problem. Keep supporting them.

2. Deprecate the thirteen empty elements, with the usual fallbacks,
precautions, and warnings. Make an empty element for an "abbreviation period"
which will do the right thing in LaTeX and other formats. (There is an internal
template right now which achieves this). Better than "nbsp" and easy enough to
search/replace when we learn to identify the different uses of a period.

3. Deprecate the thirteen empty elements, with the usual fallbacks,
precautions, and warnings. Advise "nbsp" for those who understand LaTeX and
want quality output.

Comments?

Thanks,
Rob

David W. Farmer

unread,
Mar 7, 2023, 11:32:23 PM3/7/23
to prete...@googlegroups.com

4. Recognize that this is yet another case where in the long run
people will write in a form of pre-PreTeXt which is transformed
into official PreTeXt.

So the underlying question is: what should be in that official
PreTeXt which is machine generated from what the author actually
writes? That answer may suggest which of the first three options
is best.

David
> --
> You received this message because you are subscribed to the Google Groups
> "PreTeXt development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pretext-dev...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pretext-dev/MTAwMDAxMS5iZWV6ZXI.1678247966%40quikprotect.
>
>

Rob Beezer

unread,
Mar 7, 2023, 11:49:54 PM3/7/23
to prete...@googlegroups.com
5. The LaTeX conversion does seven interceptions of the form (spaces are significant):

" etc. " -> " etc.\@ "

And there is low overhead to add more. But not the hundreds in CMOS.

I guess forms at the start of sentences would need implementation. No, I don't get regex support in XSLT 1.0.

Rob

Rob Beezer

unread,
Mar 8, 2023, 12:19:12 AM3/8/23
to prete...@googlegroups.com
6. Mr. Ms. Dr. Ph.D. ...

Not including D.W. Farmer. ;-)

Alex Jordan

unread,
Mar 8, 2023, 1:40:50 AM3/8/23
to prete...@googlegroups.com
I'll confine my thoughts to English relevancy, but I note that the
logic and complications are not the same in all other languages.

Three kinds of spacing:
1. between sentences
2. between atoms of a sentence and line breaking is OK
3. between atoms of a sentence and line breaking is not OK

What signifies 1? It's not always a period. We have these:
. ? !
LaTeX recognizes all of them to induce space flavor 1. (Even in
situations like this one!) Just to observe that these three characters
need not immediately precede a space flavor 1 in all instances where
space flavor 1 is warranted.

Sometimes, one of those three characters is *not* sentence-ending and
should not make space flavor 1. This thread is about the period used
as an abbreviation, which is the most common situation when that
happens. But there are other rare situations like if I ate an Oh
Henry! candybar the other day. And less rare situations like when Rob
said "Knock it off!" so I did.

So I wanted to check about whatever the "official PreTeXt" ends up
being, with regard to the new element proposed in option 2. Would it
be:
* intentionally designed to support abbreviations that end with a period?
* or more abstract: usable for "?" and "!" as well and maybe have the
option to declare flavor 2 (can't think of an example) or flavor 3?
> To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/MTAwMDAzOS5iZWV6ZXI.1678252751%40quikprotect.

Alex Jordan

unread,
Mar 8, 2023, 1:43:26 AM3/8/23
to prete...@googlegroups.com
Amusing that Gmail broke "Oh Henry!". I should have typed an nbsp.

Rob Beezer

unread,
Mar 8, 2023, 2:49:28 AM3/8/23
to prete...@googlegroups.com
Dear Alex,

You are right, of course, about other languages and a couple other punctuation marks.

Right now I'm really just concerned about periods in "common" abbreviations giving rise to "big" spaces in LaTeX output.

If we could define the end of a sentence, and perhaps markup as such in the pre-processor, then all the other instances could be in a different category. Or something close to that.

David W. Farmer

unread,
Mar 8, 2023, 8:39:20 AM3/8/23
to prete...@googlegroups.com

What I think we are trying to understand is exactly what official PreTeXt
*has* to include.

< sentence>Let's assume PreTeXt acquires a "sentence" element.< /sentence>
< sentence>A sentence element is a better way to handle end-of-sentence punctuation,
right?< /sentence>< sentence>(Apologies if "element" is the wrong term:
you know what I mean.)</ sentence>

Let's assume someone will build a system which allows authors to write
a document which is transformed into official PreTeXt markup. The system
causes the transformed source to be converted to the desired output
format (via the CLI or pretext.py or xsltproc -- it does not matter).
Also assume that the system can (must?) mark the "sentence"s in the
document.

The system does not show the "sentence" tags to the author: those are
in a hidden version that is created from the author's source and then
sent for processing.

Maybe the system also recognizes common abbreviations: it is plausible
that a system which knows what are the sentences can identify common
abbreviations, e.g., etc.

Given all that, if there remains some other abbreviation or circumstance
which needs special treatment, then that needs to be in official PreTeXt.
Otherwise, it just needs to be documented how to use "nbsp" or some
other primitive element to get the correct output.

Is there some difference between converting to LaTeX or HTML where
naive markup (meaning, inserting "nbsp" instead of having a
special tag) is not sufficient for both cases?

This thought experiment can determine if certain tags are required.
Deprecating other tags on the basis of a mythical system is another
matter.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/CA%2BR-jrfySTc%2Bodf8WRMyES0wKXTn--2cfR_p_Wr2vAN7WoT%3DQA%40mail.gmail.com.
>

Sean Fitzpatrick

unread,
Mar 8, 2023, 9:43:19 AM3/8/23
to PreTeXt development
If there were a plan to deprecate abbreviations on favour of an nbsp element, I would hope that the documentation would also gain an author education page for those of us who have never figured out the subtleties of using it (perhaps because we never had to?)

Jason Siefken

unread,
Mar 8, 2023, 10:45:03 AM3/8/23
to prete...@googlegroups.com, David W. Farmer
Since sentences are most common, I would opt for having a <word> tag instead. `word` is a euphemism. What it really means is "there are no sentence boundaries inside this tag". So you could do <word>e.g.</word>, or <word>Oh Henry! Chocolate Bar</word>. It should be possible to directly change this to a LaTeX macro that turns off sentence spacing (though, don't ask me about how to write such a macro right now...).

Rob Beezer

unread,
Mar 8, 2023, 11:46:07 PM3/8/23
to prete...@googlegroups.com
Thanks for all the discussion. I think a lot about locating and marking
sentences (as pre-processing), so Jason's "word" suggestion was new to me. I'll
need to digest that one.

Maybe a corollary to the flavor of David's comments: I think a lot about making
what an author actually types as natural as possible for the "mixed-content"
bits---the actual content of paragraphs, titles, captions, etc. It continues to
drive me nuts that TeX expects authors to routinely write "Mr.\ " or "Ms.\~" We
should do better. <etc/> is not better, nor is "Mr.<nbsp/>". An author should
just write "Brown vs. Board of Education of Topeka" and be done with it.

This discussion is really just about working around a shortcoming of TeX. I
think I can catch the same exceptions we have now, without empty elements, for
LaTeX only, and if I'm right, then produce the exact same LaTeX file we get now.
All without author's knowledge or involvement, with an opportunity for easy
expansion later.

Then the current documentation can become 100% author-education.

Rob
Reply all
Reply to author
Forward
0 new messages