[Meta] TOC, en-US and en-GB (again?)

83 views
Skip to first unread message

David

unread,
Nov 26, 2025, 8:25:14 AM (10 days ago) Nov 26
to Standard Ebooks
I know that the "Producing" guide specifies:

[quote]
Note that we *don’t* change the language for the metadata or boilerplate files, like `colophon.xhtml`, `imprint.xhtml`, `toc.xhtml`, or `titlepage.xhtml`. Those must always be in American spelling, so they’ll always have the en-US language tag.
[/quote]

I'm wondering about `toc.xhtml`: there are, of course, en-GB books with chapter titles in distinctive en-GB spellings, spelled as such in the ToC: e.g., example 1; example 2; example 3. I'm guessing there are more.

How to square this with the notion that these "must always be in American spelling"? 

(This could conceivably affect other meta-files, but I think Shakespeare's *Love's Labour's Lost* is the only *title* in the corpus with an en-GB spelling. I didn't look hard, though!)

I'm fairly confident this must have been discussed before but I can't find that discussion if it did take place, although I did find my related question about `content.opf` from back in April. (And/or Google [oh the irony] Groups' poor search tool is letting me down....)

David / Fife, UK

Alex Cabal

unread,
Nov 29, 2025, 3:07:45 PM (7 days ago) Nov 29
to standar...@googlegroups.com
I think this and your question re. honorable in a book title are related.

We could update build-toc to use content.opf's language when creating
the ToC. In most cases I think that would be fine since the only word in
the ToC that isn't from the book text is "Landmarks".

But then we get to the question of the title. I would prefer titles to
be en-US because our website is in en-US, and I feel like the title is
metadata and not *really* part of the text per se. (Though it can appear
in the titlepage, half title, etc.)

Historically, books that were published in both spellings usually had
en-US spelling for the US edition (or often an entirely new title
altogether). However I don't have an example of this from our corpus off
the top of my head.

Of course, if a book was published in the US with an en-US title, then
its body text was also updated to be en-US - but that's not something
we're going to do for our ebooks.

So, I think the approach here would be to use en-US for titles, and
update build-toc to use the content.opf language in the ToC; and just
ignore the small minority of cases where the title spelling doesn't
match the ebook's declared spelling.

This solves the problem for the vast majority of books whose title isn't
affected by spelling. For a very small percentage of books, there will
be an inconsistency with the title spelling and its internal text
spelling, as well as the declared language of the ToC and how the title
appears in the ToC. However, this is no worse than the current state of
affairs, in which the ToC is often inconsistent anyway; very few people
will notice the rare case of a title spelling being inconsistent with
internal spelling; and we can add the alternate spelling in the metadata
for searching.

On 11/26/25 7:25 AM, David wrote:
> I know that the "Producing" guide specifies:
>
> [quote]
> Note that we *don’t* change the language for the metadata or boilerplate
> files, like `colophon.xhtml`, `imprint.xhtml`, `toc.xhtml`, or
> `titlepage.xhtml`. Those must always be in American spelling, so they’ll
> always have the en-US language tag.
> [/quote]
>
> I'm wondering about `toc.xhtml`: there are, of course, en-GB books with
> chapter titles in distinctive en-GB spellings, spelled as such in the
> ToC: e.g., example 1 <https://github.com/standardebooks/l-m-
> montgomery_anne-of-avonlea/blob/
> b30a616d3042689bdc20ece7e9e3af7da5942cd1/src/epub/toc.xhtml#L50>;
> example 2 <https://github.com/standardebooks/edwin-a-abbott_flatland/
> blob/9e7c4503b7374441285b94e88435493fb0ff9a34/src/epub/toc.xhtml#L53>;
> example 3 <https://github.com/standardebooks/anthony-hope_the-prisoner-
> of-zenda/blob/b9ef70f5419262f5fc53634a4abae69a529afae5/src/epub/
> toc.xhtml#L20>. I'm guessing there are more.
>
> How to square this with the notion that these "must always be in
> American spelling"?
>
> (This could conceivably affect other meta-files, but I think
> Shakespeare's *Love's Labour's Lost* <https://standardebooks.org/ebooks/
> william-shakespeare/loves-labours-lost> is the only *title* in the
> corpus with an en-GB spelling. I didn't look hard, though!)
>
> I'm fairly confident this must have been discussed before but I can't
> find that discussion if it did take place, although I did find my
> related question about `content.opf` from back in April. (And/or Google
> [oh the irony] Groups' poor search tool is letting me down....)
>
> David / Fife, UK
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> standardebooks/94be31b0-a67a-47a5-aac9-7707d584d3d4n%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/94be31b0-a67a-47a5-
> aac9-7707d584d3d4n%40googlegroups.com?utm_medium=email&utm_source=footer>.

David

unread,
Nov 30, 2025, 5:29:35 AM (6 days ago) Nov 30
to Standard Ebooks
Thanks for such a full and thoughtful response, Alex. The main take-away ("use en-US for titles, and update build-toc to use the content.opf language in the ToC") makes sense to me.

As it happens, I know some real-world cases where US publishers used en-GB for the title and en-US for the content but ... these are definitely edge cases! And, as you note, this affects only a small number of titles in the corpus in any case.

Alexander Keane

unread,
Nov 30, 2025, 3:17:52 PM (6 days ago) Nov 30
to standar...@googlegroups.com
My last five books worked on didn't come straight from Gutenberg projects, so may different from those where body.xhtml is directly imported using the tool, but per the step by step, content.opf gets completed after the ToC is built. So there might not be a value in content.opf yet when build-toc is run.

Is toc.xhtml one of the files that lint checks against content.opf for a match?

--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/standardebooks/b01b6168-b34a-486b-92db-0ab8eb9e3ba8n%40googlegroups.com.

Erin

unread,
Nov 30, 2025, 3:42:19 PM (6 days ago) Nov 30
to standar...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages