Toolset 2.8.0 released, manual 1.8.3 released

127 views
Skip to first unread message

Alex Cabal

unread,
Jul 11, 2025, 10:40:27 AM7/11/25
to Standard Ebooks
Version 2.8.0 of the toolset is now available. Upgrade with `pipx
upgrade standardebooks`.

Version 1.8.3 of the manual is also now available.

Toolset updates include many new lint checks and improvements.

There are two major changes to how ebooks are structured:

1. The SE identifier (the contents of the `<dc:identifier>` element in
`content.opf`) no longer has a leading `url:`. It is now just the plain
SE URL. Thanks to brendanny

2. The colophon now requires `<time>` semantics for years. `se
semanticate` will attempt to add them for you, but due to quirks in how
the HTML `<time>` element works (e.g., no BC years, special attribute
values for years < 1000, etc.) it might not always be possible to do
automatically. Thanks to Robin Whittleton

Full changelog:

# 2.8.0

## General

- Correctly format released/modified timestamps without microseconds

- Colophons now use `<time>` markup. Thanks to Robin Whittleton

- SE identifiers no longer include leading `url:`. Thanks to brendanny

## se build

- Cleans more characters disallowed in filenames, when generating a
filename from `<dc:identifier>`. Thanks to kewlar

## se build-ids

- Remove stray debugging `print()` statement

- Add completions for `--no-endnotes` argument

## se extract-ebook

- Update kindleunpack source. Thanks to Vince Rice

## se lint

- Improve s-074

- Improve m-079

- Improve t-019

- Add y-034, Possible typo, period embedded in word

- Improve m-079

- Add exception to t-007

- Add m-083, title-type element without similar siblings

- Improve s-076. Thanks to Vince Rice

- Improve t-077

- Add s-104, headings should be either title or ordinal, not both.
Thanks to Robin Whittleton

- Improve t-063

- Remove m-066, replace m-008 with general LoC URI check

- Restructure and merge various malformed URL checks

- Add m-066, subject identifiers must be IDs and not URLs

- Add m-084, URL metadata element that is not a complete URL

- Convert some regex-based tests to xpath

- Fix logic in m-064

- Improve m-056 and error case when long description is invalid HTML

- Output matches for s-105

- add `completed` to m-081 check. Thanks to Robin Whittleton

- Add s-106, proper names in colophon must be wrapped in `<a>` or `<b>`

- Add s-107, incorrect string for anonymous contributor

## se modernize-spelling

- Fix broken regex

- Various additions

## se recompose-epub

- Add `--image-files` flag

## se semanticate

- Try to fix incorrectly-formatted attributes before parsing DOM

- Add some items and improve regexes

- Fix regex when replacing inches

- Semanticate colophon, and add `<time>` to colophon years

## se titlecase

- Lowercase `of` if preceded by an initialism

## se typogrify

- Add en dashes to ranges of roman numerals

- Don't typogrify `<dc:identifier>`. Thanks to brendanny

- Improve check for elided words

David

unread,
Jul 12, 2025, 11:53:43 AM7/12/25
to Standard Ebooks
I've just updated before submitting Hardy's Short Fiction for review, and `lint` seems to be throwing this false positive:

Screenshot from 2025-07-12 16-50-41.png
The value in `content.opf` IS the expected `https://github.com/standardebooks/thomas-hardy_short-fiction`, and was not flagged previously. I expect this is a quirk of 2.8.0 lint? I won't create an `se-lint-ignore.xml` file, but will flag this when submitting the review.

Vince

unread,
Jul 12, 2025, 12:28:51 PM7/12/25
to Standard Ebooks
Yep, something with the self-generated URL, it has an extra “standard-ebooks” in it.

> On Jul 12, 2025, at 10:53 AM, David <djre...@gmail.com> wrote:
>
> I've just updated before submitting Hardy's Short Fiction for review, and `lint` seems to be throwing this false positive:
>

Vince

unread,
Jul 12, 2025, 12:46:45 PM7/12/25
to Standard Ebooks
All right, this is because the dc:identifier still has the url: at the front of it, which is no longer used in 2.8. Because it’s there, the class determines it’s not an SE book, and changes the github slug accordingly.

If you want, you can remove the url: from the identifier and things should be OK, but I’m sure Alex has his usual scripts to deal with that sort of thing in the transition, so you shouldn’t have to worry about it.

David

unread,
Jul 16, 2025, 9:26:24 AM7/16/25
to Standard Ebooks
On using `create-draft` with the updated 2.8.0 toolset, I get this:

`content.opf`
<meta property="se:url.homepage" refines="#transcriber-2">https://pgdp.net</meta>

`colophon.xhtml`
...and <a href="https://www.pgdp.net/">Distributed Proofreaders</a><br/>

The two issues:
(1) content.opf doesn't get the trailing slash;
(2) colophon.xhtml has the additional "//www..."

We add the trailing slash, but is short or long (with `www`) URL preferred? Thanks!

Alex Cabal

unread,
Jul 16, 2025, 9:57:43 AM7/16/25
to standar...@googlegroups.com
Thanks, this should be fixed in master.

The url should have www, so https://www.pgdp.net/
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> standardebooks/3edb58d9-5d48-4631-bbab-3635f09c3fb0n%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/3edb58d9-5d48-4631-
> bbab-3635f09c3fb0n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages