Toolset 4.0.0 released

47 views
Skip to first unread message

Alex Cabal

unread,
Jun 30, 2026, 4:14:38 PM (yesterday) Jun 30
to Standard Ebooks Mailing List
Upgrade using `pipx upgrade standardebooks`.

This version introduces major changes in how we model some metadata in
`content.opf`.

Previously, we used a lot of custom `se:` vocabulary to include various
additional metadata in `content.opf`.

This was because at the time SE started (10-15 years ago), schema.org, a
global registry of semantic keywords and relationships, was not as
developed and didn't include a lot of the vocabulary we were trying to
express in metadata.

Schema.org has advanced a lot since then, and now contains basically all
of the vocabulary we need to express the METADATA semantics we
previously had to express using the `se:` vocabulary. (Note that I'm
only talking about METADATA in `content.opf`, not semantics in ebook
text.) It has also been part of the epub spec since epub 3.1.

Additionally, epub 3.3 allows the use of `<link>` elements in the
metadata to better express direct linking relationships, instead of
`<meta>`.

In SE 4.0.0, we've completed an (almost) total conversion to schema.org
vocabulary for metadata, and epub 3.3-style `<link>`s where appropriate.

Many of these changes are 1:1 relations:

<meta property="se:url.encyclopedia.wikipedia" refines="...">URL</meta>
-> <link href="URL" refines="..." rel="schema:sameAs"/>

<meta property="se:url.homepage" refines="...">URL</meta> -> <link
href="URL" refines="..." rel="schema:url"/>

<meta property="se:name.person.pen-name" refines="...">SOME NAME</meta>
-> <meta property="schema:alternateName" refines="...">SOME NAME</meta>

<meta property="se:name.person.full-name" refines="...">SOME NAME</meta>
-> <meta property="schema:alternateName" refines="...">SOME NAME</meta>

<meta property="se:reading-ease.flesch">NUMBER</meta> -> <meta
property="schema:educationalLevel">NUMBER</meta>

<meta property="se:subject">SUBJECT</meta> -> <meta
property="schema:genre">SUBJECT</meta>

<meta property="se:word-count">NUMBER</meta> -> <meta
property="schema:wordCount">NUMBER</meta>

<meta property="se:is-a-collection">true</meta> -> <meta
property="schema:additionalType">http://schema.org/Collection</meta>

We also change how we model NACOAF URLs:

<meta property="se:url.authority.nacoaf" refines="...">URL</meta> ->
<link href="URL" refines="..." rel="schema:sameAs"/>

Since we're changing the NACOAF property from a URI to a URL link, you
must now use the actual URLs of the LoC names authority pages. This will
be easier for you because you can just copy and paste from your address
bar, when before you had to create a URI out of the URL by changing it
to http and removing the file extension.

Some metadata requires some slightly more complex modeling, like the
GitHub URL:

<meta property="se:url.vcs.github">GITHUB_URL</meta>

becomes:

<meta property="schema:workExample"
id="vcs-repository">schema:workExample</meta>
<meta property="rdf:type"
refines="#vcs-repository">http://schema.org/SoftwareSourceCode</meta>
<link href="GITHUB_URL" refines="#vcs-repository"
rel="schema:codeRepository"/>

And in collections, the `se:title-in-collection`:

<meta property="se:title-in-collection" refines="...">TITLE</meta>

becomes:

<meta property="schema:hasPart" refines="#collection-N"
id="collection-N-entry-1">schema:hasPart</meta>
<meta property="rdf:type"
refines="#collection-N-entry-1">http://schema.org/CreativeWork</meta>
<meta property="schema:name" refines="#collection-N-entry-1">TITLE</meta>

Lastly, we're moving the production notes out of `content.opf`, because
much like the `se-lint-ignore.xml` file, they're production-related
details that don't really belong in the actual ebook distributable.
Therefore, instead of

<meta property="se:production-notes">NOTES</meta>

there is a new `production-notes.md` file in the project root that you
can use for that purpose. If you have no production notes, you must
delete that file as part of the production process. Lint will alert you
of this.

The only SE vocabulary item that remains, and is unchanged, is
`se:long-descrition`. Robin is doing research on some ways we can use
native semantic vocabulary to express the differences between the short
and long descriptions, while still maintaining ereader compatibility.

Lastly, there is no need to update your current projects for this new
schema, I will make the changes on my end when reviewing the ebook. If
you want to make the changes early, this GNU sed script will do most of
it in one line:

sed -i -E "
s|<meta property=\"se:url.homepage\"
refines=\"([^\"]+?)\">([^<]+?)</meta>|<link href=\"\2\" refines=\"\1\"
rel=\"schema:url\"/>|g;
s|<meta property=\"se:subject\">|<meta property=\"schema:genre\">|g;
s|<meta property=\"se:is-a-collection\">true</meta>|<meta
property=\"schema:additionalType\">http://schema.org/Collection</meta>|;
s|<meta property=\"se:word-count\">|<meta property=\"schema:wordCount\">|;
s|<meta property=\"se:reading-ease.flesch\">|<meta
property=\"schema:educationalLevel\">|;
s|<meta property=\"se:url.encyclopedia.wikipedia\"(
refines=\"([^\"]+?)\")?>(.+?)</meta>|<link href=\"\3\"\1
rel=\"schema:sameAs\"/>|g;
s~ property=\"se:(name.person.pen-name|name.person.full-name)\"~
property=\"schema:alternateName\"~g;
s|<meta property=\"se:url.authority.nacoaf\"
refines=\"([^\"]+?)\">([^<]+?)</meta>|<link href=\"\2\" refines=\"\1\"
rel=\"schema:sameAs\"/>|g;
s|<meta property=\"se:url.vcs.github\">([^<]+?)</meta>|<meta
property=\"schema:workExample\"
id=\"vcs-repository\">schema:workExample</meta>\n\t\t<meta
property=\"rdf:type\"
refines=\"#vcs-repository\">http://schema.org/SoftwareSourceCode</meta>\n\t\t<link
href=\"\1\" refines=\"#vcs-repository\" rel=\"schema:codeRepository\"/>|g;
s|\"http://id.loc.gov/authorities/names/([^\"]+?)\"|\"https://id.loc.gov/authorities/names/\1.html\"|g;
s|prefix=\"se: https://standardebooks.org/vocab/1.0\"|prefix=\"se:
https://standardebooks.org/vocab/1.0 rdf:
http://www.w3.org/1999/02/22-rdf-syntax-ns#\"|;
s|<dc:publisher|<meta
property=\"rdf:type\">http://schema.org/Book</meta>\n\t\t<dc:publisher|;
s|<meta property=\"se:title-in-collection\"
refines=\"#collection-([0-9]+)\">([^<]+?)</meta>|<meta
property=\"schema:hasPart\" refines=\"#collection-\1\"
id=\"collection-\1-entry-1\">schema:hasPart</meta>\n\t\t<meta
property=\"rdf:type\"
refines=\"#collection-\1-entry-1\">http://schema.org/CreativeWork</meta>\n\t\t<meta
property=\"schema:name\" refines=\"#collection-\1-entry-1\">\2</meta>|g;
" ./src/epub/content.opf

You still have to move production notes into its own file though.

Full changelog:

# 4.0.0

## General

- Update metadata to new schema.org-based standards

## se build

- Add `schema:version` metadata to built EPUBs with the source commit
and build type

- Remove stray cache pruning notification

- Fix AZW3 JPEG compression behavior for reproducible builds

## se build-toc

- Update ToC templates for ARIA accessibility

## se create-draft

- Create an empty `production-notes.md` file when creating new drafts

- Use canonical HTTPS Library of Congress Name Authority URLs

## se lint

- Confirm that `production-notes.md` exists before testing it

- Update s-019 and x-018 to accept `aria-labelledby` as valid uses of ID
attributes

- Report MusicXML files with incorrect file extensions. Thanks to Robin
Whittleton

- Add a check for missing mcp role (Music copyist) if music found Thanks
to Robin Whittleton

## se recompose-epub

- Use less memory by not canonicalizing XML unless needed

## se shift-endnotes

- Remove the `--amount` option, and make `--increment` and `--decrement`
take an integer argument
Reply all
Reply to author
Forward
0 new messages