se split-file not working as expected

64 views
Skip to first unread message

Scott Voyles

unread,
Jan 7, 2026, 2:58:25 PM (2 days ago) Jan 7
to Standard Ebooks
I'm working through the step-by-step guide but hit a snag with the `se split-file` util - the split comments seem to be added correctly before each "h2" tag for each chapter, but "chapter-1.xhmtl" has no contents and all subsequent chapters contain the previous chapter's content (eg. chapter 1 content lands in "chapter-2.xhtml" and so on).

Are the comments being added correctly?Screenshot 2026-01-07 at 20.58.07.png

Alex Cabal

unread,
Jan 7, 2026, 3:00:23 PM (2 days ago) Jan 7
to standar...@googlegroups.com
se split-file doesn't understand frontmatter, only chapters. So, remove
the frontmatter first, so that chapter 1 is the first section in larger
file.

On 1/7/26 1:58 PM, Scott Voyles wrote:
> I'm working through the step-by-step guide but hit a snag with the `se
> split-file` util - the split comments seem to be added correctly before
> each "h2" tag for each chapter, but "chapter-1.xhmtl" has no contents
> and all subsequent chapters contain the previous chapter's content (eg.
> chapter 1 content lands in "chapter-2.xhtml" and so on).
>
> Are the comments being added correctly?Screenshot 2026-01-07 at 20.58.07.png
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> standardebooks/7d216366-cccd-4da5-b4b2-530e14c22d54n%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/7d216366-cccd-4da5-
> b4b2-530e14c22d54n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Scott Voyles

unread,
Jan 7, 2026, 3:05:22 PM (2 days ago) Jan 7
to Standard Ebooks
Yep, it's literally the first element of the "body"

Screenshot 2026-01-07 at 21.04.42.png

Alex Cabal

unread,
Jan 7, 2026, 3:06:36 PM (2 days ago) Jan 7
to standar...@googlegroups.com
Yes, you're telling it that everything before the first comment is
chapter 1. So, remove everything before the first comment, and the first
comment itself.

On 1/7/26 2:05 PM, Scott Voyles wrote:
> Yep, it's literally the first element of the "body"
>
> <https://groups.google.com/d/msgid/>
> > standardebooks/7d216366-cccd-4da5-
> b4b2-530e14c22d54n%40googlegroups.com <http://40googlegroups.com>
> > <https://groups.google.com/d/msgid/standardebooks/7d216366-
> cccd-4da5- <https://groups.google.com/d/msgid/
> standardebooks/7d216366-cccd-4da5->
> > b4b2-530e14c22d54n%40googlegroups.com?
> utm_medium=email&utm_source=footer <http://40googlegroups.com?
> utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> standardebooks/3c5e8987-432d-493f-955f-5c634dfdfd26n%40googlegroups.com
> <https://groups.google.com/d/msgid/
> standardebooks/3c5e8987-432d-493f-955f-5c634dfdfd26n%40googlegroups.com?
> utm_medium=email&utm_source=footer>.

Scott Voyles

unread,
Jan 7, 2026, 3:07:01 PM (2 days ago) Jan 7
to Standard Ebooks
Or do you mean removing the doctype directives, and also html and body elements?

Scott Voyles

unread,
Jan 8, 2026, 2:02:17 AM (yesterday) Jan 8
to Standard Ebooks
Another question, Alex - there were anchor tags preserved while formatting the headers, should we keep these? I didn't see any mention yet in the style guide.

<h2>
<a id="CHAPTER_XI"><span epub:type="z3998:roman">XI</span></a>
</h2>

Or more as in the styleguide:

      <hgroup>
        <h2 epub:type="ordinal z3998:roman">I</h2>
        <p>My Birth and Childhood.</p>
      </hgroup>

Robin Whittleton

unread,
Jan 8, 2026, 2:55:24 AM (yesterday) Jan 8
to standar...@googlegroups.com
You can dump them. For this example, the final markup would be <h2 epub:type="ordinal z3998:roman">XI</h2>

-Robin

On 8 Jan 2026, at 08:02, Scott Voyles <zeit...@gmail.com> wrote:

Another question, Alex - there were anchor tags preserved while formatting the headers, should we keep these? I didn't see any mention yet in the style guide.
--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/standardebooks/addd6775-ec26-423d-89c9-a47f3e4aff20n%40googlegroups.com.

Scott Voyles

unread,
Jan 8, 2026, 6:44:18 AM (yesterday) Jan 8
to Standard Ebooks
Thanks, Robin. Another question regarding links to page numbers from PG - I assume I can also get rid of all these as well? I find it nice to know this information, do we generate this somehow later?

Here's some example markup to show you what I mean:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-US">
<head>
<title>I: My Birth and Childhood</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">

<hgroup>
<h2 epub:type="ordinal z3998:roman">I</h2>
<p epub:type="title">My Birth and Childhood</p>
</hgroup>
<p epub:type="bridgehead">Earliest Memories.⁠—Born in Maryland.⁠—My Father’s First Appearance.⁠—Attempted Outrage on My Mother.⁠—My Father’s Fight with an Overseer.⁠—One Hundred Stripes and His Ear Cut Off.⁠—Throws Away His Banjo and Becomes Morose.⁠—Sold South.</p>
<p>...</p>
<p>
<span class="pagenum"> <--- this for example
<a id="Page_8">[Pg 8]</a>
</span>
</p>
</section>
</body>
</html>

Scott Voyles

unread,
Jan 8, 2026, 6:58:54 AM (yesterday) Jan 8
to Standard Ebooks
And some questions regarding organization:

1. Is it preferred to keep questions in the original book thread? I feel like the more questions I have, it would be nice to maintain history there.
2. And I would refer to the repo from now on to save from copying/pasting so much, and would like to setup a "standardebooks" repo with individual projects as git submodules, I assume for you guys it doesn't matter?

Robin Whittleton

unread,
Jan 8, 2026, 6:59:09 AM (yesterday) Jan 8
to standar...@googlegroups.com
Yep, all those can be dropped too.

The thing that’s missing from your example is a header element to wrap the heading and bridgehead together. Have a look at the examples in https://standardebooks.org/manual/1.8.5/7-high-level-structural-patterns#7.2.11

-Robin

On 8 Jan 2026, at 12:44, Scott Voyles <zeit...@gmail.com> wrote:

Thanks, Robin. Another question regarding links to page numbers from PG - I assume I can also get rid of all these as well? I find it nice to know this information, do we generate this somehow later?

Alex Cabal

unread,
Jan 8, 2026, 1:31:51 PM (24 hours ago) Jan 8
to standar...@googlegroups.com
Yes, questions about your book should go in your book's thread.

Please create one repo per ebook, don't complicate things with
submodules. Thanks!

On 1/8/26 5:58 AM, Scott Voyles wrote:
> And some questions regarding organization:
>
> 1. Is it preferred to keep questions in the original book thread? I feel
> like the more questions I have, it would be nice to maintain history there.
> 2. And I would refer to the repo from now on to save from copying/
> pasting so much, and would like to setup a "standardebooks" repo with
> individual projects as git submodules, I assume for you guys it doesn't
> matter?
>
> On Thursday, January 8, 2026 at 12:44:18 PM UTC+1 Scott Voyles wrote:
>
> Thanks, Robin. Another question regarding links to page numbers from
> PG - I assume I can also get rid of all these as well? I find it
> nice to know this information, do we generate this somehow later?
>
> Here's some example markup to show you what I mean:
>
> <?xml version="1.0" encoding="utf-8"?>
> <html xmlns="http://www.w3.org/1999/xhtml <http://www.w3.org/1999/
> xhtml>" xmlns:epub="http://www.idpf.org/2007/ops <http://
> www.idpf.org/2007/ops>" epub:prefix="z3998: http://www.daisy.org/
> z3998/2012/vocab/structure/ <http://www.daisy.org/z3998/2012/vocab/
> structure/>, se: https://standardebooks.org/vocab/1.0 <https://
> standardebooks.org/vocab/1.0>" xml:lang="en-US">
> <head>
> <title>I: My Birth and Childhood</title>
> <link href="../css/core.css" rel="stylesheet" type="text/css"/>
> <link href="../css/local.css" rel="stylesheet" type="text/css"/>
> </head>
> <body epub:type="bodymatter z3998:fiction">
> <section id="chapter-1" epub:type="chapter">
>
> <hgroup>
> <h2 epub:type="ordinal z3998:roman">I</h2>
> <p epub:type="title">My Birth and Childhood</p>
> </hgroup>
> <p epub:type="bridgehead">Earliest Memories.⁠—Born in Maryland.⁠—My
> Father’s First Appearance.⁠—Attempted Outrage on My Mother.⁠—My
> Father’s Fight with an Overseer.⁠—One Hundred Stripes and His Ear
> Cut Off.⁠—Throws Away His Banjo and Becomes Morose.⁠—Sold South.</p>
> <p>...</p>
> <p>
> <span class="pagenum"> *<--- this for example*
>> a47f3e4aff20n%40googlegroups.com <https://groups.google.com/d/
>> msgid/standardebooks/addd6775-ec26-423d-89c9-
>> a47f3e4aff20n%40googlegroups.com?
>> utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> standardebooks/512e0100-c259-4757-bbb6-bd0e0eaf252an%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/512e0100-c259-4757-
> bbb6-bd0e0eaf252an%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages