[First Project] The Little Nugget by P.G. Woodehouse

120 views
Skip to first unread message

Andrew Rice

unread,
Jan 17, 2022, 2:37:59 PM1/17/22
to Standard Ebooks
Hello,

New to Standard Ebooks.  Trying to get my feet wet on my first project.  Would like to do The Little Nugget by P.G. Wodehouse.  https://www.gutenberg.org/ebooks/6683

-Andrew

Andrew Rice

unread,
Jan 17, 2022, 10:02:15 PM1/17/22
to Standard Ebooks

Alex Cabal

unread,
Jan 17, 2022, 10:07:10 PM1/17/22
to standar...@googlegroups.com
Great, that one would be a good start.

This won't be the easiest first production but you can take a swing at it.

It's in British-style quotation so you'll have to run `se
british2american`. This script makes a lot of errors so you'll have to
pay careful attention to the direction of quotation marks when proofreading.

There are nested sections here too. See the manual for how to nest
sections, you will have to include stubs of the parent sections.

I also see a few letters--check the manual for styling.

Make sure to read the Standard Ebooks Manual of Style before starting,
as you won't know what to fix if you haven't read the standards. In
particular, please closely review the semantics, high level patterns,
and typography sections:

https://standardebooks.org/manual

https://standardebooks.org/manual/latest/4-semantics

https://standardebooks.org/manual/latest/7-high-level-structural-patterns

https://standardebooks.org/manual/latest/8-typography

The step by step guide will take you from start to finish:

https://standardebooks.org/contribute/producing-an-ebook-step-by-step

Please email often if you have any questions at all. Our standards are
well-established so there is probably already a standard for formatting
whatever problem you've encountered.

When you're ready, email back with a link to your Github repository so
that I can mark you as having started.

Have fun! :)
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/d56c88e5-6202-41ca-8ca4-eaaa5d5ddbcfn%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/d56c88e5-6202-41ca-8ca4-eaaa5d5ddbcfn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Andrew Rice

unread,
Jan 17, 2022, 10:53:46 PM1/17/22
to Standard Ebooks
Right out of the gate I noticed every <p> tag had a unique incrementing identified i.e. <p id="id00073">.  I can easily strip these using a regex search and replace.  Should I be doing this?

Weijia Cheng

unread,
Jan 17, 2022, 10:56:53 PM1/17/22
to Standard Ebooks
Yes, we don't keep those sorts of internal ids on tags from PG since our book will have an entirely different internal structure.

Andrew Rice

unread,
Jan 24, 2022, 10:36:59 PM1/24/22
to Standard Ebooks
Sorry it has taken me so long to get it setup,  but my git hub repository is here https://github.com/AndrewLRice/The_Little_Nugget

Alex Cabal

unread,
Jan 24, 2022, 10:38:27 PM1/24/22
to standar...@googlegroups.com

Andrew Rice

unread,
Jan 25, 2022, 11:55:52 AM1/25/22
to Standard Ebooks
Okay I have to cry for help here.  From what I can tell this book has two parts.  The first part has only one un-numbered chapter which has two sections.  The second part has multiple numbered chapters most of which has subsections.  How to I handle this?  Also the second part has a name, but in the printed version of the book the part name appears first page of text and not the part divider page. 

-Andrew

B Keith

unread,
Jan 25, 2022, 12:38:53 PM1/25/22
to Standard Ebooks
We just redid how we handle divisions (https://standardebooks.org/manual/1.6.3/single-page#7.1) but your case is a bit interesting. I am not sure how to handle the missing “chapter” in part one. I would be tempted to include it with no content.

 Anyway, Alex will chime in, but this should get you started and you can correct it later.


<section id="book-1" epub:type="division">
<h2>
<span epub:type="label">Part</span>
<span epub:type="ordinal z3998:roman">I</span>
</h2>
</section>

<section data-parent="book-1" id="chapter-1-1" epub:type="chapter">
</section>

<section data-parent="book-1-1" id="chapter-1-1-1" epub:type="part">
<h4>
<span epub:type="ordinal z3998:roman">I</span>
</h4>
<p>...</p>
<p>...</p>
</section>
<section data-parent="part-1-1" id="chapter-1-1-2" epub:type="part">
<h4>
<span epub:type="ordinal z3998:roman">II</span>
</h4>
<p>...</p>
<p>...</p>
</section>


<section id="book-2" epub:type="division">
<h2>
<span epub:type="label">Part</span>
<span epub:type="ordinal z3998:roman">I</span>
</h2>
</section>

<section data-parent="book-2" id="chapter-2-1" epub:type="chapter">
<h3>
<span epub:type="label">Chapter</span>
<span epub:type="ordinal z3998:roman">II</span>
</h3>
</section>

<section data-parent="book-2-1" id="chapter-2-1-1" epub:type="part">
<h4>
<span epub:type="ordinal z3998:roman">I</span>
</h4>
<p>...</p>
<p>...</p>
</section>


To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/e5eb5417-874b-45fe-866c-1c3eb695fae9n%40googlegroups.com.

Alex Cabal

unread,
Jan 25, 2022, 1:04:32 PM1/25/22
to standar...@googlegroups.com
The divisions in part 1 are indeed chapters. Then part 2 has chapters,
which have subchapters.

Wodehouse is tricky because editions of his work are all wildly
different. For example these scans don't even have part divisions, and
appear to number the chapters sequentially to 33:
https://books.google.com/books?id=zXc4AAAAYAAJ

This is how you would structure it according to PG's edition:

<section id="part-1" epub:type="part">
<h2>
<span epub:type="label">Part</span>
<span epub:type="ordinal z3998:roman">I</span>
</h2>
</section>

<section data-parent="part-1" id="chapter-1-1" epub:type="chapter">
<h3 epub:type="ordinal z3998:roman">I</h3>
...
</section>

...

<section data-parent="part-2" id="chapter-2-1" epub:type="chapter">
<h3 epub:type="ordinal z3998:roman">I</h3>
<section id="chapter-2-1-1" epub:type="z3998:subchapter">
<h4 epub:type="ordinal z3998:roman">I</h4>
...
</section>
</section>

Bruce, when you did the Wodehouse research did you look at editions of
the novels? Because it might be worth deciding which edition to look at
page scans for, since they clearly differ wildly in this case.

On 1/25/22 11:38 AM, B Keith wrote:
> We just redid how we handle divisions
> (https://standardebooks.org/manual/1.6.3/single-page#7.1
> <https://standardebooks.org/manual/1.6.3/single-page#7.1>) but your case
>> <https://groups.google.com/d/msgid/standardebooks/4de5bdc7-afd0-49bf-a1bf-8240b86c8767n%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://groups.google.com/d/msgid/standardebooks/4de5bdc7-afd0-49bf-a1bf-8240b86c8767n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Standard Ebooks" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to standardebook...@googlegroups.com
>> <mailto:standardebook...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/standardebooks/e5eb5417-874b-45fe-866c-1c3eb695fae9n%40googlegroups.com
>> <https://groups.google.com/d/msgid/standardebooks/e5eb5417-874b-45fe-866c-1c3eb695fae9n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/3E6B1C1D-0FE4-40F0-A5C8-A25CABD524BE%40gmail.com
> <https://groups.google.com/d/msgid/standardebooks/3E6B1C1D-0FE4-40F0-A5C8-A25CABD524BE%40gmail.com?utm_medium=email&utm_source=footer>.

B Keith

unread,
Jan 25, 2022, 2:07:21 PM1/25/22
to Standard Ebooks
I didn't do any novels… just short stories and the variations between Strand, Saturday Evening Post, books scans I could find, and PG could be wildly different. I generally gave the book-form precedence but not all of them were the same either. Fortunately with the short stories the actual formatting doesn’t usually affect things greatly

> Bruce, when you did the Wodehouse research did you look at editions of the novels? Because it might be worth deciding which edition to look at page scans for, since they clearly differ wildly in this case.

> On Jan 25, 2022, at 11:04 AM, Alex Cabal <al...@standardebooks.org> wrote:
>
> The divisions in part 1 are indeed chapters. Then part 2 has chapters, which have subchapters.
>
> Wodehouse is tricky because editions of his work are all wildly different. For example these scans don't even have part divisions, and appear to number the chapters sequentially to 33: https://books.google.com/books?id=zXc4AAAAYAAJ
>
> This is how you would structure it according to PG's edition:
>
> <section id="part-1" epub:type="part">
> <h2>
> <span epub:type="label">Part</span>
> <span epub:type="ordinal z3998:roman">I</span>
> </h2>
> </section>
>
> <section data-parent="part-1" id="chapter-1-1" epub:type="chapter">
> <h3 epub:type="ordinal z3998:roman">I</h3>
> ...
> </section>
>
> ...
>
> <section data-parent="part-2" id="chapter-2-1" epub:type="chapter">
> <h3 epub:type="ordinal z3998:roman">I</h3>
> <section id="chapter-2-1-1" epub:type="z3998:subchapter">
> <h4 epub:type="ordinal z3998:roman">I</h4>
> ...
> </section>
> </section>
>
>
> You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/9df1099f-eb24-b7c4-09e6-3e41bab16c72%40standardebooks.org.

Andrew Rice

unread,
Jan 27, 2022, 6:03:29 PM1/27/22
to standar...@googlegroups.com
Alex,

Since the parts in the first section are to be considered chapter I assume I should break them up into separate files as with the rest of the chapters?

-Andrew

--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.


--
Not all that's gold glitters, not all that wander are lost. -J.R.R. Tolkien

Alex Cabal

unread,
Jan 27, 2022, 6:43:15 PM1/27/22
to standar...@googlegroups.com
Right. They're going to look identical in structure to the chapters in
part 2.

On 1/27/22 5:03 PM, Andrew Rice wrote:
> Alex,
>
> Since the parts in the first section are to be considered chapter I
> assume I should break them up into separate files as with the rest of
> the chapters?
>
> -Andrew
>
> On Tue, Jan 25, 2022 at 2:07 PM B Keith <bois...@gmail.com
> <mailto:bois...@gmail.com>> wrote:
>
> I didn't  do any novels… just short stories and the variations
> between Strand, Saturday Evening Post, books scans I could find, and
> PG could be wildly different. I generally gave the book-form
> precedence but not all of them were the same either. Fortunately
> with the short stories the actual formatting doesn’t usually affect
> things greatly
>
> > Bruce, when you did the Wodehouse research did you look at
> editions of the novels? Because it might be worth deciding which
> edition to look at page scans for, since they clearly differ wildly
> in this case.
>
> > On Jan 25, 2022, at 11:04 AM, Alex Cabal <al...@standardebooks.org
> <mailto:standardebooks%2Bunsu...@googlegroups.com>.
> <https://groups.google.com/d/msgid/standardebooks/9df1099f-eb24-b7c4-09e6-3e41bab16c72%40standardebooks.org>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to standardebook...@googlegroups.com
> <mailto:standardebooks%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/6DFD6080-0056-40FB-BAA8-8346C70B4688%40gmail.com
> <https://groups.google.com/d/msgid/standardebooks/6DFD6080-0056-40FB-BAA8-8346C70B4688%40gmail.com>.
>
>
>
> --
> Not all that's gold glitters, not all that wander are lost. -J.R.R. Tolkien
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/CAOUGe8nMuD0H8%2B%3DubXLxvVBrpb%2B8XzUS6xfwyDO%2BQrQOpXVb-w%40mail.gmail.com
> <https://groups.google.com/d/msgid/standardebooks/CAOUGe8nMuD0H8%2B%3DubXLxvVBrpb%2B8XzUS6xfwyDO%2BQrQOpXVb-w%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Andrew Rice

unread,
Apr 9, 2022, 9:52:59 PM4/9/22
to Standard Ebooks
Hello,

Sorry for the long delay in all of this.  As I mentioned to Alex we recently had a baby and that has been taking up quite a bit of my time.  Back on the subject of this book, it seems as if the page scans I am using and the PG version of the book do not agree on how the book is divided.  The page scan version splits the book in to individual chapters while the PG version has two parts each with a subset of chapters and many of the chapters have their own subsections.  If I am sticking to the chapters with subsections formatting am I correct in my reading of the Manual of Style that these subsections are not broken off into their own xhtml files?

-Andrew

Alex Cabal

unread,
Apr 10, 2022, 9:51:10 AM4/10/22
to standar...@googlegroups.com

B Keith

unread,
Apr 10, 2022, 11:13:02 AM4/10/22
to Standard Ebooks
I’m not sure what the issue is.

The scans Andrew mentioned (https://babel.hathitrust.org/cgi/pt?id=umn.31951002001434n&view=1up&seq=11&skin=2021) have parts, The Penguin Edition has parts (https://archive.org/details/littlenugget00wode/page/n7/mode/2up) and the GB version has parts. Its possible the Google book version is slightly different but I am geoblocked out of most of it.

The Part 1 “intro" has subsections and then Part 2 has chapters start and have sub sections as well.

Part I
section 1
section 2
Part  II
Chapter 1
section 1
section 2 etc.

So I think that is a “title page" for each part, one file for part I and twelve files (chapters) for part II

B

To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/81222c11-5807-a7e1-7cf3-e8b7139dfa31%40standardebooks.org.

B Keith

unread,
Apr 10, 2022, 11:15:00 AM4/10/22
to Standard Ebooks
Oops, 18 chapters in part II :-)

Andrew Rice

unread,
Apr 10, 2022, 11:17:30 AM4/10/22
to standar...@googlegroups.com
Hey this helps a lot! I was using a different version of the scan this one matches with PG so I will use it. Thank you.  

-Andrew 

Reply all
Reply to author
Forward
0 new messages