Scrambles amongst the Alps in the years 1860-69 by Edward Whymper

475 views
Skip to first unread message

Gijs van Tulder

unread,
Aug 12, 2024, 7:00:25 PM8/12/24
to standar...@googlegroups.com
Hello,

Let me try to propose this again, now that the Tarzan book is done. :)

As I wrote earlier, I would like to produce a version of "Scrambles
amongst the Alps in the years 1860-69" by Edward Whymper. In this book
the British author/illustrator recounts the story of his ascent of the
Matterhorn and his other explorations of the Alps.

It's obviously more complicated than a Tarzan novel, so would this be a
suitable next project?


Since my previous email, I've done some more research.

Based on Alex's earlier response, I think the fifth edition (August
1900) would be the best version to use for this. The text is fairly
similar to the first edition, but it contains some corrections
(spelling, updated mountain heights) and other small revisions, plus a
new preface and appendix on later history of the Matterhorn. It seems to
be the most complete version by the original author.

There are some good fifth edition scans:
https://archive.org/details/scramblesamongst0000unse_l2h0
https://archive.org/details/scramblesamongst00whymuoft


The next edition was published in 1936, but this sixth edition was made
after the author's death, with revisions and introduction by someone
else, and is probably too new to be public domain anyway.

Sixth edition reprint from 1948:
https://archive.org/details/dli.ernet.13808


Transcriptions: I have not found a transcription of the fifth edition.
However, there are other usable sources:

First edition on Wikisource:
https://en.wikisource.org/wiki/Scrambles_amongst_the_Alps
based on https://archive.org/details/scramblesamongst00whym

An early edition on Project Gutenberg:
https://www.gutenberg.org/ebooks/41234
I'm not entirely sure which edition this is based on: it doesn't match
the scans of the first edition, and it even seems to miss entire parts.

Third edition on Project Gutenberg:
https://www.gutenberg.org/ebooks/38044
The third edition is a bit special: it has the title "The Ascent of the
Matterhorn" and it is an abridged version. The excluded material was
later restored in the fourth and fifth editions.

I have not been able to find a transcription of the fifth edition.
However, the Internet Archive has a pretty usable OCR text for one of
their two scanned versions:
https://archive.org/details/scramblesamongst0000unse_l2h0


I've been playing with these four sources to see if I could match the
transcription to the scans of the fifth edition:

1. Using the Wikisource transcription as the starting point, because
this matches the fifth edition fairly well.
2. Comparing against the early PG transcription and the PG third edition
to find errors in the Wikisource transcription.
3. Using the PG transcription of the third edition's appendix for the
section that was added since the first edition.
4. Comparing against the Internet Archive's OCR of the fifth edition.
The OCR isn't 100% correct, of course, but a diff against the first
edition's transcription does highlight the changes in the text.

Following these steps, I now have a fairly accurate transcription the
scans of the fifth edition. I could upload this somewhere else, but I
think it would make a good starting point for an SE version of the book,
if this would fit in the collection.


Looking forward to your comments!

Gijs

Alex Cabal

unread,
Aug 14, 2024, 1:35:17 PM8/14/24
to standar...@googlegroups.com
OK. This will be a very, very difficult production, especially since
you're working on raw OCR. Proofing raw OCR is much more difficult than
it appears. But since you seem set on doing this, then go for it.

This book has lots of illustrations. Usually we would remove these.
However since it's nonfiction, and many of the illustrations are
functional and not decorative, I think we can simply keep all of them.

It looks like they're all woodcuts/etchings, however I don't think all
of them are suitable for conversion to SVG. However many of them *are*
suitable. For example the figures on p. 92, 274, 276-277, etc.

Some simpler portraits, like p. 78, would also be suitable for SVG
conversion.

However, the more "picture-like" the illustration, the less suitable it
is for SVG and we can simply include it as color-adjusted PNG. For
example the ones on p. 60 and 77.

We have a guide on SVGs that might be helpful:
https://standardebooks.org/contribute/how-tos/how-to-create-svgs-from-maps-with-several-colors

The table in the appendix will also be very complicated. This W3C
publication might be helpful to ensure you're getting the semantics
right: https://www.w3.org/WAI/tutorials/tables/

Please send a link to your repo once you get started.

Alex Cabal

unread,
Aug 14, 2024, 1:37:03 PM8/14/24
to standar...@googlegroups.com
Vince and David, can I assign you both as respective manager and
reviewer again for this one?

Vince

unread,
Aug 14, 2024, 1:39:39 PM8/14/24
to standar...@googlegroups.com
Will do.

David

unread,
Aug 14, 2024, 3:12:56 PM8/14/24
to Standard Ebooks
That's me in for the review. :)

David / Fife, UK

Gijs van Tulder

unread,
Aug 14, 2024, 6:01:07 PM8/14/24
to standar...@googlegroups.com
On 14-08-2024 19:35, Alex Cabal wrote:
> OK. [...] go for it.
> Please send a link to your repo once you get started.

Great! It's somewhat complex, but it's a nice book.

GitHub repository:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69

Target scans:
https://archive.org/details/scramblesamongst0000unse_l2h0


The repository currently contains a rough version of the fifth edition
text. I will have more questions and points for discussion later, but to
start:


* Separate commits for semantics?

While comparing the text against the fifth edition, I've also started
adding additional markup (<time>, epub:type, xml:lang etc.), because it
seemed easier to do that at the same time.

The text should match the scanned page -- I didn't add or remove any
italics or introduce other changes -- so it should be quite easy to
create a text+appearance-only version by removing the <span>s and extra
attributes. Is this useful to have as separate commits?


* Spelling of place names

The book contains many place names, but not all of them follow current
spelling. For example, "Val Tournanche" is currently written as
"Valtournenche".

Proposed approach: keep the spelling from the book, but make it
consistent (e.g., "Val Tournanche" is most common, but there are a few
references to "Valtournanche" that I plan to change).

Perhaps some exceptions where the book is clearly wrong should be
changed. For example, the book consistently writes "Rhone" instead of
"Rhône", and I think it would be reasonable to change that.


* What to do with references to page numbers?

There are quite a few cross-references in the book. What to do with those?

Some are easily removed with a direct link to an image or chapter:
"See the engraving "Crags of the Matterhorn" facing p. 162"
-> Proposed solution: remove the page number and directly link the title
the illustration.

Some are part of sentences:
"In the preliminary remarks at pp. 129-134"
-> Could change to a reference to a chapter, linked to the correct
paragraph.

Some are inside parentheses:
"the glacier of the same name (p. 226)"
-> Perhaps also change to a chapter reference?

Some refer to full-page images:
"in the illustrations facing pp. 106 and 114"
-> Change text to include illustration titles?


* Table of contents/chapter titles

The Table of Contents of the printed book contains a summary of each
chapter:

https://archive.org/details/scramblesamongst0000unse_l2h0/page/n22/mode/1up

These could be added as bridgeheads if that's interesting, but that
might be a bit much: there is already an epigraph to start each chapter.
I'm planning to leave the leave the ToC descriptions out.

In some cases the chapter title in the table of contents doesn't exactly
match that on the chapter page. I'm planning to just follow the titles
from the chapter pages.

Vince

unread,
Aug 14, 2024, 6:13:34 PM8/14/24
to Standard Ebooks
* Separate commits for semantics?

While comparing the text against the fifth edition, I've also started adding additional markup (<time>, epub:type, xml:lang etc.), because it seemed easier to do that at the same time.

The text should match the scanned page -- I didn't add or remove any italics or introduce other changes -- so it should be quite easy to create a text+appearance-only version by removing the <span>s and extra attributes. Is this useful to have as separate commits?

As mentioned in the big yellow box at the top of the Step by Step guide, and a few other times throughout, please keep commits to a single unit of work. Changing the text and adding semantics are unrelated things, i.e. at least two units of work.

* Spelling of place names

The book contains many place names, but not all of them follow current spelling. For example, "Val Tournanche" is currently written as "Valtournenche".

Proposed approach: keep the spelling from the book, but make it consistent (e.g., "Val Tournanche" is most common, but there are a few references to "Valtournanche" that I plan to change).

Perhaps some exceptions where the book is clearly wrong should be changed. For example, the book consistently writes "Rhone" instead of "Rhône", and I think it would be reasonable to change that.

As with other modernize spelling, if the modernizations are the same pronunciation, then they’re generally OK. Consistency within the book is also good, unless it’s done intentionally (rare, but it happens). All should be [Editorial], of course.


* What to do with references to page numbers?

There are quite a few cross-references in the book. What to do with those?

They should be converted to standard links, typically with wording similar to “See here", with references to the appropriate location; generally it’s a paragraph with an ID, but could be an illustration id, etc. See SEMoS 5.1.2.


* Table of contents/chapter titles

The Table of Contents of the printed book contains a summary of each chapter:

https://archive.org/details/scramblesamongst0000unse_l2h0/page/n22/mode/1up

These could be added as bridgeheads if that's interesting, but that might be a bit much: there is already an epigraph to start each chapter. I'm planning to leave the leave the ToC descriptions out.

In some cases the chapter title in the table of contents doesn't exactly match that on the chapter page. I'm planning to just follow the titles from the chapter pages.

Correct on both. We generally only include bridgeheads if they’re on the chapters themselves, and we would take the actual chapter titles as canon.

Gijs van Tulder

unread,
Aug 17, 2024, 4:50:26 PM8/17/24
to standar...@googlegroups.com
Hi. I have two questions on words in foreign languages (in this book
often French, sometimes German): when should they be marked with an
xml:lang attribute?

The italics rule is fairly clear: if it appears in MW, it shouldn't be
in italics (with a few exceptions for technical terms etc.).


But when should non-italicized words be marked with xml:lang?

SEMoS 5.3.1 says:
"When words are required to be pronounced in a language other than
English, the xml:lang attribute is used to indicate the IETF language
tag in use."

but how far does this go? Should I liberally mark everything with a
French origin that sounds "French enough"? Or should the rule be the
same as for italics: if it's in MW, it's English enough?

I have this question in two forms: for names (persons/places) and for
normal words.



Q1. For names of persons and places, do I mark every French/German name
with the correct language, or does every name require a decision on how
likely it is to be mispronounced?

My suggestion would be to mark everything, because otherwise it quickly
becomes very subjective. E.g., "La Bérarde" doesn't appear in MW and
should be marked as French, but is it obvious how to pronounce "Rhône"
or "Nîmes" in English because they do appear in MW?

Subquestion: There are some rare special cases where French/German is
mixed with English: e.g., "Valtournanchians", which starts with French
"Valtournanche" but ends in English. How to mark those?



Q2. For normal words: when does a word or phrase require "to be
pronounced in a language other than English"?

See below for a list of words that looked French enough to mark them as
French in my first pass, but which do appear in the online MW dictionary.

* Some words seem quite common in English, such as "bureau" or "chalet"
(which has even lost its French â in MW), while others such as
"aiguille", "coup de théâtre", "en fête", "rognon", or "père" look quite
French to me.

* Some words have a different meaning if pronounced in French ("denier"
and "sol", both for coins, or "diligence" for the stagecoach). Some
words are in the dictionary, but with a different meaning ("cabane",
"voyageur").

What to do with all these French-sounding words? As above, I'd say it's
easier to use xml:lang for almost everything with a French
pronunciation, even if it appears in the dictionary.

Or is there some other rule-of-thumb for this?


Thanks!



French words that appear in MW:

abbé
aiguille
arête
beau ideal
bravos
bureau
cabane (in MW, but with a different meaning; here: cabin/hut)
cabaret
café noir
carte blanche
centime
chalet (châlet in French)
chamois
chasseur
cirque
clientele (clientèle in French)
contretemps
coquette
cordon
couloir
coup de théâtre
crevasse
curé
denier (the coin, so not the standard pronunciation)
diligence (the stagecoach)
éboulement
éclat
en fête
en masse
en route
esprit de corps
fils
massifs
messieurs (marked as xml:lang=fr in some SE books)
monsieur (marked as xml:lang=fr in some SE books)
moraine
névé
père (the combination "père et fils" is not in MW)
physique
plateau
reglement (règlement in French)
résumé
rognon
salle à manger
serac (sérac in French)
sou
sol (the coin)
table d'hôte
toises
voyageur (in MW, but with a different meaning)



Some cases in German:
alpenstock
bergschrund
schrund

Vince

unread,
Aug 17, 2024, 5:24:57 PM8/17/24
to Standard Ebooks
If they’re in M-W, they generally don’t need a tag of any kind. Many of the words may have originated in other languages, e.g. père, but they’ve been used in English long enough that they’re in the dictionary and therefore “English” enough.

We use spans when the word isn’t in M-W, and it’s obvious it wouldn’t be pronounced correctly in English. Proper names don’t usually qualify, because we generally know how to pronounce those, even when they’re in another language.

In short, you definitely should not tag everything, and, given your examples, you shouldn’t tag much of anything.

(SEMoS rules should be followed, regardless of the corpus. E.g., “monsieur” is in M-W (and is an extremely common word), and therefore shouldn’t be tagged. The fact it has instances that are in the corpus could stand to be corrected, but are irrelevant since SEMoS has a clear rule for it.)


--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/fbc69bc1-7a87-4225-8695-df4d59972d2d%40gmail.com.

Gijs van Tulder

unread,
Aug 23, 2024, 6:16:05 PM8/23/24
to standar...@googlegroups.com
Hello,

This time I have some table questions. There are several tables in the
book, some more complex than others. I've converted them to XHTML and
would appreciated some feedback.

For each table, I've included a link to the code on Github and to an
online demo. See the details below. (Fairly long, sorry.)

About the code and demo:
* There may be more classes and IDs than are strictly necessary. This
can be cleaned up later, but I thought it would be easier to focus on
the structure of the tables first.
* The images and image placement also need more work. They're here as
placeholders.

Thanks!

Gijs



Chapter 3
=========
Page 55
-------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/55/mode/1up

This is a pseudo-table or a formatted paragraph. To make the words line
up nicely, I've created two versions.


Version 1
.........
Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/text/chapter-3.xhtml#L66-L116
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/css/local.css#L177-L229
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-11-v1/html/epub/text/chapter-3.xhtml#chapter-3-table-1

A fairly straightforward version that splits up each line in multiple
columns. No fancy CSS.

Downside: doesn't line up as nicely, and the semantics of the split
sentences are a bit awkward.

Version 2
.........
Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-alt/src/epub/text/chapter-3.xhtml#L66-L114
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-alt/src/epub/css/local.css#L177-L232
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-11-alt/html/epub/text/chapter-3.xhtml#chapter-3-table-1

A fancier alternative using <span class="ditto"> element to hide
repeated text and show the ditto marks.

This uses more complex CSS, but I suspect it would degrade nicely: worst
case, you don't see the ditto marks, and either see the repeated text or
nothing at all.

Semantically, the table structure is a bit simpler. I prefer this
version, but ereader support might be difficult?



Page 65
-------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/65/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/text/chapter-3.xhtml#L150-L199
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/css/local.css#L231-L260
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-11-v1/html/epub/text/chapter-3.xhtml#chapter-3-table-2

Not too complicated. This could also be a single table with four
columns, but semantically I think two tables makes more sense.



Page 66
-------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/66/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/text/endnotes.xhtml#L168-L315
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/css/local.css#L385-L417
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-notes/html/epub/text/endnotes.xhtml#note-51

These are two separate tables, I would say (manpower and horse-power).
Inside each table, I used a <tbody> with a <th scope="rowgroup"> for
each category.



Page 67
-------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/67/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/text/endnotes.xhtml#L322-L451
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/css/local.css#L419-L445
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-notes/html/epub/text/endnotes.xhtml#note-52

Fairly straightforward, I think.

Difficult bit: the missing decimals on the first line. I've used
::before CSS to make the text line up, but maybe it's easier to just add
the zeros?



Page 68
-------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/68/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/text/endnotes.xhtml#L458-L627
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-notes/src/epub/css/local.css#L447-L492
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-notes/html/epub/text/endnotes.xhtml#note-53

A somewhat complicated table. How about this?

Not sure if the brackets are needed, and if so, how best to do that. An
SVG might work? It's difficult to line that up.

(One thing to fix: the ditto marks in the middle don't line up.)



Chapter 11
==========
Page 223
--------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/223/mode/1up

A table with a lot of ditto marks. Similar to the first table in Chapter
3, I've created two versions.

Version 1
.........
Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/text/chapter-11.xhtml#L32-L202
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-v1/src/epub/css/local.css#L262-L303
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-11-v1/html/epub/text/chapter-11.xhtml#noteref-194

Every component split up in a separate column. Simple, but with a
disadvantage: the names are split up.

Version 2
.........
Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-alt/src/epub/text/chapter-11.xhtml#L32-L184
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/demo-table-3-11-alt/src/epub/css/local.css#L265-L299
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-11-alt/html/epub/text/chapter-11.xhtml#noteref-194

Using <span>s to hide the repeated text and overlay a ditto mark. Keeps
the text together, but might be less well supported?
(It should degrade gracefully, I think: if it doesn't work, you just see
the repeated text.)




Appendix D
==========
Page 424
--------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/424/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/main/src/epub/text/appendix-d.xhtml
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/main/src/epub/css/local.css#L331-L377
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-notes-alt/html/epub/text/appendix-d.xhtml

Relatively simple, I think. I'm using a separate <tbody> with a <th
scope="rowgroup"> for each year.



Appendix E
==========
Page 425
--------
Page:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/424/mode/1up

Code:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/main/src/epub/text/appendix-e.xhtml
CSS:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/main/src/epub/css/local.css#L331-L377
Demo:
https://whymper-scrambles-demo.pages.dev/demo-table-3-notes-alt/html/epub/text/appendix-e.xhtml

Very similar to Appendix D. Quite a long table, but I don't think it's
necessary to repeat the table headers: the meaning of each column is
clear from the text.



Vince

unread,
Aug 23, 2024, 6:50:19 PM8/23/24
to Standard Ebooks
In general, the same thing applies to tables as to the entire book as a whole—we don’t use classes where we can easily target with CSS selectors. In the case of tables, that is almost always easily accomplished by targeting a column, which is usually all of the same type of data, and if it isn’t, it’s a good time to examine the table. :) Thus, with rare exceptions, tables don’t need (and shouldn’t have) classes any more than the rest of the book.

If you have specific questions, please ask them, but it would take far too much time to pre-review multiple versions of every table in the book. The overall structure of an HTML table is well-documented in numerous places, and SEMoS 5.7 highlights the specific requirements that we have for them in our productions. (That section also contains references for the HTML standards and web accessibility standards for tables.)

--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.

Gijs van Tulder

unread,
Aug 23, 2024, 7:30:17 PM8/23/24
to standar...@googlegroups.com
> If you have specific questions, please ask them, but it would take far
> too much time to pre-review multiple versions of every table in the
> book.

Sure, no problem, then I'll include the tables when I'm also done with
the rest of the book. Since Alex specifically highlighted the tables in
his initial email, I thought you might want an early look at those. But
that's not required. I had seen the documentation and have followed the
accessibility guidelines, so I'll pick the option I think works best and
then we'll see later if there need to be any changes.

Thanks!



On 24-08-2024 00:50, Vince wrote:
> In general, the same thing applies to tables as to the entire book as a
> whole—we don’t use classes where we can easily target with CSS
> selectors. In the case of tables, that is almost always easily
> accomplished by targeting a column, which is usually all of the same
> type of data, and if it isn’t, it’s a good time to examine the table. :)
> Thus, with rare exceptions, tables don’t need (and shouldn’t have)
> classes any more than the rest of the book.
>
> If you have specific questions, please ask them, but it would take far
> too much time to pre-review multiple versions of every table in the
> book. The overall structure of an HTML table is well-documented in
> numerous places, and SEMoS 5.7
> <https://standardebooks.org/manual/1.8.0/single-page#5.7> highlights the
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/F2070213-FE4E-40A3-ABFD-7AF33E5138EF%40letterboxes.org <https://groups.google.com/d/msgid/standardebooks/F2070213-FE4E-40A3-ABFD-7AF33E5138EF%40letterboxes.org?utm_medium=email&utm_source=footer>.

David

unread,
Aug 24, 2024, 5:54:26 AM8/24/24
to Standard Ebooks
Just in case it helps, too, Gijs - a recent production had a lot of complex tables (which Vince also helped with).


You could also see how those tables were handled to supplement the SEMoS guidance.

For what it's worth!

David / Fife, UK

Gijs van Tulder

unread,
Sep 3, 2024, 6:17:51 PM9/3/24
to standar...@googlegroups.com
Thanks David and Vince for the table suggestions!

I've been looking at potential covers and so far have found one option
that might work. A large part of the book is centered around the
Matterhorn, so it would be nice if the cover reflects this.

Could this painting work?

The Matterhorn by John Ferguson Weir, 1869
CC0 at the Yale University Art Gallery
https://artgallery.yale.edu/collections/objects/10370

The cover could look like this:


The other high-resolution public-domain Matterhorn picture I came across
so far is the Sunrise on the Matterhorn
<https://standardebooks.org/artworks/albert-bierstadt/sunrise-on-the-matterhorn>
by Albert Bierstadt, but that's already in use and is a bit dark anyway.
Bierstadt has some other Matterhorn-related paintings, but I haven't
been able to find clear public domain versions of those.

Gijs van Tulder

unread,
Sep 3, 2024, 6:21:05 PM9/3/24
to standar...@googlegroups.com
Apologies, it looks like my email program didn't like the inline image.

Second try: see the attachment for an example cover.

The Matterhorn by John Ferguson Weir, 1869
CC0 at the Yale University Art Gallery
https://artgallery.yale.edu/collections/objects/10370





On 24-08-2024 11:54, David wrote:
cover-weir.jpg

Vince

unread,
Sep 3, 2024, 10:04:14 PM9/3/24
to Standard Ebooks
Looks good to me. I’ll add it to the database.

Gijs van Tulder

unread,
Sep 23, 2024, 5:55:17 PM9/23/24
to standar...@googlegroups.com
Hello,

I've finished proofreading "Scrambles Amongst the Alps". There are a few
loose ends, see below, but after that I think it is ready for review.

Repository:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69

Scans for the text and illustrations:
https://archive.org/details/scramblesamongst0000unse_l2h0

Scans for the maps at the end:
https://archive.org/details/scramblesamongst00whym
(this is a scan of the first edition)


Remarks/questions:

-

1. Fifth edition / Colophon.

Is there a standard formula to include the edition in the colophon? In
this case, the ebook is based on the fifth edition, which has some small
changes and adds a new preface and an extended appendix. If the colophon
only mentions the publication year of the first edition, this might give
the wrong impression.

I couldn't find a standard phrase for this in other ebooks, so for the
time being, I've included the following in the colophon:

> Scrambles Amongst the Alps in the Years 1860–69
> was first published in 1871 by
> Edward Whymper.
> This ebook is based on the fifth edition, published in 1900.

(Changes from to the template: added "first" in the first sentence, and
the second sentence is entirely new.)

Does this belong somewhere else?

-

2. Source for scanned pages.

I have used the Internet Archive scans of the fifth edition as the
reference and as the source for the illustrations. However, the fold-out
maps at the end of the book are missing from these these scans. I have
taken those from the IA first edition scans (they should be similar enough).

How best to present this in the colophon? I followed the instructions
for books with multiple sources, so the colophon now links to to the
"Page scans" section on the SE website, but then almost everything comes
from the main source. Would it be better to link to that from the
colophon and have the maps source only in the content.opf?

-

3. Illustration on the half title page?

The book's title page includes a small illustration:
https://archive.org/details/scramblesamongst0000unse_l2h0/page/n12/mode/1up

The guidelines for the half title page (SEMoS 3.1.10) do not explicitly
exclude images, so I've put the image there for now to make it similar
to the printed book. The frontispiece (which already has a full-page
image) could be an alternative location.

-

4. Long parts in French?

Following SEMoS 8.2.9, I've used italics to mark short French phrases
and individual sentences within an English context. That is clear.

However, there are some longer pieces of text that are entirely in
French (a full section, a few notes that are entirely in French, and a
few long stand-alone quotes). I'm not sure whether these count as a
"phrase" and must be printed in italics, or if they can be shown as
normal text to improve readability. For now, I've marked those as
xml:lang="fr" without italics, but perhaps this should be changed.

-

5. Numbers.

The book is quite number-heavy. SEMoS 8.8.3 suggests that numbers in a
non-mathematical context should be spelled out, which I guess is a good
idea for fiction. For readability (and to prevent unhelpful phrases such
as "an elevation of thirteen-thousand feet, instead of twelve-thousand
eight-hundred and seventy-eight") I considered altitudes etc. to be
sufficiently mathematical and left these numbers as printed in the book.
The book generally has a good balance of spelled-out numbers in
sentences and digits when it wants to be more precise. I see that other
non-fiction ebooks in the collection use a similar approach.

-

6. xml:lang for patois.

The footnote on page 39 contains two paragraphs in the regional dialect.
It sounds a little bit like French, but it isn't. I've marked these with
xml:lang="x-patois" (SEMoS 8.2.9.7: non-English "alien" language)
instead of xml:lang="und" (SEMoS 8.2.9.8: unknown language).
https://archive.org/details/scramblesamongst0000unse_l2h0/page/39/mode/1up

-

7. se tools

On a technical note: the ebook works best with the most recent GitHub
version of the tools. The latest 2.7.1 release has some problems in
lint/clean/build that have since been fixed.

-

Thanks for the feedback!

Gijs

Vince

unread,
Sep 23, 2024, 6:28:23 PM9/23/24
to Standard Ebooks
  1. We don’t store the edition anywhere, and I assume we don’t want to, as this situation is quite common in the corpus; the colophon isn’t meant for information like that, and the scans indicate what edition was used. Alex can confirm/deny that.
  2. What you’ve done in the colophon is fine, that’s what the “multiple sources” is for. The colophon doesn't indicate what came from where, and doesn’t need to. If you don’t already, you should add a production note that indicates that, though.
  3. I would have said no a couple of weeks ago, but I believe Alex let something else be on the halftitle page recently, so I’ll leave this for him.
  4. Unless Alex wants to make an exception, the “long parts” should be italicized.
  5. SEMoS 8.8.3 only has them spelled out in certain circumstances, which, e.g., your altitude example does not fall into.
  6. X is specifically for an “alien” language, which patois is not, so it should be either “und” or “fr.” (Is it not French because it’s a dialect? I don’t know if that’s valid; English dialect is still English. But I don’t know it’s not valid, either. Edge cases are always fun.)

--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.

Alex Cabal

unread,
Sep 23, 2024, 10:30:24 PM9/23/24
to standar...@googlegroups.com
No image on the half title, you can just cut that image.

On 9/23/24 5:28 PM, Vince wrote:
> 1. We don’t store the edition anywhere, and I assume we don’t want to,
> as this situation is quite common in the corpus; the colophon isn’t
> meant for information like that, and the scans indicate what edition
> was used. Alex can confirm/deny that.
> 2. What you’ve done in the colophon is fine, that’s what the “multiple
> sources” is for. The colophon doesn't indicate what came from where,
> and doesn’t need to. If you don’t already, you should add a
> production note that indicates that, though.
> 3. I would have said no a couple of weeks ago, but I believe Alex let
> something else be on the halftitle page recently, so I’ll leave this
> for him.
> 4. Unless Alex wants to make an exception, the “long parts” should be
> italicized.
> 5. SEMoS 8.8.3 only has them spelled out in certain circumstances,
> which, e.g., your altitude example does not fall into.
> 6. X is specifically for an “alien” language, which patois is not, so
> it should be either “und” or “fr.” (Is it not French because it’s a
> dialect? I don’t know if that’s valid; English dialect is still
> English. But I don’t know it’s /not/ valid, either. Edge cases are
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/3E650E84-93AD-4E43-B9F1-98BD985C3F5F%40letterboxes.org <https://groups.google.com/d/msgid/standardebooks/3E650E84-93AD-4E43-B9F1-98BD985C3F5F%40letterboxes.org?utm_medium=email&utm_source=footer>.

David

unread,
Sep 24, 2024, 12:17:25 PM9/24/24
to Standard Ebooks
Just a little addendum on #1 (as I brace myself for a review) regarding the edition information.

As Vince says, we definitely don't want this in the colophon file. Two possibilities come to my mind, though: (1) On occasion, edition information is included in the long description, where it is relevant that potential readers should know what text they're getting (here's an example from Henry James). (2) But if not so important that readers should know, then it's the sort of information that could go into the production notes line of content.opf.

But if I'm way off base on either of these suggestions, I'm sure we'll be informed!

David / Fife, UK

On Monday 23 September 2024 at 23:28:23 UTC+1 Vince wrote:
  1. We don’t store the edition anywhere, and I assume we don’t want to, as this situation is quite common in the corpus; the colophon isn’t meant for information like that, and the scans indicate what edition was used. Alex can confirm/deny that.
. . .
On Sep 23, 2024, at 4:55 PM, Gijs van Tulder wrote:
. . .
1. Fifth edition / Colophon.

Is there a standard formula to include the edition in the colophon? In this case, the ebook is based on the fifth edition, which has some small changes and adds a new preface and an extended appendix. If the colophon only mentions the publication year of the first edition, this might give the wrong impression.

I couldn't find a standard phrase for this in other ebooks, so for the time being, I've included the following in the colophon: . . .

Gijs van Tulder

unread,
Sep 24, 2024, 6:04:23 PM9/24/24
to standar...@googlegroups.com
On 24-09-2024 18:17, David wrote:
> As Vince says, we definitely don't want this in the colophon file. Two
> possibilities come to my mind, though:

Thanks for the suggestions. I have removed the edition from the
colophon. It just feels a bit odd to only mention 1871 if the edition is
from a much later date. But there's a "Preface to the Fifth Edition", so
I guess it is clear enough.

The edition is mentioned in the production notes.

> (1) On occasion, edition information is included in the long description,

I already had two sentences there. Based on the discussion, it might be
okay to remove those, so I guess we'll see if they survive the review.

Gijs


On 24-09-2024 18:17, David wrote:
> Just a little addendum on #1 (as I brace myself for a review) regarding
> the edition information.
>
> As Vince says, we definitely don't want this in the colophon file. Two
> possibilities come to my mind, though: (1) On occasion, edition
> information is included in the long description, where it is relevant
> that potential readers should know what text they're getting (here's an
> example <https://standardebooks.org/ebooks/henry-james/the-ambassadors>
> from Henry James). (2) But if not so important that readers should know,
> then it's the sort of information that could go into the production
> notes line <https://standardebooks.org/manual/1.8.0/single-page#9.9.4>
> of content.opf.
>
> But if I'm way off base on either of these suggestions, I'm sure we'll
> be informed!
>
> David / Fife, UK
>
> On Monday 23 September 2024 at 23:28:23 UTC+1 Vince wrote:
>
> 1. We don’t store the edition anywhere, and I assume we don’t want
> to, as this situation is quite common in the corpus; the
> colophon isn’t meant for information like that, and the scans
> indicate what edition was used. Alex can confirm/deny that.
>
> . . .
>
>> On Sep 23, 2024, at 4:55 PM, Gijs van Tulder wrote:
>> . . .
>> 1. Fifth edition / Colophon.
>>
>> Is there a standard formula to include the edition in the
>> colophon? In this case, the ebook is based on the fifth edition,
>> which has some small changes and adds a new preface and an
>> extended appendix. If the colophon only mentions the publication
>> year of the first edition, this might give the wrong impression.
>>
>> I couldn't find a standard phrase for this in other ebooks, so for
>> the time being, I've included the following in the colophon: . . .
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/827fb9d9-2892-4f7b-af6e-4c98ec35164fn%40googlegroups.com <https://groups.google.com/d/msgid/standardebooks/827fb9d9-2892-4f7b-af6e-4c98ec35164fn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Gijs van Tulder

unread,
Sep 24, 2024, 6:06:52 PM9/24/24
to standar...@googlegroups.com
On 24-09-2024 00:28, Vince wrote:
> ...

Thanks for the clarifications.

1. I removed the edition from the colophon.
2. This was indeed already mentioned in the production notes.
3. Figure removed.
4. Done. Italics added.
5. OK. No changes needed then, I think.
6. I changed the language to French. It's very weird French, but it's
probably closer to French than to an undefined language.

Latest version pushed to Github:
https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69

Gijs


On 24-09-2024 00:28, Vince wrote:
> 1. We don’t store the edition anywhere, and I assume we don’t want to,
> as this situation is quite common in the corpus; the colophon isn’t
> meant for information like that, and the scans indicate what edition
> was used. Alex can confirm/deny that.
> 2. What you’ve done in the colophon is fine, that’s what the “multiple
> sources” is for. The colophon doesn't indicate what came from where,
> and doesn’t need to. If you don’t already, you should add a
> production note that indicates that, though.
> 3. I would have said no a couple of weeks ago, but I believe Alex let
> something else be on the halftitle page recently, so I’ll leave this
> for him.
> 4. Unless Alex wants to make an exception, the “long parts” should be
> italicized.
> 5. SEMoS 8.8.3 only has them spelled out in certain circumstances,
> which, e.g., your altitude example does not fall into.
> 6. X is specifically for an “alien” language, which patois is not, so
> it should be either “und” or “fr.” (Is it not French because it’s a
> dialect? I don’t know if that’s valid; English dialect is still
> English. But I don’t know it’s /not/ valid, either. Edge cases are
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/3E650E84-93AD-4E43-B9F1-98BD985C3F5F%40letterboxes.org <https://groups.google.com/d/msgid/standardebooks/3E650E84-93AD-4E43-B9F1-98BD985C3F5F%40letterboxes.org?utm_medium=email&utm_source=footer>.

David

unread,
Sep 25, 2024, 4:23:14 AM9/25/24
to Standard Ebooks
I'm not quite sure what stage things are at: with these commits done, is the production now ready for review? or is there still something(s) left to be done?

While I'm at it, I see you've got two versions of each illustration (*.jp2 and *.png, for the most part) in the `/images` folder; it's not clear to me from SEMoS that there is a need for both?

And again on images, a number of the PNG and JPG files in `./src/epub/images/` look like they might be candidates for conversion to SVG — but here it would be good to get input from Alex or another editor with more experience of images than I have!

David / Fife, UK

Gijs van Tulder

unread,
Sep 25, 2024, 6:05:02 AM9/25/24
to standar...@googlegroups.com
> I'm not quite sure what stage things are at: with these commits done, is
> the production now ready for review? or is there still something(s) left
> to be done?

I suppose that's for Vince to decide, but I do not have other changes to
make at this point.

We can also discuss the images first, if you prefer.


> While I'm at it, I see you've got two versions of each illustration
> (*.jp2 and *.png, for the most part) in the `/images` folder; it's not
> clear to me from SEMoS
> <https://standardebooks.org/manual/1.8.0/single-page#10.5> that there is
> a need for both?

Thanks for having a first look, good observation. I don't know if both
need to be kept in the final repository. For my process so far, it was
useful to keep both:

* The .jp2 images are crops from the original IA scans. The "real"
originals, so to say.
* The .png images are high-resolution, grayscale, cleaned-up versions.
For example, I removed text surrounding the images, cleaned up
imperfections in the scan etc.

From the high-resolution .png images, I generated the cropped,
downsampled, compressed images in src/epub/images/. If that needs to be
redone (e.g., to reduce image size, convert to other formats, etc.) the
high-res .png would be the best source.

If preferred, I can remove the .pngs from the for-review repository and
keep them in my own copy somewhere.


> And again on images, a number of the PNG and JPG files in
> `./src/epub/images/` look like they might be candidates for conversion
> to SVG — but here it would be good to get input from Alex or another
> editor with more experience of images than I have!

That is a tricky question. I did quite extensive experiments with all
sorts of variations, and the current solution seemed to give the best
results in terms of quality and image size. But I'm open to suggestions.

1. The true line drawings (e.g., the diagrams and simple maps) I've all
converted to SVG, as per the SEMoS instructions.

2. The large maps at the end and one inside the book did not work well
as SVG: if the SVG converter doesn't crash, the file size becomes
enormous, the smaller letters stick together and become unreadable.
Grayscale JPG seemed to give the best balance of readability/file
size/quality.

3. For the other images, I've also tried SVG, but that didn't give good
results:

* Many images contain grayscale hatching, which doesn't look well as an
SVG. The shading either becomes fully black, fully white, or full of
artifacts. That takes away a lot of the charm. (Even in the diagrams I
currently have as SVG you can see artifacts in the simple hatching:
white spots, black spots instead of parallel lines.)

* SVG is not smaller. Converting all images to SVG added about 30% to
the size of individual images and to the size of the final epub.

* Black-on-transparent SVG (or PNG) does not work well in dark mode.
Either the images are inverted, which leads to negative images with
black snow and other weirdness, or they should receive a white
background. This background can not be added reliably with CSS (for
example, the Apple Books app does not accept background colors, so the
black-on-transparent SVG images become invisible on a dark background).

So overall, other than for the true line drawings which I did convert to
SVG, I think SVG would 1. reduce image quality, 2. increase file size,
3. cause problems in dark mode; while for these images SVG doesn't
provide clear benefits over PNG.

If not SVG, then the choice is between PNG or JPG.

* I converted all rectangular images to grayscale JPG.

* For the non-rectangular images (e.g., the circle-shaped ones), having
a white rectangle background doesn't look very nice in dark mode. For
these images, I used PNG with a white-on-transparent background to match
the outline of the image. The file size isn't much different from a JPG,
but I think it looks a bit nicer.

In summary: I think the current PNG/SVG/JPG combination is the best
compromise, but I'd be happy to be convinced otherwise.




On 25-09-2024 10:23, David wrote:
> I'm not quite sure what stage things are at: with these commits done, is
> the production now ready for review? or is there still something(s) left
> to be done?
>
> While I'm at it, I see you've got two versions of each illustration
> (*.jp2 and *.png, for the most part) in the `/images` folder; it's not
> clear to me from SEMoS
> <https://standardebooks.org/manual/1.8.0/single-page#10.5> that there is
> a need for both?
>
> And again on images, a number of the PNG and JPG files in
> `./src/epub/images/` look like they might be candidates for conversion
> to SVG — but here it would be good to get input from Alex or another
> editor with more experience of images than I have!
>
> David / Fife, UK
>
> On Tuesday 24 September 2024 at 23:06:52 UTC+1 Gijs wrote:
>
> On 24-09-2024 00:28, Vince wrote:
> > ...
>
> Thanks for the clarifications.
>
> 1. I removed the edition from the colophon.
> 2. This was indeed already mentioned in the production notes.
> 3. Figure removed.
> 4. Done. Italics added.
> 5. OK. No changes needed then, I think.
> 6. I changed the language to French. It's very weird French, but it's
> probably closer to French than to an undefined language.
>
> Latest version pushed to Github:
> https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69 <https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69>
>
> Gijs
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/8212bf07-79b3-47b7-88b2-cda76ac6b747n%40googlegroups.com <https://groups.google.com/d/msgid/standardebooks/8212bf07-79b3-47b7-88b2-cda76ac6b747n%40googlegroups.com?utm_medium=email&utm_source=footer>.

David

unread,
Sep 25, 2024, 7:20:07 AM9/25/24
to Standard Ebooks
Thanks, Gijs, that's helpful commentary on the images. Let's get some reaction/confirmation/whatever on the images to this point before I dig in with a review.

It's apparent to me that Alex was a "true prophet" when he said that this would be "a very, very difficult production"! This is an impressive amount of work, and it will be good to see it complete in due course.

D.

Alex Cabal

unread,
Sep 25, 2024, 1:02:18 PM9/25/24
to standar...@googlegroups.com
For SVGs:

- we aren't concerned about file size.

- dark mode is also not a concern because each ereader does its own
thing and we cannot cater to every possible case, as some are mutually
exclusive.

- only black-on-transparent images are inverted by our CSS, and you must
be really sure that it's really black-on-transparent. Shades of grey do
not invert well.

- SVGs are converted to PNG for the compatible epub anyway so dark mode
becomes less of a concern since most people are using the compatible epub.

So if any of those points would result in more SVGs, you should go for
it. We have even converted complex maps and dense woodcuts to SVG - see
Ulysses S. Grant memoirs, Alice's Aventures in Wonderland.

Gijs van Tulder

unread,
Sep 25, 2024, 2:51:17 PM9/25/24
to standar...@googlegroups.com
Thanks for the clarifications.

On 25-09-2024 19:02, Alex Cabal wrote:
> So if any of those points would result in more SVGs, you should go for
> it.

Clear. In my opinion, the images that are not already SVGs look worse if
converted to SVG, so I would suggest to leave them as-is.



> For SVGs:
>
> - we aren't concerned about file size.

Ah, I thought that might be one of the reasons to prefer SVG.

> - dark mode is also not a concern because each ereader does its own
> thing and we cannot cater to every possible case, as some are mutually
> exclusive.
>
> - SVGs are converted to PNG for the compatible epub anyway so dark mode
> becomes less of a concern since most people are using the compatible
epub.

I understand not being concerned too much about every nitty-gritty
detail of every possible ereader, but are you aware that also with the
compatible epub, dark mode in the default epub reader on iPad makes all
non-inverted SVG-derived images invisible?

For example, in Alice's Adventures in Wonderland all illustrations
disappear in dark mode, because the generated SVGs are transparent and
Apple doesn't process the CSS background color.

(Feel free to ignore if this is a known limitation.)


> So if any of those points would result in more SVGs, you should go for
> it. We have even converted complex maps and dense woodcuts to SVG - see
> Ulysses S. Grant memoirs, Alice's Aventures in Wonderland.

They look nice, but they have really high-resolution source images.

Alex Cabal

unread,
Sep 25, 2024, 2:53:34 PM9/25/24
to standar...@googlegroups.com
On 9/25/24 1:51 PM, Gijs van Tulder wrote:
> I understand not being concerned too much about every nitty-gritty
> detail of every possible ereader, but are you aware that also with the
> compatible epub, dark mode in the default epub reader on iPad makes all
> non-inverted SVG-derived images invisible?
>
> For example, in Alice's Adventures in Wonderland all illustrations
> disappear in dark mode, because the generated SVGs are transparent and
> Apple doesn't process the CSS background color.

They didn't used to, so that's another point for not worrying about it.
We try to do our best and since different ereaders implement mutually
exclusive dark mode functionality we cannot cater to everything with
just one file.

David

unread,
Sep 25, 2024, 3:21:49 PM9/25/24
to Standard Ebooks
And I assume that with that exchange, this production is now ready for review.

Given what I have on, it will be the weekend before I can make a start on this. Hopefully I'll have it done on Saturday. Hope that timing's acceptable!

D.

David

unread,
Sep 28, 2024, 7:03:13 AM9/28/24
to Standard Ebooks
Okay, I've started the review, and there's a lint error to start with, and the message is: ".png file without transparency. Hint: If an image doesn’t have transparency, it should be saved as a .jpg." That applies to 77 PNG files in the `src/epub/images/` directory (see output with list of images attached).

I can proceed with other aspects of the review, but it would be good to know from Alex whether these 77 PNGs should, in fact, be converted to JPGs, as the lint message instructs. The alternative, I suppose, is to give them transparent backgrounds as PNGs, as it seems most (I think all?) are B&W/greyscale etchings in any case.

(Also, just to note, the repo is 1.1Gb, and my wifi choked on it repeatedly when attempting to clone. I plugged into my router with a cable, and it came down fine then.)

D.
whymper-lint-out.md

Alex Cabal

unread,
Sep 28, 2024, 12:38:25 PM9/28/24
to standar...@googlegroups.com
Lint is almost always correct, in this case it is indeed correct.

If you're considering adding transparency to a woodcut-style PNG, then
ask yourself, should this be an SVG instead? Because a woodcut-style PNG
with transparency will have to be inverted in dark mode, and if we're
going through all that effort than an SVG does the same thing but as a
vector.

Also, while I often say file size doesn't matter, 1GB is shocking. How
many 50MB images are there in this book? Are the big ones the SVGs? If
so then try running them through an SVG compressor, there are a ton
online. Sometimes they mess up the actual rendering of the SVGs so you
have to play around with them a little.
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/f8642277-4a8c-4460-a635-97618010fe1cn%40googlegroups.com <https://groups.google.com/d/msgid/standardebooks/f8642277-4a8c-4460-a635-97618010fe1cn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Vince Rice

unread,
Sep 28, 2024, 1:18:42 PM9/28/24
to standar...@googlegroups.com
Also, I think it got missed in the initial set of questions about the images, but we don’t need both the .jp2 and .jpg in the /images directory. We only need one original, and it should be the .jpg, I believe.

> On Sep 28, 2024, at 11:38 AM, Alex Cabal <al...@standardebooks.org> wrote:
>
> Lint is almost always correct, in this case it is indeed correct.
> To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/525af4d9-15a1-4a24-9837-60baaa7dda18%40standardebooks.org.

Gijs van Tulder

unread,
Sep 28, 2024, 4:01:43 PM9/28/24
to standar...@googlegroups.com
Hi David, Vince, Alex,

Thanks for looking into this. As you predicted, it's a complicated book.

A few remarks on the points raised by David and your subsequent comments.



Repository size:

Alex wrote:
> Also, while I often say file size doesn't matter, 1GB is shocking. How
> many 50MB images are there in this book? Are the big ones the SVGs?
> If so then try running them through an SVG compressor, there are a ton
> online. Sometimes they mess up the actual rendering of the SVGs so you
> have to play around with them a little.

Maybe it's good to clarify that 1GB is *not* the size of the finished
book. It's large, but not *that* large.

In a clean clone, the .git directory is 525MB (on my system at least).
The size of all files (images/ + src/) is 497MB. Of this, the src/
directory is 67MB. The final epub is around the same size. The processed
images have all been optimized and are all under 1.5MB (and usually
smaller).

du -hs * .* --total
431M images
67M src
525M .git
1022M total

The main weight is in the directory with source images, 431MB.

* 291MB are the JPEG2000 crops from the Internet Archive scans. Besides
cropping, I didn't touch (change/optimize) these in any way, so they are
fairly large.

* 127MB is taken by the intermediate PNGs that I mentioned in an earlier
email and which Vince suggested today can be removed.

Removing the intermediate-source PNGs reduces the numbers to:

305M images
67M src
400M .git
770M total

Would this be sufficient, or is this still too large? I think it would
be difficult to get this down further without changing the (quality of
the) source images, but maybe that's acceptable.




se lint / transparency:

On 28-09-2024 13:03, David wrote:
> Okay, I've started the review, and there's a lint error to start with,

There two separate issues here, I think:
1. the lint errors,
2. the role of the transparency (and if this is a good idea or not).


1. There have been a few changes and bug fixes in the tools since the
last release, so it would be best if you could use the most recent
version from GitHub. (I briefly mentioned this all the way at the end of
my earlier list of points, where it's easy to miss. I probably should
have stated that more clearly.)

This transparency warning is one of those things: there *is*
transparency in those PNG files, but se lint 2.7.1 doesn't recognize
that correctly, because it is transparency of a type it doesn't know. In
the latest git version, there shouldn't be any lint errors (other than
the ones in se-lint-ignore).

You might also notice that se clean makes a breaking change to the CSS,
if you use the 2.7.1 release of the tools. This is also fixed in the
most recent unreleased version on GitHub.


2. It's probably good to address the reason for the transparency in
these images. See also my earlier email on the images, but as I wrote
there: non-rectangular images, such as the circular ones, currently have
a white-on-transparent background to give them a background of the right
shape (so that you don't get a round image in a white rectangle, for
instance). Size-wise, the optimized PNG is not much different from a
JPG, and my reasoning was that for round images a round white background
looked better than one with large white corners.

This is mostly a result of an attempt to make the images look nicer in
dark mode, for which I thought the round background looks better. If the
review consensus is that a white rectangular background is preferred, or
that dark mode doesn't matter at all, then non-transparent JPG would
give the same result.


Thanks,

Gijs

David

unread,
Sep 29, 2024, 10:27:20 AM9/29/24
to Standard Ebooks
Thanks for those notes, Gijs, which will get picked up in due course. 

I've now completed my initial review, and posted to your repo. I have to confess, this production is far more complex than anything I've undertaken (relates especially to local.css). There are a LOT of [Editorial] commits that I've flagged as probably NOT [Editorial], but this is can often be a case of "editorial in the eye of the beholder". I wouldn't mind (would even be grateful) if one of the more senior editors could cast their eye over my review. Some of it will (I trust!) be uncontroversial. 

I haven't done anything further about images as discussed earlier in this thread. Nor was I able to update my toolset to the bleeding edge. :) For me, "most recent version" pretty much means "most recent tagged/stable version". :/

As noted before, you've put a huge amount of work into this production already, so hopefully getting it over the line will not look so onerous.

David / Fife, UK

Gijs van Tulder

unread,
Sep 29, 2024, 1:04:07 PM9/29/24
to standar...@googlegroups.com
Hi David,

Thanks a lot for the review. I'll process your comments in the coming days.

With regard to the images: I'll wait for the final answer. I'm fine with
changing them to JPG if that's the preference, but it's slightly more
complex than "not transparent at all": it does change the way they look.

I have indeed tried to be cautious by marking the possibly-editorial
commits as editorial. Basically: anything that changed the appearance of
the text (spelling, but also adding/removing italics). I thought it
might be safer to have a few too many than to miss a few for the review.
I'll wait with renaming/rebasing the ones you listed until the doubtful
cases are confirmed.

There is a quite a bit of CSS for the various tables, indeed.

Thanks.

Gijs


On 29-09-2024 16:27, David wrote:
> Thanks for those notes, Gijs, which will get picked up in due course.
>
> I've now completed my initial review, and posted to your repo
> <https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/issues>. I have to confess, this production is far more complex than anything I've undertaken (relates especially to local.css). There are a LOT of [Editorial] commits that I've flagged as probably NOT [Editorial], but this is can often be a case of "editorial in the eye of the beholder". I wouldn't mind (would even be grateful) if one of the more senior editors could cast their eye over my review. Some of it will (I trust!) be uncontroversial.

Alex Cabal

unread,
Sep 29, 2024, 11:24:24 PM9/29/24
to standar...@googlegroups.com
OK, that's a reasonable breakdown of file size. ~50MB is fine for an
ebook of this type.

Yes, you must do what lint says.

Gijs van Tulder

unread,
Oct 10, 2024, 8:54:45 PM10/10/24
to standar...@googlegroups.com
Hi David,

I've processed your remarks from the initial review. There's a new
version available on GitHub. Details here:

https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/issues/1

Be aware: it's still a large repository, and I rebased a substantial
part of it, so be careful if you do a git pull.

I look forward to your response when you have time. (No reason to hurry.)

Thanks!

Gijs

David

unread,
Oct 11, 2024, 7:51:39 AM10/11/24
to Standard Ebooks
It is good to be able to work at one's own pace! :) Thanks for the "heads-up", and I'll try to get to this soon.

D.

David

unread,
Nov 5, 2024, 9:52:17 AM11/5/24
to Standard Ebooks
Apologies for having taken so long over this one (complex project, in addition to a busy time domestically!).

It's time to hand over to you, Alex. There are aspects of this project that are well beyond my competence, but hopefully it's in decent shape as you pick it up.

D.

On Friday 11 October 2024 at 12:51:39 UTC+1 David wrote:
It is good to be able to work at one's own pace! :) Thanks for the "heads-up", and I'll try to get to this soon.

D.

Alex Cabal

unread,
Nov 12, 2024, 1:39:06 PM11/12/24
to standar...@googlegroups.com
OK, I'm working through this as it's a very big project.

The first thing that comes to mind is that we can convert many more of
these JPGs into SVGs. There are a lot in there that would be suitable
for conversion, here are just a few for example in no particular order:

illustration-3
illustration-15
illustration-21
illustration-39
illustration-49

And so on. There are many more inbetween those as well, I just picked a
few random ones.

There is no hard rule on what is or is not suitable, it's a matter of
eye. Smaller images with little shading are the most suitable. If you do
a conversion and your image viewer struggles to open it then it's
probably not suitable.

Some that are definitely *not* suitable would be:

illustration-1
illustration-10
illustration-29
illustration-51

and so on.

So can you convert more of these to SVG?

The second thing is that the rest of these JPGs are already
postprocessed to be grayscale on a white background. So what we should
do is actually use a color-to-alpha function (GIMP has this function for
example) to change the white background to transparent, then save as a
PNG in grayscale. If you run the result through TinyPNG or a similar
optimizer, the file size is almost the same as JPG.

I've attached a converted illustration-1.png as an example - this PNG is
actually *smaller* than the JPG in the repo, but it has transparency, so
it won't appear as a white rectangle in ereaders whose backgrounds are
not pure white (which is most of them).

On 11/5/24 8:52 AM, David wrote:
> Apologies for having taken so long over this one (complex project, in
> addition to a busy time domestically!).
>
> It's time to hand over to you, Alex. There are aspects of this project
> <https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69> that are well beyond my competence, but hopefully it's in decent shape as you pick it up.
>
> D.
>
> On Friday 11 October 2024 at 12:51:39 UTC+1 David wrote:
>
> It /is/ good to be able to work at one's own pace! :) Thanks for the
> "heads-up", and I'll try to get to this soon.
>
> D.
>
> On Friday 11 October 2024 at 01:54:45 UTC+1 Gijs wrote:
>
> Hi David,
>
> I've processed your remarks from the initial review. There's a new
> version available on GitHub. Details here:
>
> https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/issues/1 <https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/issues/1>
>
> Be aware: it's still a large repository, and I rebased a
> substantial
> part of it, so be careful if you do a git pull.
>
> I look forward to your response when you have time. (No reason
> to hurry.)
>
> Thanks!
>
> Gijs
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit
> https://groups.google.com/d/msgid/standardebooks/69043df0-8e7e-4603-bee7-f066ca9a10e1n%40googlegroups.com <https://groups.google.com/d/msgid/standardebooks/69043df0-8e7e-4603-bee7-f066ca9a10e1n%40googlegroups.com?utm_medium=email&utm_source=footer>.
illustration-1.png

Alex Cabal

unread,
Nov 12, 2024, 1:41:12 PM11/12/24
to standar...@googlegroups.com
Work on the SVGs first and let me know when that's done, and then we can
address PNG conversion after we're certain that we've made SVGs out of
everything we can.

PNG conversion might even be scriptable with imagemagick if it supports
a color-to-alpha function.

Gijs van Tulder

unread,
Nov 16, 2024, 3:45:06 AM11/16/24
to standar...@googlegroups.com
Thanks for looking into this. I went through a number of iterations for
the images, the all-JPEG version was based on your previous instruction,
but I still have the other versions.

It's complicated: I don't think there is a perfect solution, so have a
look at the options below.

-

The problem I found with black-on-transparent-PNG (and SVG by extension)
is that it doesn't work reliably in dark mode:

* Sometimes you don't see anything, because the black foreground is
invisible against a black background (this happens, for example, in the
default epub reader on iOS/iPadOS, which doesn't accept CSS background
colors).

* Or you get large white rectangular backgrounds (a bit similar to what
you saw with in your non-pure-white-background reader, but then
white-on-black). This is especially ugly for the circular images.

* Or you would have to invert the images, so that the black becomes
white, but that doesn't work for most drawings. (I now only do this for
the true diagrams.)

-

As compromise, I previously made a version with white + black +
transparent PNGs. In this version, the image is shown against a white
background that matches the shape of the image (e.g., if the image is a
circle, it's shown on a white circle background). The rest of the
background is transparent, so this avoids the white rectangle effect.

I've pushed this version to GitHub. Maybe you can have a look to see
what you think of this. For example:

https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69/blob/main/src/epub/images/illustration-2.png

(GitHub doesn't show the transparency.)

-

Alternatively, the black-on-transparent PNG/SVG combination could work.
I tried those at some point (it is indeed an ImageMagick-based
conversion), so I still have them somewhere. It looks nice on a light
background, but in dark modes it looks awkward at best and doesn't work
at all in some cases.

-

So I think the choice is between:

1. PNG black-on-white-on-transparent: okay-ish in all cases
2. PNG/SVG black-on-transparent: nicest on light backgrounds,
ugly/unusable in dark mode

Let me know what you prefer.

-

> On 11/12/24 12:38 PM, Alex Cabal wrote:
>> transparency, so it won't appear as a white rectangle in ereaders
>> whose backgrounds are not pure white (which is most of them).

Aside: several epub reader apps I've tried will change the white
background to match the background color in those cases, so you don't
see the white rectangle. So even that is not standardized. :)

Alex Cabal

unread,
Nov 16, 2024, 2:57:33 PM11/16/24
to standar...@googlegroups.com
The solution for transparent PNGs that wouldn't invert well is
https://standardebooks.org/manual/1.8.1/single-page#7.8.4.1

I don't think I mentioned that earlier.

By default in our ebooks, nothing is inverted. If you add
`se:color-depth.black-on-transparent`, then the image will be inverted
in night mode...

But, for images that don't look good inverted, use both
`se:color-depth.black-on-transparent se:image.style.realistic` which
will 1) semantically indicate that the image is black-on-transparent but
also in a "realistic" style not suitable for inversion, and 2) in our
default CSS, it will not be inverted but instead will get a background
color in dark mode.

This will not really work in a web browser because the way ereaders and
browsers do dark mode is mutually incompatible. But you can try it on a
tablet ereader like iBooks to make sure it works.

Does that address your concerns?

Alex Cabal

unread,
Nov 16, 2024, 3:06:20 PM11/16/24
to standar...@googlegroups.com
Now that I'm reminded of that solution, that could mean that we might
just be able to convert every image to SVG. What do you think?

Gijs van Tulder

unread,
Nov 17, 2024, 3:11:42 PM11/17/24
to standar...@googlegroups.com
On 16-11-2024 20:57, Alex Cabal wrote:
> The solution for transparent PNGs that wouldn't invert well is https://
> standardebooks.org/manual/1.8.1/single-page#7.8.4.1

Yes, I came across se:color-depth.black-on-transparent and
se:image.style.realistic. I have tried them, and they don't solve the
problem entirely. It depends on what the ereader app does to with the
white CSS background. The Apple Books app doesn't like it, some others do.

> But you can try it on a
> tablet ereader like iBooks to make sure it works.

Yes, I tried, and it doesn't work. The problem seems to be that the
default "Books" app from Apple ignores CSS background colors in dark
mode. I tried setting a red background color: this shows up in the
standard light mode, but reverts to a black background in dark mode.

Similar for the white background from se:image.style.realistic: it
doesn't show up, so the black-on-transparent images become invisible as
soon as you enable dark mode.

Someone mentions a similar problem here:
https://discussions.apple.com/thread/254834336

The Apple Books Asset Guide doesn't discuss background colors
specifically, but it does warn against images with black text on
transparent backgrounds, so I don't think it's a bug:
https://help.apple.com/itc/booksassetguide/en.lproj/static.html#itca71ad3c33

A Google search shows up other warnings about epub transparencies (some
apparently add a white background when converting the images with
transparency), but I haven't tested those.

> Does that address your concerns?

Not really, as you see. From the options I've tried so far, the only way
to reliably have a white background is to hard-code it in the image.

So I think the choice is really between something that has the highest
chance of working reasonably well everywhere (image with white
background) and something that looks a bit nicer in some cases but
doesn't work at all in some others (black on transparent background with
CSS to make it white). I don't know what's best.

Alex Cabal

unread,
Nov 17, 2024, 4:15:53 PM11/17/24
to standar...@googlegroups.com
You should still follow this pattern because during build, the
compatible epub has all SVGs converted to PNG anyway, and additional
compatibility CSS to make the background white and not `currentColor`.

This pattern is actually already in wide use in the corpus. See for
example Through the Looking Glass. You can compare the advanced (raw
source) epub vs the compatible epub to see the difference.

So since I was reminded of that (it's not every day we do SVGs) I think
we should continue to convert all the images to SVG and use the manual
prescriptions to style them.

Gijs van Tulder

unread,
Nov 17, 2024, 7:06:11 PM11/17/24
to standar...@googlegroups.com
On Sun, 17 Nov 2024 at 22:15, Alex Cabal <al...@standardebooks.org> wrote:
> You should still follow this pattern because during build, the
> compatible epub has all SVGs converted to PNG anyway, and additional
> compatibility CSS to make the background white and not `currentColor`.

Good. I have changed the images to either SVG or transparent PNG now, using the se:image.color-depth.black-on-transparent and se:image.style.realistic annotations.

There are a few large, rectangular full-page images that were already JPG and still are. They're not suitable for conversion to SVG, but I could convert those to transparent PNGs if preferred, to treat them the same as the other images.

From a previous email:
> There is no hard rule on what is or is not suitable, it's a matter of
> eye. Smaller images with little shading are the most suitable.

Good. I've now converted those images where the SVG tracing didn't introduce artifacts. The resolution of the source images is fairly low (much lower for instance than those for Through the Looking-Glass), so there are probably a few images that might look simple, but still have enough shading/detailed lines to mess up the SVG conversion. Where the SVG didn't look good, I left the images as (transparent) grayscale PNGs.



> This pattern is actually already in wide use in the corpus. See for
> example Through the Looking Glass. You can compare the advanced (raw
> source) epub vs the compatible epub to see the difference.

For what it's worth, the compatibility epub looks like this for me on iPad (both in Apple Books and in the Cantook app):
IMG_0581.pngIMG_0582.png

Alex Cabal

unread,
Nov 19, 2024, 4:37:01 PM11/19/24
to standar...@googlegroups.com
OK. I've converted a lot more of these to SVG.

I'm looking through the CSS and I see you used counter functions for
appendix G. Those will most definitely not be supported by almost any
ereader, and we don't have a compatibility solution during build. Can
you remove the counter functions?

On 11/17/24 6:05 PM, Gijs van Tulder wrote:
> On Sun, 17 Nov 2024 at 22:15, Alex Cabal <al...@standardebooks.org
> <mailto:al...@standardebooks.org>> wrote:
> > You should still follow this pattern because during build, the
> > compatible epub has all SVGs converted to PNG anyway, and additional
> > compatibility CSS to make the background white and not `currentColor`.
>
> Good. I have changed the images to either SVG or transparent PNG now,
> using the se:image.color-depth.black-on-transparent and
> se:image.style.realistic annotations.
>
> There are a few large, rectangular full-page images that were already
> JPG and still are. They're not suitable for conversion to SVG, but I
> could convert those to transparent PNGs if preferred, to treat them the
> same as the other images.
>
> From a previous email:
> > There is no hard rule on what is or is not suitable, it's a matter of
> > eye. Smaller images with little shading are the most suitable.
>
> Good. I've now converted those images where the SVG tracing didn't
> introduce artifacts. The resolution of the source images is fairly low
> (much lower for instance than those for Through the Looking-Glass), so
> there are probably a few images that might look simple, but still have
> enough shading/detailed lines to mess up the SVG conversion. Where the
> SVG didn't look good, I left the images as (transparent) grayscale PNGs.
>
> Latest version:
> https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69 <https://github.com/gvtulder/edward-whymper_scrambles-amongst-the-alps-in-the-years-1860-69>
>
>
>
>
> > This pattern is actually already in wide use in the corpus. See for
> > example Through the Looking Glass. You can compare the advanced (raw
> > source) epub vs the compatible epub to see the difference.
>
> For what it's worth, the compatibility epub looks like this for me on
> iPad (both in Apple Books and in the Cantook app):
> IMG_0581.pngIMG_0582.png
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion visit
> https://groups.google.com/d/msgid/standardebooks/CAJJ6rSeNKM3Zd%3DcF%2BVdmm7PppO1pvBKFCbmHYwN_LnQrAfyryA%40mail.gmail.com <https://groups.google.com/d/msgid/standardebooks/CAJJ6rSeNKM3Zd%3DcF%2BVdmm7PppO1pvBKFCbmHYwN_LnQrAfyryA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Gijs van Tulder

unread,
Nov 19, 2024, 6:37:10 PM11/19/24
to standar...@googlegroups.com
> I'm looking through the CSS and I see you used counter functions for
> appendix G. Those will most definitely not be supported by almost any
> ereader, and we don't have a compatibility solution during build. Can
> you remove the counter functions?

Done. I've replaced the complex list structure with paragraphs with
hard-coded numbers that match how the paragraphs are numbered in the book.

Alex Cabal

unread,
Nov 19, 2024, 10:12:15 PM11/19/24
to standar...@googlegroups.com
OK, I think we're all done! Congratulations on completing this very
advanced and difficult book, you have climbed the Matterhorn of ebooks.
Very good work all around. I've released it, but it will probably take
some hours to build. Thanks for your hard work!

Gijs van Tulder

unread,
Nov 20, 2024, 4:24:32 AM11/20/24
to standar...@googlegroups.com
Excellent, thanks. Thanks also to Vince and David for the reviewing!
Reply all
Reply to author
Forward
0 new messages