Hyphenation, now in Standard Ebooks (even Kindle!)

1,554 views
Skip to first unread message

Alex Cabal

unread,
Feb 18, 2016, 1:41:42 AM2/18/16
to Standard Ebooks
A perennial problem in today's ereading systems is hyphenation of words between two lines.

Some ereaders do it fairly well; Nook, despite being a generally terrible ereader, has had automatic hyphenation for quite some time.  Ereading apps like FBReader also do hyphenation fairly well.

Kindle, the 800 pound gorilla of ereaders, didn't do it at all until very recently with their "enhanced typography" update.  But unfortunately for us ebook producers, Kindle's enhanced typography isn't automatic; that is, Kindle doesn't use a built-in hyphenation dictionary to hyphenate ebooks on-the-fly.  While this would be the obvious approach to things, resulting in all of Amazon's back catalog being instantly hyphen-compatible and making life easy for ebook developers, as we all know Amazon's more interested in milking the ebook cow than it is putting any kind of effort into the reading or development experiences.

The good news is that it's possible to take advantage of Kindle's new hyphenation support as an independent ebook developer.  While Amazon, intent on keeping developers miserable, declined to explain how their new hyphenation engine works, it actually appears to just be support for the Unicode soft hyphen character (U+00AD).  If Kindle encounters a soft hyphen near a line break, it considers it a hyphenation opportunity.

That means all ebook developers have to do is manually insert soft hyphens into all of their ebooks, by hand, at every possible word break point.  How convenient.  Thanks Amazon!

Lucky for us there's a Python library for hyphenation, Hyphenator, that we can take advantage of.  I've put together a new script for the Standard Ebooks production tools, hyphenate, that uses the Hyphenator library to place a soft hyphen at every syllable break for every word in an XHTML file.  Do this for every XHTML file in an epub, and convert it to azw3 with Calibre's ebook-convert, and all of a sudden you have a hyphenated Kindle ebook.  And it looks pretty good, too!

We've integrated hyphenate in to our build process and left soft hyphens in our compatible epub2 files, so now every Standard Ebook, including Kindle files, should support hyphenation whenever the reading system does.  As part of this update, we've moved on from generating .mobi files to generating Amazon's newest .azw3 format.  Our entire catalog has been updated with these features.

It's just another little detail that we think true book lovers deserve in their ebooks.

Ian

unread,
Jul 30, 2020, 7:08:24 PM7/30/20
to Standard Ebooks
While reading Middlemarch, I found that the frequency of the hyphenation made the reading experience much more difficult. Please see the attached example where 5 lines in a row are hyphenated.

IMG_3225.jpg



There doesn't seem to be any way to disable this in the kindle (for sideloaded ebooks at least) and I eventually just disabled auto-hyphenation in the azw3 using a calibre plugin, but I just wanted to check if this is working as expected?

Kind regards,

Ian

Alex Cabal

unread,
Jul 30, 2020, 7:10:33 PM7/30/20
to standar...@googlegroups.com
You can probably improve that by using left-aligned text instead of
justified. As usual, how the ereader decides to hyphenate is largely out
of our control. We add soft hyphens for Kindle, but those are only hints
and ultimately it's up to the ereader to apply a nice hyphenation algorithm.

On 7/30/20 6:08 PM, Ian wrote:
> While reading Middlemarch, I found that the frequency of the hyphenation
> made the reading experience much more difficult. Please see the attached
> example where 5 lines in a row are hyphenated.
>
> <https://github.com/standardebooks/tools/blob/master/hyphenate>,
> that uses the Hyphenator library to place a soft hyphen at every
> syllable break for every word in an XHTML file.  Do this for every
> XHTML file in an epub, and convert it to azw3 with Calibre's
> ebook-convert, and all of a sudden you have a hyphenated Kindle
> ebook.  And it looks pretty good, too!
>
> We've integrated hyphenate in to our build process and left soft
> hyphens in our compatible epub2 files, so now every Standard Ebook,
> including Kindle files, should support hyphenation whenever the
> reading system does.  As part of this update, we've moved on from
> generating .mobi files to generating Amazon's newest .azw3 format. 
> Our entire catalog has been updated with these features.
>
> It's just another little detail that we think true book lovers
> deserve in their ebooks.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/b90af34d-64ed-432a-9111-a54156c21131o%40googlegroups.com
> <https://groups.google.com/d/msgid/standardebooks/b90af34d-64ed-432a-9111-a54156c21131o%40googlegroups.com?utm_medium=email&utm_source=footer>.

Ian

unread,
Jul 30, 2020, 7:16:13 PM7/30/20
to Standard Ebooks
As I suspected, thanks. The option to switch to left-aligned is greyed out for all books not purchased from Amazon.

Ian

Alex Cabal

unread,
Jul 30, 2020, 7:16:57 PM7/30/20
to standar...@googlegroups.com
Time to find a better ereader :)

Vince

unread,
Jul 30, 2020, 7:29:05 PM7/30/20
to Standard Ebooks
Did you load the azw3 file directly, or convert it to mobi first? I don’t have a Kindle, and I may be mis-remembering, but I want to say someone else ran into this in the past and it might be a file format thing, i.e. mobi doesn’t support it.

If you loaded the azw3 directly, then I’m mis-remembering. :)

Ian

unread,
Jul 30, 2020, 7:44:37 PM7/30/20
to Standard Ebooks
This was azw3 transferred by USB, but I've also checked examples of mobi transferred by email, which also don't have the option to left align (but do not have this auto-hyphenation issue).

I'll stick with the calibre plugin to remove the hyphenation in future, some people may prefer the hyphenation but there were too many for me.

Regarding a better e-reader, I've been keeping an eye out for a good deal on a Kobo Forma, but aside from these reasonably small issues, the paperwhite has served me well.

Matt Chan

unread,
Jul 30, 2020, 8:02:14 PM7/30/20
to standar...@googlegroups.com
There's an option in Calibre when converting from epub to mobi to change the justification! That's what I do anyway 

--
You received this message because you are subscribed to the Google Groups "Standard Ebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standardebook...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/standardebooks/43cb51ab-b5f0-49ea-8d3d-936b2405ec06o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages