text metadata

Francisco Treacy

unread,

Sep 12, 2010, 8:48:18 AM9/12/10

to zhook...@googlegroups.com

Bringing our discussion from Twitter, our problem is how to deal with
text metadata such as footnotes and glossaries.

We are already displaying those as contextual help bubbles, with the
help of a microformat and some javascript.

In the case of footnotes, although not ideal, its content can be
inlined because it appears once. Not for glossaries though - it works
like a dictionary (for a certain term, you need to fetch values
multiple times).

Ideally we would want to have <article>s with glossaries or footnotes
listing. Is there a way of making them non-linear? How do we
differentiate them from a normal chapter, to tell the reading system
it's metadata that might not be displayed?

Generalizing that question, how can data common to the whole book
-like a header or footer- be included in zhook?

--
Francisco Treacy
http://widescript.com

Joseph Pearson

unread,

Sep 19, 2010, 8:47:56 PM9/19/10

to zhook...@googlegroups.com

I think this is an interesting question about the division of
responsibilities between the format and the reading systems. The view
implied in the Zhook spec is that the reading system has greater
responsibilities, and the format has less. This is one of the
important ways that Zhook differs from EPUB.

So, a book designer should not be worrying so much "how will this data
display?" as considering "what mark-up best describes this data?" That
is, what is the most concise and meaningful way to mark-up this
content?

The question of linearity highlights this. As you say, some content is
pan-textual, but isn't metadata. EPUB has a concept of spine elements
with an attribute of linear="no" for these — the cover, the table of
contents, list of illustrations, glossary, etc.

It's worth observing that EPUB reading system support for this
attribute is lukewarm at best.

What I'd suggest is that different reading systems may reasonably have
different approaches to linearity. Some basic reading systems will
just present everything linearly anyway. Others may have special
support for displaying a table of contents — therefore, if they see a
table of contents in the DOM, they'll remove it in order to display it
their own way. Same goes for covers and glossaries.

Rather than saying "this is non-linear", say "this is a glossary" and
let the reading system decide what to do with it.

Now, how you say "this is a glossary" is a good question. I think you
find or invent a microformat for that. What are you doing at the
moment? If you like, put up a simple microformat proposal for
glossaries and we can evolve it on this list. Once there's consensus —
and given the present traffic on this list, that won't take long — we
can put it up as a Page here
(http://groups.google.com/group/zhook-spec/web) and continue to update
it.

I think these microformat specs are ancillary to the Zhook spec
itself, can be a lot more fluid and bend to the will of common
implementations. At the end of the day, if the book designer describes
the content of the book using fairly conventional and complete
mark-up, they can have confidence that good reading systems will
display it as well as they can. This approach to ebook design is a lot
more future-compatible that present approaches, I think.

I'm not clear on your generalisation of the question though. I think
that headers and footers (at least in the old printed-book sense of
running heads) are just metadata, which is different from pan-textual
content.

But as I think you're suggesting, HTML5 offers <header> and <footer>
elements. Given that different kinds of this content typically occupy
a particular position at the start or end of a linear representation
of a book (ie, a printed book), you could use these elements inside
<article> elements as part of the microformat. Eg:

<article>
<footer class="glossary">
<h1>Glossary</h1>
<dl>
...
</dl>
</footer>
</article>

That is, since a glossary typically appears at the end of a book, you
could wrap it in a footer element. Then again, that's a fairly
print-centric idea. The article element is useful for
componentisation, but seems to be doubling up with the footer element
here. Maybe <article class="glossary"> is more succinct and useful.
Maybe HTML5 offers a less ambiguous attribute than class?

What do you think?

- J

--
Joseph Pearson | software inventor | inventivelabs.com.au | +61394163198

Francisco Treacy

unread,

Sep 22, 2010, 9:46:30 AM9/22/10

to zhook...@googlegroups.com

Joseph,

Thank you, your answer is spot-on.

Some thoughts and questions inline:

> Rather than saying "this is non-linear", say "this is a glossary" and
> let the reading system decide what to do with it.

Totally.

> I think these microformat specs are ancillary to the Zhook spec
> itself, can be a lot more fluid and bend to the will of common
> implementations. At the end of the day, if the book designer describes
> the content of the book using fairly conventional and complete
> mark-up, they can have confidence that good reading systems will
> display it as well as they can. This approach to ebook design is a lot
> more future-compatible that present approaches, I think.

I know what you mean and I agree with that approach, however, I don't
see where to draw the line between spec and ancillary microformats. It
is clear that the TOC can be derived from structure - and that is
fabulous. But, for example, why should the cover be part of the spec
and the glossary not? Will the microformats effectively be something
more than a mere 'recommendation'?

Epub is a can of worms, our reading system is currently "guessing" a
crazy amount of things. Try this, otherwise that, else with that. I
certainly don't want to be "guessing" how a certain publisher has
defined glossaries in its zhooks. So I am wondering how to keep a
standard as clean and concise as possible without leading to confusion
when it comes to authoring.

> I'm not clear on your generalisation of the question though. I think
> that headers and footers (at least in the old printed-book sense of
> running heads) are just metadata, which is different from pan-textual
> content.

Nevermind for now. I actually had a false need - but I'll shout if it
eventually becomes relevant again.

> simple microformat proposal

Terms are defined in any number of definition lists that are direct
descendants of an article with class "glossary"...
"//article[@class='glossary']/dl".

How does that sound?

Francisco

Joseph Pearson

unread,

Sep 24, 2010, 4:15:05 AM9/24/10

to zhook...@googlegroups.com

You're right, it's been niggling me too: the difference between the
core spec and ancillary conventions is a tricky one. On one hand, we
want to keep the door open to reading system innovation, and avoid the
spec cruft that comes with any attempt at a completist format. On the
other hand, like you say, there has to be some solid ground for book
designers to know what to do and reading system developers to know
what to expect.

Here's the bit of the spec I think we should review:

> The RS should document what HTML microformats it supports for interaction
> (such as footnotes, maps, table navigation, image zooming, video, etc).
> The RS may use a documented microformat for the Table of Contents, if it
> identifies it in the Index. Otherwise, it should derive the Table of
> Contents from HTML elements, using the practical algorithm for HTML5[1].

In its place, how about we provide a set of links to "convention
documents" that describe a particular type of information and how it
should be marked up so that it is recognised by conforming Reading
Systems?

Candidates for these conventions include:

* Cover
* Table of Contents
* Table of Figures/Illustrations/Tables/etc
* Glossary
* Endnotes/Footnotes
* Index (although I have reservations about this)
* Acknowledgements
* etc — it's open-ended

Maybe we can order these links from most-strongly to least-strongly
recommended. A final link in the list could track emerging and
experimental conventions. The stipulation would be that Reading
Systems must indicate which of the listed conventions they support.

It would be reasonable for any Reading System to support none of the
listed conventions. Similarly a conforming Zhook need not include any
of the listed conventions.

The cover question is an interesting one. Having a particular image
file at a particular location in the archive is very useful —
certainly way easier than with EPUB, as I expect you've noticed. But
PNG is not always the ideal format. And we've been toying with ideas
around animated covers using CSS3 or SVG, which don't at present fit
into the Zhook spec. So to kind of reverse what you're saying, I think
we should move the stringent cover requirements into a slightly more
flexible convention. That's possibly a discussion for a separate
thread.

In terms of the proposed glossary convention, I like the simplicity.
My concern is just that it doesn't leave a lot of opportunity for the
sorts of markup that designers like to use for presentational
purposes. As much as we like semantic purity, we have to bow to the
possibility that someone might want to put the <dl> in a surrounding
<div>, for instance. Maybe some additional classnames are required?
Maybe just "glossary:list" on the <dl>?

- J

Francisco Treacy

unread,

Sep 26, 2010, 5:58:40 PM9/26/10

to zhook...@googlegroups.com

On Fri, Sep 24, 2010 at 10:15 AM, Joseph Pearson
<jos...@inventivelabs.com.au> wrote:
> Here's the bit of the spec I think we should review:
>
>> The RS should document what HTML microformats it supports for interaction
>> (such as footnotes, maps, table navigation, image zooming, video, etc).
>> The RS may use a documented microformat for the Table of Contents, if it
>> identifies it in the Index. Otherwise, it should derive the Table of
>> Contents from HTML elements, using the practical algorithm for HTML5[1].
>
> In its place, how about we provide a set of links to "convention
> documents" that describe a particular type of information and how it
> should be marked up so that it is recognised by conforming Reading
> Systems?

Yea. I agree. So then the zhook spec will describe what is set in
stone. The rest will be pointers to optional features that may or may
not be supported by RS or e-books, in the form of "convention
documents" as you suggest.

So long as these are kept very simple yet flexible, and of utmost
importance with no ambiguity whatsoever. Like Don't Repeat Yourself.

> Candidates for these conventions include:
>
> * Cover
> * Table of Contents
> * Table of Figures/Illustrations/Tables/etc
> * Glossary
> * Endnotes/Footnotes
> * Index (although I have reservations about this)
> * Acknowledgements
> * etc — it's open-ended
>
> Maybe we can order these links from most-strongly to least-strongly
> recommended. A final link in the list could track emerging and
> experimental conventions. The stipulation would be that Reading
> Systems must indicate which of the listed conventions they support.

Sure. We could start with two or three as our needs arise. At the
moment we only have interest in Cover, Table of Contents, Table of
Figures and Glossary.

Should we keep these in gists or in some wiki-like groups pages? Let
me know how can I help.

> It would be reasonable for any Reading System to support none of the
> listed conventions. Similarly a conforming Zhook need not include any
> of the listed conventions.

Exactly.

> The cover question is an interesting one. Having a particular image
> file at a particular location in the archive is very useful —
> certainly way easier than with EPUB, as I expect you've noticed.

I did, I did!

> But PNG is not always the ideal format. And we've been toying with ideas
> around animated covers using CSS3 or SVG, which don't at present fit
> into the Zhook spec. So to kind of reverse what you're saying, I think
> we should move the stringent cover requirements into a slightly more
> flexible convention. That's possibly a discussion for a separate
> thread.

I don't know what to say, for the moment we are quite good to go with
PNG. But I get your point. How about drafting this in the Cover
Convention?

>
> In terms of the proposed glossary convention, I like the simplicity.
> My concern is just that it doesn't leave a lot of opportunity for the
> sorts of markup that designers like to use for presentational
> purposes. As much as we like semantic purity, we have to bow to the
> possibility that someone might want to put the <dl> in a surrounding
> <div>, for instance. Maybe some additional classnames are required?
> Maybe just "glossary:list" on the <dl>?

Perfect, let's go for that.

Francisco

Joseph Pearson

unread,

Sep 27, 2010, 1:35:52 AM9/27/10

to zhook...@googlegroups.com

I was thinking Gists — easy to fork and version them while we're
discussing them. Virginia's preparing one for footnotes, I believe.
And I'll do one for covers when I have a moment, trying to keep the
simplicity of "cover.png" but allow some more advanced stuff. The
Glossary doc I'll leave in your hands. :)

Agree with DRY and unambiguous, would add "short as possible" and
"consistent with each other". After we have one or two, hopefully a
pattern will emerge that we can follow for the others.

Cheers,

- J

Francisco Treacy

unread,

Oct 8, 2010, 6:05:21 AM10/8/10

to zhook...@googlegroups.com

Sorry, I got sidetracked.

Were you able to put something together? I'd love to see an example
for inspiration to draft the glossary recommendation - that can be
done real quick.

By the way, gists are fine but they are ultimately a git repo. Why not
create a proper repository (under your name if you want, or maybe we
can create an organization 'zhook').

Then people can start forking e.g. http://github.com/zhook/zhook-spec
but that one will be the blessed (reference) repo.

What do you think? How should we proceed?

Cheers,

Francisco

On Mon, Sep 27, 2010 at 7:35 AM, Joseph Pearson

Joseph Pearson

unread,

Oct 9, 2010, 11:46:42 AM10/9/10

to zhook...@googlegroups.com

That's a great idea — I'll do that as soon as I get some clear time.

Cheers,

- J

Francisco Treacy

unread,

Nov 17, 2010, 5:19:13 AM11/17/10

to zhook...@googlegroups.com

On Sat, Oct 9, 2010 at 5:46 PM, Joseph Pearson
<jos...@inventivelabs.com.au> wrote:
> That's a great idea — I'll do that as soon as I get some clear time.

I've done that for us (created an Organization on Github):
https://github.com/zhook

I have no particular interest in being an owner. I will add you too.

Gist's history has been merged in, and I created a *very* quick
initial suggestion for glossaries at conventions/glossary.md.

If you have some time, let me know what you think, and whether you
have other conventions to contribute.

Francisco

ps: would it make sense to keep on discussing on this list, or through
the "issues" functionality on the Github project?

Reply all

Reply to author

Forward