Manifests or IIIF collections for Wikimedia categories ?

93 views
Skip to first unread message

James Heald

unread,
Sep 23, 2020, 5:28:52 AM9/23/20
to iiif-d...@googlegroups.com
In the IIIF and Wikimedia zoom call on Monday, Kat Thornton mentioned
how ScienceStories creates a manifest from a WikiCommons category, eg
http://www.sciencestories.io/Q34091?moment=2
from
https://commons.wikimedia.org/wiki/Category:Mae_Jemison

and I think somebody asked "but should that be a manifest or a collection?"

Asking just as a private individual (with no affiliation to the WM-SE
GLAM liason team, or the WMF platform/API team), if MediaWiki were
perhaps to have a sidebar link on each article page or category, to
download an IIIF object for the media associated with that page or
category, what would be the pros and cons of that object being a IIIF
collection or it being a IIIF manifest ?

Let's assume that code would exist to create a manifest for each
individual image, that would pull together information from its Commons
file-description page, from any associated statements in the new
Structured Data for Commons system, and any relevant information from
any Wikidata item for whatever underlying thing the image represents.
This manifest for the image would presumably be generated or
re-generated as needed, and would be potentially cached until any change
was registered in the underlying data. The manifest could be available
eg from a sidebar link on the file-description page (and would have a
predictable URL that could be linked from the Structured Data query system).

Additionally, for a particular work such as a book or a manuscript, one
might want to create a multi-image manifest choosing a single 'best'
image for each page, that could be linked to separately from the
Wikidata item for the work. This might not correspond 1:1 with the
contents of the category, because categories may also contain eg
duplicate images, crops and other derivatives, or any manner of random
other stuff.


So: when it comes to a category like
https://commons.wikimedia.org/wiki/Category:Mae_Jemison
would the general preference be for a single manifest, or for a IIIF
collection linking to the individual manifests for each image?

As I understand it, because the category does not (necessarily)
represent a single work, that might tip the scale towards going for a
IIIF collection.

But on the other hand, from a practical point of view, does one want
one's application to have to load in the collection, and then have to
request a huge further number of separate different mini-files, one for
each manifest for each individual image in the collection?

Or, perhaps, there is a way to bundle a IIIF collection together with
the manifests in it (or, perhaps, the first 'n' of them), so that only
one request is needed?


As a second thing, most typical Commons categories might contain perhaps
between 5 and 300 images. But there are some maintenance and catch-all
categories which can get really rather big - eg the catch-all category
to gather together all images from the DPLA,
Category:Media contributed by the Digital Public Library of America
currently contains 734,000 files.

The MediaWiki APIs protect themselves from being crushed by categories
like this by only giving information on 500 items at a time. (It may be
possible to up this to 5000 or 20000, but certainly not 734,000).
Information on the next 500 then needs a continuation request.

Has IIIF developed preferred ways to deal with VeryBig(tm) flat
collections, eg by paging through parts of them, so that an application
doesn't try to acquire the whole thing unless it really really really
needs to?

As I said, I'm not affiliated with the official WM teams working on
this, this was just something that somebody referred to in passing in
the call, that I was curious to know more about -- and I hope this was
an appropriate forum to bring such a question to.

Thanks very much,

James Heald.

Andrew Hankinson

unread,
Sep 23, 2020, 6:37:55 AM9/23/20
to iiif-d...@googlegroups.com
Hi James,

My initial impression from your description is that it should be a Collection. To speak to your questions:

> But on the other hand, from a practical point of view, does one want one's application to have to load in the collection, and then have to request a huge further number of separate different mini-files, one for each manifest for each individual image in the collection?

I think the idea would be that your users would be presented with a list of potential things in the collection, and it's only when they choose the thing that the request is sent off to fetch the manifest. You can place a label and thumbnail for each manifest in your Collection so that users can have identifying information shown to allow them to make that selection. So unless I'm missing something, you would not necessarily need to automatically request any further files after loading the collection, unless you wanted to display something that wasn't available in the Collection.

> Or, perhaps, there is a way to bundle a IIIF collection together with the manifests in it (or, perhaps, the first 'n' of them), so that only one request is needed?

The spec forbids this: "...Manifests must not be embedded within Collections.": https://iiif.io/api/presentation/3.0/#51-collection

> Has IIIF developed preferred ways to deal with VeryBig(tm) flat collections, eg by paging through parts of them, so that an application doesn't try to acquire the whole thing unless it really really really needs to?

For "Very Big" collections, the current recommendation is that you chunk them up into reasonable-sized sub-collections. If these are arbitrarily divided (i.e., not thematically grouped) you will have to get inventive with your ID URI for each sub-collection, e.g., "http://example.com/collections/dpla/1-500/", "/collections/dpla/501-1000/", etc. Depending on how your back-end works with sorting these may change in composition from request to request...

Collection pagination was removed in v3. See the discussion here for more information about dealing with pagination and very large collections: https://github.com/IIIF/api/issues/1343

-Andrew
> --
> -- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> --- You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/iiif-discuss/f97bf0c0-aae1-0b05-3a99-a143959ec633%40gmail.com.

Robert Sanderson

unread,
Sep 23, 2020, 8:32:32 AM9/23/20
to iiif-d...@googlegroups.com
Dear all,

I agree with Andrew -- generally speaking categories should be Collections, as they have the same purpose of providing navigation and grouping together similar items.
The Mae Jemison category seems like a special case of a category/collection with only a single Manifest (Mae Jemison, with 22 canvases) in it. A more typical collection might be https://commons.wikimedia.org/wiki/Category:Astronaut_Hall_of_Fame_inductees  with a Manifest for each astronaut.

Otherwise, as Andrew says, you can nest collections to provide either arbitrary or thematic segmentation, but cannot embed entire Manifests within a Collection.

Rob




--
Rob Sanderson
Director for Cultural Heritage Metadata
Yale University

Stefano Cossu

unread,
Sep 23, 2020, 2:12:41 PM9/23/20
to iiif-d...@googlegroups.com, Robert Sanderson
I also lean toward representing that kind of category as a Collection,
but I also think that this should not be a one-size-fits-all preference.
It seems to me that Manifests and Collections are functionally quite
interchangeable when it comes to representing hierarchies, so an
institution can decide what a Collection and what a Manifest represent
to most conveniently fit its own content model.

The visual material and metadata about Mae Jemison would make sense as a
Manifest because Jemison is a "real world" entity rather than a
cataloging or organizational structure. The key here is tying that
manifest to the person's record (and reflecting metadata accordingly).
You may have multiple curated collections and categories containing this
manifest as Rob mentions; and you can have some images, e.g.
https://commons.wikimedia.org/wiki/File:STS-47_crew_in_SLJ_make_notes_during_shift_changeover.jpg
appear in both in Jamison's manifest and in another one about a specific
mission, vessel, person, team, etc. In this case, which is becoming very
frequent at the Getty, we are considering processing images separately
as independent and dereferenceable canvases rather than embedding them
in manifests, in order to save storage and processing, especially if we
plan to add considerable metadata to the canvases.

As for representing large collections, it seems that the Commons UI
already does a good job at separating them by initials and subcategory,
e.g. https://commons.wikimedia.org/wiki/Category:United_States_Army ;
could those become IIIF Collections?

I should add that so far I haven't seen a major IIIF viewer that handles
Collections flawlessly, or at all, so that might be a limitation for
some (I haven't tested Mirador 3 yet and I'd love to be corrected on this).

Stefano
> <mailto:iiif-d...@googlegroups.com>. To unsubscribe from this
> group, send email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>. For more
> options, visit this group at
> https://groups.google.com/d/forum/iiif-discuss?hl=en
> > --- You received this message because you are subscribed to the
> Google Groups "IIIF Discuss" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/iiif-discuss/f97bf0c0-aae1-0b05-3a99-a143959ec633%40gmail.com.
>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com
> <mailto:iiif-d...@googlegroups.com>. To unsubscribe from this
> group, send email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>. For more
> options, visit this group at
> https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iiif-discuss/87BE09A4-2713-4197-97D2-C00B28375CE5%40gmail.com.
>
>
>
> --
> Rob Sanderson
> Director for Cultural Heritage Metadata
> Yale University
>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com
> <mailto:iiif-discuss...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iiif-discuss/CABevsUHhJraQkLikU6c_4Kz59ZFUmLyLjSnZUiERE_9fiKMLrQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/iiif-discuss/CABevsUHhJraQkLikU6c_4Kz59ZFUmLyLjSnZUiERE_9fiKMLrQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> *CAUTION: This email originated from outside of the Getty. Do not click
> links or open attachments unless you verify the sender and know the
> content is safe.*
>
>

--
Stefano Cossu
Software Architect
J. Paul Getty Trust

James Heald

unread,
Sep 24, 2020, 2:53:50 PM9/24/20
to iiif-d...@googlegroups.com
Thank you so much Andrew, Rob, and Stefano. These were really really
useful thoughts and pointers.

I've a couple of things I'd like to pick up, but first a couple of
really basic questions, so I can get a sense of the feel of ways people
are using IIIF collections:

* What are the options currently for viewers/browsers that let one
explore collections? Mirador doesn't seem to acknowledge them at all.
Universal Viewer seems to jump to a particular manifest, with no
apparent sense of the rest of the collection, or apparent option to go
up a level. Have I just not found the functionality, or are there other
viewer/browsers that show more of what is possible?

* Secondly, do people have some examples of different IIIF collection
hierarchies, that show different approaches to how people may be
choosing to structure their IIIF offering for explorability, and/or
different approaches to dealing with "very big" subsets within that
overall content?

Some of the collection URLs linked from
https://github.com/IIIF/api/issues/1343 ("Deprecate the paging model")
still work I think, but I would be interested to know if there are other
ones that people would recommend taking a look at.

Thanks again,

James.

Andrew Hankinson

unread,
Sep 24, 2020, 4:08:42 PM9/24/20
to iiif-d...@googlegroups.com
Hi James,

The absolute best collections browser (IMO) is the Jalava viewer from the University of Durham:

https://iiif.durham.ac.uk/jalava/universe.html#https%3A%2F%2Fiiif.durham.ac.uk%2Fiiif-universe.json

Browsing through the collections there would give you a feel for how individual institutions are structuring their browseable collections. This uses a listing from an "IIIF Universe" repo which is itself a IIIF Collection that simply points to a top-level collection for each institution.

See:

https://github.com/ryanfb/iiif-universe/blob/gh-pages/iiif-universe.json
https://github.com/durham-university/iiif-universe/blob/gh-pages/iiif-universe.json

-Andrew
> --
> -- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> --- You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/iiif-discuss/aeab08a2-be34-4db4-3f39-ece2fbb2d990%40gmail.com.

HIGGINS, RICHARD I.

unread,
Sep 24, 2020, 4:15:22 PM9/24/20
to iiif-d...@googlegroups.com
Hello:
We have a collection viewer which helps show some of the strengths and weaknesses of this approach to IIIF. It is still written around the Presentation 2 api - you can see it working from our top level collection at https://iiif.durham.ac.uk/jalava/ or using a collection of institutional collections at https://iiif.durham.ac.uk/jalava/universe.html It simply takes the content of the collection manifests and displays either text, links down the hierarchy or images as you expand it.

Strengths are that it is a very simple design as you need very little to traverse the collection tree and display text, images or links deeper down into the tree. As we publish new collections or manifests they just appear within the hierarchy.

Weaknesses are that often within published collection trees there are inconsistencies or errors. Unless you have very robust error handling (which relies to some extent upon being able to predict the errors) you will quite often run into an error which will affect the performance of the viewer. Some published IIIF collections just do not seem to work. Browsing a large quantity of IIIF content always relies upon pulling in large numbers of thumbnails: response from each image server will vary so you don't always see results as quickly as you would expect. In general the information that you want is a layer deeper into the collection tree than your browser is viewing, which means that what can be displayed will often only be understood by someone who already knows the collections. This makes it very useful for me, but probably confusing to you. We can currently only include a manifest in a single collection, which limits how much guidance can be given, but if you look at the Wellcome, for example, in the universe tree you can browse by sub-collections of topic, collection, genre or author.

It would be possible to send dynamic collections created on the fly to this viewer, which might make it useful as the end point of a search system. Paging is built into the viewer rather than the IIIF: it breaks large results up into pages of 200. You can see this in the Museum objects Egyptological section (there are some images here that do not zoom yet - these are thumbnail placeholders as we are currently loading the full versions).

I'm happy to try to answer more specific questions, feel free to ask ...
Best regards
Richard
--
# Richard Higgins
# E-Mail: r.i.h...@durham.ac.uk


From: iiif-d...@googlegroups.com <iiif-d...@googlegroups.com> on behalf of James Heald <jpm....@gmail.com>
Sent: 24 September 2020 19:53
To: iiif-d...@googlegroups.com <iiif-d...@googlegroups.com>
Subject: Re: [IIIF-Discuss] Manifests or IIIF collections for Wikimedia categories ?
 
--
-- You received this message because you are subscribed to the IIIF-Discuss Google group. To post to this group, send email to iiif-d...@googlegroups.com. To unsubscribe from this group, send email to iiif-discuss...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "IIIF Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iiif-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iiif-discuss/aeab08a2-be34-4db4-3f39-ece2fbb2d990%40gmail.com.

Stefano Cossu

unread,
Sep 24, 2020, 4:58:08 PM9/24/20
to iiif-d...@googlegroups.com, HIGGINS, RICHARD I.
Richard,
Congratulations for creating this browser and thanks for sharing the
information. It's a great way to view the IIIF universe.

I support the idea of building pagination in the client rather than in
the manifest because the pagination parameters are technology and
capacity dependent (i.e. the same manifest that takes a minute to load
could possibly take a few seconds in 2025-2030 technology, so parameters
such as page size for the same manifest will likely change over time).

I see that the browser struggles with large collections (hitting the
Bodleian Libraries crashes my tab, and opening any of the Wellcome
"browse by" facets takes up to a minute). Maybe loading and parsing a
huge JSON file is the bottleneck even if you paginate the rendering?
Whatever the reason, I think that scaling is still an issue that the
IIIF presentation specs can and should help mitigating in the future,
and the ability to retrieve arbitrarily-sized pages of a whole manifest
seems valuable to me.

Too bad this thread came up right after the call for proposal for the
Fall meeting expired, but I hope we have a chance to discuss this further.

Stefano
> ------------------------------------------------------------------------
> *From:* iiif-d...@googlegroups.com <iiif-d...@googlegroups.com> on
> behalf of James Heald <jpm....@gmail.com>
> *Sent:* 24 September 2020 19:53
> *To:* iiif-d...@googlegroups.com <iiif-d...@googlegroups.com>
> *Subject:* Re: [IIIF-Discuss] Manifests or IIIF collections for
> <mailto:iiif-discuss...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iiif-discuss/LNXP265MB1162DBBF18D48EFCB6E4CC9E8F390%40LNXP265MB1162.GBRP265.PROD.OUTLOOK.COM
> <https://groups.google.com/d/msgid/iiif-discuss/LNXP265MB1162DBBF18D48EFCB6E4CC9E8F390%40LNXP265MB1162.GBRP265.PROD.OUTLOOK.COM?utm_medium=email&utm_source=footer>.

James Heald

unread,
Sep 26, 2020, 9:31:07 PM9/26/20
to iiif-d...@googlegroups.com
Thank you so much Richard (and Andrew). That has been incredibly
useful. Jalava makes a fantastic way to explore the different
collections, and the portal is a wonderful gateway into the IIIF universe.

It has really clarified to me that I think that there would be great
advantage in making available a manifest of all the images in a
Wikimedia category (or, at least, the first 500 say of them), in
addition to an IIIF collection for the category; so that the collection
would contain the manifest for the images, and also collections for each
of the sub-categories.

I think that that combination of both a collection and a manifest would
be the way to give an experience, when explored in an application like
Jalava, that would relate quite closely to what the category page looks
like, and which would cleanly separate navigation of the category tree
from navigation of the images within the category -- the latter of which
so many IIIF tools excel at, if the images are in a manifest.

Both the collection URL and the manifest URL should be readily
predictable from the category name; but if we add a link (primarily for
humans) to the sidebar of the Commons category page, IMO out of the two
it would probably be most useful if that link went to the manifest,
rather than the collection, as the manifest is what would be valuable
for the largest range of tools; and humans would be able to use the
existing wiki page to navigate to other categories. Included as part of
the collection URL and manifest URL should probably be something to
indicate a language localisation for the collection or manifest, with
each different language getting a different collection + manifest tree;
with the manifest link displayed on the Commons page being adapted
according to the Commons' user's currently preferred language.

I would anticipate that both the manifest and the collection for each
category would be 100% machine-generated, being constructed as
additional outputs from information that Commons would already be
storing independently.

I would hope to see Commons also able to store 'hand-curated' manifests;
with the IIIF collection for the category also including any curated
manifests that had been placed into that category. However there are
some (slightly political) issues wiki-side that may need some work
before that can become a reality. (See (*) at the end below).


Turning to the question of larger categories, in another reply Stefano
Cossu wrote:
> I support the idea of building pagination in the client rather than in the manifest

I would agree up to a point; but realistically there do need to be
limits set from the server side as to the maximum items in a manifest or
a collection that it will agree to deliver. It is not realistic to
expect the server to generate details for 734,000 files every time
anyone (or any spider) casually clicks on the manifest link for
"Category:Media contributed by the Digital Public Library of America".
Realistically some kind of limitation and paging/continuation mechanism
(or hard break-down into pages) is required.

Also, while it might be nice for some future iteration of the IIIF
standard to make it possible to request a part-manifest for "the next
500 files in sequence with sort-keys starting from the letter M", akin
to how Commons serves
https://commons.wikimedia.org/w/index.php?title=Category:United_States_Army&from=M
(or some arbitrary number as requested by the client, from some
arbitrary offset, subject to some maximum set by the server) -- that
might be nice, but there does seem to me to be a complication that would
have to be addressed, namely how any such new IIIF design would deal
with ranges, because on the present IIIF design, to specify that a range
of images starting with 'M' was available would seem to require sending
information up-front with urls for all the canvases that would be part
of that range. And, also, in IIIF one can have multiple groups of
ranges over the whole set of images, eg perhaps one specifying an
alphabetical ordering, versus in another a chronological ordering for
the images; so it may not even be clear what the first 'n' images
actually are.

So, yes, while the idea of a soft paging under the control of the client
is attractive, it seems to me that realistically -- at least with the
IIIF standard as it is at the moment -- that large IIIF manifests from
Commons categories would need hard paging into pre-defined blocks.

The best design for this might be something that might need a few
iterations to evolve. I think the scheme would need to adapt, depending
on just /how/ big the category was. For a category of a few thousand
images, it might be enough just to have several manifests. (Though
perhaps one should consider whether these should be grouped together as
a distinct sub-collection of the whole category). But if the number of
images were increased from this, I think one would seem to need
increasingly more and more of a hierarchical structure of collections to
contain the manifests, if no manifest should contain more than 500
items, and no collection perhaps more than 50. Any detailed design is
probably something perhaps best left to be worked out in due course.


One thing that I do wonder about is that the IIIF presentation API
standard 3.0 at paragraph 5.1 specifies that collections should for "a
tree-structured hierarchy", although later this is qualified by the
provision that "Manifests or Collections MAY be referenced from more
than one Collection".

Commons categories very definitely do /not/ form a simple tree. A
typical categories will usually have several parent categories, and
'diamond' patterns of inheritance are common, often extended to whole
'ladder' patterns of inheritance -- it is very common that a category
will have two parent categories, each of which descends from a common
grandparent; in addition to other parent categories as well.

So I hope that would be acceptable. It looks like the standard allows a
collection to specify an array of values for "partOf", though I am not
sure how well clients deal with this? Does, for example, Jalava show
upward links, if they are not specified in its URL? Can it handle
multiple such upward links?

One other thing with the structure of Commons that might cause issues is
the potential for category loops -- eg (imaginary example)
[[Category:Egg]] listed as relating to chickens under
[[Category:Chicken]], with [[Category:Chicken]] listed as relating to
eggs under [[Category:Egg]]. Such arrangements are certainly deprecated
in the help-pages for Commons categorisation, and I believe efforts are
made to try to track them down and resolve them. But I don't know how
successful those efforts are, particularly as category chains get deeper
and can go off in unexpected directions. I see that there are currently
28 categories that are members of themselves:
https://commons.wikimedia.org/wiki/Commons:Database_reports/Self-categorized_categories
(which can occur sometimes after a sub-category has been merged into a
parent category), so I would imagine that there may be quite a few
examples of longer category-loops as well; which would get translated
into loops in the collection structure.

Probably as much as can be done is to note that a possibility of such
loops in the collection hierarchy will definitely exist; is probably
unpreventable; and, beyond that, leave it as 'scraper beware'.


(*) I mentioned above that, while the default manifests for categories
would likely be 100% machine-generated, there might also be a
possibility for additional 'hand-curated' manifests as well, to get
saved on Commons and included in Commons IIIF collections; but there
were some issues that might get in the way.

(So external support or enthusiasm might be helpful, from GLAMs and
others, as to why 'hand-curated' manifests on Commons could be useful,
to provide impetus to put in the effort to get past the issues.

Such use-cases might include:
* Situations where additional structural information may be available
beyond what would necessarily be included in the default category
manifest, eg perhaps range information corresponding to grouping of
images, eg chaptering of book-scan images, or alternate orderings.
* Or, one might want a 'hand-curated' manifest to specify a particular
subset of images in the category -- eg perhaps a 'best' set of images
from a work, if perhaps the Commons category contained addition derived
images, or multiple near-duplicate files for each page or picture.
* Or, perhaps, the institution providing the images might have its own
IIIF manifest for them containing additional infomation and annotations,
that would be a valuable addition to include with them.
* Or, it might want to provide a manifest connecting to the original
uploaded version of its images, before any 'improvement' (sic) of them
by Commons users.

-- further usefulness stories very gratefully sought)


So, what are the issues I was referring to, that manifest hosting would
need to overcome?

Firstly, (and easier of the two), it would need to be decided what such
manifests could contain.

Somebody in one of the Zoom chats wondered whether the Image API,
presumably referencing such a manifest, could be used to work around
Wikipedia's prohibitions (or more accurately: deep deep limitations) on
content licensed for "non-commercial use only". I can categorically
tell you: that will not be allowed to happen. Any such development
would be hunted down by the user communities themselves (or their
tribunes at least), armed with pitchforks and fire. And if by any
chance it got past them, then it would be stamped out without mercy by
the WM organisation itself. Wikimedia understands its underlying role
as being to increase the availability of content, that people can then
use in any way for any purpose -- of which the encyclopedia is only a
part. The prohibition on the inclusion of NC content is part and parcel
of this strategy, seen as a fundamental lever to get institutions to ask
whether their content really needs to be limited in this way, and to
motivate Wikimedians to try to encourage institutions to see what they
can release without this restriction. There is therefore absolutely no
way that either the community or the organisation will accept or
tolerate anything that could work as an end-run around this very
deliberate, very calculated policy.

I would actually expect restrictions on any WM-hosted manifests to go
further, and probably to exclude any references to any images or other
material that was not itself WM-hosted. And yes, I appreciate that that
is 180 degrees at variance with one of the key ethoses of IIIF, namely
for IIIF to make possible a mosaic of content, each element of which
will stay on the site of its originating institution and be served from
there as a service, removing the need for any copying or duplication or
aggregation into central silos. I am aware of that, and of the value of
avoiding duplication so that an image is always attached to the most
complete, most up to date metadata. I am also aware that the IIIF
universe is already so big that there is no way that WM could directly
host even a fraction of it. Nevertheless, I think the restriction on
WM-hosted manifests is entirely possible, and quite likely, for a few
reasons. Firstly, it removes the possibility of WM-hosted manifests
including by reference NC material (other than through 'see also'
links), since no content hosted on Commons is NC. Secondly, it
encourages local copies of images and other content to be made on
Commons. This has always been the WM Commons way. For one thing it
should mean that the material will continue to be available and readily
accessible for as long as WM continues to exist. And by making the
material easy to copy, that should help assure its availability even
after WM continues to exist, or in environments where WM is hard to
access: security through multiplicity -- whereas content on remote sites
can disappear, or the site policy without warning become less
permissive, or metered, or limited, or tied to advertising, or otherwise
more restrictive. Additionally, when the content is on Commons it is
immediately usable throughout Wikimedia, whereas content on remote sites
is not. And it is accessible for the Commons community to refine and
edit, add to its metadata, refine its categorisation, 'restore', etc, etc.

So for all of these reasons, I would quite expect WN-hosted manifests to
be restricted to WM-hosted content. And probably particularly so
initially, when this is a new form of object for WM to host.


The second issue for manifest hosting to overcome is more 'political'.
As a glob of JSON, an IIIF manifest would likely sit in the 'Data:'
namespace on Commons, with some suitable suffix ('.maniiif'?).

But the 'Data:' namespace, which is currently home to two kinds of JSON
objects -- map files, specifying shapefile outlines for real-world
boundaries, and tab files, specifying tabular data stored as a glob of
JSON -- is deeply, deeply un-loved by the powers that be. Indeed, as
one of the PTB explained to me quite jauntily, the staffer working on
the MediaWiki code who built the Data: namespace was told in no
uncertain terms *not* to create a facility for tabular data. He built
it anyway. So they fired him. And then about half a year later the
whole team he worked for (the maps team) was shut down and redeployed.

As a result the Data: namespace now drifts on, unloved and unregarded,
apparently without any sponsoring team in the organisation to take
ownership of it, or seek to develop it.

That's unfortunate, because the objects in the Data: namespace currently
lack some critical things. They don't get a wiki file-description page,
so you can't explain what's in the object in wikitext. They don't get a
Structured Data for Commons slot, so you can't explain what's in them in
SDC statements. And, since they don't have file-description pages,
there's currently no way to add Commons category information to them, so
they don't turn up in Commons categories. Between these three
limitations, that makes the objects pretty much un-discoverable, unless
you already know exactly where they are.

If the manually-curated manifests are going to live in the Commons
'Data:' namespace, then at the very least it would be necessary to be
able to add Category indications to them, so that they would appear in
the relevant Commons categories, and corresponding Commons IIIF
collections. To make this possible probably means attaching Commons
file-description pages to them, which would be a good thing in itself.
SDC statements would be nice too, if the manifests are to be
discoverable through SPARQL queries. All of which means finding a WM
development team prepared to take ownership of the Commons 'Data:'
namespace, and commit the required resources for quite an amount of
development.

WM's "Core Platform Team" might just be the ones to do it. (And in the
process rescue the usability of the whole 'Data:' namespace). But the
initial ambitions for their 'IIIF API' initiative seem limited to
wrapping the thinnest of IIIF-compliant APIs around the most basic of
what MediaWiki can already do.

WM-Sweden has been charged with reaching out to GLAMs to see what
use-case can be developed for doing any more than that. WMSE's
organising ticket for that project is
https://phabricator.wikimedia.org/T261621
Additionally, there's a wiki talk page for the Core Platform Team's IIIF
initiative at
https://www.mediawiki.org/wiki/Talk:Core_Platform_Team/Initiatives/IIIF_API

So: if you think there is a case to be made for curated IIIF manifests,
that case needs to be made.

If you think there is a case to be made for curated IIIF manifests
(subject to the limitations above) to be hostable at Wikimedia Commons;
or you know a GLAM (perhaps a small one, perhaps a large one) that might
have some use for such a capability, eg alongside images hosted for them
on Wikimedia Commons and made available through the IIIF image API, then
please do say so and comment, either on this thread, or on that wiki
talk page, or by getting in touch with somebody from WMSE that's
subscribed to that Phabricator ticket.

And of course I'd also really appreciate any thoughts or comments on the
first part of this post too, as to what might be the best structure of
IIIF manifests and collections to correspond to WikiCommons categories,
and whether the thoughts outlined above seem sane.

Thanks to everyone very much again,

Best wishes,

James.
Reply all
Reply to author
Forward
0 new messages