How do clients treat content that is embedded but also dereferenceable ?

24 vues
Accéder directement au premier message non lu

James Heald

non lue,
29 sept. 2020, 09:36:5229/09/2020
à iiif-d...@googlegroups.com
Question: if content (eg specifically a canvas) is embedded in a
manifest, but also has a URI that is dereferenceable, what do clients
usually do, currently?

- Do they typically just take what is embedded in the manifest?
- Or always get whatever is at the URIs, and prefer that?
- Or get 'last updated' dates and go with whichever is most recent?
- Or try to mix-and-match the two, and patch in any content that's in
one but not the other?


Background:

In a recent thread, Stefano Cossu wrote:

> "[Some images appear in more than one manifest]. In this case, which is becoming very frequent at the Getty, we are considering processing images separately as independent and dereferenceable canvases rather than embedding them in manifests, in order to save storage and processing, especially if we plan to add considerable metadata to the canvases."

( https://groups.google.com/g/iiif-discuss/c/bCTNRGdXGFI/m/UWP_FlpjAgAJ )

That might fairly closely match the situation for Wikimedia Commons.
The same annotated canvas might reasonably appear in a number of manifests:
* a manifest for the image itself, as a link from the file's Commons
description page
* a manifest for each Wikipedia article the image appeared in, each
presenting all the images in that article, linked from the sidebar of
the article
* a manifest for each Commons category, presenting the images in that
category (or, at least, the first 500 of them), linked from the sidebar
of the category page

Additionally, further cases for making the canvas individually
de-referenceable might include
i) if a URI looks like it *might* be dereferenceable, it probably
*should* dereference to something.
ii) a dereferenceable URI means that 3rd parties can reference it in
their manifests, and always get the most up-to-date metadata
iii) arguments made in this 2016 thread ("Re: The Case for
Dereferencable Canvas URIs",
https://groups.google.com/g/iiif-discuss/c/HZtInSSs_8k), which I didn't
quite follow.


But, on the other hand, I suspect there may also be some arguments for
embedding, eg
i) potential delivery and cacheing efficiencies, if everything is in
the one file.
ii) perhaps a bit more user-convenience/portability/transparency, if
everything is bundled all together? (eg for making new manifests).


So that was making me wonder: what happens if a set of canvases are
embedded, but also exist as dereferenceable entities in their own right?

Do clients identity that they can get enough of what they need from what
has been embedded? Or do they hit the URIs anyway? Or some something
in between?


Thanks,

James.

Stefano Cossu

non lue,
29 sept. 2020, 13:31:4929/09/2020
à iiif-d...@googlegroups.com,James Heald


On 9/29/20 6:36 AM, James Heald wrote:
> Question: if content (eg specifically a canvas) is embedded in a
> manifest, but also has a URI that is dereferenceable, what do clients
> usually do, currently?
>
> - Do they typically just take what is embedded in the manifest?
> - Or always get whatever is at the URIs, and prefer that?
> - Or get 'last updated' dates and go with whichever is most recent?
> - Or try to mix-and-match the two, and patch in any content that's in
> one but not the other?

That is an interesting question that I haven't found a clear answer for
in the specs yet, so I will defer to the Editors.

What I see about the "embedded" and "referenced" terms is:

> embedded: When a resource (A) is embedded within an embedding resource (B), the complete JSON representation of resource A is present within the JSON representation of resource B, and dereferencing the URI of resource A will not result in additional information. Example: Canvas A is embedded in Manifest B.
> referenced: When a resource (A) is referenced from a referencing resource (B), an incomplete JSON representation of resource A is present within the JSON representation of resource B, and dereferencing the URI of resource A will result in additional information. Example: Manifest A is referenced from Collection B. [1]

[1] https://iiif.io/api/presentation/3.0/#12-terminology

It is not clear to me how a resource is deemed "complete" or
"incomplete" by the spec. Would that be the lack of some key element?
What about elements that may be empty, so what appears to be an
incomplete resource is actually the whole thing? I would obviously want
to avoid to look up every single canvas, range, etc. in a manifest that
may or may not have more content in a separate location.

An unambiguous approach could be using non-HTTP identifiers for embedded
resources, so that if a resource has a HTTP URI it would be looked up
separately and any embedded content ignored. But that violates the rule
that Canvases, Ranges, etc. must have HTTP URIs.

>
>
> Background:
>
> In a recent thread, Stefano Cossu wrote:
>
>> "[Some images appear in more than one manifest].   In this case, which
>> is becoming very frequent at the Getty, we are considering processing
>> images separately as independent and dereferenceable canvases rather
>> than embedding them in manifests, in order to save storage and
>> processing, especially if we plan to add considerable metadata to the
>> canvases."
>
> ( https://groups.google.com/g/iiif-discuss/c/bCTNRGdXGFI/m/UWP_FlpjAgAJ )
>
> That might fairly closely match the situation for Wikimedia Commons.
> The same annotated canvas might reasonably appear in a number of manifests:
> * a manifest for the image itself, as a link from the file's Commons
> description page
> * a manifest for each Wikipedia article the image appeared in, each
> presenting all the images in that article, linked from the sidebar of
> the article
> * a manifest for each Commons category, presenting the images in that
> category (or, at least, the first 500 of them), linked from the sidebar
> of the category page

You could also make that a collection and aggregate all the article
manifests, using the `thumbnail` from each manifest for an index view?

>
> Additionally, further cases for making the canvas individually
> de-referenceable might include
> i)   if a URI looks like it *might* be dereferenceable, it probably
> *should* dereference to something.
> ii)  a dereferenceable URI means that 3rd parties can reference it in
> their manifests, and always get the most up-to-date metadata
> iii) arguments made in this 2016 thread ("Re: The Case for
> Dereferencable Canvas URIs",
> https://groups.google.com/g/iiif-discuss/c/HZtInSSs_8k), which I didn't
> quite follow.
>
>
> But, on the other hand, I suspect there may also be some arguments for
> embedding, eg
> i)   potential delivery and cacheing efficiencies, if everything is in
> the one file.
> ii)  perhaps a bit more user-convenience/portability/transparency, if
> everything is bundled all together? (eg for making new manifests).

Our (Getty) typical canvas looks like this one at the moment :


{
"@id":
"https://media.getty.edu/iiif/manifest/17ead4ae-ff05-419a-a73a-efc318422f7f",

"@type": "sc:Canvas",
"height": 6199,
"images": [
{
"@id":
"https://media.getty.edu/iiif/manifest/annotation/anno-5e8592c9-f97b-4f63-b67c-1326f0ac259a.json",

"@type": "oa:Annotation",
"motivation": "sc:painting",
"on":
"https://media.getty.edu/iiif/manifest/17ead4ae-ff05-419a-a73a-efc318422f7f",

"resource": {
"@id":
"https://media.getty.edu/iiif/image/5e8592c9-f97b-4f63-b67c-1326f0ac259a/full/full/0/default.jpg",

"@type": "dctypes:Image",
"format": "image/jpeg",
"height": 6199,
"service": {
"@context": "http://iiif.io/api/image/2/context.json",
"@id":
"https://media.getty.edu/iiif/image/5e8592c9-f97b-4f63-b67c-1326f0ac259a",
"profile": "http://iiif.io/api/image/2/level2.json"
},
"width": 8251
}
}
],
"label": "Recto",
"width": 8251
},

(from
https://media.getty.edu/iiif/manifest/24846abe-8402-4061-b48f-74ebcb501da7)

I don't think that is enough information to justify separating it out
and requiring an extra call to retrieve it, even if some canvases are
referred to in multiple manifests. However, if you start considering
canvases as first-class citizens that represent a specific entity in
your content repository (e.g. a specific view of an object) and you
start attaching descriptive metadata to it, it may soon become
convenient to have that resource live on its own, both because of the
increased volume of the data and because it would be easier to update
one small canvas when you update the represented resource metadata,
rather than multiple, potentially very large manifests.

Note that by "descriptive metadata" I am referring to information
strictly related to the image, not the subject, e.g. "Official portrait
of STS-47 Mission Specialist Mae Jemison; created by J. Doe on ...;
copyright/CC, etc.". Metadata for the subject would go in a manifest
that represents Jemison.

>
>
> So that was making me wonder: what happens if a set of canvases are
> embedded, but also exist as dereferenceable entities in their own right?
>
> Do clients identity that they can get enough of what they need from what
> has been embedded?  Or do they hit the URIs anyway?  Or some something
> in between?
>
>
> Thanks,
>
>    James.
>
> --
> -- You received this message because you are subscribed to the
> IIIF-Discuss Google group. To post to this group, send email to
> iiif-d...@googlegroups.com. To unsubscribe from this group, send
> email to iiif-discuss...@googlegroups.com. For more options,
> visit this group at https://groups.google.com/d/forum/iiif-discuss?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "IIIF Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to iiif-discuss...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iiif-discuss/fb278c85-4d02-845a-fd6d-a13cdc07434e%40gmail.com.
>
>        CAUTION: This email originated from outside of the Getty. Do not
> click links or open attachments unless you verify the sender and know
> the content is safe.
>

--
Stefano Cossu
Software Architect
J. Paul Getty Trust

James Heald

non lue,
1 oct. 2020, 10:19:3401/10/2020
à iiif-d...@googlegroups.com
Thanks, Stefano. The example of the Getty is very useful.


As for placing IIIF manifests for wiki articles into IIIF collections,
you're right, I had overlooked this.

It's a good idea -- though for Wikimedia there may be a wrinkle, that
I'll come to below.

To think about its relevance for Wikimedia, it is probably useful to
take Commons and the Wikipedias separately.


For Commons, it's a good idea but its relevance is limited, because
there are rather few articles ("galleries") - I think the figures
currently are about 70,000 galleries on Commons, compared to about 9
million categories.

So for Commons (which is where high-resolution images contributed by
partner institutions would normally be hosted), to make the images
discoverable at scale by IIIF tools, I do see the main way forward as
via a manifest of the all images in the Commons category.

If one also had a IIIF collection for the category, that could contain
that main manifest, any other manifests associated with the category;
plus collections representing sub-categories of that Commons category,
to capture the hierarchical structure of the Commons category tree.

So, yes, that would be a very natural place also for manifests for any
Commons galleries associated with the category, as you suggest; even if
it would only be the occasional category that would have such a gallery
(and most galleries have not been maintained, so might not contain the
best pictures, and sometimes/often not even the best version of a
particular picture).


Compared to Commons, the situation on Wikipedias is reversed, in that
articles rather than categories are the objects that are most important,
most visible, and most curated.

However there is an additional issue with the Wikipedias (most of them
anyway), namely fair-use images (eg record covers, company logos, etc,
and certain other non-free copyright images, all typically limited to
only about 300 x 300 pixels).

Legally the making available of the fair-use images is only justified by
their contextual relevance in the articles which contain them. So I am
a little wary about any readiness with which they would be made
available, if the proximity to that context became more remote.

I do think that an "IIIF manifest" link in the sidebar of every
Wikipedia article would be the best way to advertise any IIIF image API
capability on the platform, if it becomes available; and indeed to
highlight IIIF and its possibilities as a whole to users. So it would
be nice to see such links appearing early.

But on the other hand, to put off till later any issues involving
fair-use content, parhaps the initial IIIF collection hierarchy should
just focus on Commons, at least to start with.

One way round might be for manifests for Wikipedia articles to only
include the content in them drawn from Commons. (Which might be an
initial development stepping-stone anyway).

But the right way forward after that might take some thinking about.
Would it be confusing if manifests for an article did not present all of
the content from that article, but only the free content? Would it
instead be more useful and intuitive if the manifests instead contained
all the content (including the non-free content), but then were not
gathered into collections? If the largest images were discoverable
through IIIF collections for Commons categories, would it matter if
there were not collections for the Wikipedia article manifests? Or
perhaps I am over-thinking this, and such a degree of caution about
non-free images is just unneeded pernicketiness? I am not sure.

But whatever eventual way forward, I do think an IIIF collection
hierarchy for Commons would be a good start.

And by adding the capability to MediaWiki for that use, that should also
make at least the capability available for third parties running their
wikis on Mediawiki, to use as they want. So with luck that should be
3rd parties too, making IIIF manifests and IIIF collections available,
derived from their wikis.

So yes, the capability for manifests derived from wiki articles to then
appear in collections relating to wiki categories is definitely worth
remembering, and I hope would indeed be useful.

Thanks again,

James.

Stefano Cossu

non lue,
1 oct. 2020, 18:53:2701/10/2020
à iiif-d...@googlegroups.com,James Heald
> Legally the making available of the fair-use images is only justified by
> their contextual relevance in the articles which contain them. So I am
> a little wary about any readiness with which they would be made
> available, if the proximity to that context became more remote.

I'm no expert of fair use policies, but I think I understand your issue.
If you only publish presentation API links in your web pages, rather
than image API links, you would be sure to always advertise the context.
Of course someone can hotlink your image URL without the context, but
that (I guess) won't be your problem: you are making a best effort to
present the image context and copyright information.

Note that a presentation URI can even be a Canvas targeting a single
image with minimal descriptive and legal information.

Stefano
> https://groups.google.com/d/msgid/iiif-discuss/3e76a496-a48c-b60f-afac-7b75bbeb3046%40gmail.com.
>
>        CAUTION: This email originated from outside of the Getty. Do not
> click links or open attachments unless you verify the sender and know
> the content is safe.
>

Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message