Searching/filtering on a specific dataset?

Matthias Vandermaesen

unread,

Aug 19, 2015, 2:12:47 AM8/19/15

to Europeana API forum

Hi,

I wonder if it is possible to filter objects on a specific dataset/provider?

The dataset/[datasetId].json call returns a JSON response with the edmDatasetName property. I tried using that in the search.json query, but that would throw me an error:

Example:

- http://europeana.eu/api/v2/dataset/04101.json?wskey=api2demo (List a single dataset imported by the Flemish Art Collection)

- http://europeana.eu/api/v2/search.json?wskey=api2demo&query=edmDatasetName:04101_M_BE_FlemishArtCollection_ese (Search based on a filter on the edmDatasetName property)

The latter query currently fails / returns an error.

Are there alternative solutions on how to do this?

Thanks!

With regards,

Matthias Vandermaesen

----

Digital dataconservator

Flemish Art Collection

Abrahamstraat, 8,

9000 Ghent, Belgium

http://www.flemishartcollection.be

Remy Gardien

unread,

Aug 19, 2015, 3:10:49 AM8/19/15

to Europeana API forum

Hi Matthias,

Yes, that is possible.

However, the query filter name for the API currently still uses the (deprecated) europeana_collectionName property.

So a search which you are trying can be done using this parameter name:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=europeana_collectionName:04101_M_BE_FlemishArtCollection_ese

To search for a provider, you can either use the provider or data_provider parameter as a query facet, eg:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&qf=PROVIDER:%22Vlaamse+Kunstcollectie%22

In general though it is better to use the custom facets to return a list of providers and/or datasets, eg:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&profile=facets&facet=provider_aggregation_edm_provider or

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&profile=facets&facet=europeana_collectionName

I hope this helps!

Best regards,

Remy Gardien

Matthias Vandermaesen

unread,

Aug 19, 2015, 4:32:12 AM8/19/15

to Europeana API forum

Hi Remy,

Yes, this is very helpful. Thank you!

My use case consists of looking through a list of providers, selecting one and then querying specific datasets and/or objects that are tied to that particular provider.

Now, the API documentation doesn't say as much, but your answer indicates that there are two possible roads you can take to implement this.

1/ Using facets:

Just start looking through all the available objects using this query:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&profile=facets

The 'facets' parameter will also return the "PROVIDER" facet which holds an array or list of available providers. Using the value in the 'label' property of an item, you can drill down the results to a particular provider like this:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&qf=PROVIDER:%22Vlaamse+Kunstcollectie%22&profile=facets

2/ Using providers.json and/or datasets.json

First you enumerate the available providers:

http://europeana.eu/api/v2/providers.json?wskey=api2demo

Then you select a particular provider and you retrieve the related dataset(s)

http://europeana.eu/api/v2/provider/041/datasets.json?wskey=api2demo

Finally, you can query for the specific objects in the dataset(s):

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=europeana_collectionName:04101_M_BE_FlemishArtCollection_ese&rows=100

Remarks regarding both use cases:

If your use case consists of implementing a faceted search user interface, then the first approach is a good way of querying the API. You get all the data you need (objects, facets, etc.) and you can easily drill down. Suppose you don't implement a fully fledged faceted search UI, but instead you are building an UI that targets specific datasets directly, then this is a less desirable approach. Why?

- In the first call, you want to enumerate the list of available providers. You're using search.json but you are only using the data in the "facets" property. The default data in the "items" property is likely not going to be used in such an application and could be considered as "waste" in this context.

- The API reuses the "label" value as an argument to the PROVIDER qf in the second call. However, the value is just that: a name, not an unique identifier. Are there guarantees that this call would yield an exhaustive list of relevant results? For one, I've noticed that my organisation is denoted with various names in the API:

=> http://europeana.eu/api/v2/provider/041.json?wskey=api2demo :: name: "Vlaamse Kunstcollectie vzw"

=> http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&qf=PROVIDER:%22Vlaamse+Kunstcollectie%22&profile=facets :: provider => "Vlaamse Kunstcollectie"

... which makes all the difference if you would inadvertently try to use the value from provider/<objectId>.json in the search.json call.

The second approach would be preferable here:

- Use a specific resource call to retrieve specific resource information (provider > datasets)

- Use of the unique ID of the dataset or provider to filter the data unambiguously using the europeanaCollectionName filter.

Anyhow, thanks for your response!

With regards,

Matthias Vandermaesen

Remy Gardien

unread,

Aug 19, 2015, 9:06:30 AM8/19/15

to Europeana API forum

Hi Matthias,

We realise that there are improvements to make in the way the Europeana API deals with providers & datasets, in particular when it comes to identification (eg persistent identifiers rather than names). In this context your feedback here is very welcome.

With regards to your comparison of use-case 1 & 2: you can also instruct the API to not return any items and only the facets, by setting the row count to zero:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&start=1&rows=0&profile=facets

To make it even more clean, you can limit the list of returned facets to only the DATA_PROVIDER facet:

http://europeana.eu/api/v2/search.json?wskey=api2demo&query=*:*&start=1&rows=0&profile=facets&facet=DATA_PROVIDER

This query basically gives you a similar response as to the list of providers in the Providers API. For your use-case I would recommend this way of querying the API, this way the values for providers are also always consistent.

Regards,

Remy Gardien

Matthias Vandermaesen

unread,

Aug 19, 2015, 10:32:35 AM8/19/15

to Europeana API forum

Hi Remy,

Thanks for the clarifications!

A few follow up questions:

1/ What do you mean with "the values for providers are also always consistent"? Does this mean that the response of the providers.json call is less consistent - and thus less reliable - than what your example via search.json returns?

2/ So, what is the purpose of the providers.json & dataset.json calls? What would be a valid use case for implementing these instead of the search.json approach?

2/ I've noticed that "Vlaamse Kunstcollectie" is not an existing value in the DATA_PROVIDER facet list, but it does appear in the PROVIDER facet list. What's the difference between those two properties?

Regards,

Matthias Vandermaesen

Remy Gardien

unread,

Aug 19, 2015, 10:37:58 AM8/19/15

to Europeana API forum

Hi Matthias,

1) The Providers & Datasets API are partly sourced from a CRM-system, and some data may be manually updated whereas the search API comes straight from our search index.

2) The Providers & Datasets API is useful if one is looking for more detailed information on providers and datasets such as country, origin, domain. Or if you're looking for the status of a dataset (this is also where we maintain old/deleted datasets).

3) There is no difference, but this is one of the consequences of the answer in 1) and thus it is recommended to use the Search API. This is actually an error as both names should be equal, I'll file that as a report.