Article about Query Translation in Europeana

7 views
Skip to first unread message

Péter Király

unread,
Jan 29, 2015, 5:35:13 AM1/29/15
to europe...@googlegroups.com, EUROPEA...@list.ecompass.nl
Dear All,

I would like to call your attention to a fresh article in The Code4Lib Journal:

Query Translation in Europeana
http://journal.code4lib.org/articles/10285

It discusses the technical background of the query translation feature
of Europeana portal and API.
If you have questions or critique, I am happy to answer.

Regards,
Péter

--
Péter Király
software developer

Göttingen Society for Scientific Data Processing - http://gwdg.de
eXtensible Catalog - http://eXtensibleCatalog.org

Vladimir Alexiev

unread,
Oct 1, 2015, 12:33:03 PM10/1/15
to Europeana API forum, EUROPEA...@list.ecompass.nl
Hi Peter!
I have a critique not about the implementation, but about the wisdom of query expansion for Europeana.
Language filtering is unreliable because:
  • The language field indicates the language of the CHO not the metadata
  • Per-field language tags are most often missing or unreliable
I think that query expansion exacerbates the multilingual ambiguity problem...

Péter Király

unread,
Oct 13, 2015, 4:36:37 PM10/13/15
to europe...@googlegroups.com
Well, it depends, Sometimes it works, sometimes not. Unfortunatelly
almost all fields in Europeana contains uncleared values, even those
which ideally should come from a dictionary (such as provider and data
provider).

Regarding beer: the word "beer" has different meaning in different
languages/context, and for me the first several hits (without query
translation on) is about "Beer" (a Dutch person), "Beers" (a Dutch
location), "beer" (bear in Dutch) etc. So all of them are relevant if
we don't provide information about our language preference. The
default Europeana query doesn't know and handle the language of the
query, so it doesn't know that we search for the English word "beer".
I should mention, that the query translation also did not know about
it, and doesn't care whether it finds "beer" in an English record or
in a Dutch record.
I found, that if I turn on query translation, the hist are more
relevant for the concept of the alcoholoc drink, maybe because
different language version of it don't appear in other records (so
there is no person name with the Hungarian or German version of beer).

Maybe a next step would be something like this (if Europeana would
have perfect records):

1) language detection on the user entered query
2) run a query with adding the appropriate language filter
3) provide feedback about the result of the language detection ("Do
you want to search beer only in English/Dutch/French... records?")

Cheers,
Péter
> --
> Visit Europeana Labs for API Documentation, Open Datasets, and our Apps
> Showcase - http://labs.europeana.eu
> ---
> You received this message because you are subscribed to the Google Groups
> "Europeana API forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to europeanaAPI...@googlegroups.com.
> To post to this group, send email to europe...@googlegroups.com.
> Visit this group at http://groups.google.com/group/europeanaAPI.
> For more options, visit https://groups.google.com/d/optout.



--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

James Morley

unread,
Oct 14, 2015, 7:58:16 AM10/14/15
to europe...@googlegroups.com
I think the example of beer is quite an extreme one and more of a general issue with languages as a whole rather than the API method.

In fact looking at Vladimir's original point, one of my Dutch colleagues could equally write and complain that they didn't get as many bears (to use the English name) as they would have expected. Using the language tag does help, and with http://www.europeana.eu/portal/search.html?query=beer&qf=LANGUAGE%3Anl you do get a lot of 'bears' and no 'beers' in the drink sense. The 'noise' is simply people called ter Beer etc. Of course I totally agree with Vladimir that the metadata is often lacking, but that in itself doesn't mean the translation API is not useful.

I was advising an API user yesterday about using translation terms. The issue was that students for example in Poland on a page about transport would not get many local results if the API call was set to use the search term 'car'. But the idea was, just as Péter suggests, that the user's location could be detected and in the case of Polish visitors, the term 'Samochód' added to the query. I need to check this, but I believe that you could also put in query term weighting to bias towards one language, which I'm sure would further help.

Best, James


________________________________________
From: europe...@googlegroups.com [europe...@googlegroups.com] on behalf of Péter Király [kiru...@gmail.com]
Sent: 13 October 2015 22:36
To: europe...@googlegroups.com
Subject: Re: {Europeana API forum} Re: Article about Query Translation in Europeana

Cheers,
Péter

To post to this group, send an email to europe...@googlegroups.com.

Vladimir Alexiev

unread,
Oct 20, 2015, 8:46:32 AM10/20/15
to Europeana API forum
To clarify my intentions, I wanted to search for Beer the beverage. 
Limiting to dc:language="en" loses matches (since many legitimate Beer records are not marked with language, or not accurately marked).
It also doesn't guarantee relevance, since that field describes the language of the CHO not of the metadata.

> (if Europeana would have perfect records)

If the Moon was made of green cheese, cheese and wine lovers would be in heaven.

I have not heard of a record being rejected because it doesn't have accurate language tags, so Europeana is very far from having perfect language tagging.
However, in some cases it's possible to recognize the language of a record, and to recognize whether that's a mention of a named entity (eg "de Beer" is a person not beer).
Furthermore, having an extensive gazetteer (eg like the EFD one extracted from Wikipedia) will allow you to find records mentioning different kinds and varieties of beers, beer mugs, etc etc that don't mention the word "beer".

Semantic enrichment!

Péter Király

unread,
Oct 20, 2015, 9:19:53 AM10/20/15
to europe...@googlegroups.com
Hi Vladimir,

You make me remember one thing regarding to semantic enrichment. The
Euopeana Data Model's contextual entity belong to the recor, not to
the individual field instance, so for example if "de Beer" is
recognized as a name, and the EDM record would contain an Agent
entity, even in that case there is no linkage between the dc:title or
dc:contributor (wherever de Beer appears), and the Agent. So the index
would contain a recognized de Beer (in the Agent), and a normal,
unrecognized one in the original place, which thus could be mixed with
other "beer"-s.

The solution for that would be if the enrichment process should not
forget the source of the entities, and the entities and the source
would be interlinked somehow.
For example:

<dc:title>Tweede Kamer hoorzitting over medianota van <edm:agent
id="1">Van Doorn</edm:agent> ; v.l.n.r. kamerleden <edm:agent
id="2">Wolf</edm:agent> (CPN), <edm:agent id="3">Keja</edm:agent>,
<edm:agent id="4">De Beer</edm:agent> (beiden VVD), <edm:agent
id="5">Roethof</edm:agent> (PvdA)</dc:title>

This way we could know that this De Beer is a person, who... instead
of a meaningless string.

Cheers,
Péter
Reply all
Reply to author
Forward
0 new messages