Wikidata Reconciliation: new features

204 views
Skip to first unread message

Antonin Delpeuch (lists)

unread,
Jun 27, 2017, 4:34:26 PM6/27/17
to OpenRefine
Hi all,

To celebrate the new version of OpenRefine I have added a new feature to
the reconciliation interface. (But there is no need to upgrade anything
on your side to use it, as it is entirely server-side.)

Until now, matching was only supported for string-based fields (such as
monolingual texts or identifiers). I have now added custom matching
strategies for most value types in Wikibase. For instance, matching on
geographical coordinates is now possible (use the "lat,long" format in
OpenRefine).

In addition, you can now extract sub-fields in these complex values. For
instance, if for some reason you only have records of people with their
month and day of birth (but not their year of birth), you can match on that.

This is all described in the documentation:

https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation#comparing-values

A few notes of caution :

- you should not rely too much on the particular behaviour of the
scoring as it might evolve in the future. Of course I try to keep the
scoring sensible for every use case, but things can break. Please let me
know if that happens!

- all this is just a reranking *after* the search for Qids, we do not
query Wikidata based on them. So for instance if you have very noisy
names but accurate geographical coordinates, this reconciliation
interface will not work.

Also: contributors welcome!

- if you want to contribute subfields, it is quite straightforward: just
add a class here:

https://github.com/wetneb/openrefine-wikidata/blob/master/wdreconcile/subfields.py

- if you want to improve the scoring methods for particular values, it
is here:

https://github.com/wetneb/openrefine-wikidata/blob/master/wdreconcile/wikidatavalue.py

Cheers,

Antonin


Ettore Rizza

unread,
Jun 28, 2017, 5:12:42 AM6/28/17
to OpenRefine, li...@antonin.delpeuch.eu
Thank you Antonin, I'll test that !

Andrea Zanni

unread,
Jul 25, 2017, 12:00:54 PM7/25/17
to openr...@googlegroups.com, li...@antonin.delpeuch.eu
Hi Antonin, thanks.

A (related?) question:
I usually need to reconcile a list of authors with VIAF and Wikidata.
What I do:
* reconcile authors with VIAF, via the "conciliator" extension
* reconcile with Wikidata (having the VIAF ids in a column helps a lot)

For some reason, with this dataset I already have a list of authors which has been previously reconciled with Wikidata, so I have a column with author names and another one with Q ids.
I don't have any more the reconciliation matchings data on OpenRefine, just these 2 columns.

Is there a way to get also the VIAF reconciliation? This is *backwards* on what I usually do, and I'm not sure if there's a way to use the Qid to easily and quickly reconcile with VIAF.

Thanks!

Aubrey

On Wed, Jun 28, 2017 at 11:12 AM, Ettore Rizza <ettor...@gmail.com> wrote:
Thank you Antonin, I'll test that !

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrea Zanni

unread,
Jul 25, 2017, 12:12:47 PM7/25/17
to openr...@googlegroups.com, li...@antonin.delpeuch.eu
Hi Antonin, thanks.

A (related?) question:
I usually need to reconcile a list of authors with VIAF and Wikidata.
What I do:
* reconcile authors with VIAF, via the "conciliator" extension
* reconcile with Wikidata (having the VIAF ids in a column helps a lot)

For some reason, with this dataset I already have a list of authors which has been previously reconciled with Wikidata, so I have a column with author names and another one with Q ids.
I don't have any more the reconciliation matchings data on OpenRefine, just these 2 columns.

Is there a way to get also the VIAF reconciliation? This is *backwards* on what I usually do, and I'm not sure if there's a way to use the Qid to easily and quickly reconcile with VIAF.

Thanks!

Aubrey
On Wed, Jun 28, 2017 at 11:12 AM, Ettore Rizza <ettor...@gmail.com> wrote:
Thank you Antonin, I'll test that !

--

Antonin Delpeuch (lists)

unread,
Jul 25, 2017, 12:24:14 PM7/25/17
to openr...@googlegroups.com
Hi Aubrey,

If I understand your question correctly, you have Qids in one column,
and you want to obtain the VIAF ids associated with these Qids in
another column?

If so, that is exactly the feature I have been working on lately. It
lets you fetch properties (in your case, VIAF (P214)) from any
reconciled column (currently only for Wikidata). See the screencast here:
https://github.com/OpenRefine/OpenRefine/pull/1210
(Note that if you only have a column of "bare" Qids, you will need to
"reconcile" it to Wikidata first: it is of course a straightforward
bureaucratic step as Qids are directly recognized as items.)

This feature is available in the development version of OpenRefine. If
for some reason you need to stick to a stable version of OpenRefine, you
can also use the workaround described here:
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation#fetching-values-from-the-reconciled-database

I hope it helps!

Antonin
> send an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.

Ettore Rizza

unread,
Jul 25, 2017, 12:30:46 PM7/25/17
to OpenRefine, li...@antonin.delpeuch.eu
Hi Antonin, 

this reminds me that there is a problem with the example you provide for the workaround. When you click on the URL, you get an internal server error : https://tools.wmflabs.org/openrefine-wikidata/en/fetch_values?item=Q3068626&prop=P463&label=true
>     For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com

Andrea Zanni

unread,
Jul 25, 2017, 12:57:47 PM7/25/17
to openr...@googlegroups.com
Thanks!
I'll try asap.

(Note that if you only have a column of "bare" Qids, you will need to
"reconcile" it to Wikidata first: it is of course a straightforward
bureaucratic step as Qids are directly recognized as items.)

Mmmh, so I need to run a reconciliation task with this column, and it will automatically understand those Q are actually items? I can try: I was just worried because I have ~65k rows so I try not to reconcile these big columns.

Antonin Delpeuch (lists)

unread,
Jul 25, 2017, 1:24:24 PM7/25/17
to openr...@googlegroups.com
On 25/07/2017 17:57, Andrea Zanni wrote:
> Mmmh, so I need to run a reconciliation task with this column, and it
> will automatically understand those Q are actually items? I can try: I
> was just worried because I have ~65k rows so I try not to reconcile
> these big columns.

Yes it recognizes Qids: it will directly return the item as only
reconciliation candidate, auto-matched with a perfect score.

Antonin

Antonin Delpeuch (lists)

unread,
Jul 25, 2017, 1:29:48 PM7/25/17
to Ettore Rizza, OpenRefine
Oops, sorry about that! I just fixed this bug. Thanks a lot for
reporting it.

Antonin
> <javascript:>
> > <mailto:ettor...@gmail.com <javascript:>>> wrote:
> >
> > Thank you Antonin, I'll test that !
> >
> > --
> > You received this message because you are subscribed to the
> Google
> > Groups "OpenRefine" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to openrefine+...@googlegroups.com <javascript:>
> > <mailto:openrefine+...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> > <https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>>.
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "OpenRefine" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to openrefine+...@googlegroups.com <javascript:>
> > <mailto:openrefine+...@googlegroups.com <javascript:>>.

Luiza Wainer

unread,
Jan 22, 2018, 2:16:51 PM1/22/18
to OpenRefine
Hi all,

I have a situation similar to Andrea's of "backwards" VIAF reconciliation. I have a column of VIAF IDs and I need to obtain all the prefLables associated with them. I'm not the sharpest when it comes to GREL, so I was wondering if someone could explain to me the best route to do so (with hopefully detailed examples / recipes)

Thank you so much,
 - Luiza

Ettore Rizza

unread,
Jan 22, 2018, 4:37:19 PM1/22/18
to OpenRefine
Hi Luiza,

You need to use the regular Viaf API (what is discussed here only works with Wikidata). In Open Refine, this is a four steps operation.

1 Create a Viaf API URL from the column that contains your IDs (let's call it "viaf_id"). You need to create a new column using this GREL formula :

"http://viaf.org/viaf/" + value + "/viaf.json"


2 Get the Json file behind this URL. Edit column -> Add column by fetchnig URLS. Change the default value from 5000 milliseconds to 100 milliseconds. Then simply click "ok".

3 Extract the information that interests you in this Json. For the preffered labels, this GREL formula seems to do the job:

forEach(value.parseJson()["ns1:mainHeadings"]["ns1:data"], e, e["ns1:text"]).join(':::')

4 Split the results if you want using Edit Cells -> Split multivalued cells, and the separator :::

Here is a screencast that summarizes the operations, it will be clearer. 







Note: The JSONs returned by VIAF are huge. Even with only two IDs, you see on the screencast that my Open Refine was struggling. You should check in the API documentation if it's not possible to limit the amount of information returned.


Hope this helps,

Ettore

Owen Stephens

unread,
Jan 23, 2018, 7:29:05 AM1/23/18
to OpenRefine
Luiza,

Are you wanting to get the label from VIAF or from Wikidata?

Thanks

Owen

Andrea Zanni

unread,
Feb 18, 2018, 12:16:15 PM2/18/18
to openr...@googlegroups.com
Hi everyone,
I never came back here and told you that I managed to get my correct results.
"Add columns from reconciled values" is a *great* feature, and I now use it a lot.
It's very simple and one of the best things of OpenRefine.
I think we're past the tipping point in terms of usefulness for this great software, and
for what it's worth our team (Lara and I) is spreading the word whenever we have the chance.
So thanks everyone in the dev community.

The only thing it's missing, at the moment, from the feature is the possibility to import other data
*in another language*. Do we have to open a GitHub issue somewhere for this, or is already on schedule?

Thanks again.

Aubrey

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antonin Delpeuch (lists)

unread,
Feb 18, 2018, 2:27:39 PM2/18/18
to openr...@googlegroups.com
Hi Aubrey,

It's very nice to hear, thanks! :)

If I understand your request correctly, you want to fetch labels of
reconciled cells in a different language than the one used for
reconciliation?

The closest issue that we have is
https://github.com/wetneb/openrefine-wikidata/issues/17 . Feel free to
chime in there!

Antonin
> send an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages