Using birth and death dates for reconciling persons with Wikidata

155 views
Skip to first unread message

christine eslao

unread,
May 15, 2018, 3:34:21 PM5/15/18
to OpenRefine
Can anyone offer advice on how to use birth and death dates to improve reconciliation with persons in Wikidata? So far, nothing I've tried seems to have led to more accurate matches. The data I'm working with involves the names of 18th century authors, so filtering out modern persons with similar names would be valuable for avoiding candidates with similar names. (Every obscure preacher seems to have the name of a modern footballer or cricketer.) Typically, the data has years but not precise dates. They're plain text and I've used the toDate function to convert them.

I'm reconciling against the type Humans (Q5) and trying P569 (date of birth) and/or P570 (date of death).

A colleague and I tried testing birth dates in different formats within query URLs (outside OpenRefine), but the judgements don't seem to change:

With ISO 8601 date:
https://tools.wmflabs.org/openrefine-wikidata/en/api?queries=%7B%22theQuery%22%3A+%7B%22query%22%3A+%22Addison%2C+Joseph%22%2C+%22type%22%3A+%22Q5%22%2C+%22properties%22%3A+%5B%7B%22pid%22%3A+%22P569%22%2C+%22v%22%3A+%221672-01-01T00%3A00%3A00Z%22%7D%5D%7D%7D

With four-digit year:
https://tools.wmflabs.org/openrefine-wikidata/en/api?queries=%7B%22theQuery%22%3A+%7B%22query%22%3A+%22Addison%2C+Joseph%22%2C+%22type%22%3A+%22Q5%22%2C+%22properties%22%3A+%5B%7B%22pid%22%3A+%22P569%22%2C+%22v%22%3A+%221672%22%7D%5D%7D%7D

With four-digit year (no quotes):
https://tools.wmflabs.org/openrefine-wikidata/en/api?queries=%7B%22theQuery%22%3A+%7B%22query%22%3A+%22Addison%2C+Joseph%22%2C+%22type%22%3A+%22Q5%22%2C+%22properties%22%3A+%5B%7B%22pid%22%3A+%22P569%22%2C+%22v%22%3A+1672%7D%5D%7D%7D

What we're hoping for, in these particular examples, is to automatch with https://www.wikidata.org/wiki/Q206384 (born 1672) and not, say, https://www.wikidata.org/wiki/Q1334842 (born 1879); these candidates seem to come with the same scores.

Let me know if we're going about this all wrong. Somehow limiting the results to people who were alive during the 18th century would also help us to meaningfully narrow the options. Thanks!




Antonin Delpeuch (lists)

unread,
May 16, 2018, 2:20:38 AM5/16/18
to openr...@googlegroups.com
Hi Christine,

There is a paragraph about that in the docs:
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation#comparing-values

In short, instead of using "P569", just use "P569@year" to indicate that
you only want to match on the years of these dates.

https://tools.wmflabs.org/openrefine-wikidata/en/api?queries={"theQuery":{"query":"Addison,%20Joseph","type":"Q5","properties":[{"pid":"P569@year","v":"1672"}]}}

Let me know if it helps - and feel free to improve the docs if you did
not find them clear in the first place!

Cheers,
Antonin
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

christine eslao

unread,
May 16, 2018, 8:59:41 AM5/16/18
to OpenRefine
Antonin,

The docs are wonderfully clear -- we just overlooked this part, somehow. Thanks so much for answering my question anyway! Looking forward to trying this out.

Best,
Christine

Jeremy Guillette

unread,
May 16, 2018, 10:01:40 AM5/16/18
to OpenRefine
Antonin,

I've been working with Christine on the same project, and now that you bring it up, I had looked at and tried to use the @year modifier for the field, but I was entering "SPARQL: P569@year" into the property match field, which didn't return any matches for any of the rows I was reconciling. Omitting that "SPARQL: " from the property match field yielded much better results.

The screenshot on the wiki page shows a query similar to what I used, but which didn't work for me, but it seems to be hosted by wfmlabs. Is there an issue with hosting a new image elsewhere? If not, I can replace the screenshot with one that also omits the "SPARQL: " portion.

Thanks!
Jeremy

christine eslao

unread,
May 16, 2018, 10:11:58 AM5/16/18
to OpenRefine
Here's a followup question, hopefully less redundant to the documentation:

In our data, birth or death dates are off by one or two years. In these cases, it looks like using dates still has some effect, but we're wondering if we can tweak how the scores are weighted to affect matching. Any thoughts on this?

Here’s a query for Joseph Addison with the correct year of birth:

 

https://tools.wmflabs.org/openrefine-wikidata/en/api?queries={%22theQuery%22:{%22query%22:%22Addison,%20Joseph%22,%22type%22:%22Q5%22,%22properties%22:[{%22pid%22:%22P569@year%22,%22v%22:%221672%22}]}}

 

And here’s one for him with a year off from his actual date of birth (1673 instead of 1672):

 

https://tools.wmflabs.org/openrefine-wikidata/en/api?queries={%22theQuery%22:{%22query%22:%22Addison,%20Joseph%22,%22type%22:%22Q5%22,%22properties%22:[{%22pid%22:%22P569@year%22,%22v%22:%221673%22}]}}

 

There’s a “P569@year” key in the JSON response, and it’s got a “weighted” value of 40 and a “score” value of 100 for the correct date, and a “weighted” value of 20 and a “score” value of 50 for the date off by one year. Also, the off-by-one year did not get a match, but the correct year did.



On Wednesday, May 16, 2018 at 8:59:41 AM UTC-4, christine eslao wrote:

Thad Guidry

unread,
May 16, 2018, 10:26:10 AM5/16/18
to openr...@googlegroups.com
Jeremy,

You can host an image elsewhere and then just provide a link on our Wiki (or update an existing image link with a better picture)
Here's the official Github instructions on how to do so on any Github Wiki: https://help.github.com/articles/adding-images-to-wikis/

-Thad

Jeremy Guillette

unread,
May 16, 2018, 1:40:31 PM5/16/18
to OpenRefine
Thanks for confirming that, Thad! I've updated the page with a new screenshot.

-Jeremy

Daria

unread,
Nov 5, 2021, 2:50:24 PM11/5/21
to OpenRefine
Hi, thanks for these discussion. I have a follow-up question. Is it possible to use the century of birth  instead of the birth dates for reconciling persons?
With other reconcilation services this is possible (eg with GND I can use regex like "19*"), however I dont find a way to do that with the wikidata service. Or am I overseeing something?
Thanks a lot for your help.
Daria

Antonin Delpeuch (lists)

unread,
Nov 5, 2021, 3:02:33 PM11/5/21
to openr...@googlegroups.com

Hi Daria,

That is indeed not possible in the Wikidata service as far as I am aware.

Best,

Antonin

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/e5521cef-5a0f-47bc-b33e-65ffd751ed11n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages