New URL for OpenRefine reconciliation service

204 views
Skip to first unread message

Antonin Delpeuch (lists)

unread,
Jun 14, 2020, 6:22:23 PM6/14/20
to OpenRefine, Discussion list for the Wikidata project.
Hi,

The upcoming domain name migration to on the Wikimedia Toolforge implies
that OpenRefine users need to update their Wikidata reconciliation
service to the new endpoint:

https://wdreconcile.toolforge.org/en/api

or by replacing "en" by any other Wikimedia language code.

The new home page of the service is at:

https://wdreconcile.toolforge.org/

This new endpoint will be available by default in the upcoming release
of OpenRefine (3.4).

For details about why an automatic migration via redirects is sadly not
possible, see this Phabricator ticket:

https://phabricator.wikimedia.org/T254172

Cheers,

Antonin


Antonin Delpeuch (lists)

unread,
Jul 8, 2020, 8:10:15 AM7/8/20
to OpenRefine, Discussion list for the Wikidata project.
Hi,

This change is now live! If you cannot reconcile to Wikidata anymore,
delete the Wikidata reconciliation service and add it again with the new
URL:

https://wdreconcile.toolforge.org/en/api

or by replacing "en" by any other Wikimedia language code.

Cheers,

Antonin

Hay (Husky)

unread,
Jul 8, 2020, 9:40:18 AM7/8/20
to Discussion list for the Wikidata project, OpenRefine
Cheers, this seems to work again for me!

-- Hay
> _______________________________________________
> Wikidata mailing list
> Wiki...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

Reuben Honigwachs

unread,
Jul 13, 2020, 2:29:55 PM7/13/20
to Discussion list for the Wikidata project, OpenRefine
In OpenRefine 3.3 adding the new service with above URL is fine, but when trying to remove the old service with "x" OpenRefine tries to start reconciling instead. Fails consequently, and hangs. Same for you all? Thanks. 

Yves P.

unread,
Jul 13, 2020, 2:32:34 PM7/13/20
to openr...@googlegroups.com
> In OpenRefine 3.3 adding the new service with above URL is fine, but when trying to remove the old service with "x" OpenRefine tries to start reconciling instead.
Saw this with Version 3.4-beta2

> Fails consequently, and hangs.
I retried and don't understand what I do exactly, but finish to work.

__
Yves

Thad Guidry

unread,
Jul 13, 2020, 2:35:52 PM7/13/20
to openr...@googlegroups.com
Would both of you mind trying out the latest development SNAPSHOT release? (backup your workspace as a precaution)

Click on the latest blue link within

Then let us know,



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/B227C97D-EA5F-48CF-8463-C4FA6B95B1BF%40gmail.com.

Yves P.

unread,
Jul 14, 2020, 1:43:15 PM7/14/20
to openr...@googlegroups.com
Would both of you mind trying out the latest development SNAPSHOT release? (backup your workspace as a precaution)
I downloaded Version 3.4-beta-327-g0645c2a and use it.

Then let us know,

I try to reconcile movie theaters name from wikipedia.
I have only one service on the left panel.
I delete it but the reconciliation start.
I press escape to stop it and retry another reconciliation : the left panel is empty.

Got this message on the console :
19:28:17.968 [                  command] Failed to guess cell types for load
{"q1":{"query":"VARIETES","limit":3},"q2":{"query":"ARLEQUIN","limit":3},"q3":{"query":"AMPHI","limit":3},"q4":{"query":"LA  GRENETTE","limit":3},"q5":{"query":"L'Etoile","limit":3},"q6":{"query":"CINEMA DE DIVONNE","limit":3},"q7":{"query":"VOLTAIRE","limit":3},"q8":{"query":"CINEMA LE PATIO","limit":3},"q9":{"query":"NOVELTY","limit":3},"q0":{"query":"CINE FESTIVAL","limit":3}} (4860ms)
java.io.IOException: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>

at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:197)
at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:110)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:131)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19:28:17.977 [                  command] Exception caught (9ms)
java.io.IOException: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>

at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:197)
at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:110)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:131)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I retried : added standard service with URL https://wdreconcile.toolforge.org/en/api

The reconciliation start automatically.

Reconcile cells in column NOM ETABLISSEMENT to type null
 0% complete 
Seem to be frozen (nothing on the shell console)

I let it running and tell you what happened…

__
Yves

Yves P.

unread,
Jul 14, 2020, 2:29:50 PM7/14/20
to openr...@googlegroups.com
I let it running and tell you what happened…
At that time, OR reconcile only 26% of 305 rows.

Shell console :

19:39:05.866 [                   refine] GET /command/core/get-csrf-token (239303ms)
19:39:05.880 [                   refine] POST /command/core/reconcile (14ms)
19:58:16.814 [    refine-standard-recon] Failed  - code: 502 message: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>
 (1150934ms)
20:18:18.292 [    refine-standard-recon] Failed  - code: 502 message: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>
 (1201478ms)
20:28:17.650 [    refine-standard-recon] Failed  - code: 502 message: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>
 (599358ms)


__
Yves

Antonin Delpeuch (lists)

unread,
Jul 14, 2020, 2:46:24 PM7/14/20
to openr...@googlegroups.com
Hi Yves,

It seems that you have been hit by an issue that seems to have been a
byproduct of the migration to a new domain name (that I would have been
keen to avoid, sadly it is outside my control).
https://phabricator.wikimedia.org/T257405

The solution to this might be for us to migrate outside this Wikimedia
managed infrastructure. I find it frustrating to be regularly struck by
deployment issues outside my control which put strain on our release
schedule (and eat up a lot of my time).

As a temporary solution, all I can recommend is that you run the service
locally (for instance using Docker):
https://github.com/wetneb/openrefine-wikibase#running-with-docker

Antonin
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/91B25F87-A9AA-46C7-B7BA-01D03AA183A0%40gmail.com
> <https://groups.google.com/d/msgid/openrefine/91B25F87-A9AA-46C7-B7BA-01D03AA183A0%40gmail.com?utm_medium=email&utm_source=footer>.

Yves P.

unread,
Jul 14, 2020, 2:48:47 PM7/14/20
to openr...@googlegroups.com
> As a temporary solution, all I can recommend is that you run the service
> locally (for instance using Docker):
> https://github.com/wetneb/openrefine-wikibase#running-with-docker
Nice. I hope that I could run it locally…

Merci,

__
Yves

Tom Morris

unread,
Jul 14, 2020, 3:19:10 PM7/14/20
to openr...@googlegroups.com
On Tue, Jul 14, 2020 at 2:46 PM Antonin Delpeuch (lists) <li...@antonin.delpeuch.eu> wrote:

The solution to this might be for us to migrate outside this Wikimedia
managed infrastructure. I find it frustrating to be regularly struck by
deployment issues outside my control which put strain on our release
schedule (and eat up a lot of my time).

I think a better solution would be to migrate the reconciliation service to the Wikidata team to support. 

It's much more closely aligned with Wikidata than OpenRefine and totally dependent on their underlying services for its quality of service.

Tom

Thad Guidry

unread,
Jul 14, 2020, 4:39:14 PM7/14/20
to openr...@googlegroups.com
Tom - Yes, all of us (Martin, Antonin, myself) agreed on an earlier advisory committee where we would like Wikidata to begin to absorb this service and maintain the QoS around such service.

Antonin - I know you have some plan in mind, but what would be the next steps to get Wikidata to begin to absorb such service?  I don't mind driving this effort with Lydia or others to relieve you, but I'd probably need your input on a kickoff call?



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAE9vqEFr6Z9xnZ6Z7N44H_RUcDGxXJ%2BvP37G5Z3Q_OJPV0wpDw%40mail.gmail.com.

Antonin Delpeuch (lists)

unread,
Jul 14, 2020, 7:17:40 PM7/14/20
to openr...@googlegroups.com
On 14/07/2020 21:18, Tom Morris wrote:
>
> I think a better solution would be to migrate the reconciliation service
> to the Wikidata team to support.
>
> It's much more closely aligned with Wikidata than OpenRefine and totally
> dependent on their underlying services for its quality of service.

I have tried to suggest that:
https://phabricator.wikimedia.org/T244847

If it is indeed taken over by WMDE, WMSE or any other organization, it
will likely take quite some time before something operational is in
place (at the current pace, a few years perhaps).

In the meantime I think it would really be worth having something more
functional than what we have now. This service is really crucial for a
lot of people (and for OpenRefine as a project). It is really not
expensive to run, so I am really tempted to just host an instance on a
dedicated server of mine.

The service has been generally unreliable for quite some time and it is
not really putting pressure on anyone to offer an alternative, so I
would not bet that by leaving the service in the current limbo we are
going to encourage other stakeholders to step up.

The general context is: WMDE/WMF are already struggling to maintain the
Wikidata Query Service, other orgs like WMSE are probably not extremely
keen to get new responsibilities in the current financial situation, and
so on.

Antonin

Yves P.

unread,
Jul 15, 2020, 5:41:55 AM7/15/20
to openr...@googlegroups.com

As a temporary solution, all I can recommend is that you run the service
locally (for instance using Docker):
https://github.com/wetneb/openrefine-wikibase#running-with-docker

I run the local instance this morning with the default settings.

Work reliably, thank you very much 😃

Took 6 seconds by query ? (I reconciled 280 items) :
redis_1      | 1:M 15 Jul 2020 08:33:12.095 * 100 changes in 300 seconds. Saving...
redis_1      | 1:M 15 Jul 2020 08:38:13.015 * 100 changes in 300 seconds. Saving...
redis_1      | 1:M 15 Jul 2020 08:43:14.075 * 100 changes in 300 seconds. Saving...
redis_1      | 1:M 15 Jul 2020 08:48:15.037 * 100 changes in 300 seconds. Saving...
redis_1      | 1:M 15 Jul 2020 08:53:16.010 * 100 changes in 300 seconds. Saving...
redis_1      | 1:M 15 Jul 2020 08:58:17.067 * 100 changes in 300 seconds. Saving...

__
Yves

Thad Guidry

unread,
Jul 15, 2020, 8:41:05 AM7/15/20
to openr...@googlegroups.com
Great Yves!  Thanks for letting us know!



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/0E507128-F150-4141-8F13-D1E0723AE0AC%40gmail.com.

Antonin Delpeuch (lists)

unread,
Jul 20, 2020, 9:55:04 AM7/20/20
to openr...@googlegroups.com
Hi all,

Unfortunately this migration has made the Wikidata reconciliation
service very unreliable. This comes from the hosting provider (WMF
Toolforge), who initiated the migration ([1], [2]).

Because I have no way to mitigate the issue in Toolforge myself, I have
set up an instance on a server of mine:
https://wikidata.reconci.link/

You can add it in OpenRefine as usual with
https://wikidata.reconci.link/en/api

This instance should be faster and more reliable than the previous ones.
Please report any issues you discover, though.

This is a transitional measure: I will keep hosting this instance as
long as necessary, but I would prefer it to be run by an organization,
rather than on my own server.

Apologies for the disruption this has caused to the community, and thank
you to the toolforge team for hosting the service until now!

Antonin

[1]: https://phabricator.wikimedia.org/T257405
[2]: https://github.com/wetneb/openrefine-wikibase/issues/83
Reply all
Reply to author
Forward
0 new messages