Problem with incomplete schema when one particular property is added

119 views
Skip to first unread message

Dr Thneed

unread,
Feb 2, 2021, 2:58:14 AM2/2/21
to OpenRefine
Hi. I am having trouble uploading publications to Wikidata because of a problem with the page(s) property.

I have a perfectly valid schema but as soon as I add a statement for "page(s)" (property P304 in Wikidata) the schema becomes invalid. It doesn't seem to be a data problem as even if I just enter a page number by hand (rather than a column from the spreadsheet) I am told the schema is incomplete, and there is no preview available.

I'm using 3.4.1 on a Mac. Any ideas?
Thanks,
Tamsin

Owen Stephens

unread,
Feb 2, 2021, 10:07:53 AM2/2/21
to OpenRefine
Can you share  the schema or a screenshot of the schema config?

Dr Thneed

unread,
Feb 2, 2021, 3:11:41 PM2/2/21
to openr...@googlegroups.com
Hi Owen,

Attached some screenshots to demonstrate the problem, but I've tried this in multiple projects and the page(s) field breaks any and every schema I've tried it on. The issue was brought to my attention by a librarian friend who is having the same problem (different machine, different data).
Screenshots:
1. Functioning schema (16 issues are statements created without references)
2. Same schema and data with page(s) property and a column of page numbers added
3. Preview related to 2 (ie not a functional schema)
4. Deleting the column of page numbers and replacing with a typed number
5. Preview related to 4 (ie still not a functional schema)

Thanks,
Tamsin

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com.


--
Tamsin Braisher

Dunedin
New Zealand
Screen Shot 2021-02-03 at 8.52.20 AM.png
Screen Shot 2021-02-03 at 8.53.08 AM.png
Screen Shot 2021-02-03 at 8.53.12 AM.png
Screen Shot 2021-02-03 at 8.53.42 AM.png
Screen Shot 2021-02-03 at 8.53.45 AM.png

Owen Stephens

unread,
Feb 2, 2021, 4:33:20 PM2/2/21
to OpenRefine
Thanks Tamsin

The wikidata pages property (P304) is meant to be validated by a regular expression but there is a problem with this currently and I'm not sure whether the solution needs to be at the Wikidata end or the OpenRefine end.
An issue with using P304 in OpenRefine was reported in October https://github.com/OpenRefine/OpenRefine/issues/3274#issuecomment-710500625 and there was some discussion on the P304 wikidata discussion page in November https://www.wikidata.org/wiki/Property_talk:P304#Possibly_broken_regex 

This discussion resulted in some changes being made to the regular expression in Wikidata, but OpenRefine is still not able to use the regular expression successfully - which results in the error you are seeing (the issue isn't really an incomplete schema, but failure to validate the content of the schema I think).

I'm not sure if ultimately the problem is with the modified regular expression in Wikidata, or if that is now correct and the problem lies with OpenRefine - I might need to call on someone with some more expertise - I'm hoping Antonin, who wrote much of the Wikidata integration, might see this and comment. In the meantime the only thing I can think to suggest is to use the schema as you currently have it, but try using the "Export as Quickstatements" instead to generate a quickstatements file that can be used to update wikidata via the quick statements tool - this is a bit of a work around but looks to me like it would work.

Best wishes
Owen

Anyone looking to diagnose the problem, the error I'm seeing in OpenRefine log is:

21:12:46.756 [                  command] Exception caught (381ms)
java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2
(?'r'(?'p'(?'d'\d+)(?'w'[A-Za-z]+)?|\g'w'(?:\g'n'\g'w'?)?)(?:[-–]\g'p')?)(?:,\s?\g'r')*
  ^
at java.base/java.util.regex.Pattern.error(Pattern.java:2015)
at java.base/java.util.regex.Pattern.group0(Pattern.java:3034)
at java.base/java.util.regex.Pattern.sequence(Pattern.java:2111)
at java.base/java.util.regex.Pattern.expr(Pattern.java:2056)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1778)
at java.base/java.util.regex.Pattern.<init>(Pattern.java:1427)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1068)
at org.openrefine.wikidata.qa.scrutinizers.FormatScrutinizer.getPattern(FormatScrutinizer.java:68)
at org.openrefine.wikidata.qa.scrutinizers.FormatScrutinizer.scrutinize(FormatScrutinizer.java:80)
at org.openrefine.wikidata.qa.scrutinizers.SnakScrutinizer.scrutinize(SnakScrutinizer.java:57)
at org.openrefine.wikidata.qa.scrutinizers.StatementScrutinizer.scrutinize(StatementScrutinizer.java:36)
at org.openrefine.wikidata.qa.EditInspector.inspect(EditInspector.java:107)
at org.openrefine.wikidata.commands.PreviewWikibaseSchemaCommand.doPost(PreviewWikibaseSchemaCommand.java:92)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Dr Thneed

unread,
Feb 2, 2021, 5:06:21 PM2/2/21
to openr...@googlegroups.com
Thanks Owen, it's good to understand some of the background to the problem. I guess I'll wait to see if the problem might be at the OpenRefine end before I post about it at Wikidata, it seems like the regex hasn't been changed since it was thought to be fixed back in November. Unfortunately you can't export statements via QuickStatements if OpenRefine thinks the schema isn't valid, so the workaround isn't possible (message says "Your schema is incomplete so it cannot be saved yet." when you try to export).
Cheers,
Tamsin


Owen Stephens

unread,
Feb 2, 2021, 6:10:45 PM2/2/21
to OpenRefine
Hi Tamsin - no problem!
For me the export to Quickstatements worked even though the schema wasn't validated  - it's possible that my very skeleton example I set up is simpler, but it didn't seem to mind that the schema wasn't validated.

Best wishes

Owen

Antonin Delpeuch (lists)

unread,
Feb 3, 2021, 2:34:17 AM2/3/21
to openr...@googlegroups.com
Hi Tamsin and Owen,

Yes, this is a known problem:
https://github.com/OpenRefine/OpenRefine/issues/3274
We should definitely fix this on our end. In the meantime, a temporary
solution would be to fix the regex on Wikidata's side.

Best,
Antonin
> <https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>
>
> --
> Tamsin Braisher
>
> Dunedin
> New Zealand
>
> --
> You received this message because you are subscribed to the
> Google Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to openrefine+...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com
> <https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>
>
> --
> Tamsin Braisher
>
> Dunedin
> New Zealand
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/ca4905c4-e4c1-4247-bd07-53afa5c5ce05n%40googlegroups.com
> <https://groups.google.com/d/msgid/openrefine/ca4905c4-e4c1-4247-bd07-53afa5c5ce05n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Dr Thneed

unread,
Feb 3, 2021, 3:58:32 PM2/3/21
to openr...@googlegroups.com
Hi Antonin,

Thank you for responding. Just so I'm clear - the regex is still a problem, AND there is a fix needed in OpenRefine? The regex on Wikidata was changed in response to the issue being raised in November, and doesn't appear to have been changed since. Did the changes that were made then not fix the problem on the Wikidata side?
Cheers,
Tamsin

Antonin Delpeuch (lists)

unread,
Feb 3, 2021, 4:08:56 PM2/3/21
to openr...@googlegroups.com
Hi Tamsin,

OpenRefine needs to be fixed so it does not become unusable when someone
inputs a broken regular expression in Wikidata.

Before this happens, OpenRefine users can circumvent the fact that
OpenRefine does not handle broken regular expressions well by making
sure that Wikidata does not contain broken regular expressions.

If you click on the current regular expression at
https://www.wikidata.org/wiki/Property:P304#P2302
you will be taken to regex101.com where you can try the regular
expression interactively and check its validity.

At the moment it seems that the expression is not valid, in any of the
flavors supported by regex101.com. Therefore it is to be expected that
OpenRefine still chokes on it.

Let me know if anything is unclear.

Antonin
> <mailto:tams...@gmail.com> wrote:
> >
> >     Thanks Owen, it's good to understand some of the background to the
> >     problem. I guess I'll wait to see if the problem might be at the
> >     OpenRefine end before I post about it at Wikidata, it seems
> like the
> >     regex hasn't been changed since it was thought to be fixed back in
> >     November. Unfortunately you can't export statements via
> >     QuickStatements if OpenRefine thinks the schema isn't valid,
> so the
> >     workaround isn't possible (message says "Your schema is incomplete
> >     so it cannot be saved yet." when you try to export).
> >     Cheers,
> >     Tamsin
> >
> >
> >     On Wed, 3 Feb 2021 at 10:33, Owen Stephens
> <http://org.openrefine.wikidata.qa>.scrutinizers.FormatScrutinizer.getPattern(FormatScrutinizer.java:68)
> >         at
> >         org.openrefine.wikidata.qa
> <http://org.openrefine.wikidata.qa>.scrutinizers.FormatScrutinizer.scrutinize(FormatScrutinizer.java:80)
> >         at
> >         org.openrefine.wikidata.qa
> <http://org.openrefine.wikidata.qa>.scrutinizers.SnakScrutinizer.scrutinize(SnakScrutinizer.java:57)
> >         at
> >         org.openrefine.wikidata.qa
> <http://org.openrefine.wikidata.qa>.scrutinizers.StatementScrutinizer.scrutinize(StatementScrutinizer.java:36)
> >         at
> >         org.openrefine.wikidata.qa
> <http://org.openrefine.wikidata.qa>.EditInspector.inspect(EditInspector.java:107)
> tams...@gmail.com <mailto:tams...@gmail.com>
> >         wrote:
> >
> >             Hi Owen,
> >
> >             Attached some screenshots to demonstrate the problem, but
> >             I've tried this in multiple projects and the page(s) field
> >             breaks any and every schema I've tried it on. The
> issue was
> >             brought to my attention by a librarian friend who is
> having
> >             the same problem (different machine, different data).
> >             Screenshots:
> >             1. Functioning schema (16 issues are statements created
> >             without references)
> >             2. Same schema and data with page(s) property and a column
> >             of page numbers added
> >             3. Preview related to 2 (ie not a functional schema)
> >             4. Deleting the column of page numbers and replacing
> with a
> >             typed number
> >             5. Preview related to 4 (ie still not a functional schema)
> >
> >             Thanks,
> >             Tamsin
> >
> >             On Wed, 3 Feb 2021 at 04:07, Owen Stephens
> >             <ow...@ostephens.com <mailto:ow...@ostephens.com>> wrote:
> >
> >                 Can you share  the schema or a screenshot of the
> schema
> >                 config?
> >
> >                 On Tuesday, February 2, 2021 at 7:58:14 AM UTC
> <mailto:openrefine%2B...@googlegroups.com>.
> >                 To view this discussion on the web visit
> >               
>  https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com
> <https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com>
> >               
>  <https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/openrefine/1f2a1eb5-43dd-4e13-abcd-f40d63219861n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
> >
> >
> >
> >             --
> >             Tamsin Braisher
> >
> >             Dunedin
> >             New Zealand
> >
> >         --
> >         You received this message because you are subscribed to the
> >         Google Groups "OpenRefine" group.
> >         To unsubscribe from this group and stop receiving emails from
> >         it, send an email to openrefine+...@googlegroups.com
> <mailto:openrefine%2B...@googlegroups.com>.
> >
> >         To view this discussion on the web visit
> >       
>  https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com
> <https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com>
> >       
>  <https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/openrefine/07e6b533-5270-4d34-822c-b8a254076123n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
> >
> >
> >
> >     --
> >     Tamsin Braisher
> >
> >     Dunedin
> >     New Zealand
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "OpenRefine" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to openrefine+...@googlegroups.com
> <mailto:openrefine%2Bunsu...@googlegroups.com>
> > <mailto:openrefine+...@googlegroups.com
> <mailto:openrefine%2Bunsu...@googlegroups.com>>.
> <https://groups.google.com/d/msgid/openrefine/ca4905c4-e4c1-4247-bd07-53afa5c5ce05n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/openrefine/ca4905c4-e4c1-4247-bd07-53afa5c5ce05n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to openrefine+...@googlegroups.com
> <mailto:openrefine%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/153fb75e-037c-b258-9b38-123277671557%40antonin.delpeuch.eu
> <https://groups.google.com/d/msgid/openrefine/153fb75e-037c-b258-9b38-123277671557%40antonin.delpeuch.eu>.
>
>
>
> --
> Tamsin Braisher
>
> Dunedin
> New Zealand
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/CAH8gOZyFWeBmNpgW87j6uzuunkjM%3DbT9W%2BCa%2BP%3DQipABC7dtuQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/openrefine/CAH8gOZyFWeBmNpgW87j6uzuunkjM%3DbT9W%2BCa%2BP%3DQipABC7dtuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Dr Thneed

unread,
Feb 3, 2021, 4:28:55 PM2/3/21
to openr...@googlegroups.com
Thank you Antonin, I understand much better now. I'll call for help with fixing the expression on Wikidata.
Thanks,
Tamsin

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/edd17feb-e26b-c588-7493-84ae582aa669%40antonin.delpeuch.eu.

Owen Stephens

unread,
Feb 3, 2021, 6:31:48 PM2/3/21
to OpenRefine
Thanks both - I had a look at that regular expression on P304 and I think a typo had just crept in at some point - I've corrected it, and so now the regular expression for P304 validates on regex101 and so hopefully this will mean you can use the P304 property successfully in OpenRefine schema again

Owen

Dr Thneed

unread,
Feb 5, 2021, 3:57:42 PM2/5/21
to openr...@googlegroups.com
Thanks Owen. P304 still breaks the schema for me BUT importantly the workaround of exporting edits to QuickStatements now works, so we are able to add pages to bibliographic items again. Thank you!

You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/PhT53jFaix8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/a6f8f808-971b-4bb7-8d30-40ba8a78fc46n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages