"Add column by fetching URLs based on a column" stalls - anyway to know what's the problem.

108 views
Skip to first unread message

Aaron Tay

unread,
Oct 17, 2017, 6:10:04 AM10/17/17
to OpenRefine
Sorry for the basic question.


So for example I have a bunch of dois from Scopus and I ran them against apis with like oadoi or crossref data events.. It generally works fine.

But occasionally I run into a problem where Openrefine gets stuck at say 20%. And no matter how long I wait it seems to be stuck and I have no choice but to cancel? And all the work done so far is lost.

Is there anyway to know what is causing the error? I rerun the same query again, and every time I get stuck at the same percentage. 

Is it because the API has stop responding? Or the API is giving an error on one of the values? Is there anyway to tell?

Aaron

Ettore Rizza

unread,
Oct 17, 2017, 8:04:20 AM10/17/17
to OpenRefine
Hi Aaron,

I suspect it's related to this issue : https://github.com/OpenRefine/OpenRefine/issues/1219

But hard to say without seeing the Open Refine console. Can you send a sample of one hundred DOIs, so that I can do a test? 

Cheers

Thad Guidry

unread,
Oct 17, 2017, 8:36:13 AM10/17/17
to openr...@googlegroups.com

You can also try...When configuring your fetch...Make sure to choose the store error radio button to make OpenRefine store the error in the cell instead of leaving it blank.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Tay

unread,
Oct 19, 2017, 2:16:15 AM10/19/17
to OpenRefine
Setting store error in cell as opposed to blank seems to have done the trick! Thanks

After it's done, I see 2 rows with errors - text in red. It seems like 2 of them aren't dois.

I'm now trying the same fetch on those 2 rows using the default set errors to blank. It's stalling....

Ettore Rizza

unread,
Oct 19, 2017, 4:20:25 AM10/19/17
to OpenRefine
Re Aaron, 

based on a test performed with nearly a thousand DOI, everything looks fine. 




I have no error and the operation could be completed in a few minutes. I'm on Windows 10 with Open Refine 2.7. What is your configuration?

Aaron Tay

unread,
Oct 20, 2017, 5:40:32 AM10/20/17
to OpenRefine
Odd. Thanks for trying.

Anyway I've found that the error for that batch comes from these 2 dois

10.1086/68008

10.13306/j.1672-3813.2015.03.001

If I set on error "store error" it will not stall but these 2 dois will have error message in red.

I then try to run on these 2 dois but this time set on error to "Blank" and it gets stuck there.

I'm Windows 10, Version 2.7 [TRUNK] 

Thad Guidry

unread,
Oct 20, 2017, 12:06:55 PM10/20/17
to openr...@googlegroups.com

Aaron,

Give me the full url for one of those 2 failing fetches.

Ettore Rizza

unread,
Oct 20, 2017, 2:10:59 PM10/20/17
to OpenRefine
These two DOIs seems invalid. 




Here is the consol error :

Exception in thread "Thread-6" com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key https://api.oadoi.org/10.13306/j.1672-3813.2015.03.001?email=ettor...@gmail.com.
        at com
.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2407)
        at com
.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2375)
        at com
.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
        at com
.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
        at com
.google.common.cache.LocalCache.get(LocalCache.java:3990)
        at com
.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
        at com
.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
        at com
.google.refine.operations.column.ColumnAdditionByFetchingURLsOperation$ColumnAdditionByFetchingURLsProcess.cachedFetch(ColumnAdditionByFetchingURLsOperation.java:306)
        at com
.google.refine.operations.column.ColumnAdditionByFetchingURLsOperation$ColumnAdditionByFetchingURLsProcess.run(ColumnAdditionByFetchingURLsOperation.java:267)
        at java
.lang.Thread.run(Unknown Source)

This is certainly related to this issue, which will be corrected in the next version of Open Refine (2.7.1)

Thad Guidry

unread,
Oct 20, 2017, 2:49:19 PM10/20/17
to openr...@googlegroups.com
Ah, thanks Ettore.

Yeah, if that's CacheLoader issue, then agreed his problem will be fixed after we release 2.7.1

-Thad

Antonin Delpeuch (lists)

unread,
Oct 20, 2017, 6:45:06 PM10/20/17
to openr...@googlegroups.com
Aaron: in 2.7, disabling the cache should solve your issue (untick the
"cache responses" box).

The bug will indeed be solved in the new release.

Sorry about that!

Antonin
> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages