DVWebloader double-encoded persistentId

43 views
Skip to first unread message

Deirdre Kirmis

unread,
Jan 15, 2026, 5:38:43 PM (2 days ago) Jan 15
to dataverse...@googlegroups.com

I think this is a question for Jim Myers .. we are having an issue with DVWebloader in that when we invoke it it sits on the “Getting Dataset information” screen and never uploads the folder/files.

 

In the Payara log I see errors like “Error parsing identifier: 10.48349%2FASU%2FWW9UMD: ':<authority>/<identifier>' not found in string]]” and “Servlet threw exception java.lang.IllegalArgumentException: Failed to parse identifier: doi%3A10.48349%2FASU%2FWW9UMD”.

 

A little research seems to indicate that DVWebLoader may be double-encoding the DOI when it builds/redirects to the dataset URL. But it could also possibly be something in my config as I recently realized that I’m using AJP proxying and not HTTP. Or I missed an upgrade or announcement somewhere. Either is possible.

 

Also, just a question sort of on topic. I have been under the impression that DVWebloader DOES use direct upload features (ie: presigned URLs and chunking), but now I’m not sure. It requires that direct upload be enabled, and this doc seems to indicate that it does use the API and direct upload. Thanks for any info.

 

Thanks,

Deirdre

James Myers

unread,
Jan 15, 2026, 5:59:43 PM (2 days ago) Jan 15
to dataverse...@googlegroups.com

Deidre,

Since DVWebloader works elsewhere, I’d think the app/JavaScript itself should be OK. The best way to debug would be to look in the browser dev console(the console tab for messages from the script, the network tab to see http calls being made) – you should be able to see what Dataverse launches it with and what it sends back which might help you see where double encoding is happening, or which calls fail, or if there’s some other issue.

 

In general, I think AJP, http, and https forwarding can all work.

 

W.r.t. DVWebloader – yes it does do direct-upload (only) with presigning and multipart uploads. The later versions also send one update to Dataverse at the end (versus one per file) to be more efficient. The latest versions have more error reporting to the user when things go wrong and some nice select/unselect options etc.

 

Hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/PH0PR06MB101505592A4C7260DD367FC25C78CA%40PH0PR06MB10150.namprd06.prod.outlook.com.

Deirdre Kirmis

unread,
Jan 15, 2026, 9:08:26 PM (2 days ago) Jan 15
to dataverse...@googlegroups.com
Ah yea so sorry about that. The double encoding thing was a different issue. Didn’t mean to blame dvwebloader. Apparently our cloudflare rules are blocking it. 

Thanks for the info about direct upload.


From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> on behalf of James Myers <qqm...@hotmail.com>
Sent: Thursday, January 15, 2026 3:59:37 PM
To: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
Subject: [Dataverse-Users] RE: DVWebloader double-encoded persistentId
 

Deirdre Kirmis

unread,
Jan 16, 2026, 1:40:11 AM (yesterday) Jan 16
to dataverse...@googlegroups.com

Actually what I’ve finally found is that DVWebloader isn’t working because I’m using the github hosted version. I was getting CORS errors because I didn’t have gdcc.github.io as an allowed origin. I don’t know why it worked for so long, it just stopped working. I’m guessing I really need to install DVwebloader locally? Do I just clone the repo and run the installer, and then change the toolurl in the db? Or as you mention fork and just point the toolurl to that copy?

 

Thanks for any guidance.

 

Deirdre Kirmis

ASU Library

Arizona State University 

James Myers

unread,
Jan 16, 2026, 8:10:31 AM (yesterday) Jan 16
to dataverse...@googlegroups.com

Glad to hear you found/fixed the double encoding, etc. FWIW: CORS support is updated/fixed in v6.9 (see #11744 and related PR) – perhaps you’ve run into one of the things it addresses.

 

In any case, to run a local copy – cloning or just copying everything from the src dir is fine. DVwebloader does rely on the relative path staying the same (so that js css and images are subdirs of where the html page sits). The one additional thing you could try would be to run the localinstall.sh script. That retrieves local copies of the other scripts being used (jquery, crypto) and alters the html page to run those instead so you have no dependencies on remote sites. Then, as you say, just update the :WebloaderUrl setting (via API or in the db).

Deirdre Kirmis

unread,
Jan 16, 2026, 9:19:31 AM (yesterday) Jan 16
to dataverse...@googlegroups.com

Ah yes, I think this is what I am seeing, or something similar, except I am seeing “CORS Missing Allowed Origin” or “Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://<our dv url>/api/files/fixityAlgorithm.” Would this be the same issue? I have the required CORS rules on the bucket (set in the AWS console), and I have the :WebloaderUrl set to the github dvwebloader URL. I spent much time last night messing with the headers trying to make it work but nothing seemed to fix it. I made the ACAO point to the github URL directly and then it worked, but the previewers stopped working (I’m pointing to external there, too).

 

Thanks much,

Deirdre

Deirdre Kirmis

unread,
Jan 16, 2026, 4:28:45 PM (22 hours ago) Jan 16
to dataverse...@googlegroups.com

Sorry one more question, as I apparently am not quite understanding. When I upgraded to v6.7 I followed the instructions and set the dataverse.cors.origin and deleted :AllowCors, but I didn’t set the other CORS JVM settings, I think I assumed the bucket CORS policy would be used. In the guide instructions for CORS (Big Data) it discusses setting the CORS rules in the bucket policy, which we’ve had for a while now, but it also talks about setting dataverse.cors.origin to match that (which I’ve done). With v6.9 do we need to set the other CORS JVM options, or will it use the rules specified on the bucket policy in AWS (ie: do we need both)?

 

And with all of the externals tools is there a suggested list of origins to use for dataverse.cors.origin?

Thanks!

Deirdre Kirmis

unread,
Jan 16, 2026, 4:51:00 PM (22 hours ago) Jan 16
to dataverse...@googlegroups.com

Ah, I see there are defaults, so assuming those override the bucket policy.

James Myers

unread,
Jan 16, 2026, 5:23:34 PM (21 hours ago) Jan 16
to dataverse...@googlegroups.com

It’s late on Friday, so I may be off – the Dataverse and bucket settings are independent (though you need similar settings for both for everything to work). The Dataverse setting affects what a script from some other site (gdcc.io) can read from your Dataverse api while the bucket settings are about what it can read directly from the s3 bucket for direct upload/downloads.

 

For 6.9, the main difference is “CORS is no longer enabled by default.“, so you have to set the origin to * or some list of sites for CORS to be on at all. (If I understand the PR in 6.9, #11745, using multiple origins prior to 6.9 was not working correctly without other changes in Apache/nginx, so * was the only out-of-the-box solution to make everything work.) The guides section on CORS settings lists the defaults for the other settings, which I think are broad enough to allow previewers/DVWebloader to work.

Deirdre Kirmis

unread,
Jan 16, 2026, 5:28:17 PM (21 hours ago) Jan 16
to dataverse...@googlegroups.com

Ah! That clears up a lot! Thanks so much for the response and clarification. Sometimes I need various methods of explanation! =)

Happy weekend!

Reply all
Reply to author
Forward
0 new messages