Uploading files having UTF-8 names

56 views
Skip to first unread message

Péter Király

unread,
Oct 29, 2024, 5:13:16 AM10/29/24
to dataverse...@googlegroups.com
Dear all,

I have a question. I have files with Greek (UTF-8) characters in their
names. I found that when I use API to upload files the saved file name
became very strange, and very far from the original name.

Here is the API call I use:
curl -H X-Dataverse-key:$API_KEY \
-X POST \
-F "file=@$ΜΕΡΟΣ.txt" \
-F 'jsonData={...}' \
"$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER"

the result is:
{
"status":"OK",
"data":{
"files":[
{
"label":"1ο Î^|Î^uΡÎ^=Σ.txt",
...
"dataFile":{
"filename":"1ο Î^|Î^uΡÎ^=Σ.txt",
...
}
}
]
}
}

I tried to make it explicit that our data are in UTF-8, but it failed:

curl -H X-Dataverse-key:$API_KEY \
-H "Content-Type: multipart/form-data; charset=UTF-8" \
-X POST \
-F "file=@$ΜΕΡΟΣ.txt" \
-F 'jsonData={...}' \
"$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER"
{
"status":"ERROR",
"code":400,
"message":"Bad Request. The API request cannot be completed with the
parameters supplied. Please check your code for typos, or consult our
API guide at http://guides.dataverse.org.",
"requestUrl":"https://..../api/v1/datasets/:persistentId/add?persistentId=doi:10....",
"requestMethod":"POST"
}

Do you have any suggestions to solve the problem?

Best,
Péter Király

--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

Weihong Xu

unread,
Oct 29, 2024, 8:15:00 AM10/29/24
to dataverse...@googlegroups.com
I think you need to specific UFT-8 encoding
/usr/local/payara6/bin/asadmin create-jvm-options "-Dfile.encoding=UTF8" 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CABFhGtmjKUBL53DFnVA9XXUhwsPDPkaJx_nTqvRa-%3DwV3_PdNQ%40mail.gmail.com.


--
Henry Xu
SDET Team Lead of RingCentral

Péter Király

unread,
Oct 30, 2024, 5:24:03 PM10/30/24
to dataverse...@googlegroups.com
Dear Henry,

thanks a lot for your hint. I added the JVM option, and restarted
Payara. Unfortunately it did not solved the problem, the stored file
name is still not the what I expect.

Note: if I use the web user interface to upload the same file via the
browser, it works.

Anybody has any other suggestion?

Best,
Péter
> To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CAA4TkwVRqbONBm9uUWt%3DbVtw%2BrV0o_0FCHfZujNQo1%2Bf9Tc2Yg%40mail.gmail.com.

Joshua Arulsamy

unread,
Oct 30, 2024, 6:19:29 PM10/30/24
to dataverse...@googlegroups.com
Hi Péter,

I'm not sure if it will work as I don't have an easy way to test, but you could
try specifying the filename in your curl request specifically. Since the web UI
upload works, it seems more like curl is damaging the filename before it gets to
Dataverse.

You could try something like this to manually set the filename:
curl -H X-Dataverse-key:$API_KEY \
     -H "Content-Type: multipart/form-data; charset=UTF-8" \
     -X POST \
     -F "file=@$ΜΕΡΟΣ.txt;filename=<FilenameYouWantOnDataverse>" \

     -F 'jsonData={...}' \
     "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER"

If that doesn't work, perhaps trying a different tool like Python and requests
to upload may be worthwhile. From looking at the curl manpages, it doesn't
really mention unicode filenames in particular, just file contents.

Best,

Josh

Reply all
Reply to author
Forward
0 new messages