Direct upload on a Ceph object store

318 views
Skip to first unread message

paul...@dans.knaw.nl

unread,
Oct 15, 2024, 8:41:11 AM10/15/24
to Dataverse Users Community
Somehow direct upload working with a Ceph object store fails because of CORS problems.
The guides explain what to do (https://guides.dataverse.org/en/latest/developers/big-data-support.html#allow-cors-for-s3-buckets).

Using s3cmd I have set the CORS rules to this (XML instead of JSON):

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <ID>AllowDataverse</ID>
    <AllowedOrigin>http://test.lifesciences.datastations.nl</AllowedOrigin>
    <AllowedOrigin>https://test.lifesciences.datastations.nl</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
    <ExposeHeader>ETag</ExposeHeader>
    <ExposeHeader>Accept-Ranges</ExposeHeader>
    <ExposeHeader>Content-Encoding</ExposeHeader>
    <ExposeHeader>Content-Range</ExposeHeader>
</CORSRule>
</CORSConfiguration>

However if I try to upload, the following is shown in the console:
"Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at  ... etc."

Everything else seems to work, incuding direct download.
Also tried <AllowedOrigin>*</AllowedOrigin> but that made no difference.

If there are people with direct upload working with Ceph, maye you can spot what I am doing wrong here.

Cheers,
Paul

Joshua Arulsamy

unread,
Oct 15, 2024, 9:22:38 AM10/15/24
to dataverse...@googlegroups.com
Hi  Paul,

I have a dataverse instance backed by Ceph (via RGW) for direct upload just as you are describing.
I didn't have any issue getting CORS to work. My CORS policy is almost the same as yours:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>https://dataverse.arcc.uwyo.edu</AllowedOrigin>

    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>

    <AllowedHeader>*</AllowedHeader>
    <ExposeHeader>ETag</ExposeHeader>
    <ExposeHeader>Accept-Ranges</ExposeHeader>
    <ExposeHeader>Content-Encoding</ExposeHeader>
    <ExposeHeader>Content-Range</ExposeHeader>
</CORSRule>
</CORSConfiguration>

The only differences I see are that you have an ID field that I don't have, and I have a MaxAgeSeconds that you don't have. Perhaps that is the issue? 
I know Ceph is somewhat picky about the XML, and if it is invalid it simply ignores it. I am running an older version of Ceph though (v16.2.10-266 Pacific), so things may have changed in later versions.

I set my policy with this s3cmd command (it sounds like you probably did the same):

s3cmd --host <redacted> --access_key=<redacted> --secret_key=<redacted> setcors dv-cors.xml s3://dataverse-bigdata

If that doesn't solve it, maybe you are targeting the wrong bucket in your S3 config on dataverse? I made that error the first time I was setting this up.

Best,

Josh



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/481caa9a-2093-4077-96b4-35e9f3f1c840n%40googlegroups.com.

Joshua Arulsamy

unread,
Oct 15, 2024, 9:22:38 AM10/15/24
to Dataverse Users Community
Hi  Paul,

I have a dataverse instance backed by Ceph (via RGW) for direct upload just as you are describing.
I didn't have any issue getting CORS to work. My CORS policy is almost the same as yours:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>https://dataverse.arcc.uwyo.edu</AllowedOrigin>

    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>

    <AllowedHeader>*</AllowedHeader>
    <ExposeHeader>ETag</ExposeHeader>
    <ExposeHeader>Accept-Ranges</ExposeHeader>
    <ExposeHeader>Content-Encoding</ExposeHeader>
    <ExposeHeader>Content-Range</ExposeHeader>
</CORSRule>
</CORSConfiguration>
The only differences I see are that you have an ID field that I don't have, and I have a MaxAgeSeconds that you don't have. Perhaps that is the issue? 
I know Ceph is somewhat picky about the XML, and if it is invalid it simply ignores it. I am running an older version of Ceph though (v16.2.10-266 Pacific), so things may have changed in later versions.

I set my policy with this s3cmd command (it sounds like you probably did the same):

s3cmd --host <redacted> --access_key=<redacted> --secret_key=<redacted> setcors dv-cors.xml s3://dataverse-bigdata

If that doesn't solve it, maybe you are targeting the wrong bucket in your S3 config on dataverse? I made that error the first time I was setting this up.

Best,

Josh


Paul Boon

unread,
Oct 15, 2024, 9:57:22 AM10/15/24
to dataverse...@googlegroups.com
Hi Josh,

Thanks for the suggestion, I did try to make my CORS similar to the one you have:
  • remove that ID
  • add the MaxAgeSeconds
  • remove the http url (leave only that https url).
So I now have set that to my bucket as shown below:
s3cmd info s3://store
s3://store/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    none
   CORS:      <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><CORSRule><AllowedMethod>GET</AllowedMethod><AllowedMethod>PUT</AllowedMethod><AllowedOrigin>http://test.lifesciences.datastations.nl</AllowedOrigin><AllowedOrigin>https://test.lifesciences.datastations.nl</AllowedOrigin><AllowedHeader>*</AllowedHeader><MaxAgeSeconds>3000</MaxAgeSeconds><ExposeHeader>ETag</ExposeHeader><ExposeHeader>Accept-Ranges</ExposeHeader><ExposeHeader>Content-Encoding</ExposeHeader><ExposeHeader>Content-Range</ExposeHeader></CORSRule></CORSConfiguration>
   ACL:       knaw_datastation_test_lhms: FULL_CONTROL

Unfortunately, it does not help.

Paul



From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> on behalf of Joshua Arulsamy <joshua....@gmail.com>
Sent: Tuesday, October 15, 2024 3:17 PM
To: Dataverse Users Community <dataverse...@googlegroups.com>
Subject: [Dataverse-Users] Re: Direct upload on a Ceph object store
 
You don't often get email from joshua....@gmail.com. Learn why this is important
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

James Myers

unread,
Oct 15, 2024, 10:10:03 AM10/15/24
to dataverse...@googlegroups.com

Paul,

I your error “Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at  ...” – what’s the rest of the message? I think there’s supposed to be a Reason in it somewhere.

 

Other ideas – I had trouble with previewers when trying to restrict the origins and wrote some ideas in https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 where I found I had to add a Content-Security-Policy tag.

 

Some S3 servers don’t support the x-amz-tagging header we use by default. Have you tried dataverse.files.<id>.disable-tagging=true ? Do you have the other settings recommended for Surf stores - dataverse.files.<id>.payload-signing=true, dataverse.files.<id>.chunked-encoding=false and dataverse.files.<id>.path-style-request=true ?  I’m not sure any of these could cause a CORS error (maybe the tagging?), but thought I’d mention them just in case.

 

-- Jim

Paul Boon

unread,
Oct 15, 2024, 10:28:19 AM10/15/24
to dataverse...@googlegroups.com
Hi Jim,

I get

And I also tried disabling 'tagging' and I have the jvm options as suggested. Note that I am able to upload and download files non-direct, and also download direct. Only that CORS with direct upload is getting in the way.

Paul

From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> on behalf of James Myers <qqm...@hotmail.com>
Sent: Tuesday, October 15, 2024 4:09 PM
To: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
Subject: RE: [Dataverse-Users] Re: Direct upload on a Ceph object store
 

James Myers

unread,
Oct 15, 2024, 10:37:48 AM10/15/24
to dataverse...@googlegroups.com

Can you see in the browser console if the Origin Header being sent with your request is null? That’s the issue I saw when I was looking at making previewers work with allowed origins.

Paul Boon

unread,
Oct 15, 2024, 10:50:27 AM10/15/24
to dataverse...@googlegroups.com
Yes, It is doing an OPTIONS request as part of a preflight, See screendump below:





Sent: Tuesday, October 15, 2024 4:37 PM
Reply all
Reply to author
Forward
0 new messages