question about documentation for storing files in a specific store file

已查看 43 次
跳至第一个未读帖子

Jamie Jamison

未读,
2020年6月1日 19:41:192020/6/1
收件人 Dataverse Users Community
I'm trying to follow the documentation to put files in a specific s3 bucket.  (http://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#id7)

So far all I'm getting is a 400-bad request "browser sent a request that this server could not understand"  error.

As an example, to list the available storageDrivers:  

curl -H “X-Dataverse-key:---api-key---http://test.dataverse.ucla.edu/api/admin/storageDrivers

Same result whether I use $API_TOKEN and $SERVER variables or type in api and url.  I haven't found anything that seems to be helpful in the httpd log files. 

Thank you,
Jamie Jamison



James Myers

未读,
2020年6月1日 20:25:002020/6/1
收件人 dataverse...@googlegroups.com

Jamie, Looks like a typo in the docs. That call is coded as :

curl http://localhost:8080/api/admin/dataverse/storageDrivers

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/19fb34a3-4d9e-44a0-96b3-83ffc69fd64d%40googlegroups.com.

Jamie Jamison

未读,
2020年6月1日 23:09:332020/6/1
收件人 Dataverse Users Community
ok, now it works.  Do you want me to put in an issue for the documentation typo?


On Monday, June 1, 2020 at 5:25:00 PM UTC-7, Jim Myers wrote:

Jamie, Looks like a typo in the docs. That call is coded as :

curl http://localhost:8080/api/admin/dataverse/storageDrivers

 

-- Jim

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of Jamie Jamison
Sent: Monday, June 01, 2020 7:41 PM
To: Dataverse Users Community
Subject: [Dataverse-Users] question about documentation for storing files in a specific store file

 

I'm trying to follow the documentation to put files in a specific s3 bucket.  (http://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#id7)

 

So far all I'm getting is a 400-bad request "browser sent a request that this server could not understand"  error.

 

As an example, to list the available storageDrivers:  

 

curl -H “X-Dataverse-key:---api-key---http://test.dataverse.ucla.edu/api/admin/storageDrivers

 

Same result whether I use $API_TOKEN and $SERVER variables or type in api and url.  I haven't found anything that seems to be helpful in the httpd log files. 

 

Thank you,

Jamie Jamison

 

 

 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

未读,
2020年6月1日 23:13:572020/6/1
收件人 dataverse...@googlegroups.com

That would be great.

Thanks,

-- Jim

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of Jamie Jamison
Sent: Monday, June 01, 2020 11:10 PM
To: Dataverse Users Community
Subject: Re: [Dataverse-Users] question about documentation for storing files in a specific store file

 

ok, now it works.  Do you want me to put in an issue for the documentation typo?



On Monday, June 1, 2020 at 5:25:00 PM UTC-7, Jim Myers wrote:

Jamie, Looks like a typo in the docs. That call is coded as :

curl http://localhost:8080/api/admin/dataverse/storageDrivers

 

-- Jim

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of Jamie Jamison
Sent: Monday, June 01, 2020 7:41 PM
To: Dataverse Users Community
Subject: [Dataverse-Users] question about documentation for storing files in a specific store file

 

I'm trying to follow the documentation to put files in a specific s3 bucket.  (http://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#id7)

 

So far all I'm getting is a 400-bad request "browser sent a request that this server could not understand"  error.

 

As an example, to list the available storageDrivers:  

 

curl -H “X-Dataverse-key:---api-key--- http://test.dataverse.ucla.edu/api/admin/storageDrivers

 

Same result whether I use $API_TOKEN and $SERVER variables or type in api and url.  I haven't found anything that seems to be helpful in the httpd log files. 

 

Thank you,

Jamie Jamison

 

 

 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/7ae6db92-6a80-481e-8f88-94477a784df0%40googlegroups.com.

Jamie Jamison

未读,
2020年6月2日 00:02:282020/6/2
收件人 Dataverse Users Community
Followup.  Since I can have multiple s3 buckets.  Do they ALL have to have their own storageDriverLabel.  Is there a way to have a default s3 bucket for the site?

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

未读,
2020年6月2日 01:11:082020/6/2
收件人 dataverse...@googlegroups.com

The labels are meant as human-readable – they show up for admins when you edit the General Settings of a Dataverse. Each storage driver should have a unique id and unique label so the system (id) and humans (label) can tell them apart.

 

Each driver is associated with a bucket (or file path for file storage drivers, etc.) and you direct Dataset content to a bucket or file path by assigning the corresponding storageDriver to its parent Dataverse.

 

You assign a default using the -Ddataverse.files.storage-driver-id=<id> jvm option. Installations with a single storage driver set it to the default and don’t ever have to worry about assigning a storage driver per Dataverse. With multiple drivers, you either use the API or have an admin assign a driver by Editing a Dataverse’s General Settings.

 

(FWIW: The original idea was that each storage driver would point to a different bucket (just talking about S3 drivers here), but, with the ability to set direct S3 upload/download, it could make sense, as Harvard is considering, to have two drivers point to the same bucket while having different options. This would allow you to have default uploads go to the bucket normally while setting up some Dataverses to use a different driver set to do direct uploads to the same bucket (presumably with a larger file size limit as well.))

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/138c400a-614f-4858-921d-4ccf06e7964a%40googlegroups.com.

Philip Durbin

未读,
2020年6月2日 10:06:082020/6/2
收件人 dataverse...@googlegroups.com
Don't mind me. I'm just jumping in here for a moment to say that I just fixed the typo in https://github.com/IQSS/dataverse/pull/6954

Thanks Jamie for reporting the problem and Jim for providing the solution.



--

Jamie Jamison

未读,
2020年6月2日 20:41:312020/6/2
收件人 Dataverse Users Community
I must still be coding this incorrectly.   When I display the storageDrivers I only get the 2nd.   The first should be the default s3 bucket and the 2nd only for the LARIAC gis data.

The code I've been trying to use:
<jvm-options>-Ddataverse.files.storage-driver-id=s3-test-dataverse</jvm-options>
        <jvm-options>-Ddataverse.files.s3-url-expiration-minutes=120</jvm-options>
        <jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>
        <jvm-options>-Ddataverse.files.s3.label=s3-test-dataverse</jvm-options>
        <jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-oregon</jvm-options>
        <jvm-options>-Ddataverse.files.s3.label=s3-test-lariac</jvm-options>        
        <jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-lariac</jvm-options>    

Jamie

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

未读,
2020年6月3日 11:17:072020/6/3
收件人 dataverse...@googlegroups.com

Jamie,

The change to multi-store means that the entry in the jvm options has to include the <id>, not just ‘s3’. So your options should be something like (pick your own ids):

 

        <jvm-options>-Ddataverse.files.storage-driver-id=s3-test-dataverse</jvm-options>

       

        <jvm-options>-Ddataverse.files.s3-url-expiration-minutes=120</jvm-options>

 

Instead – one per storagedriver:

        <jvm-options>-Ddataverse.files.s3lariac.url-expiration-minutes=120</jvm-options>

        <jvm-options>-Ddataverse.files.s3.url-expiration-minutes=120</jvm-options>

 

        <jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>

        <jvm-options>-Ddataverse.files.s3.label=s3-test-dataverse</jvm-options>

        <jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-oregon</jvm-options>

        <jvm-options>-Ddataverse.files.s3.label=s3-test-lariac</jvm-options>        

        <jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-lariac</jvm-options>    

 

The options for the second storagedriver need entries with that id in them (I picked ‘s3lariac’):

 

        <jvm-options>-Ddataverse.files.s3lariac.type=s3</jvm-options>

        <jvm-options>-Ddataverse.files.s3lariac.label=s3-test-lariac</jvm-options>        

        <jvm-options>-Ddataverse.files.s3lariac.bucket-name=dataverse-test-lariac</jvm-options>    

 

The upgrade instructions for single store sites probably add to the confusion since they recommend using ‘s3’ as the id of that store so it looks like the old s3 type in options. (The reason for this is that the older versions of Dataverse stored the store type (s3) in the storageidentifier entries in the database for files, e.g. s3://mybucket:123456645454 whereas the new versions used the storage identifier, eg. s3lariac:dataverse-test-lariac:1254543534545 .When upgrading, if you give the driver ‘s3’ as an identifier, the existing storageidentifier entries still work without changes. If you pick a different id, you’d also have to change the existing storageidentifier entries to match: s3://mybucket:34243234234 à mynews3id://mybucket:34243234234. Unfortunately that makes it less clear that the jvm-options and the database entries are now referring to the id of individual storage drivers rather than to the type of driver used. (Analogous explanation for file:// type / driver id changes.))

 

Hope that gets you going,

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f784f9d5-b976-4d68-8a34-eb7261da5e9e%40googlegroups.com.

已删除帖子

James Myers

未读,
2020年6月3日 13:33:452020/6/3
收件人 dataverse...@googlegroups.com

Jamie,

 

I’m sure an example would help, so great idea to post.

 

The default is set with

<jvm-options>-Ddataverse.files.storage-driver-id=s3-test-dataverse-main</jvm-options>

The API call and the ability for an admin to change the Storage in Edit/General Settings of a Dataverse set the store for all child Dataverse/datasets of the one you update. Setting your lariac storagedriver for the GIS Dataverse should be all that you need to do once you have the default set. You should be able to verify that by looking at Edit/General Settings as an admin – it will show you the current choice.

(And just to be a little confusing, your main Dataverse should show ‘s3-test-dataverse-main (Default)’ as the chosen option with ‘s3-test-dataverse-main’ as another option in the list. This means that your main/root Dataverse is using you main storage driver because it is the default rather than because it was set explicitly for the root Dataverse. No change to where files go between these two choices but, with the former (with (Default) showing), if you change the above jvm option, the choice will change whereas with the latter the Dataverse will continue to use the same driver because you’ve explicitly set it as the choice for that Dataverse.)

-- Jim

 

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of Jamie Jamison
Sent: Wednesday, June 03, 2020 1:21 PM
To: Dataverse Users Community
Subject: Re: [Dataverse-Users] question about documentation for storing files in a specific store file

 

Thank you for the explanation. That clarifies things.  When I'm finished I'll post the code since an example might be helpful.  

Last question.   I am setting the lariac s3 bucket for that specific GIS collection.   Otherwise everything else will used the main storage.   I want to be sure that the default is our "main" storage which is now called:  s3-test-dataverse-main.   Do I need to use the command that configures files for a specific store (http://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#id7)?  I just want to be sure there is a default storage.

 

Thank you once again,

 

Jamie

Jamie Jamison

未读,
2020年6月3日 14:14:142020/6/3
收件人 Dataverse Users Community
This code works as far as setting the default (spaces added for readability):
<jvm-options>-Ddataverse.files.storage-driver-id=s3-dataverse-main</jvm-options>  <-- the default storage

<jvm-options>-Ddataverse.files.s3-dataverse-main-url-expiration-minutes=120</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-main.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-main.label=s3-dataverse-main</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-oregon</jvm-options>

<jvm-options>-Ddataverse.files.s3-test-lariac-url-expiration-minutes=120</jvm-options> <-- the new storage for the GIS data
<jvm-options>-Ddataverse.files.s3-test-lariac.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-test-lariac.label=s3-test-lariac</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-lariac</jvm-options>


But,  because I changed the original default from "s3" to "s3-dataverse-main"  I also now have "null (Inherited from enclosing Dataverse)" for the sub-Dataverses.   For people who are updateing an older site with the original "s3" name is it better to leave that name?  Because I'm not sure how to clean up the null storage.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

未读,
2020年6月3日 14:33:442020/6/3
收件人 dataverse...@googlegroups.com

Jamie,

 

First –typos:

<jvm-options>-Ddataverse.files.s3-dataverse-main-url-expiration-minutes=120</jvm-options>

And

<jvm-options>-Ddataverse.files.s3-test-lariac-url-expiration-minutes=120</jvm-options>

 

Should be

<jvm-options>-Ddataverse.files.s3-dataverse-main.url-expiration-minutes=120</jvm-options>

And

<jvm-options>-Ddataverse.files.s3-test-lariac.url-expiration-minutes=120</jvm-options>

 

-- with a period after the id. As is, you’ll get the default (60 minutes) for both.

 

 

W.r.t. using the id ‘s3’ – YES – existing installations SHOULD DEFINITELY use ‘s3’ as the id. Not due to the problem you’re having but to be able to access existing files without changing the db. Using any other id will require a db change for all existing files in the db. However – the label is up to the installation and doesn’t have to be ‘s3’ – that’s just convenient if you only have one store.

 

For your issue – it looks like you have set the storage driver to use for some parent Dataverse (the root?). When you changed the jvm-options, this didn’t change and so no points to a non-existent driver that therefore has a ‘null’ driver label. You should be able to resolve this by API or UI (Edit/General Information) referencing whichever parent has the storage driver explicitly set (probably your root/main Dataverse). The easiest thing would be to DELETE any storage driver for the Dataverse in question via the API or select the ‘s3-dataverse-main (Default)’ option in the UI which will also delete the explicitly set (and missing) driver.  Either of those should work – if not submit an issue (there could be a bug in trying to remove when the driver doesn’t exist, but I don’t think that should matter).

 

You should also be aware that if you have existing files from when you had a driver with id ‘s3’, you won’t be able to access those files until you make changes in the database (replacing ‘s3://’ with ‘s3-dataverse-main://’ in the storageidentifier). Right now, moving content between storagedrivers or changing the storage driver id (versus just changing the human-readable label) is a manual process involving a db update.

 

A different option – if the reason you are using s3-dataverse-main is for human readability, you can set the jvm-options to still use s3 as the id and just update the label with >-Ddataverse.files.s3.label=s3-dataverse-main.  This would also avoid the problem in the previous paragraph as the id isn’t changing from what would have been used for earlier files.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/a1923360-d1e5-4695-9958-eb4f9c6d8993%40googlegroups.com.

Jamie Jamison

未读,
2020年6月3日 15:41:512020/6/3
收件人 Dataverse Users Community
Corrected typos.   I'm going to use the option you mentioned last - keeping 's3' but label for human readability.  

My corrected code example:
<jvm-options>-Ddataverse.files.storage-driver-id=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3.url-expiration-minutes=120</jvm-options>                
<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3.label=s3-dataverse-main</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-oregon</jvm-options>
<jvm-options>-Ddataverse.files.s3-test-lariac.url-expiration-minutes=120</jvm-options>      
<jvm-options>-Ddataverse.files.s3-test-lariac.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-test-lariac.label=s3-test-lariac</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-test-lariac</jvm-options>

Thank you again for the assistance and explanations,

Jamie

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

回复全部
回复作者
转发
0 个新帖子