Experience data with growth in userbase and storage needed for new dataverse

64 views
Skip to first unread message

Asger Hansen

unread,
Dec 14, 2018, 6:10:14 AM12/14/18
to Dataverse Users Community

Hi.

We are setting up Dataverse in a university environment. In the business plan for the project, we are trying to figure out what kind of growth in the userbase to expect. Thereby being able to calculate the required storage, support etc.

Is these data available somewhere? Are any of you willing to share your growth in users contra storage and datasets. It would also be very helpful to match storage requirements in relation to what branch of science a researcher is coming from. I have a clear idea, that health and science produce more data measured in storage than for example law.

I have looked up the Harvard Dataverse in re3data, it lists dataverses datasets and files. But I would love to know userbase and storage.

Thanks in advance.

Best
Asger Juel Hansen

 

  

Eugene Barsky

unread,
Dec 15, 2018, 10:42:19 AM12/15/18
to Dataverse Users Community
Asger:

We have done a survey to find these questions out a couple of years ago. The University of British Columbia (UBC) is very large - 65K students public university in Vancouver, Canada, with three affiliated hospitals...

Barsky, E. (2017, July 31). Three UBC Research Data Management (RDM) Surveys : Science and Engineering, Humanities and Social Sciences, and Health Sciences : Summary Report . doi:http://dx.doi.org/10.14288/1.0348719

Page 3 has a figure for Data Storage Use for an Average Project for various disciplines. Take a look...

Eugene

Juan Corrales

unread,
Dec 17, 2018, 11:46:08 AM12/17/18
to Dataverse Users Community
Hi Asger,

  I think that the e-cienciaDatos data could not be very useful for you because we have not many datasets (287 in two years) and we have some particular cases.
 
  The amount of data for subject is:

- Computer and Information Science. 2 datasets: 106GB
- Engineering. 6 datasets: 7 GB
- Earth and Environmental Sciences. 7 datasets: 1.5 MB
- Arts and Humanities. 95 datasets: 283GB
- Social Sciences. 173 datasets: 470MB
- Medicine, Health and Life Sciences. 1 dataset 100KB
- Agricultural Sciences. 1 dataset 300KB
- Law. 1 datset: 150KB

  The disk space used for the Arts and Humanities subject is due to a project with 93 datasets with audio and video.

  Best,

Juan

Asger Hansen

unread,
Dec 19, 2018, 1:55:05 AM12/19/18
to Dataverse Users Community
Dear Eugene and Juan.

Thank you for sharing your insights into Dataverse growth. It will serve as an excellent base for our calculations.

Best,
Asger

Manuela Ferreira

unread,
Apr 15, 2021, 9:41:05 AM4/15/21
to Dataverse Users Community
Hi!

We are estimating the storage resources needed for our Dataverse server in the coming years. 

I would like to ask the managers of other Dataverse servers to share the informations below about their storage:
- Number of datasets 
- Storage size used 
- URL of Dataverse server

 These informations will be of great value to us. 

Thank you in advance 

Manuela Klanovicz Ferreira 
Systems Analyst 
Federal University of Rio Grande do Sul

James Myers

unread,
Apr 15, 2021, 10:13:20 AM4/15/21
to dataverse...@googlegroups.com

You might find https://dataverse.org/metrics, generated by the Dataverse-metrics app and the Metrics API it uses (there are links to those at the bottom of the metrics page I just listed.) useful. That page and current API don’t give you total file size, but they retrieve counts of Dataverse collections, datasets, and datafiles, per Dataverse instance, with the metrics page aggregating those counts from the list of Dataverse instances at the bottom of the page.

 

The other thing I’d mention is that I think everyone would agree that stats like file size, files per dataset, the distribution of mimetypes, etc. are all highly correlated to the disciplines your Dataverse instance will support, and can be influenced by local policies such as the configured file size limit.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/583e0843-3011-4020-b9f7-4afc8df010e6n%40googlegroups.com.

Manuela Ferreira

unread,
Apr 15, 2021, 2:35:13 PM4/15/21
to Dataverse Users Community
Hi, Jim!

Thank you for indicating the page https://dataverse.org/metrics. Its propose is very useful:
"Metrics are aggregated from multiple Dataverse installations running different versions (4.9 and newer), with different caching schedules, and with some metrics endpoints enabled and others disabled. Minor discrepancies in these metrics can be expected."

Unfortunately, when I load this page I cannot see the metrics. Apparently there is a requisition for the URL https://dataverse.org/api/session/token that returns forbidden. Does anyone else see this behavior. I attached a print screen. 

Thanks again
Manuela   
Captura de tela de 2021-04-15 15-26-46.png

James Myers

unread,
Apr 15, 2021, 2:51:12 PM4/15/21
to dataverse...@googlegroups.com

Sorry  - there was some work on that page to get updated stats and a typo broke the display. It should be back now, hopefully updating in the background.

Reply all
Reply to author
Forward
0 new messages