Amazon S3 integration

68 views
Skip to first unread message

Ryan Steans

unread,
Sep 5, 2017, 7:18:17 PM9/5/17
to Dataverse Users Community
Hi all,

I know I've not been attending calls, but I can't attend this week, so I'm throwing this out there.

I'm curious to get a picture of the planned architecture for Dataverse with S3.

Is the idea to host the Dataverse software on an EC2 with data and thumbnails stored in S3 and pulled in to the display on EC2?  If not - what is the plan?

As a follow up question - will the move to S3 eliminate the file size limitation issue?

Thanks

Ryan

Philip Durbin

unread,
Sep 5, 2017, 7:28:57 PM9/5/17
to dataverse...@googlegroups.com
Yep, that's my understanding. Data files and thumbnails and other ancillary files such as XML representations of datasets in DDI format would be stored on S3 rather than a file system.

What file size limitation issue?

We haven't cut a release with S3 support yet but you can read the documentation we've written so far at https://github.com/IQSS/dataverse/blob/2404fea775d912f3cc239f9a12d643b2ff9e041d/doc/sphinx-guides/source/installation/config.rst#file-storage-local-filesystem-vs-swift-vs-s3

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/20adad33-6c84-4896-9bc1-2fb0400c2e62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Ryan Steans

unread,
Sep 6, 2017, 1:03:47 PM9/6/17
to Dataverse Users Community, philip...@harvard.edu
Hi Phil,

That's good news on the architecture front.  I'll take a look at the github link in more depth.
We've had a 2GB limitation on individual file size thanks to the 32 bit issue.  That's the biggest size we've been able to upload via the interface to date.  I'm hoping we can circumvent that via an API to S3.  But then I'm just blue-skying at the moment.



On Tuesday, September 5, 2017 at 6:28:57 PM UTC-5, Philip Durbin wrote:
Yep, that's my understanding. Data files and thumbnails and other ancillary files such as XML representations of datasets in DDI format would be stored on S3 rather than a file system.

What file size limitation issue?

We haven't cut a release with S3 support yet but you can read the documentation we've written so far at https://github.com/IQSS/dataverse/blob/2404fea775d912f3cc239f9a12d643b2ff9e041d/doc/sphinx-guides/source/installation/config.rst#file-storage-local-filesystem-vs-swift-vs-s3
On Tue, Sep 5, 2017 at 7:18 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi all,

I know I've not been attending calls, but I can't attend this week, so I'm throwing this out there.

I'm curious to get a picture of the planned architecture for Dataverse with S3.

Is the idea to host the Dataverse software on an EC2 with data and thumbnails stored in S3 and pulled in to the display on EC2?  If not - what is the plan?

As a follow up question - will the move to S3 eliminate the file size limitation issue?

Thanks

Ryan

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Sep 6, 2017, 1:10:21 PM9/6/17
to dataverse...@googlegroups.com
Huh. Interesting. Please feel free to open an issue about the 32 bit thing. Either this is the first I've heard of it or my memory is failing me again. :)

On Wed, Sep 6, 2017 at 1:03 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi Phil,

That's good news on the architecture front.  I'll take a look at the github link in more depth.
We've had a 2GB limitation on individual file size thanks to the 32 bit issue.  That's the biggest size we've been able to upload via the interface to date.  I'm hoping we can circumvent that via an API to S3.  But then I'm just blue-skying at the moment.



On Tuesday, September 5, 2017 at 6:28:57 PM UTC-5, Philip Durbin wrote:
Yep, that's my understanding. Data files and thumbnails and other ancillary files such as XML representations of datasets in DDI format would be stored on S3 rather than a file system.

What file size limitation issue?

We haven't cut a release with S3 support yet but you can read the documentation we've written so far at https://github.com/IQSS/dataverse/blob/2404fea775d912f3cc239f9a12d643b2ff9e041d/doc/sphinx-guides/source/installation/config.rst#file-storage-local-filesystem-vs-swift-vs-s3
On Tue, Sep 5, 2017 at 7:18 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi all,

I know I've not been attending calls, but I can't attend this week, so I'm throwing this out there.

I'm curious to get a picture of the planned architecture for Dataverse with S3.

Is the idea to host the Dataverse software on an EC2 with data and thumbnails stored in S3 and pulled in to the display on EC2?  If not - what is the plan?

As a follow up question - will the move to S3 eliminate the file size limitation issue?

Thanks

Ryan

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ryan Steans

unread,
Sep 6, 2017, 1:48:47 PM9/6/17
to Dataverse Users Community, philip...@harvard.edu
I know we reported it when we were bringing up dataverse as we discovered it during load testing.  I'll definitely submit a formal ticket.


On Wednesday, September 6, 2017 at 12:10:21 PM UTC-5, Philip Durbin wrote:
Huh. Interesting. Please feel free to open an issue about the 32 bit thing. Either this is the first I've heard of it or my memory is failing me again. :)
On Wed, Sep 6, 2017 at 1:03 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi Phil,

That's good news on the architecture front.  I'll take a look at the github link in more depth.
We've had a 2GB limitation on individual file size thanks to the 32 bit issue.  That's the biggest size we've been able to upload via the interface to date.  I'm hoping we can circumvent that via an API to S3.  But then I'm just blue-skying at the moment.



On Tuesday, September 5, 2017 at 6:28:57 PM UTC-5, Philip Durbin wrote:
Yep, that's my understanding. Data files and thumbnails and other ancillary files such as XML representations of datasets in DDI format would be stored on S3 rather than a file system.

What file size limitation issue?

We haven't cut a release with S3 support yet but you can read the documentation we've written so far at https://github.com/IQSS/dataverse/blob/2404fea775d912f3cc239f9a12d643b2ff9e041d/doc/sphinx-guides/source/installation/config.rst#file-storage-local-filesystem-vs-swift-vs-s3
On Tue, Sep 5, 2017 at 7:18 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi all,

I know I've not been attending calls, but I can't attend this week, so I'm throwing this out there.

I'm curious to get a picture of the planned architecture for Dataverse with S3.

Is the idea to host the Dataverse software on an EC2 with data and thumbnails stored in S3 and pulled in to the display on EC2?  If not - what is the plan?

As a follow up question - will the move to S3 eliminate the file size limitation issue?

Thanks

Ryan

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Sep 6, 2017, 7:18:28 PM9/6/17
to dataverse...@googlegroups.com

On Wed, Sep 6, 2017 at 1:48 PM, Ryan Steans <rst...@gmail.com> wrote:
I know we reported it when we were bringing up dataverse as we discovered it during load testing.  I'll definitely submit a formal ticket.

On Wednesday, September 6, 2017 at 12:10:21 PM UTC-5, Philip Durbin wrote:
Huh. Interesting. Please feel free to open an issue about the 32 bit thing. Either this is the first I've heard of it or my memory is failing me again. :)
On Wed, Sep 6, 2017 at 1:03 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi Phil,

That's good news on the architecture front.  I'll take a look at the github link in more depth.
We've had a 2GB limitation on individual file size thanks to the 32 bit issue.  That's the biggest size we've been able to upload via the interface to date.  I'm hoping we can circumvent that via an API to S3.  But then I'm just blue-skying at the moment.



On Tuesday, September 5, 2017 at 6:28:57 PM UTC-5, Philip Durbin wrote:
Yep, that's my understanding. Data files and thumbnails and other ancillary files such as XML representations of datasets in DDI format would be stored on S3 rather than a file system.

What file size limitation issue?

We haven't cut a release with S3 support yet but you can read the documentation we've written so far at https://github.com/IQSS/dataverse/blob/2404fea775d912f3cc239f9a12d643b2ff9e041d/doc/sphinx-guides/source/installation/config.rst#file-storage-local-filesystem-vs-swift-vs-s3
On Tue, Sep 5, 2017 at 7:18 PM, Ryan Steans <rst...@gmail.com> wrote:
Hi all,

I know I've not been attending calls, but I can't attend this week, so I'm throwing this out there.

I'm curious to get a picture of the planned architecture for Dataverse with S3.

Is the idea to host the Dataverse software on an EC2 with data and thumbnails stored in S3 and pulled in to the display on EC2?  If not - what is the plan?

As a follow up question - will the move to S3 eliminate the file size limitation issue?

Thanks

Ryan

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages