Export FastQ files from BaseSpace to S3

1,739 views
Skip to first unread message

Niranjan

unread,
Aug 13, 2018, 12:03:16 PM8/13/18
to basespace-developers
Hi,

We want to export FastQ files from our BaseSpace account to our AWS S3 account. We are currently using BaseMount on EC2 to download files from BaseSpace and upload them to S3 but the transfer speed is slow (instance network bandwidth doesn't seem to be the limiting factor). Since BaseSpace files are located on S3, is it possible to do a server-side transfer of files from BaseSpace S3 account to our S3 account? The transfer speed will be much higher in this case. Please advise.

Thanks,
niranjan

Al Maynard

unread,
Aug 14, 2018, 11:51:45 AM8/14/18
to basespace-developers
We don't allow direct AWS access other than a pre-signed URL to the file contents (see https://developer.basespace.illumina.com/docs/content/documentation/rest-api/v1-api-reference).

You could also create a BaseSpace app for this, as they have high-bandwidth to S3.

arnold

unread,
Aug 15, 2018, 3:04:34 AM8/15/18
to basespace-developers
Hi Niranjan,
I have a private app to do this.  can you email me for access al...@illumina.com
Best,
Arnold

On Monday, August 13, 2018 at 9:03:16 AM UTC-7, Niranjan wrote:

Alex Mijalis

unread,
Mar 12, 2021, 10:36:59 AM3/12/21
to basespace-developers
Hi Al, just bumping this because it's been a few years. Do you have updated guidance on performing 'quick' transfers from basespace to S3 buckets? If we use the pre-signed URLs that you mentioned, would those have high bandwidth to S3?

Thanks!
Alex

Al Maynard

unread,
Mar 15, 2021, 9:40:09 PM3/15/21
to basespace-developers
Bandwidth from S3 to EC2 instances can be very fast. This article is CLI centric, but the general principles can be applied when working with URLs directly:
https://aws.amazon.com/premiumsupport/knowledge-center/s3-transfer-data-bucket-instance/

In particular it mentions using  ranged GETs which can be used to effectively saturate the bandwidth to the instance.

Homer

unread,
May 27, 2021, 6:48:50 PM5/27/21
to basespace-developers
I run bcl2fastq in basespace/Runs/MY_DATA_FOLDER/Files to pull my files and convert them to fastq (that is awfully slow). I can see that there is already a "Files.metadata" folder with all subdirectories and S3 URLs (but there is no parent URL to s3-pull-recursive the entire 'Files' folder).
Is there any way to simplify (and speed up) this process?
Also is there any multi-thread functionality in bcl2fastq? 

Homer

unread,
May 28, 2021, 2:47:54 PM5/28/21
to basespace-developers
I've tested using '-p', '-r' and '-w' options and it didn't make any difference in transfer rate (our sample bcl2fastq time is 12 hours). The test platform was a heavy i3en.12xlarge with mounted NVMe drive.
The only way I can't think of is to find a way to do S3 to S3 and run bcl2fastq locally (but I don't know how as I've mentioned above).

Also is there any way I can know a sample is ready to pick up (e.g. a file in Runs or Projects folder)? The presence of a folder in Runs can not be used as a trigger since the folder might be still in progress. 

Reply all
Reply to author
Forward
0 new messages