Smart-Seq2 workflow localizes reference per sample - huge cost impact

18 views
Skip to first unread message

Mark Godek

unread,
Apr 7, 2026, 3:00:56 PMApr 7
to Cumulus Support
Hello,

One of the researchers in my lab ran the Cumulus SmartSeq2 workflow on 576 samples this week on Terra.
It triggered a spending anomaly on our Google Cloud Platform billing.

On Terra, the workflow says the run cost $100 but when I dig into the billing report there is a $720 data transfer charge with the actual run costing $925. Our BigQuery billing export said this cost was caused by a multi-region transfer of 38.66 Terabytes of data from the gs://regev-lab bucket. 

After searching the WDL of the workflow, I found the smartseq2_per_plate function pulls the reference genome specified in the workflow input JSON on a per-sample basis.

This was ok when Broad/Terra policy was to use multi-regional buckets years ago, but the default is now US-Central1. gs://regev-lab, which is the source of the reference genome, is still a multi-region bucket, which causes workflows to incur "Network Data Transfer GCP Multi-region within Northern America" charge which is 9 times what is expected (i.e. no/minimal transfer charge in the case the references were in the same region).

I understand there may be users that operate out of different regions, necessitating the multi-region bucket. However, if it is the case most users are using Terra and in US-Central1 it might be simplest to create a bucket with references there.

Alternatively, the reference genome is specified as a string at the workflow level. Might it be possible to localize the reference across regions to a bucket (like the workspace bucket) temporarily, which the workflow then copies to the per-sample shards?

Thanks,
Mark Godek
Data Manager, Bioinfomatics Specialist
Shalek Lab

Yiming Yang

unread,
Apr 9, 2026, 1:52:32 PMApr 9
to Mark Godek, Cumulus Support
Hello Mark,

Thank you for your continued interest in using Cumulus SmartSeq2 workflow.

We actually do have an up-to-date bucket gs://cumulus-ref to host resources like genome references for the community, which is in us-central1 region. And for example, our Cellranger and Spaceranger workflows now point to this bucket by default. For SmartSeq2 workflow, we didn't do that yet because we were not sure if anyone still used it. 

Now given that SmartSeq2 workflow is still useful for the community, we are happy to transfer the necessary resources of the workflow to gs://cumulus-ref in us-central1 region as well. Will let you know once it's done.


Sincerely,
Yiming

--
You received this message because you are subscribed to the Google Groups "Cumulus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cumulus-suppo...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cumulus-support/aea5b9da-22e1-4fe3-a337-25125d4374e6n%40googlegroups.com.


--
γνθι σεαυτόν.

Mark Godek

unread,
Apr 14, 2026, 4:28:21 PMApr 14
to Cumulus Support
Thanks for the quick response.

Yes, I spoke to my lab manager about the SmartSeq2 after the billing abnormality.
She said they don't use it often anymore, but because it's easy and reliable they sometimes use it as a sanity check.

Thanks again for looking into this!

Yiming Yang

unread,
Apr 14, 2026, 4:30:58 PMApr 14
to Cumulus Support
Hi Mark,

Thanks for the info. I just would like to let you know that we've just released Cumulus v4.0.4, with SmartSeq2 workflow uses resources from gs://cumulus-ref bucket in us-central1 region by default.


Sincerely,
Yiming

Reply all
Reply to author
Forward
0 new messages