Globus source for https://mast.stsci.edu and other STSCI data sets

14 views
Skip to first unread message

Alan Sill

unread,
Sep 30, 2025, 5:44:44 PM (4 days ago) Sep 30
to Discuss, Alan Sill
We have several account holders who are downloading large data sets from https://mast.stsci.edu through our login nodes despite explicit rules against this. Is there a Globus Replica Service source for these data sets, or other advice that we can give them to get them to stop doing this?

Thanks,

Alan Sill, Ph.D
Managing Director, High Performance Computing Center
Texas Tech University
Drane 159, MS 4-1167
http://www.hpcc.ttu.edu 
e-mail: Alan...@ttu.edu
ph. 806-834-5940

Backeberg, David

unread,
Oct 1, 2025, 9:53:53 AM (3 days ago) Oct 1
to Alan Sill, Discuss
I’m not sure what the policy is in place… for them to meet their needs in a better way.

We try to do carrot and stick.

As in…

give them a good place to do things they need to do 

and

discourage users from doing the wrong thing in the wrong place

So if several users need the same data set, we try to create a dedicated place,
where users can have read access to that data set, and nobody needs to keep downloading it.

Sometimes we call that /data 
partition depending on the system

We provide a “transfer” zone, and we encourage people to move large data sets there. There is no ability to compute on a transfer node, just move files, and you don’t need to count those cycles against your compute time allocations.

We also block harm on our login nodes with linux cgroups. Describing that is a bit out of scope, but basically we use kernel settings to enforce fairness on the login node. If we see abuse that harms usability for everybody else we will kill their activity and email them to stop doing that.

I don’t know whether that data is available via globus.
--
David Backeberg <david.b...@yale.edu>
(203) 432-9226  Yale Center for Research Computing


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Karl Kornel

unread,
Oct 1, 2025, 6:07:17 PM (3 days ago) Oct 1
to Alan Sill, Discuss
Hi Alan,

Some sites have started introducing services​ or dtn​ partitions/queues in their job schedulers, which gives folks access to long-running low-core-count low-memory jobs that can be used for data-transfer purposes.  For one cluster I help to run, we have an “Interactive Desktop” in Open OnDemand that folks can use for data-transfer purposes.  By providing a desktop-type session that they can leave running (and come back to later), they can do their transfers knowing that they won’t be interrupted.

So, that might be the easiest thing: Detecting that folks are doing these kinds of transfers, and sending them to documentation explaining how folks should be doing these kinds of transfers in your environment.

Why do I make that suggestion, instead of immediately saying “Oh yeah, you should push for Globus”?  Well, when I went to https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html, the “Data Retrieval Notes” section had mentions of ’no more unencrypted FTP’ and ‘authentication required’.  Following the links makes me think that doing a download from MAST is not as simple as browsing some files and clicking some links.

So if you are interested in advocating for Globus on MAST, I suggest having a MAST user show you how they do a download.  Maybe walk you through how you could do a download yourself.  During that time, think about how what they do could potentially map to Globus.  Then, reach out to MAST folks, explain your problem (folks downloading through login nodes), (if applicable) why standard solutions might not work (maybe bandwidth limits on non-DTNs), and why you’d prefer Globus.  As part of that, showing you understand that fitting Globus into their setup might not be easy, and offering what help you can.

At least, that’s my suggestion!

 

~ Karl

Lev Gorenstein

unread,
Oct 1, 2025, 7:28:03 PM (3 days ago) Oct 1
to Karl Kornel, Alan Sill, Discuss
I will add that from the Globus side we are happy to be part of this discussion with MAST and help.

I think that [with MAST being a powerful data portal with multiple download methods that has an infrastructure to generate links to collections of files] the effort of adding yet another one should not be overwhelmingly large... but of course only the MAST team can estimate the magnitude of such effort and decide on prioritizing it. Hence the importance of community and practitioners voicing the need of such a feature.


Lev
--
Lev Gorenstein
Solutions Architect
Globus.org // University of Chicago
e: l...@globus.org
Reply all
Reply to author
Forward
0 new messages