GCS for internal data movement

43 views
Skip to first unread message

Ken Carlile

unread,
Aug 2, 2022, 1:51:35 PM8/2/22
to Discuss
Hi all, 

We are currently in the process of getting a Globus subscription, and the matter of how to connect the storage and what we can do with it has come up. Currently, I have a test set up in my DMZ with all the various ports and whatnot translated to a single endpoint with 4 DTNs. Those machines are mounting only a single storage system, a relatively small NAS I have in the DMZ with them. 

This leads to a couple of potential issues: 
1. Someone (either the users or me) has to transfer data to be shared to that storage system in the DMZ. Given the sizes of data sets involved (10s of terabytes likely), this is going to be a long process. There's also only one of me, so I don't want to be doing this for everyone and their brother, even if I can do it way faster than they can through their workstations. 
2. The size of the storage in the DMZ is a couple of orders of magnitude smaller than our internal storage systems. 

We are also interested in the use case of using Globus to do internal and internal->dmz data transfers; it's much more palatable to have the users push their data out to the dmz if they can just use the Globus web app to do it and have it go asynchronously. I imagine some users will also want to share internal stuff externally, for whatever that's worth. Not my preference, but all the ways I can think of of making this work would allow that. 

Additionally to the internal->dmz transfers and internal->external transfers, it would be really useful to be able to do big asynchronous transfers between internal storage systems. Currently I do these using Starfish (but that's me again), or users can use the HPC cluster, but that doesn't have access to our archive storage system. 

What we've been tossing around idea-wise is one of 3 things (and please let me know if there's another way):
1. Mount the internal storage systems on the DTNs in the DMZ through the firewall--this is likely to be slow and cumbersome, and confusing for our network administrators (don't ask...)
2. Multi-home the DTNs so that they have a DMZ interface and an internal interface that can access the internal storage systems. This requires hardening the DTNs and fiddling with routing tables, but that's doable. It's probably the fastest option, both to use and to set up. 
3. Set up internal DTNs that are assigned external addresses and ports through the firewall. 

Any ideas on how to work through this? My wishlist would be to have a fully internal GCS that I didn't need to have externally resolvable, but I don't think that is an option from what I've read. 

Thanks, 
Ken

Ian Foster

unread,
Aug 2, 2022, 2:02:46 PM8/2/22
to Ken Carlile, Discuss

Probably not directly relevant, but Francesco de Carlo at the Advanced Photon Source (APS) has built a nice system for his beamline that transfers data generated by experiments automatically from the beamline storage system to an externally storage system, setting permissions as required, all via Globus. However, I don’t think that APS requires “an fully internal GCS that isn’t externally resolvable”

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Vas Vasiliadis

unread,
Aug 2, 2022, 5:56:33 PM8/2/22
to Ken Carlile, Discuss
Hi Ken,

We see a few of these cases (where there’s a need to separate internal and external access to storage). You're correct that having a DTN inside the firewall without a publicly routable IP address for the control channel is not an option. The most common approach is your option #2; then you can have separate storage gateways configured to user different data interfaces (happy to provide more details on how to do that). It might be best to jump on a call and talk through this in more detail. Feel free to email me directly, if you like.

Thanks,
Vas

Eli Dart

unread,
Aug 3, 2022, 12:35:39 PM8/3/22
to Vas Vasiliadis, Ken Carlile, Discuss
Hi Ken,

Every circumstance is unique to some degree (hence the admonition Know Your Network, which it's clear that you do), but many other sites with DTN clusters go with your option 2, mounting the main cluster/HPC storage system on the DTNs via an internal-facing interface and placing an external-facing interface in a Science DMZ. It does require some hardening for the DTNs, but that's good practice for any external-facing host anyway.

This config does often involve maintaining a slightly-customized host routing table, but again that depends on your network architecture (which it's clear you're already thinking about). 

Thanks,

Eli


--

Eli Dart, Network Engineer                          NOC: (510) 486-7600
ESnet Science Engagement Group                           (800) 333-7638
Lawrence Berkeley National Laboratory 

Ken Carlile

unread,
Aug 3, 2022, 4:28:09 PM8/3/22
to Discuss, da...@es.net, Ken Carlile, Discuss, v...@uchicago.edu

Thanks all, I think I have a pretty good idea of the direction to go now, although I look forward to talking to you on Friday, Vas. 

--Ken
Reply all
Reply to author
Forward
0 new messages