Endpoints within a VM

167 views
Skip to first unread message

Kevin Krieger

unread,
Apr 16, 2020, 1:22:07 PM4/16/20
to User Discuss, Ashton Reimer
Hi,

I'm trying to use globusconnectpersonal (3.0.4) on a Fedora 31 Workstation fresh install (x86_64) on Virtualbox 6.1.4.
With the fresh install, I install tcllib, tcl, tk via yum. I have the VM network set as the default NAT.

I then downloaded globusconnectpersonal and ran the globusconnect script, put in the new security code I generate from the globus website, and connect.
I can see the endpoint and I am able to transfer files TO the endpoint from another working endpoint, but I am unable to transfer files OUT of the endpoint.

Here is the error message I get when trying to transfer files FROM the VM endpoint (destination endpoint uuid removed), the activity list says "Connection Broken":
Error (transfer)
Endpoint: endpoint name (uuid)
Server: Globus Connect
File: /~/Downloads/globusconnectpersonal-latest.tgz
Command: STOR ~/Downloads/globusconnectpersonal-latest.tgz
Message: Fatal FTP response
---
Details: 500-Command failed. : an end-of-file was reached\r\n500-globus_xio: An end of file occurred\r\n500 End.\r\n

Is there any special requirements for using a VM that I should be aware of?

Thank you,
Kevin Krieger

Stephen Rosen

unread,
Apr 17, 2020, 11:40:25 AM4/17/20
to User Discuss, Ashton Reimer
Hi Kevin,

There shouldn't be any special requirements, but we often see issues running endpoints in VMs or containers because the network setup can become more complex. The VM needs to be able to make and receive connections on all of the ports used by Globus Connect Personal.
Usually we recommend using some form of Bridged Networking, which makes the VM its own entity on the network, sharing the same physical adapter as the host. This is the simplest setup and most likely to work without issues.

I'm going to provide much more detail on the alternatives and how the connections flow postscript. It's long and you might not need all of the detail if bridged networking works for you.


The error you encountered, "Fatal FTP response" is relatively generic, and usually the details field contains useful information.
In your case, the end-of-file (EOF) isn't very informative, but we often see generic errors like this or "callback failed" as a sign of network issues.
Often, that means it's a recoverable fault, and the transfer proceeds and eventually succeeds. It may impact throughput, but generally is not cause for alarm.
However, if it is happening so frequently that performance is degraded, or the Transfer outright fails, we can assume that there's a deeper issue. The potential causes are varied -- we've seen bad switches, problematic firewalls, etc.

An EOF strongly suggests to me that a connection was closed by one end of the connection or the other -- perhaps unexpectedly, hence the error, but in a graceful way.
However, I think it's more likely that the details field in this case is a red herring, and that the EOF event is not related to the error.
That explanation also holds if the EOF came from a filesystem read (vs a socket), as it would be another normal and unrelated condition.
It's hard to be certain without more information.


We can start digging into greater detail if necessary, and for that I'd recommend opening a support ticket with sup...@globus.org because we'll likely need many details, and information about the endpoints you're using.


There are a few basic diagnostics you can perform on your own. Firstly, we will want to make sure it's your end of the connection that is the source of trouble.
We often ask people to try data transfers to and from the Globus Tutorial Endpoints. These are known-functioning servers and let us test only your endpoint, not both the source and destination at once.
You should do this with both of the endpoints you're using, to determine which (if either) fails.

Sometimes, we find that a newly configured endpoint is fine, and the existing, other end of the connection is experiencing an outage or limping hardware. Explicitly testing these cases helps catch that.


If you give us a task ID for a task which exhibits the failure, we'll be able to examine more details about the failure. In the future, I would be perfectly comfortable sharing a task ID on the listhost (the ID itself contains no information, and tasks are not visible to other users), but we of course understand if you'd prefer to keep that information private.


I hope all of that helps. More detail about networks and connections to follow.
Best,
-Stephen


First up: Bridged Networking vs NATed.

If you can't or don't want to share your host's adapter with the VM (bridging), your host will usually behave as a NAT. It sounds like this is what you're using.
Depending on what functionality you want to use, and how the software NAT itself is configured, this may or may not pose an issue.
Transfers between a Globus Connect Personal endpoint in a NATed VM and an external Globus Connect Server endpoint will usually work, but transfers with another Globus Connect Personal endpoint (a feature which requires a subscription) may experience issues.


I also think that bridged networks are more reliable for another reason: it removes a "device" from the network path.
The simulated network and NAT used by many virtualization tools is complex.
Even though it's reasonable to assume that these tools are pretty well tested by the vendors who make them, a bridged network is just a whole lot simpler to implement correctly.
If there's a subtle bug in the VirtualBox NAT, GridFTP's emphasis of saturating the network as much as possible (after all, we want transfers to be fast!) might expose it in a way that other protocols usually don't.
If we can avoid that virtual NAT device, we remove a potential source of errors.


Second: Here's why transfers with your VM and a Globus Connect Server endpoint will usually work.
Transfers between Globus Connect Personal and Globus Connect Server endpoints will always connect in the same direction, regardless of the direction of the transfer. The Personal endpoint should connect to the Server endpoint, regardless of which way the data is flowing.
The Globus Connect Personal endpoint also needs to connect outbound to Globus' centralized services, to coordinate the data transfer.

These connections are captured in the firewall requirements for Globus Connect Personal ( https://docs.globus.org/how-to/configure-firewall-gcp/ ).
But there is a tricky case. Personal-to-Personal transfers require that the two ends do "NAT hole punching" (a.k.a. "NAT traversal"), a technique to listen for connections from behind a NAT, and communicate directly. If you want to use this feature from your VM, I *strongly* recommend using bridged networking. Otherwise, we can be looking at two layers of NAT (one on the host, virtualized, and one provided by, e.g., a home router), and the hole-punching protocol is much more likely to fail.

If you were doing a Personal-to-Personal transfer and had a message like "ICE negotiation failed", that would be almost certainly the culprit.
ICE is the name of one of the NAT traversal techniques we employ.

As long as you stay clear of that messy case, no NAT traversal is necessary, since it's "just" an outbound connection as far as the various layers of routing are concerned.


The one last thing I'll mention, for anyone who read this far, is UDT.
Globus transfers with UDT are possible (if both endpoints support it), but the protocol changes some characteristics of error handling.
Specifically, a UDT ECONNLOST will present via Globus as an EOF because it's the nearest equivalent when reading from a UDT socket.
If you're using UDT, I might retry the Transfer without UDT enabled to see whether or not the error persists, and to see if it is the same.

I mention this only because the details field only shows an EOF event, which is relatively unusual, and a UDT connection drop is the only way I can see a broken connection causing it.
An EOF in a socket read means that the socket was closed in an orderly manner (on TCP, that means you got a FIN).
As I said above, it does not seem very likely that an orderly connection closure by one of the two endpoints would cause an error.
Reply all
Reply to author
Forward
0 new messages