Hello!
I have a user, I'll call them luke, who is connecting to my Globus Connect Server v5 mapped collection using Globus Web App and is sometimes getting a timeout error listing the contents of their user directory:
====
Directory Listing Timed Out
The server response may be slow or listing a very large number of files is taking too long.
Command Failed: Error (list)
Endpoint: imcacat#dtn (5f4f99e5-ac35-11ea-8f11-0a21f750d19b)
Server: 164.54.200.26:443
Command: MLSC /confidential-data/luke/
Message: The operation timed out
---
Details: Timeout waiting for response ====
I don't know anything about how Globus Web App is implemented, but I'm wondering, is it possible to conclude anything about where the timeout is occurring just from this error message? For example, if the MLSC command is sent by Globus Web App, then it would seem that nothing can be concluded since the timeout could be occurring in the server's attempt to list the contents of the directory or in Globus Web App's attempt to list the contents of the directory. However, if the MLSC command is never sent by Globus Web App and is instead a command that only the server sends internally, and the server error is simply being reported to the user by Globus Web App, then it could be concluded that the timeout is occurring on the server and has nothing to do with Globus Web App nor the network path between it and the server.
This timeout error seems to happen to the user roughly once a week, and then the problem will seemingly self-resolve with no intervention on my end, where the user will try again maybe an hour later, and the listing will once again work. I note, however, that this observed frequency might not be painting the clearest picture if, for example, the user only tries to list the contents of their directory once a week, in which case whatever is causing the problem could be happening at a different frequency, but the user doesn't observe it because they only try to list the directory once a week.
Does anyone have any suggestions for how to debug this? If the issue is on my end with my v5 Globus Connect Server or the underlying storage, how could I make progress in telling that? Similarly, if the issue is on the user's end, how could I make progress in telling that?
I searched this group and found
which suggests the most common reasons for the directory-listing timeout are:
- The directory being listed has so many objects that the timeout window is exceeded before the listing can complete.
- "The GCS Manager service on the endpoint is blocked by firewall policy."
Re #1, I checked, and the directory being listed only has 16 entries (15 directories and 1 file). I can't imagine that being too many objects.
Re #2, since it happens only sometimes, this seems less likely. Of course, a firewall policy can change dynamically, so I guess it could still be the cause of the problem.
Lastly, while I don't think it would be wise to permanently increase the timeout window for a directory listing, it might help characterize the problem, so I'm wondering, is it possible to change it in Globus Web App or Globus Connect Server, and if so, how?
Thank you!
Lewis