Directory size and number of files in directory

38 views
Skip to first unread message

Jakub Toman

unread,
Jun 9, 2020, 4:19:22 PM6/9/20
to User Discuss
Hey everybody,

Is there a way to see the total size of a directory (folder) in the Globus browser-based GUI? What about the total number of files? Or, when selecting only certain files to transfer, how many files have been selected?

The count of files is useful to estimate their total size when the file sizes are the same. So far, I've only been able to do this by visually, manually counting all of the files. For numbered files, this becomes harder when there's a gap in the numbering system - one has to scroll through the whole list.

The total size of a directory is important when the destination has limited free space, of course.

It also became important in the past when a transfer failed. In that case, the transfer failed due to one (just one!) file with a non-allowed character in its name. Because there seems to be no confirmation that all the other files transferred successfully once a transfer fails, one has to resort to comparing what is visible in the browser on Globus vs. the local files using the local OS (which, thankfully, easily shows file counts). This can of course take hours of work. Checking the sum of sizes and the number of files would be enough to make me comfortable that everything worked out.

Am I completely missing something, or just using this service in an unintended way?

Thank you for any feedback!

-Jakub

Gigi Kennedy

unread,
Jun 10, 2020, 11:13:10 AM6/10/20
to user-d...@globus.org
Hi Jakub,

You are correct in that Globus does not provide the total size of a directory (folder), the total number of files, or the number of files selected for transfer. 


In regards to the directory size, if you have local shell access, a ‘du’ is probably a good option. The 'du' command is a standard Linux command, and is not related to the Globus CLI. You would have to have shell access to the server hosting the remote endpoint to be able to use that command. You would need to talk to the admins of that server to request such access. 



Also, you can get a list of which files were successful in a transfer.

For getting the list of files transferred, the `endpoint_manager_task_successful_transfers` call in the Globus SDK could be used:

https://globus-sdk-python.readthedocs.io/en/stable/clients/transfer/#globus_sdk.TransferClient.endpoint_manager_task_successful_transfers


The Globus CLI uses that for `globus task show -t` if you would rather get the information there. It may not give you what you're looking for entirely though.

https://docs.globus.org/cli/reference/task_show/



If you're new to the CLI, you can check out our documentation here:

https://docs.globus.org/cli/


Hope this helps.
Best regards,
  Gigi

Stephen Rosen

unread,
Jun 10, 2020, 12:29:16 PM6/10/20
to User Discuss
Hi all,

A few of us on the team discussed this, and we agree that Gigi's reply covers what we have available today and what's supported.
But we also agreed that with the Globus CLI, it is possible to do a bit more.

We can build up to a good pipeline like so:

1. List data on an endpoint. I'll use the Tutorial Endpoint 1 for this, obviously substitute your own and add a path if you want a non-default directory.

globus ls -Fjson ddb59aef-6d04-11e5-ba46-22000b92c6ec


I want the JSON format because it makes it easier to formulate a query over that data.

2. Filter the results to files only, no directories.

The Transfer API supports more advanced filtering, but the CLI's `--filter` option only does filtering by name.
So we'll use jmespath to do the filtering client-side.

globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`]'

3. Collect file sizes and convert the jmespath output to "unix" format

globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`].size' --format unix

Now we have tab-delimited file sizes.

4. Sum the values

There are a bunch of ways in the shell, but, personally, I find many of them *awk*ward.
So I'll use tr and awk here:

globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`].size' --format unix | tr '\t' '\n' | awk '{sum+=$1}END{print sum}'


Brief aside: I'm a huge fan of awk. If you're working with CLI output and want to compute a value, the combination of `--jmespath ... --format unix` and some awk can handle an impressive number of scenarios.


5. (optional) Add `--recursive` to get a recursive listing

Two quick notes on this:
- Because `-r/--recursive` produces data in the same format as a non-recursive listing, the whole above pipeline works on it
- `--recursive` is implemented client-side. On directory structures with many (thousands or millions) of directories and files, it can be very slow or even run out of memory for processing.



The problem you'll find in general is that recursively processing a large directory tree is slow.
Even `du` (which is local to the filesystem where your files exist) will take a long time to run on very large or slow filesystems.

The only option which is reasonable for the Globus web app would be to show a total for files in your current directory.
We actually have this on our ToDo list, as a status bar for the file manager pages, though I can't speak to when we'll get to that work.


Anyway, I hope the above CLI pipeline helps.

Best,
-Stephen

Jakub Toman

unread,
Jun 10, 2020, 7:45:06 PM6/10/20
to User Discuss
Gigi,

Thank you for the helpful reply! Good to know that a list of transferred files can be obtained. In the future, I'll look into using the CLI.

-Jakub

Jakub Toman

unread,
Jun 10, 2020, 7:57:36 PM6/10/20
to User Discuss
Stephen,

I appreciate you fleshing out a pipeline right here for me and for others who might have this question. And good to hear that a total for files in the directory may be a future feature on the web app.

Thank you!

-Jakub
Reply all
Reply to author
Forward
0 new messages