Hi all,
A few of us on the team discussed this, and we agree that Gigi's reply covers what we have available today and what's supported.
But we also agreed that with the Globus CLI, it is possible to do a bit more.
We can build up to a good pipeline like so:
1. List data on an endpoint. I'll use the Tutorial Endpoint 1 for this, obviously substitute your own and add a path if you want a non-default directory.
globus ls -Fjson ddb59aef-6d04-11e5-ba46-22000b92c6ecI want the JSON format because it makes it easier to formulate a query over that data.
2. Filter the results to files only, no directories.
The Transfer API supports more advanced filtering, but the CLI's `--filter` option only does filtering by name.
So we'll use jmespath to do the filtering client-side.
globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`]'
3. Collect file sizes and convert the jmespath output to "unix" format
globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`].size' --format unix
Now we have tab-delimited file sizes.
4. Sum the values
There are a bunch of ways in the shell, but, personally, I find many of them *awk*ward.
So I'll use tr and awk here:
globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec --jmespath 'DATA[?type==`file`].size' --format unix | tr '\t' '\n' | awk '{sum+=$1}END{print sum}'
Brief aside: I'm a huge fan of awk. If you're working with CLI output and want to compute a value, the combination of `--jmespath ... --format unix` and some awk can handle an impressive number of scenarios.
5. (optional) Add `--recursive` to get a recursive listing
Two quick notes on this:
- Because `-r/--recursive` produces data in the same format as a non-recursive listing, the whole above pipeline works on it
- `--recursive` is implemented client-side. On directory structures with many (thousands or millions) of directories and files, it can be very slow or even run out of memory for processing.
The problem you'll find in general is that recursively processing a large directory tree is slow.
Even `du` (which is local to the filesystem where your files exist) will take a long time to run on very large or slow filesystems.
The only option which is reasonable for the Globus web app would be to show a total for files in your current directory.
We actually have this on our ToDo list, as a status bar for the file manager pages, though I can't speak to when we'll get to that work.
Anyway, I hope the above CLI pipeline helps.
Best,
-Stephen