globus ls --filter with --recursive returns empty list, but egrep or grep will work

121 views
Skip to first unread message

W Hills

unread,
Feb 9, 2022, 7:26:11 PM2/9/22
to Discuss
Hello,

I have never been able to get the --filter option to work with -r or --recursive, and have always resorted to just piping to grep (or egrep) instead. I am running into some issues now because I have so many files, the *grep options will timeout. The filter will work on an individual directory, but it returns an empty list when a parent directory is the starting point and -r (or --recursive) are used to list particular files across multiple directories.

Can you please help me understand what I am doing wrong with the --filter and --recursive commands?

I pasted a screenshot of an example with .txt files in subdirectories to search for and list. (Note: I also showed that I have tried increasing the --recursive-depth-limit to integers much higher than the true depth, and default of 3, it would need to successfully locate the files)

globus_ls_filter_question.png


Thank you for your help!

Best,
Beckett

Stephen Rosen

unread,
Feb 10, 2022, 10:36:00 AM2/10/22
to W Hills, Discuss
Hi Beckett,

You haven't misunderstood what these options do. You're running up against a known limitation in the way that `--recursive` and `--filter` behave when combined.

The filtering done by `--filter` is done on each stage of the listing, and directories are *not* excluded. That means that if "/foo/" is a directory containing "bar/" (a directory) and "baz.txt" (a file), then

  $ globus ls --filter '*.txt' --recursive "$endpoint:/foo/"

will do a filtered listing, finding only "baz.txt" and filtering out "bar/". When the results of that listing are checked for recursion (i.e. "are there directories in this ls result to traverse?"), "bar/" has already been filtered out.

We're aware that this is surprising, and not what most users expect when combining these two options. The behavior you were probably expecting, that `--filter` only gets applied to files, is something we've discussed in the past and is in our feature backlog.
I'm raising this with our team to see if we can prioritize support for that behavior.


In the meantime, the best two approaches I can recommend are:

1. Do a recursive ls and filter the results with `grep`, `awk`, or other unix tools

As you mentioned, this is easy and lightweight.

2. Write a script with the globus-sdk

If you are comfortable writing python, we have an Example of Recursive LS in the docs which would be a good starting point. You could extend this to add filtering either client-side (similar to `grep`) or by making two calls for each directory -- one filtered to directories and one filtered to files matching the desired pattern.


There are issues with both of these approaches, and cases in which either one might run into performance issues. However, for most simple cases in which the `ls` calls return quickly, either one should work adequately.

Best,
-Stephen
Reply all
Reply to author
Forward
0 new messages