svn list support for listing directory entries only

498 views
Skip to first unread message

Dan Ellis

unread,
May 3, 2014, 9:13:13 PM5/3/14
to us...@subversion.apache.org
Hi,

The svn command line list command currently accepts --depth arguments of files, infinity, and immediates (and empty, but that is really a no-op).  I'm in need of being able to list directory entries only in a repository, but I'm not sure there is any good way to accomplish that as-is.  I've search the lists and can only find articles about how folks have grep'ed the output or hacked list.c to only return directory entries (which is really just an inelegant grep anyways since the server still will be sending the data).

Is there a way, or consideration from the developers to add a feature, to fetch only directory entries from a repo?  Perhaps a --depth dirs (opposite of --depth files) option.  If there is no off the shelf way and/or the developers are against (or don't see enough need) to implement this, would there be any advise on how to accomplish this with a patch?  Is there in inherent limitation to the server protocol that makes a request like this simply unworkable? 

Thanks in advance,
Dan

Ryan Schmidt

unread,
May 3, 2014, 10:16:39 PM5/3/14
to Dan Ellis, Subversion Users

On May 3, 2014, at 20:13, Dan Ellis wrote:

> The svn command line list command currently accepts --depth arguments of files, infinity, and immediates (and empty, but that is really a no-op). I'm in need of being able to list directory entries only in a repository, but I'm not sure there is any good way to accomplish that as-is. I've search the lists and can only find articles about how folks have grep'ed the output or hacked list.c to only return directory entries (which is really just an inelegant grep anyways since the server still will be sending the data).
>
> Is there a way, or consideration from the developers to add a feature, to fetch only directory entries from a repo? Perhaps a --depth dirs (opposite of --depth files) option. If there is no off the shelf way and/or the developers are against (or don't see enough need) to implement this, would there be any advise on how to accomplish this with a patch? Is there in inherent limitation to the server protocol that makes a request like this simply unworkable?

Directories are printed with a trailing slash, so if you just want directories, you could grep for that:

$ svn ls http://svn.apache.org/repos/asf/subversion/trunk/
.ycm_extra_conf.py
BUGS
CHANGES
COMMITTERS
INSTALL
LICENSE
Makefile.in
NOTICE
README
aclocal.m4
autogen.sh
build/
build.conf
configure.ac
contrib/
doc/
gen-make.py
get-deps.sh
notes/
subversion/
tools/
win-tests.py
$ svn ls http://svn.apache.org/repos/asf/subversion/trunk/ | grep /$
build/
contrib/
doc/
notes/
subversion/
tools/
$

Alternately you could use the --xml argument to svn, then parse the xml it gives you.

You already mentioned the grep solution in your message, so I’m guessing that’s not satisfactory for some reason. If that’s true, then maybe you could explain in more detail what you need exactly, if it’s not the above.


Dan Ellis

unread,
May 3, 2014, 10:25:23 PM5/3/14
to Ryan Schmidt, Subversion Users
On Sat, May 3, 2014 at 7:16 PM, Ryan Schmidt <subvers...@ryandesign.com> wrote:

On May 3, 2014, at 20:13, Dan Ellis wrote:

> The svn command line list command currently accepts --depth arguments of files, infinity, and immediates (and empty, but that is really a no-op).  I'm in need of being able to list directory entries only in a repository, but I'm not sure there is any good way to accomplish that as-is.  I've search the lists and can only find articles about how folks have grep'ed the output or hacked list.c to only return directory entries (which is really just an inelegant grep anyways since the server still will be sending the data).
>
> Is there a way, or consideration from the developers to add a feature, to fetch only directory entries from a repo?  Perhaps a --depth dirs (opposite of --depth files) option.  If there is no off the shelf way and/or the developers are against (or don't see enough need) to implement this, would there be any advise on how to accomplish this with a patch?  Is there in inherent limitation to the server protocol that makes a request like this simply unworkable?

Directories are printed with a trailing slash, so if you just want directories, you could grep for that:

You already mentioned the grep solution in your message, so I’m guessing that’s not satisfactory for some reason. If that’s true, then maybe you could explain in more detail what you need exactly, if it’s not the above.



Its really a performance concern.  We need to do this fairly regularly on a large repository (over a WAN I might add) and asking the server for all files and directories when we really only need a directory listing is really a huge time sink (a 100:1 file to directory ratio would result in a listing time of 100 times slower).  Grep and the like only format the output on the client side (which is easily parsable - we do use xml to parse) and don't relieve the performance burden.

Thanks!
Dan

Ryan Schmidt

unread,
May 3, 2014, 11:32:25 PM5/3/14
to Dan Ellis, Subversion Users

On May 3, 2014, at 21:25, Dan Ellis wrote:

> Its really a performance concern. We need to do this fairly regularly on a large repository (over a WAN I might add) and asking the server for all files and directories when we really only need a directory listing is really a huge time sink (a 100:1 file to directory ratio would result in a listing time of 100 times slower). Grep and the like only format the output on the client side (which is easily parsable - we do use xml to parse) and don't relieve the performance burden.

Ok. You’re right, that capability is not built in. You could write a small program to do it (using the Subversion language bindings for the language of your choice).

Ryan Schmidt

unread,
May 3, 2014, 11:35:39 PM5/3/14
to Dan Ellis, Subversion Users
Actually that’s speculation. I don’t know the Subversion libraries, and it’s entirely possible they don’t provide a way to do this; they may just give you the entire directory listing, leaving you with having to discard the non-directories. And that in turn may turn out to be because the server has no API for giving you anything other than a full directory listing; I’m not sure.

Branko Čibej

unread,
May 4, 2014, 7:27:08 AM5/4/14
to us...@subversion.apache.org
The "depth" parameter is used in many places, not just in "svn list"; whatever enhancement you come up with must at least fit the other uses. Depth was invented to describe sparse working copies, and was only later adapted to other commands. For sparse working copies, "depth=dirs" probably doesn't make much sense.

Once you've defined what "depth=dirs" means for all the commands that support depth, you're about 10% done ... you'd have to review the implementation for every use of the depth parameter and either add "depth=dirs" semantics, or make sure a reasonable error message is returned if that value is not supported by a particular command:
brane@zulu:~/subversion/trunk$ grep -ir depth subversion | fgrep -v subversion/tests | wc -l
    6217
Good luck with that ... :)

-- Brane

P.S.: That number is a bit inflated, because it counts appearances in documentation etc., but it's still a substantial code change, and not trivial in several cases.


--
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com

Dan Ellis

unread,
May 5, 2014, 1:56:18 PM5/5/14
to Subversion Users


> The "depth" parameter is used in many places, not just in "svn list"; whatever enhancement you come up with must at least fit the other uses. Depth was invented to describe sparse working copies, and was only later adapted to other commands. For sparse working copies, "depth=dirs" probably doesn't make much sense.

> Once you've defined what "depth=dirs" means for all the commands that support depth, you're about 10% done ... you'd have to review the implementation for every use of the depth parameter and either add "depth=dirs" semantics, or make sure a reasonable error message is returned if that value is not supported by a particular command: 

----
Yeah, I figured this would get shot down pretty quick since I mentioned a modification to a key parameter.  Completely understand and agree.  I've been digging around the API and I'm not sure I even see a call/parameter to fetch a directory listing only from the server.  Does with more knowledge of the API know if the API supports this, if so, any pointers (no pun intended).   I'm glad to call an API directly if that would work.

I'd also assume I won't get much traction with a new parameter unique to the list command (should it even be possible).

Thanks,
Dan


Bert Huijben

unread,
May 5, 2014, 3:34:23 PM5/5/14
to Dan Ellis, Subversion Users

The ‘list’ command is really only implemented at a high level, by retrieving the entries of each directory at a time and then filtering the results.

 

There is nothing you can do to really optimize this for directories without changing the ra layer and wire protocol for svn:// and http://.

 

I think it would make at least some users happy to add a ‘streamy’ list operation on the ra layer as it would optimize all the ‘svn ls’ cases, but nobody spend the time to fully implement this yet.

(I actually started an implementation of this some time ago, but never finished this as the compatibility work to support older servers was harder than I anticipated. But a lot of the missing ground work that made it that difficult back then is now implemented)

 

I’m not sure which Subversion client you use for your ‘svn ls’, but recently I compared an 1.6 an 1.8 client over http:// and the difference was *huge*, and 1.9 should be slightly better yet than 1.8. Perhaps just using a newer client might solve your performance problem.

 

 

Note that ‘svn ls’ was never an operation that Subversion was tuned for. Subversion works best on and between entire trees, while ‘ls’ is mostly a diagnostic tool. Some api users actually use the ‘svn status’ backend to quickly obtain a full tree of a repository in a single request in a faster way than ‘ls’

 

                Bert

Reply all
Reply to author
Forward
0 new messages