Symlinks

3 views
Skip to first unread message

Phil Norman

unread,
Aug 25, 2020, 5:44:58 PM8/25/20
to evergre...@googlegroups.com
Hi.

Evergreen scans the entire workspace when it is created. On a very large source tree, this can be slow, and so I'd like to set up some scheme whereby I can map in parts of the tree using 'ln -s'.

Unfortunately, Evergreen doesn't allow symlinks to be followed by default, when building its 'open quickly' list. Of course, there are cases where this is the behaviour you want.

I was thinking of adding an option to the "new workspace" dialog, to enable symlink scanning. Does that sound sensible, or is there another approach that would be better?

Note: I did try using 'mount --bind' instead, but that seems not to like fuse filesystems, as far as I can tell.

Cheers,
Phil

Martin Dorey

unread,
Aug 26, 2020, 1:18:22 AM8/26/20
to evergre...@googlegroups.com
Wasn’t this sorted a couple of years ago, with 
https://github.com/software-jessies-org/jessies/commit/b96d9b13365dd9235c70a1c8fe85d0d990fd9e5c?  There’s not much in the check in comment but I was quite complimentary in the email thread that ended, far as I see, with “I don't think I've ever successfully used Open Import“.

I have to admit, though, that I haven’t started Evergreen in some months, certainly since the Before Times, back when I had more screen real estate to waste on ETitleBar or whatever it’s called.

--


You received this message because you are subscribed to the Google Groups "evergreen-users" group.


To unsubscribe from this group and stop receiving emails from it, send an email to evergreen-use...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/evergreen-users/CAOa8eG7vPn_UALr5212hVbQ0YgXH7H7zdWM44EWvb_DtnbBzUg%40mail.gmail.com.




Phil Norman

unread,
Aug 26, 2020, 1:37:01 AM8/26/20
to evergre...@googlegroups.com
Nah, the problem's in the FileIgnorer: its instance variable (includeAllSymbolicLinks) is false by default, and only ever gets set to true from the ExternalTools.scanToolsDirectory function.


Martin Dorey

unread,
Aug 26, 2020, 2:07:07 AM8/26/20
to evergre...@googlegroups.com
Perhaps you’re still focused on this:

> Unfortunately, Evergreen doesn't allow symlinks to be followed by default

I was instead talking about this:

> Evergreen scans the entire workspace when it is created. On a very large source tree, this can be slow

But perhaps “created” isn’t being used to mean “opened” as I’d assumed: perhaps you really are creating new work spaces every day rather than every couple of months.

Phil Norman

unread,
Aug 26, 2020, 2:59:05 AM8/26/20
to evergre...@googlegroups.com
Ah,

Yes, I'm very much focused on the symlinks bit, as this is a _very_ large source tree, and just scanning it would take a very, very long time. How long, I'm not sure, but I'd guess O(hours). The fact that it's accessed over the network via some fuse filesystem doesn't help. A full scan simply isn't feasible here.

Cheers,
Phil

Phil Norman

unread,
Aug 26, 2020, 3:00:05 AM8/26/20
to evergre...@googlegroups.com
Maybe we can also fix the issue where creating a new file causes a full rescan - it should be feasible just to insert the new filename into the list.

Phil Norman

unread,
Aug 26, 2020, 4:05:00 AM8/26/20
to evergre...@googlegroups.com
Hmm. Actually it's a bit more complex than that. Just saying "yes" to symlinks isn't enough: making this work involves adding a 'readlink' posix jni wrapper, and then writing code to recurse into symlinks and find the underlying dir (or file).

It does make me wonder: why do we have our own special posix layer, rather than just using the Java built-in file scanning? Presumably there's a good reason, but I'd like to know what it is.

Cheers,
Phil

Martin Dorey

unread,
Aug 26, 2020, 10:30:41 AM8/26/20
to evergre...@googlegroups.com
> why do we have our own special posix layer, rather than just using the Java built-in file scanning?

... supports my prejudice that it dates from when Java’s built in facility was in a version - Java 7 - that was too shiny for us to have everywhere.

Phil Norman

unread,
Aug 26, 2020, 1:35:24 PM8/26/20
to evergre...@googlegroups.com
Sounds kind of reasonable. I've just done some profiling to see what state things are in these days. I'm using Java 11 (openjdk 11.0.8 2020-07-14) on Linux, with an ext4 filesystem. I ran things a few times, so everything's cached anyway.

The profiling is done using the Stopwatch class. I basically have a 10000-iteration loop that counts the result of whether something's a symlink or not, which is eventually printed out (to avoid anything being optimised out).

The directories, files and symlinks are all in the same directory, and their filenames are exactly the same length as each other (21 chars). Just in case that could have a bearing (eg copying across the JNI boundary).

I first ran that on all combinations of dir, symlink-to-dir, file, symlink-to-file. Got some fun results:

        | file  | sym-file | dir   | sym-dir
NIO     | 12ms  | 13ms     | 16ms  | 40ms
Posix   | 31ms  | 19ms     | 19ms  | 21ms

Then, just to check, I ran that loop in another 100-iteration loop, to check how variable the timing is. I then get this:

        | file                   | sym-file               | dir                    | sym-dir
NIO     | 9ms..18ms (mean 10ms)  | 9ms..16ms (mean 9ms)   | 9ms..17ms (mean 9ms)   | 9ms..43ms (mean 10ms)
Posix   | 17ms..21ms (mean 17ms) | 17ms..25ms (mean 17ms) | 17ms..20ms (mean 17ms) | 16ms..29ms (mean 17ms)

So I guess after enough calls, the NIO one gets heavily hotspotted, which can't be done so easily for the posix version.

So the results: NIO is always faster than the Posix JNI thing for plain files and directories. NIO starts off significantly slower for symlinks to directories (this is consistently the case across multiple single-shot runs), but once the optimiser has had enough fun with it, that goes away.

Eventually, NIO becomes roughly twice as fast as Posix/JNI. IMO, switching to NIO would be worthwhile. It'd be less code on our side, and the initial hit of 'is symlink' on a symlink-to-directory being a bit slower is probably worth taking.

Sound reasonable?

Cheers,
Phil





Martin Dorey

unread,
Aug 26, 2020, 1:53:00 PM8/26/20
to evergre...@googlegroups.com
... confirms we’re dependent on at least Java 8, which I’m still mainly using, these days, so I can’t see there being any objection.

I think the thing about avoiding symlinks was to avoid presenting files within the work area twice (and maybe messing up by replacing the symlink by a regular file instead of editing through the link).  If the symlink pointed outside the area, that wouldn’t come into play.  Perhaps that would avoid the need for an option with your original proposal.

Phil Norman

unread,
Aug 27, 2020, 11:24:30 AM8/27/20
to evergre...@googlegroups.com
Hmm, the question of 'where is the symlink pointing' is a good one. For files, presumably overwriting the symlinked file is what we'd want. We could do that by resolving the true pathname of the file on load. Although I suspect that's already done, as loading files under a symlinked directory opens them under their "true" name already.

For symlinked directories things are a little more complex. Listing the same file twice is a bad thing, so we should avoid that. Also we want to avoid infinite recursion.

I think we can get around these by adhering to two rules (at least during file scanning):
1: If a dir symlink points to the workspace directory or any subdirectory of it, we ignore it.
2: If a dir symlink points to a parent directory of the workspace dir, we ignore it.
3: Multiple symlinks to external dirs would also need careful attention, with similar rules to the above (although we may additionally want to protect against two symlinks, one pointing to dir A and another to A/B, causing repeated files). Interesting.

First step, though, is to switch to NIO. I'll see when I can get some time to work on that.

Cheers,
Phil

Phil Norman

unread,
Aug 27, 2020, 2:13:09 PM8/27/20
to evergre...@googlegroups.com
Hi again.

I've started working on updating us to NIO. I've created a git branch called "nio" tracking this. Feel free to try it out (so far it only includes the deletion of two unused functions), but I don't guarantee not to break it :-)

I'll get back as and when it's in a state where it can be merged back into the main branch.

Cheers,
Phil

Phil Norman

unread,
Sep 6, 2020, 10:01:20 AM9/6/20
to evergre...@googlegroups.com
Hi.

I've just created a pull request for merging the 'nio' branch back onto master. It's quite a big set of changes, so I've sent it to enh to have a second pair of eyes on it. I'll be using it for everyday work, so it'll get a reasonable amount of real live testing. If anyone wants to give it a go in the meantime, please feel free.

Once this is in, I have some plans on improving how file update detection works. Currently, we just stick a listener on the root dir of each workspace, so if a file changes there the entire workspace is re-scanned, while changes elsewhere require a manual 'rescan'.

My plan is to add all the dirs in the workspace to the watch list, and then perform only partial updates on the workspace file list. This should make things both more correct, and considerably more efficient.

I also intend to do similar things to the 'find in files' dialog, as doing a complete re-match as a result of saving one file is both inefficient and annoying.

Cheers,
Phil

Phil Norman

unread,
Sep 27, 2020, 10:27:59 AM9/27/20
to evergre...@googlegroups.com
Hi again.

I've continued to work on the "nio" branch. So far it contains the following changes:

It supports symlinks, as discussed previously.

The 'find in files' dialog is no longer updated if it's not open. Previously, rescans would happen every time a file in the root dir was modified, or whenever a file was saved, even if the dialog was closed. This is no longer the case. A full rescan is still done when the dialog opens, so any stale state won't be visible.

Previously, any time the root dir of the workspace was changed, or a file was saved, the 'find in files' dialog would perform a complete rescan of the workspace. This was inefficient, but also would force all tree nodes open, so if the user was collapsing parts of the tree as they went through the matches, every save would undo that (and scroll back to the top of the matches). I've changed this so that Evergreen now watches all dirs using the file watcher provided by the NIO API. When a file is changed, added or deleted, only that file will be rescanned (or removed from the matches tree). This is quicker, and also preserves the state of the tree.

If anyone wants to try out the 'nio' branch, I'd be grateful for any bug reports. I'm running it for work, and so far it seems to be functioning correctly, but a second pair of eyes is always helpful.

Cheers,
Phil


Reply all
Reply to author
Forward
0 new messages