Support symbolic links in FileTree

496 views
Skip to first unread message

Janito Vaqueiro Ferreira Filho

unread,
Dec 7, 2016, 7:09:40 AM12/7/16
to gradle-dev
Hello,

I encountered a need to archive symbolic links, and following a discussion from 2013 (http://gradle.1045684.n5.nabble.com/Symbolic-links-td5711601.html) I ended up on a design document (https://github.com/gradle/gradle/blob/master/design-docs/copy-spec-improvements.md) to add symlink support to FileTree. I started implementing the first easy steps from the story, but I had a few questions.

First of all, sorry for having this duplicated in the GitHub issue tracker. I only found out about the developer list today after reading the contributing guide. Is there a way to maybe clean up the issue?

Regarding the implementation (more specifically, starting at step 6), should I update the FileVisitor interface? I thought about adding a new method visitSymlink (or visitSymbolicLink?), but that would break backwards compatibility. Another option I thought about was to create a new interface called SymlinkAwareFileVisitor or something like that that extends FileVisitor by adding the extra method. Then FileTree could be updated to have overloads for this type of visitor.

I added the FileTree.visit(Action, SymlinkStrategy) method, but should I also add a FileTree.visit(FileVisitor, SymlinkStrategy) overload?

Lastly, are there any tips on where I should look to implement step 6: "Change the implementation of the file system file tree to honour the symlink strategy"?

Thanks in advance,

Janito

Krzysztof Malinowski

unread,
Dec 8, 2016, 7:24:33 AM12/8/16
to gradle-dev
Hi Janito,

I have not been contributing to the project, but I was investigating this issue as well and I would like to share my thoughts, maybe some of them could be useful.

I believe that adding SymlinkAwareFileVisitor instead of changing FileVisitor interface is better, since it is quite probable that instances of FileVisitor are created in user gradle scripts. If you choose to go that way, I would propose to use visitSymbolicLink instead of visitSymlink, as this is spelled in java.nio.file.Files.isSymbolicLink, but the code currently is non-consistent (and the best example is org.gradle.internal.nativeintegration.filesystem.FileSystem spelling both isSymlink and (can)createSymbolicLink).

IMHO it would be beneficial to implement FileTree.visit(FileVisitor, SymlinkStrategy) as callers may not be interested in processing symlinks on their own (otherwise they would use SymlinkAwareFileVisitor) but rather have them processed appropriately by FileTree.

I would also like to propose to extract isSymlink/canCreateSymbolicLink/createSymbolicLink from FileSystem interface to a separate interface and make FileSystem implement it, as it is done with Chmod and Stat interface. This new interface could be then used in FileTreeElement to support handling of symbolic links (and possibly allow different implementation on the FileSystem implementation, e.g. copying file in createSymbolicLink if symbolic links are not supported etc.).

Of course these are just suggestions, feel free to comment.

Best regards,
Krzysztof

Janito Vaqueiro Ferreira Filho

unread,
Dec 8, 2016, 10:50:38 AM12/8/16
to gradl...@googlegroups.com
Hello,

Sounds good. Should the class be called SymlinkAwareFileVisitor or SymbolicLinkAwareFileVisitor? One is shorter, the other (maybe) follows more the convention.

Here's what I did so far:
- Created the extra interface
- Updated FileTree to have an overload for visit(SymlinkAwareFileVisitor)
- Implemented the method in AbstractFileTree so that it handles symbolic links as regular files
- Implemented the method in CompositeFileTree
- Implemented the method in some other classes keeping the default behavior (preserve the symbolic links)
- Changed MinimalFileTree to have the same overload
- Created a helper AbstractMinimalFileTree class that implements the method falling back to visit(FileVisitor)
- Updated alll classes that I found to directly implement MinimalFileTree to extend AbstractMinimalFileTree, to keep the original behaviour
- Also implemented the method in classes that implement MinimalFileTree, but keeping the default behaviour

My idea is that when symlink handling is supported, the classes will be incrementally changed from extending AbstractMinimalFileTree to implement its own symlink handling code. That's where I'm at now, I've started looking at how to actually handle the symbolic links. I've started at ZipFileTree, and using ZipEntry.getUnixMode() I can find out if the file is a symlink. The problem will be how to handle this in the future, for example, how to traverse symlinks that refer to something not in the archive.

I'll implement tomorrow the FileTree.visit(FileVisitor, SymlinkStrategy) overload in the interface and in AbstractFileTree. And I'll also start looking into the FileSystem changes, thanks for the tips!

I'll probably manage to push all these changes tomorrow, so I'll send another update then.

Thanks!

Janito

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+unsubscribe@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/b384ac43-a776-4f76-97e4-93c7af70afbc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jörn Huxhorn

unread,
Dec 8, 2016, 11:43:38 AM12/8/16
to gradl...@googlegroups.com
Keep in mind that symlinks can cause a recursion.

That situation should be detected and handled in some reasonable way, either by
causing a meaningful error message or by skipping the symlink once it has been 
reached the second time.

Cheers,
Jörn.
> > email to gradle-dev+...@googlegroups.com.
> > To post to this group, send email to gradl...@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/
> > msgid/gradle-dev/b384ac43-a776-4f76-97e4-93c7af70afbc%40googlegroups.com
> >
> > .
> >
> > For more options, visit https://groups.google.com/d/optout.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "gradle-dev"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
> To post to this group, send email to gradl...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/CAO3EQkyuefuVEfoo7nKzcs5wRa6c%2Bq%3DTjQEZqudp9x2%2BieFXfA%40mail.gmail.com.

Janito Vaqueiro Ferreira Filho

unread,
Dec 8, 2016, 12:02:46 PM12/8/16
to gradl...@googlegroups.com
Hello, 

Good point. I'll add it as an issue to my repository so I don't forget to handle that situation. Thanks! 

Regards, 

Janito 


> > To post to this group, send email to gradl...@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/
> > msgid/gradle-dev/b384ac43-a776-4f76-97e4-93c7af70afbc%40googlegroups.com
> >
> > .
> >
> > For more options, visit https://groups.google.com/d/optout.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "gradle-dev"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+unsubscribe@googlegroups.com.

To post to this group, send email to gradl...@googlegroups.com.

Janito Vaqueiro Ferreira Filho

unread,
Dec 12, 2016, 12:21:50 PM12/12/16
to gradl...@googlegroups.com
Hello,

a quick update and some more questions.

First, I pushed the current state to my public repo: https://github.com/jvff/gradle/tree/copy_spec_improvements .

I started implementing the AbstractFileTree.visit(FileVisitor, SymlinkStrategy) overload, but I came to a dilemma. On one hand, I can leave this method unimplemented (just like I did with visit(SymlinkAwareFileVisitor) ) so that only concrete classes implement the method according to their specifications. On the other hand, I can have a "broker implementation" that calls visit(FileVisitor) when the strategy is to preserve files (which is the default strategy), and call a protected visitFollowingSymbolicLinks(FileVisitor) abstract method that the concrete classes have to implement.

The first idea might be simpler, but may lead to repeated code. I like the second idea, which is what I started to do, but I got blocked in FileTreeAdapter. It's mostly because the implementation has to call a method in MinimalFileTree, and I'm not sure if MinimalFileTree should have visit(FileVisitor, FileTree.SymlinkStrategy) or if it should have something closer to visitFollowingSymbolicLinks(FileVisitor) method.

Any opinions on this?

Also, regarding the FileSystem changes, I haven't started them yet because I'm not sure how to name the new interface, what methods should be passed to the new interface and if they should keep their names as is or be renamed into something more generic.

Regards,

Janito

Janito Vaqueiro Ferreira Filho

unread,
Jan 3, 2017, 8:16:12 AM1/3/17
to gradl...@googlegroups.com
Hello,

I'm still having progress, but I'm making some decisions to avoid stalling development. These decisions are so far more low-level implementation details than user-visible behavior, so if any suggestions appear I can change the code later to improve it.

I reached the first user-visible decision though. When handling symlinks in archives, I'm not sure exactly what to do. For starters, all files in the archived will be visited, so I don't think there's an actual need for a symlink strategy. That said, an archive can have symlinks to external files (for example system files) that may or may not be valid. Should this case be handled? Should it at least detect invalid (or external) symlinks and report them as an exception?

Regards,

Janito

Krzysztof Malinowski

unread,
Jan 4, 2017, 6:23:53 AM1/4/17
to gradle-dev
I would prefer not to raise exceptions on invalid symlinks in archives. I have a case where an archive my build depends on has invalid symlinks, pointing to file that are available in other dependency (which may be extracted later). I would suggest to create symlinks as they are, an exception would probably be thrown later if the symlink is actually used.

Best regards,
Krzysztof

Janito Vaqueiro Ferreira Filho

unread,
Jan 4, 2017, 6:33:16 AM1/4/17
to gradl...@googlegroups.com
Hello, 

I agree exceptions shouldn't be thrown when creating or extracting archives, but what about traversal of an archive's contents. In your case, if you requested to traverse an archive following symbolic links, what would you expect as a behavior for external or invalid symlinks? 

Should it access files external to archive? Should it report invalid links or just ignore them? 

Thanks for the feedback, 

Janito 


--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+unsubscribe@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.

Krzysztof Malinowski

unread,
Jan 5, 2017, 5:22:41 AM1/5/17
to gradle-dev
I'm not sure which case you have in mind, but my case was to use Copy task to get files extracted from a TarTree. From what I got from the code (I may be wrong though) when a file from archive is visited it is actually extracted into temporary directory and accessed from there. Current behavior is that an empty file is created instead of symbolic link (in this temporary directory) and I would prefer to have the symbolic link created as is. Following symlinks or not would rather apply to the Copy task not to extracting archive itself; the strategy could be made configurable on a copy spec.

Best regards,
Krzysztof

Janito Vaqueiro Ferreira Filho

unread,
Jan 5, 2017, 6:21:45 AM1/5/17
to gradl...@googlegroups.com
Hello, 

I think a divergence point I just realized we had is in the following sentence:

"Following symlinks or not would rather apply to the Copy task not to extracting archive itself"

I was missing the forest for the trees, and was actually only thinking about how the archive should handle it. 

This is an important insight, and it also leads to different scenarios. For example: consider two archives. If they are extracted to the same directory, then a symlink in the first archive successfully references a file from the second archive. If they are visited separately, the symlink is handled as invalid. 

Thanks for shining a light on this. I'll take a look from a higher abstraction level to rethink the implementation to support more generic scenarios. 

Regards, 

Janito 



Best regards,
Krzysztof

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+unsubscribe@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.

Janito Vaqueiro Ferreira Filho

unread,
Jan 6, 2017, 11:47:03 AM1/6/17
to gradl...@googlegroups.com
Hello,

after giving the scenario some thought, I'm unsure how to proceed. One idea is to add a FOLLOW_OR_PRESERVE symbolic link traversal strategy, that tries to follow symbolic links but preserves them if not possible. The CompositeFileTree class can then be changed to collect preserved symbolic links (when the strategy is FOLLOW) and attempt to resolve them using each child FileTree.

Some problems might arise though. I can think of two:

1. What if a symbolic link can be resolved by two FileTrees? Possible solution: duplicate the entry, letting DuplicateStrategy apply. Would work for regular files, but would merge directories.
2. What if a symbolic link is not relative, but an absolute path? Should an attempt always be made to resolve the link using the file system?

Any ideas if this is a valid path?

Regards,

Janito

Daniel Lacasse

unread,
Jan 20, 2017, 1:50:01 PM1/20/17
to gradle-dev
Hi Janito,

I'm a bit late to the game, and this part of the codebase isn't my expertise. Since the follow/not follow a symbolic link seems to be the biggest hesitation, did you consider introducing a "follow policy" that could you configure as required? By default, we provide two implementations of that policy which are follow/don't follow symlink. In the other cases, a user could provide its implementation through a Closure or a class implementation. A user can then implement that ever follow policy it wants. I would suggest the next step would be to open a PR with your change and we will provide comments on the specific of the implementation.

Don't be shy to correct me if I didn't catch the extent of the problem,

Daniel


On Friday, January 6, 2017 at 11:47:03 AM UTC-5, Janito Vaqueiro Ferreira Filho wrote:
Hello,

after giving the scenario some thought, I'm unsure how to proceed. One idea is to add a FOLLOW_OR_PRESERVE symbolic link traversal strategy, that tries to follow symbolic links but preserves them if not possible. The CompositeFileTree class can then be changed to collect preserved symbolic links (when the strategy is FOLLOW) and attempt to resolve them using each child FileTree.

Some problems might arise though. I can think of two:

1. What if a symbolic link can be resolved by two FileTrees? Possible solution: duplicate the entry, letting DuplicateStrategy apply. Would work for regular files, but would merge directories.
2. What if a symbolic link is not relative, but an absolute path? Should an attempt always be made to resolve the link using the file system?

Any ideas if this is a valid path?

Regards,

Janito
On 5 January 2017 at 09:21, Janito Vaqueiro Ferreira Filho <janit...@gmail.com> wrote:
Hello, 

I think a divergence point I just realized we had is in the following sentence:

"Following symlinks or not would rather apply to the Copy task not to extracting archive itself"

I was missing the forest for the trees, and was actually only thinking about how the archive should handle it. 

This is an important insight, and it also leads to different scenarios. For example: consider two archives. If they are extracted to the same directory, then a symlink in the first archive successfully references a file from the second archive. If they are visited separately, the symlink is handled as invalid. 

Thanks for shining a light on this. I'll take a look from a higher abstraction level to rethink the implementation to support more generic scenarios. 

Regards, 

Janito 
On Jan 5, 2017 08:22, "Krzysztof Malinowski" <ras...@gmail.com> wrote:
I'm not sure which case you have in mind, but my case was to use Copy task to get files extracted from a TarTree. From what I got from the code (I may be wrong though) when a file from archive is visited it is actually extracted into temporary directory and accessed from there. Current behavior is that an empty file is created instead of symbolic link (in this temporary directory) and I would prefer to have the symbolic link created as is. Following symlinks or not would rather apply to the Copy task not to extracting archive itself; the strategy could be made configurable on a copy spec.

Best regards,
Krzysztof

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.

To post to this group, send email to gradl...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages