[Python-ideas] os.path.commonpath()

23 views
Skip to first unread message

Serhiy Storchaka

unread,
Nov 6, 2012, 10:27:28 AM11/6/12
to python...@python.org
See http://bugs.python.org/issue10395.

os.path.commonpath() should be a function which returns right longest common sub-path for specified paths (os.path.commonprefix() is completely useless for this).

There are some open questions about details of *right* behavior.



What should be a common prefix of '/var/log/apache2' and
'/var//log/mysql'?
What should be a common prefix of '/usr' and '//usr'?
What should be a common prefix of '/usr/local/' and '/usr/local/'?
What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
What should be a common prefix of '/usr/bin/..' and '/usr/bin'?

Please, those who are interested in this feature, give consistent answers to these questions.

_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Ronald Oussoren

unread,
Nov 6, 2012, 10:49:42 AM11/6/12
to Serhiy Storchaka, python...@python.org

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <stor...@gmail.com> wrote:

> See http://bugs.python.org/issue10395.
>
> os.path.commonpath() should be a function which returns right longest common sub-path for specified paths (os.path.commonprefix() is completely useless for this).
>
> There are some open questions about details of *right* behavior.
>
>
>
> What should be a common prefix of '/var/log/apache2' and
> '/var//log/mysql'?

/var/log

> What should be a common prefix of '/usr' and '//usr'?

/usr

> What should be a common prefix of '/usr/local/' and '/usr/local/'?

/usr/local

> What should be a common prefix of '/usr/local/' and '/usr/local/bin'?

/usr/local

> What should be a common prefix of '/usr/bin/..' and '/usr/bin'?

/usr/bin

In all cases the path is first split into its elements, then calculate the largest common prefix of the two sets of elements, then join the elements back up again.

Some cases you don't mention:

* Relative paths that don't share a prefix should raise an exception
* On windows two paths that don't have the same drive should raise an exception

The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.

>
> Please, those who are interested in this feature, give consistent answers to these questions.

Ronald

Eli Bendersky

unread,
Nov 6, 2012, 8:01:50 PM11/6/12
to Ronald Oussoren, Serhiy Storchaka, python-ideas

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <stor...@gmail.com> wrote:

> See http://bugs.python.org/issue10395.
>
> os.path.commonpath() should be a function which returns right longest common sub-path for specified paths (os.path.commonprefix() is completely useless for this).
>
> There are some open questions about details of *right* behavior.
>
>
>
> What should be a common prefix of '/var/log/apache2' and
> '/var//log/mysql'?

/var/log

> What should be a common prefix of '/usr' and '//usr'?

/usr

> What should be a common prefix of '/usr/local/' and '/usr/local/'?

/usr/local

> What should be a common prefix of '/usr/local/' and '/usr/local/bin'?

/usr/local

> What should be a common prefix of '/usr/bin/..' and '/usr/bin'?

/usr/bin

In all cases the path is first split into its elements, then calculate the largest common prefix of the two sets of elements, then join the elements back up again.

+1

Eli

Bruce Leban

unread,
Nov 6, 2012, 9:05:30 PM11/6/12
to Ronald Oussoren, Serhiy Storchaka, python...@python.org
It would be nice if in conjunction with this os.path.commonprefix is renamed as string.commonprefix with the os.path.commonprefix kept for backwards compatibility (and deprecated).

more inline

On Tue, Nov 6, 2012 at 7:49 AM, Ronald Oussoren <ronaldo...@mac.com> wrote:

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <stor...@gmail.com> wrote:
> What should be a common prefix of '/var/log/apache2' and '/var//log/mysql'?
/var/log

> What should be a common prefix of '/usr' and '//usr'?
/usr

> What should be a common prefix of '/usr/local/' and '/usr/local/'?
/usr/local

It appears that you want the result to never include a trailing /. However, you've left out one key test case:

What is commonpath('/usr', '/var')?

It seems to me that the only reasonable value is '/'.

If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point.

It would also be a bit surprising that there are cases where commonpath(a,a) != a.

 
> What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
/usr/local

> What should be a common prefix of '/usr/bin/..' and '/usr/bin'?
/usr/bin

seems better than the alternative of interpreting the '..'.

* Relative paths that don't share a prefix should raise an exception

Why? Why is an empty path not a reasonable result?
 
* On windows two paths that don't have the same drive should raise an exception

I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.


The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.

Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case. 


--- Bruce

Greg Ewing

unread,
Nov 7, 2012, 12:15:48 AM11/7/12
to python...@python.org
Bruce Leban wrote:

> If you change the semantics so that it either (1) it always always
> includes a trailing / or (2) it includes a trailing slash if the two
> paths have it in common, then you don't have the weirdness that in this
> case it returns a slash and in others it doesn't. I am slightly inclined
> to (1) at this point.

But then the common prefix of "/a/b" and "/a/c" would be "/a/",
which would be very unexpected -- usually the dirname of a path is
not considered to include a trailing slash.

The special treatment of the root directory is no weirder than it
is anywhere else. It's already special, since in unix it's the
only case where a trailing slash is semantically significant.
(To the kernel, at least -- a few command line utilities break this
rule, but they're screwy.)

--
Greg

David Townshend

unread,
Nov 7, 2012, 12:59:49 AM11/7/12
to Greg Ewing, python...@python.org
This seems to be overlapping quite a lot with the recent discussion on object-oriented paths (http://mail.python.org/pipermail/python-ideas/2012-October/016338.html) and this question of how paths are represented on different systems was discussed quite extensively.  I'm not sure where the thread left off, but if PEP 428 is still going ahead then maybe this is something that should be brought into it.

David

Bruce Leban

unread,
Nov 7, 2012, 1:05:58 AM11/7/12
to Greg Ewing, python...@python.org
On Tue, Nov 6, 2012 at 9:15 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
Bruce Leban wrote:

If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point.

But then the common prefix of "/a/b" and "/a/c" would be "/a/",
which would be very unexpected -- usually the dirname of a path is
not considered to include a trailing slash.

Although less confusing than the current behavior :-) 

The special treatment of the root directory is no weirder than it
is anywhere else. It's already special, since in unix it's the
only case where a trailing slash is semantically significant.
(To the kernel, at least -- a few command line utilities break this
rule, but they're screwy.)

That's reasonable. Perhaps it's sufficient to document it clearly.

--- Bruce
 

Ronald Oussoren

unread,
Nov 7, 2012, 2:22:40 AM11/7/12
to Bruce Leban, Serhiy Storchaka, python...@python.org
On 7 Nov, 2012, at 3:05, Bruce Leban <br...@leapyear.org> wrote:

It would be nice if in conjunction with this os.path.commonprefix is renamed as string.commonprefix with the os.path.commonprefix kept for backwards compatibility (and deprecated).

more inline

On Tue, Nov 6, 2012 at 7:49 AM, Ronald Oussoren <ronaldo...@mac.com> wrote:

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <stor...@gmail.com> wrote:
> What should be a common prefix of '/var/log/apache2' and '/var//log/mysql'?
/var/log

> What should be a common prefix of '/usr' and '//usr'?
/usr

> What should be a common prefix of '/usr/local/' and '/usr/local/'?
/usr/local

It appears that you want the result to never include a trailing /. However, you've left out one key test case:

What is commonpath('/usr', '/var')?

It seems to me that the only reasonable value is '/'.

I agree


If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point.

I'd prefer to only have a path seperator at the end when it has semantic meaning. That would mean that only the root of a filesystem tree ("/" on Unix, but also "C:\" and "\\server\share\" on Windows) have a separator and the end.


It would also be a bit surprising that there are cases where commonpath(a,a) != a.

That's already true, commonpath('/usr//bin', '/usr//bin') would be  '/usr/bin' and not '/usr//bin'.


 
> What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
/usr/local

> What should be a common prefix of '/usr/bin/..' and '/usr/bin'?
/usr/bin

seems better than the alternative of interpreting the '..'.

That was the hard choice in the list, my reason for picking this result is that interpreting '..' can change the meaning of a path when dealing with symbolic links and therefore would make the function less useful (and you can always call os.path.normpath when you do want to interpret '..').  

Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls', '/usr/bin/sh') could be '/usr/bin'. 


* Relative paths that don't share a prefix should raise an exception

Why? Why is an empty path not a reasonable result?

An empty string is not a valid path.  Now that I reconsider this question: "." would be a valid path, and would have a sane meaning.

 
* On windows two paths that don't have the same drive should raise an exception

I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.

The paths in URLs don't have a drive, hence both URL paths would have the "same" drive.   More importantly: posixpath.commonpath would be better to compare two http or remote unix paths as that function uses the correct separator (ntpath.commonpath uses a backslash as separator)

Also: when two paths have a different drive letter or UNC share name there is no way to have a value for the prefix that allows for the construction of a path from the common prefix to one of those paths.

That is,

     path1 = "c:\windows"
     path2 = "d:\data"

     pfx = commonpath(path1, path2)

The only value of pfx that would result in there being a value of 'sfx' such that   os.path.join(pfx, sfx) == path1 is the empty string, but that value does not refer to a filesystem location.  That means you have to explictly test if commonpath returns the empty string because you likely have to behave differently when there is no shared prefix. I'd then prefer if commonpath raises an exception, because it would be too easy to forget to check for this (especially when developing on a unix based platform and later porting to windows).  An exception would mean code blows up, instead of giving unexpected results (leading to questions like "Why is your program writing junk in my home directory?")



The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.

Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case. 

"/" *is* the common prefix for absolute paths on Unix that don't share any path elements.  As mentioned above "." (or rather os.path.curdir) would be a sane result for relative paths.

Ronald 


--- Bruce

Serhiy Storchaka

unread,
Nov 7, 2012, 3:20:59 AM11/7/12
to python...@python.org
On 06.11.12 17:49, Ronald Oussoren wrote:
> On 6 Nov, 2012, at 16:27, Serhiy Storchaka <stor...@gmail.com> wrote:
>> There are some open questions about details of *right* behavior.

I only asked the questions for which there are different opinions or for which I myself doubt.

>> What should be a common prefix of '/var/log/apache2' and
>> '/var//log/mysql'?
> /var/log

I think so too.

>> What should be a common prefix of '/usr' and '//usr'?
> /usr

normpath() preserves leading double slash (but not triple). That's why I asked the question.

>> What should be a common prefix of '/usr/local/' and '/usr/local/'?
> /usr/local

os.path.split('/usr/local/') is ('/usr/local', ''). Repeated application of os.path.split() gives us ('/', 'usr', 'local', ''). That's why I assume that it is possible appropriate here to preserve the trailing slash. I'm not sure.

>> What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
> /usr/local

Here the same considerations as for the previous question. In any case a common prefix of '/usr/local/etc' and '/usr/local/bin' should be '/usr/local'.

> * Relative paths that don't share a prefix should raise an exception

I disagree. A common prefix for relative paths on the same drive is a current directory on this drive (if we decide to drop '..').

> * On windows two paths that don't have the same drive should raise an exception
> The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.

May be. This should be the same result (None or an exception) as for empty list or mixing of absolute and relative paths.

Thank you for your answers.

Serhiy Storchaka

unread,
Nov 7, 2012, 3:30:55 AM11/7/12
to python...@python.org
On 07.11.12 04:05, Bruce Leban wrote:
> It would be nice if in conjunction with this os.path.commonprefix is
> renamed as string.commonprefix with the os.path.commonprefix kept for
> backwards compatibility (and deprecated).

Agree.

> more inline

In most cases I agree with Greg and Ronald.

Serhiy Storchaka

unread,
Nov 7, 2012, 3:51:46 AM11/7/12
to python...@python.org
On 07.11.12 09:22, Ronald Oussoren wrote:
>> It would also be a bit surprising that there are cases where
>> commonpath(a,a) != a.
>
> That's already true, commonpath('/usr//bin', '/usr//bin') would be
> '/usr/bin' and not '/usr//bin'.

Yes, the current implementation does not preserve the repeated slashes, this is an argument for the answer that commonpath(['/usr//bin', '/usr/bin']) should return '/usr/bin' and not '/usr'.

However it would be a bit surprising that there are cases where commonpath([normpath(a), normpath(a)]) != normpath(a).

> Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls',
> '/usr/bin/sh') could be '/usr/bin'.

May be.

> An empty string is not a valid path. Now that I reconsider this
> question: "." would be a valid path, and would have a sane meaning.

Looks reasonable, but I am not sure. A returned value most probably will be used in join() and this will add an unexpected './' at the start of path.
Reply all
Reply to author
Forward
0 new messages