How to build unix paths on windows with build-path/convention-type

閲覧: 79 回
最初の未読メッセージにスキップ

Alex Harsanyi

未読、
2018/05/22 6:51:032018/05/22
To: Racket Users
I am trying to create a path named "/foo/bar" in Racket on a windows machine.  build-path produces "/foo\\bar" and build-path/convention-type does not seem to work:

     > (path->string (build-path "/" "foo" "bar"))
    "/foo\\bar" ; I am running on a Windows machine, so this is expected
    > (path->string (build-path/convention-type 'windows "/" "foo" "bar"))
    "/foo\\bar"
    > (path->string (build-path/convention-type 'unix "/" "foo" "bar"))
    ; build-path/convention-type: specified convention incompatible with string path element
    ;   path element: "/"
    ;   convention: 'unix
    > (path->string (build-path/convention-type 'unix "/foo" "bar"))
    ; build-path/convention-type: specified convention incompatible with string path element
    ;   path element: "/foo"
    ;   convention: 'unix

It seems that I cannot specify the root path, "/", when the convention type is set to 'unix.  Technically, the error is correct, as "/" is not a valid directory name, but I am not sure what to replace it with. The empty string does not work either.   The unix convention type seems to be more strict than the windows one:

    > (path->string (build-path/convention-type 'windows "./foo/" "bar"))
    "./foo/bar"
    > (path->string (build-path/convention-type 'unix "./foo/" "bar"))
    ; build-path/convention-type: specified convention incompatible with string path element
    ;   path element: "./foo/"
    ;   convention: 'unix

Is this a bug, or I am missing something?

Alex.

Matthew Flatt

未読、
2018/05/22 7:56:212018/05/22
To: Alex Harsanyi、Racket Users
To build paths for a convention other than the current machine's
convention, you have to work in bytes instead of strings.

(define (bs->p bs) (bytes->path bs 'unix))
(build-path/convention-type 'unix (bs->p #"/") (bs->p #"foo") (bs->p #"bar"))

Roughly, strings don't work, because they have to be converted to bytes
using the locale's default encoding. Although strings are allowed for
the current platform's convention on the assumption that the current
locale's encoding is the right one, we've avoided building in any
assumption about the encoding for the other convention.
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alex Harsanyi

未読、
2018/05/22 20:11:462018/05/22
To: Racket Users
Hi Matthew,

Thanks for clarifying this.

It seems to me that `build-path/convention-type` is very difficult to use correctly:  it is very easy for someone working on a Linux machine to just use this function with a 'unix convention and plain strings, under the assumption that it will work correctly on a Windows machine.   The documentation for the function seem imply that it is this easy, but it is not.  Even the return value for the function seems to be a "unix-path" if I run it with a 'unix convention, but a "path" if I run it with a 'windows convention, so `path->string` will not always work on the return type.

Even when it fails, the error message is confusing, as "foo" is in fact a valid "path element" :

     > (build-path/convention-type 'unix "foo" "bar")
    build-path/convention-type: specified convention incompatible with string path element
      path element: "foo"
      convention: 'unix

I would suggest that the documentation for this function is updated to at least mention this caveat.   I'm not sure how the error message could be improved, but it is confusing as it is.  For me, the following wrapper function seems to work OK:

    (define (build-path/ct type base . sub)
      (define b-base (string->bytes/utf-8 base))
      (define b-sub (map string->bytes/utf-8 sub))
      (define result (apply build-path/convention-type
                            type
                            (bytes->path b-base type)
                            (map (lambda (b) (bytes->path b type)) b-sub)))
      (bytes->string/utf-8 (path->bytes result)))

BTW, I found this problem, when working on the "frog" blog generator:  its unit tests fail on windows because it generates windows style paths and compares them to unix style path strings and it also generates the internal links for the HTML  pages using the Windows convention.

Best regards,
Alex,

David K. Storrs

未読、
2019/01/10 9:14:512019/01/10
To: Racket Users


On Tuesday, May 22, 2018 at 7:56:21 AM UTC-4, Matthew Flatt wrote:
To build paths for a convention other than the current machine's
convention, you have to work in bytes instead of strings.

  (define (bs->p bs) (bytes->path bs 'unix))
  (build-path/convention-type 'unix (bs->p #"/") (bs->p #"foo") (bs->p #"bar"))

Roughly, strings don't work, because they have to be converted to bytes
using the locale's default encoding. Although strings are allowed for
the current platform's convention on the assumption that the current
locale's encoding is the right one, we've avoided building in any
assumption about the encoding for the other convention.


Why is this the case?  For that matter, why are strings and paths even separate datatypes?  Racket is the only language I've ever seen that does this, and I can't figure out what the value add is.  Especially since most 'path' functions will accept either paths or strings, it's clear that Racket *can* do all the relevant things without needing the 'path' type.  Certainly the 'path' type causes a lot of friction when serializing or persisting.

 

George Neuner

未読、
2019/01/10 11:45:022019/01/10
To: David K. Storrs、racket users

On 1/10/2019 9:14 AM, David K. Storrs wrote:
On Tuesday, May 22, 2018 at 7:56:21 AM UTC-4, Matthew Flatt wrote:
To build paths for a convention other than the current machine's
convention, you have to work in bytes instead of strings.

  (define (bs->p bs) (bytes->path bs 'unix))
  (build-path/convention-type 'unix (bs->p #"/") (bs->p #"foo") (bs->p #"bar"))

Roughly, strings don't work, because they have to be converted to bytes
using the locale's default encoding. Although strings are allowed for
the current platform's convention on the assumption that the current
locale's encoding is the right one, we've avoided building in any
assumption about the encoding for the other convention.


Why is this the case?  For that matter, why are strings and paths even separate datatypes?  Racket is the only language I've ever seen that does this, and I can't figure out what the value add is.  Especially since most 'path' functions will accept either paths or strings, it's clear that Racket *can* do all the relevant things without needing the 'path' type.  Certainly the 'path' type causes a lot of friction when serializing or persisting.

Lisp does it too.  The idea, naturally, was code portability: Lisp comes from a time when there were many competing [and incompatible] machines and operating systems.  Although the world largely has been reduced now to  *nix  and Windows, the idea of a portable solution still has attraction.

Obviously a given path may not be usable on a system having a different filesystem structure ... that's not the purpose.  The purpose is that the program can locate things and manipulate paths in the same manner using the same functions on any supported system.

Having to work with paths is a PITA if you mostly work on one system.  But if a program really needs to be portable, they can save you a lot of grief.

YMMV,
George

Matthew Flatt

未読、
2019/01/10 12:03:332019/01/10
To: David K. Storrs、Racket Users
At Thu, 10 Jan 2019 06:14:51 -0800 (PST), "David K. Storrs" wrote:
>
>
> On Tuesday, May 22, 2018 at 7:56:21 AM UTC-4, Matthew Flatt wrote:
> >
> > To build paths for a convention other than the current machine's
> > convention, you have to work in bytes instead of strings.
> >
> > (define (bs->p bs) (bytes->path bs 'unix))
> > (build-path/convention-type 'unix (bs->p #"/") (bs->p #"foo") (bs->p
> > #"bar"))
> >
> > Roughly, strings don't work, because they have to be converted to bytes
> > using the locale's default encoding. Although strings are allowed for
> > the current platform's convention on the assumption that the current
> > locale's encoding is the right one, we've avoided building in any
> > assumption about the encoding for the other convention.
> >
> >
> Why is this the case?

Do you mean "why are paths fundamentally byte strings instead of
Unicode character strings?"?

The byte-string API for paths is a property of the OS and filesystem.
Although some filesystems can be configured to reject paths that do not
correspond to some specific encoding of Unicode strings (and a Mac OS
filesystem is normally configured to allow only UTF-8 encodings), most
filesystems do not have any such constraint. To provide full access to
the filesystem, Racket works with byte strings as the fundamental
representation of paths.

Often, you want to view paths as strings, and Racket helps with that as
much as it can. But that view is an approximation --- in part because
there's not a 1-to-1 mapping between paths and strings, but also
because the mapping is sometimes intended to be dynamically selected by
the locale. As soon as you start saving paths to use later, you need to
be more careful about the representation of paths, and you should avoid
conversions to and from strings.

David K. Storrs

未読、
2019/01/10 12:06:502019/01/10
To: Racket Users


On Thursday, January 10, 2019 at 11:45:02 AM UTC-5, gneuner2 wrote:

On 1/10/2019 9:14 AM, David K. Storrs wrote:
On Tuesday, May 22, 2018 at 7:56:21 AM UTC-4, Matthew Flatt wrote:
To build paths for a convention other than the current machine's
convention, you have to work in bytes instead of strings.

  (define (bs->p bs) (bytes->path bs 'unix))
  (build-path/convention-type 'unix (bs->p #"/") (bs->p #"foo") (bs->p #"bar"))

Roughly, strings don't work, because they have to be converted to bytes
using the locale's default encoding. Although strings are allowed for
the current platform's convention on the assumption that the current
locale's encoding is the right one, we've avoided building in any
assumption about the encoding for the other convention.


Why is this the case?  For that matter, why are strings and paths even separate datatypes?  Racket is the only language I've ever seen that does this, and I can't figure out what the value add is.  Especially since most 'path' functions will accept either paths or strings, it's clear that Racket *can* do all the relevant things without needing the 'path' type.  Certainly the 'path' type causes a lot of friction when serializing or persisting.
Lisp does it too.  The idea, naturally, was code portability: Lisp comes from a time when there were many competing [and incompatible] machines and operating systems.  Although the world largely has been reduced now to  *nix  and Windows, the idea of a portable solution still has attraction.

Obviously a given path may not be usable on a system having a different filesystem structure ... that's not the purpose.  The purpose is that the program can locate things and manipulate paths in the same manner using the same functions on any supported system.

Having to work with paths is a PITA if you mostly work on one system.  But if a program really needs to be portable, they can save you a lot of grief.

YMMV,
George

Hm.  I'm not seeing it.  Perl, Python, and (ugh) Java can all handle strings for paths and manage them portably.  (e.g. Perl will understand that, when on Windows, "/foo/bar" should be equivalent to "\\foo\\bar".)   Sure, if you pass "/usr/bin/touch" to a freshly-installed Windows box it will tell you to take a hike because none of the elements of that path exist.  That's not the point.  If you pass a string that models a path that is valid on a machine then it should work regardless of OS.  Granted, Windows is a bit of a special snowflake in that "/foo/bar" is relative to the current drive on Windows but absolute on a Unix box.  Still, provided that I manage my expectations correctly and ensure that the necessary file structures exist, I see no reason why it shouldn't work.

What, precisely, requires Racket to have separate string, path, and windows-path datatypes?  Is it simply historical reasons, or is there an actual reason?

Matthew Flatt

未読、
2019/01/10 12:39:502019/01/10
To: David K. Storrs、Racket Users
At Thu, 10 Jan 2019 09:06:50 -0800 (PST), "David K. Storrs" wrote:
> Hm. I'm not seeing it. Perl, Python, and (ugh) Java can all handle
> strings for paths and manage them portably. (e.g. Perl will understand
> that, when on Windows, "/foo/bar" should be equivalent to "\\foo\\bar".)
> Sure, if you pass "/usr/bin/touch" to a freshly-installed Windows box it
> will tell you to take a hike because none of the elements of that path
> exist. That's not the point. If you pass a string that models a path that
> is valid on a machine then it should work regardless of OS. Granted,
> Windows is a bit of a special snowflake in that "/foo/bar" is relative to
> the current drive on Windows but absolute on a Unix box. Still, provided
> that I manage my expectations correctly and ensure that the necessary file
> structures exist, I see no reason why it shouldn't work.

No, Windows paths are not remotely that straightforward. As the
simplest example I can think of, "//for/bar" is completely different
from "/foo/bar". If you manipulate the Windows path "//foo/bar" as a
Unix path, then you might end up thinking that the directory path
"//foo" or "/foo" is a prefix of "//foo/bar" (which is a UNC drive).

Another example: suppose that you read "C:/foo " (with a trailing
space) from some input, where pretty much all Windows tools all
recognize as a reference to a "foo" directory on your "C:" filesystem.
If you then simply append "/bar" to that string then, you don't get a
path that refers to the "bar" file in "foo", because "C:/foo /bar" has
a space in the middle, and that's different than a space at the end.
Replace " " with "."; same deal.

Of course, if you want to directory refer to a directory named "foo "
with a trailing space, you can do that by using the form "\\?\c:\foo "
--- because "\\?" means "no, really, I meant it", and it changes how
"/" and "\" work. (I am leaving out the escaping "\" needed to render
those as Racket strings.) And "\\?\" has its own parsing rules.

If your program somehow only needs to generate paths and never needs to
consume any path --- not even though `current-directory` --- then you
can probably stick to a subset of pathnames where string functions and
"/" will work. Otherwise, Racket's path API is trying to help you see
where you're doing it wrong.

David K. Storrs

未読、
2019/01/10 13:39:452019/01/10
To: Racket Users
Okay, fair enough.  I grant you that these are valid issues, although I will note that I have never actually had a problem under real-world usage.  (Granted, I haven't done that much cross-platform work.)   It would be nice if Racket would DWIM a bit more generous when given strings as paths and then be strict when given paths as paths.   I think that would even make sense; the fact that Racket has two ways of representing paths could be a semantic difference -- using strings for paths means "DWIM for me and if I screw up it's on me" and paths would mean "Use exactly what I'm giving you, no DWIMming allowed."  Although, at 7th RacketCon John Clements (I think it was John?) made the point to me that Racket is a B&D language that offers strictness as a feature, so I understand why this wouldn't work with that philosophy.

At the very least, though, the error messages on build-path/convention-type could be rephrased.  For example:

> (build-path/convention-type 'windows (build-path "foo"))
; build-path/convention-type: specified convention incompatible with given path
;   element
;   path element: #<path:foo>
;   convention: 'windows
; [,bt for context]

I would suggest:

; build-path/convention-type: specified convention incompatible with given path
;   element.  Elements must satisfy: windows-path?
;   path element: #<path:foo>
;   convention: 'windows
; [,bt for context]

It could even be helpful and say:

; build-path/convention-type: specified convention incompatible with given path
;   element.  Elements must satisfy: windows-path?  Use (bytes->path #"foo" 'windows)

全員に返信
投稿者に返信
転送
新着メール 0 件