RFC: dirstream-1.0.0

312 views
Skip to first unread message

Gabriel Gonzalez

unread,
Sep 15, 2013, 12:49:11 AM9/15/13
to haskel...@googlegroups.com
`dirstream` is pretty close to completion. Now it supports Windows, and
I am guessing it supports OSX by virtue of the Unix support. Is there
somebody here who has a Mac and can verify it works on OSX?

I remember that some of you previously mentioned that you didn't like
the name although I'm still partial to `dirstream`. The main reason is
that I'm trying to gradually move away from the `pipes` prefix the
further out the libraries get. However, it will still be under the
`Pipes` category of Hackage.

You can find the release candidate here:

https://github.com/Gabriel439/Haskell-DirStream-Library

Oliver Charles

unread,
Sep 15, 2013, 5:36:05 AM9/15/13
to haskel...@googlegroups.com

I like it in its current incantation. I haven't used it a lot, but when making a little example with it for my talk, I found it natural enough. The only thing I'm unclear on is why the tutorial uses 'every' - isn't 'enumerate' enough?

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscribe@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Gabriel Gonzalez

unread,
Sep 15, 2013, 11:15:35 AM9/15/13
to haskel...@googlegroups.com, Oliver Charles
On 09/15/2013 02:36 AM, Oliver Charles wrote:

I like it in its current incantation. I haven't used it a lot, but when making a little example with it for my talk, I found it natural enough. The only thing I'm unclear on is why the tutorial uses 'every' - isn't 'enumerate' enough?


Yes.  I just like teaching people to use `every` because it's more generic and works with any `Enumerable`.  It also is designed to sound nice when used with `for` if you read it as "for every childOf "/tmp"".

On 15 Sep 2013 05:49, "Gabriel Gonzalez" <gabri...@gmail.com> wrote:
`dirstream` is pretty close to completion.  Now it supports Windows, and I am guessing it supports OSX by virtue of the Unix support.  Is there somebody here who has a Mac and can verify it works on OSX?

I remember that some of you previously mentioned that you didn't like the name although I'm still partial to `dirstream`.  The main reason is that I'm trying to gradually move away from the `pipes` prefix the further out the libraries get.  However, it will still be under the `Pipes` category of Hackage.

You can find the release candidate here:

https://github.com/Gabriel439/Haskell-DirStream-Library

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Oliver Charles

unread,
Sep 15, 2013, 11:43:00 AM9/15/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Then is it worth even exporting 'enumerate'? Things would be simplified if we just had Select and 'every'

Gabriel Gonzalez

unread,
Sep 15, 2013, 12:00:11 PM9/15/13
to Oliver Charles, haskel...@googlegroups.com
On 09/15/2013 08:43 AM, Oliver Charles wrote:

Then is it worth even exporting 'enumerate'? Things would be simplified if we just had Select and 'every'


The main advantage of `enumerate` is efficiency because GHC knows that it is only unwrapping a newtype so it compiles to a no-op.  However, the performance difference is probably tiny and matters more to implementers than end-users.

John Wiegley

unread,
Sep 15, 2013, 4:30:47 AM9/15/13
to haskel...@googlegroups.com
>>>>> Gabriel Gonzalez <gabri...@gmail.com> writes:

> `dirstream` is pretty close to completion. Now it supports Windows, and I
> am guessing it supports OSX by virtue of the Unix support. Is there
> somebody here who has a Mac and can verify it works on OSX?

The example from the tutorial seems to work just fine. Do you have a more
representative stress test that you think may not work on Mac?

--
John Wiegley
FP Complete Haskell tools, training and consulting
http://fpcomplete.com johnw on #haskell/irc.freenode.net

Gabriel Gonzalez

unread,
Sep 15, 2013, 2:44:22 PM9/15/13
to haskel...@googlegroups.com, John Wiegley

>> `dirstream` is pretty close to completion. Now it supports Windows, and I
>> am guessing it supports OSX by virtue of the Unix support. Is there
>> somebody here who has a Mac and can verify it works on OSX?
> The example from the tutorial seems to work just fine. Do you have a more
> representative stress test that you think may not work on Mac?
>
I usually stress test it on large system directories both to make sure
it doesn't leak and also to see how it handles permissions. It should
route around directories with insufficient permissions by default, so if
you get an exception from any particular directory that is useful to
know and then I can decide whether or not to route around that by default.

Gabriel Gonzalez

unread,
Sep 15, 2013, 2:51:42 PM9/15/13
to haskel...@googlegroups.com
It just occurred to me that there is one problem with `dirstream`, which is that it uses `system-filepath`'s FilePath, but everything else in the `pipes` ecosystem uses the Prelude's `FilePath`.  There are two main solutions I can think of:

* Switch `dirstream` to the Prelude `FilePath` and provide a separate `pipes-filesystem` library which exports `system-filepath` equivalents of `pipes` functions from other libraries
* Switch `pipes-safe` (`readFile`/`writeFile`/`withFile`), and `pipes-bytestring` (`readFile`/`writeFile`) to use `system-filepath`

Right now I slightly lean towards the first solution, but I'd like to hear feedback from you all.

Pierre R

unread,
Sep 15, 2013, 3:51:41 PM9/15/13
to haskel...@googlegroups.com
Not answering your question but I would vote to remove readFile and writeFile from pipes-safe (Prelude) and question the existence of a Prelude module for pipes-safe

Gabriel Gonzalez

unread,
Sep 15, 2013, 3:52:19 PM9/15/13
to haskel...@googlegroups.com, Pierre R
On 09/15/2013 12:51 PM, Pierre R wrote:
> Not answering your question but I would vote to remove readFile and
> writeFile from pipes-safe (Prelude) and question the existence of a
> Prelude module for pipes-safe

Then where does `withFile` go?

Pierre R

unread,
Sep 15, 2013, 4:06:05 PM9/15/13
to haskel...@googlegroups.com, Pierre R
Maybe an extra, utilities or file related package ? Or just inside an utility section on the main pipes-safe package ?

It just feels odd to have an extra Pipes.Safe.Prelude with only one useful abstraction

Gabriel Gonzalez

unread,
Sep 15, 2013, 4:17:16 PM9/15/13
to haskel...@googlegroups.com, Pierre R
On 09/15/2013 01:06 PM, Pierre R wrote:


On Sunday, 15 September 2013 21:52:19 UTC+2, Gabriel Gonzalez wrote:
On 09/15/2013 12:51 PM, Pierre R wrote:
> Not answering your question but I would vote to remove readFile and
> writeFile from pipes-safe (Prelude) and question the existence of a
> Prelude module for pipes-safe

Then where does `withFile` go?

Maybe an extra, utilities or file related package ? Or just inside an utility section on the main pipes-safe package ?

I'll make a deal, then.  When I release `pipes-text` I will remove `readFile` and `writeFile` from `pipes-safe` and move `withFile` to the `Pipes.Safe`. module.

Pierre R

unread,
Sep 15, 2013, 4:19:36 PM9/15/13
to haskel...@googlegroups.com, Pierre R
That said if `readFile`, `writeFile` and `withFile` move out from pipes-parse and pipes-bytestring, the second solution (use `system-filepath`) looks more appealing ;-)

Gabriel Gonzalez

unread,
Sep 15, 2013, 4:21:36 PM9/15/13
to haskel...@googlegroups.com, Pierre R
Then maybe I could generalize the scope of `dirstream` to be general file system operations and rename it to `pipes-system`.  Then I could move all of those inside `pipes-system`.
--

Renzo Carbonara

unread,
Sep 15, 2013, 4:46:29 PM9/15/13
to haskell-pipes, Pierre R
On Sun, Sep 15, 2013 at 5:21 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
Then maybe I could generalize the scope of `dirstream` to be general file system operations and rename it to `pipes-system`.  Then I could move all of those inside `pipes-system`.

This approach sounds good to me. However, perhaps `pipes-filesystem` is a less ambiguous name.


Regards,

Renzo Cabonara.

Gabriel Gonzalez

unread,
Sep 15, 2013, 4:52:10 PM9/15/13
to haskel...@googlegroups.com, Renzo Carbonara, Pierre R
Alright, `pipes-filesystem` it is, then.  I will move `dirstream` to that name and then move `readFile` and `writeFile` out of `pipes-bytestring` to `pipes-filesystem`.  I'll also wait until `pipes-text` is out before releasing `pipes-filesystem` so that it can also include `readFile` and `writeFile` for Text, too.


Regards,

Renzo Cabonara.

Dag Odenhall

unread,
Sep 15, 2013, 6:09:40 PM9/15/13
to haskel...@googlegroups.com, Renzo Carbonara, Pierre R

I always thought a pipes version of the system-fileio package could be interesting.

Andrew Cowie

unread,
Sep 15, 2013, 6:43:57 PM9/15/13
to haskel...@googlegroups.com
On Sun, 2013-09-15 at 11:51 -0700, Gabriel Gonzalez wrote:
> It just occurred to me that there is one problem with `dirstream`,
> which is that it uses `system-filepath`'s FilePath, but everything
> else in the `pipes` ecosystem uses the Prelude's `FilePath`. There
> are two main solutions I can think of:

The other day I was asking what to use for testing the existence of a
file; Haskell has about 8 million different permutations on "file" and
"path", and not very many of them actually do filesystem operations :/

The **filepath** package (which is the one with System.FilePath.Posix)
re-exports System.IO's FilePath.

The **system-filepath** package has it's own opaque FilePath type (and
no I/O operations).

Gregory recommended System.Directory's `doesFileExist` from the
**directory** package, which, as it happens takes a System.IO FilePath
(aka String).

At which point my head exploded and I went to the pub for a beer.

++

I daresay that whichever FilePath you pick will have a significant
effect on normalizing future Haskell programs.

AfC
Sydney


Andrew Cowie

unread,
Sep 15, 2013, 6:44:09 PM9/15/13
to haskel...@googlegroups.com
On Sun, 2013-09-15 at 13:52 -0700, Gabriel Gonzalez wrote:


> Alright, `pipes-filesystem` it is, then.

I love how you can go to sleep with a thread marked "I need to comment
on that one" and wake up the next day to find the community has already
converged on what you were going to suggest.

AfC
Sydney


Gabriel Gonzalez

unread,
Sep 15, 2013, 7:16:15 PM9/15/13
to haskel...@googlegroups.com, Andrew Cowie
So there are three major contenders, each with one major advantage:

* `system-filepath` is more strongly typed and is much better at
avoiding programming errors
* `posix-paths` (based off of `System.Posix.ByteString`'s `RawFilePath)
is the fastest
* The `Prelude`'s `FilePath` is the most widespread and least cumbersome
to use

Generally, my impression is that `system-filepath` is the better of the
three, and I'll explain why I think so.

First, `system-filepath` is the only one that has an opaque internal
implementation. This is the most important features and bodes the best
for its long-term viability. This allows it to make changes to its
internal implementation and define its own type class instances.

Second, although `system-filepath` is slower than `posix-paths` because
it uses `String`s internally, this is fixable in principle and doesn't
require any backwards incompatible changes because the internal
implementation is opaque.

Third, `system-filepath` is maintained by John Millikin who is an
incredibly good maintainer. He is very careful about backwards
compatibility and has long deprecation cycles and pays careful attention
to defining a clear upgrade path when he does make breaking changes. He
also tests every one of his libraries on a wide variety of GHC versions
and operating systems.

Fourth, `system-filepath` has a pretty decent dependency list:

http://packdeps.haskellers.com/reverse/system-filepath

... so it is a safe choice.

Dag Odenhall

unread,
Sep 15, 2013, 7:25:02 PM9/15/13
to haskel...@googlegroups.com

I just wish system-filepath had the phantom types found in the pathtype package. And perhaps a better name. It's crazy how system-filepath is not the package that provides the System.FilePath module. But, alas, too late for that now.



--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscribe@googlegroups.com.

Andrew Cowie

unread,
Sep 15, 2013, 9:08:18 PM9/15/13
to haskel...@googlegroups.com
On Sun, 2013-09-15 at 16:16 -0700, Gabriel Gonzalez wrote:

> Fourth, `system-filepath` has a pretty decent dependency list:
>
> http://packdeps.haskellers.com/reverse/system-filepath
>
> ... so it is a safe choice.

Reading that, I discovered **system-fileio** which uses
**system-filepath**'s Filesystem.Path's FilePath and has a `isFile`.

Which answers the question of how one does I/O operations on these
FilePaths. Hooray. Thanks!

AfC
Sydney


Michael Thompson

unread,
Sep 16, 2013, 2:18:35 AM9/16/13
to haskel...@googlegroups.com

I was fiddling with dirstream and pipes-posix the other day. 
It was mostly fruitless, but I did convince myself that the 
operations in dirstream would be much faster without the fancy
FilePath type -- which has many fields, including a couple of lists. 
That the principal operations are written with something like the 
`RawByteString` type wouldn't, I think, entail that operations
using the (I guess) more humane FilePath type couldn't be exported, 
or also exported. 

I would be more confident in this, if I had had more success 
evading egregious incorrectness while trying to emulate
posix-path's approach.  The functions here 


are about the first thing I came up with -- basically quoting a bit from dirstream 
and a bit from posix-paths. They are almost correct, but some of my
more error-ridden attempts convinced me that `descendentOf` could be much faster
if the Producer were written in a more direct style. (There is for example a lot of
opening and closing of the dirstream which seems unnecessary?)

In any event, if you look at the first bar chart here you can see differences of 
the sort I was seeing 


The names in the chart are partly an attempt to evade peculiarities of the template I was using,
The first function is from 'Real World Haskell' ; it is distinctly faster than dirstream but a catastrophic disaster 
when run on a large file system, an important issue I'm not addressing here. (If you haven't seen it, the once famous
The next two are from Lato's library, the third being his preferred `traverseDirectory` 


It is extremely fast; this criterion module doesn't show it but it is also extremely well behaved
on large file systems. 

The next three are written as Producers, or rather ListT's.  The first and slowest of them 
`pipes.string`, Lato wrote; the second, I wrote using `for`, hoping for a distinct advantage, 
but is negligibly faster here.  Both take about seven times as much time as `traverseDirectory`. 

The third, `pipes.bs` is the slapdash construction I mentioned above. 
It is half as fast as `traverseDirectory` but still more than three times as fast as the 
pipes defined with the fancy filepath type. 

I wouldn't bother mentioning this but I am pretty sure that a properly written Producer
could be about as fast as `paths-posix`, which really is a giant step forward.  

Note also, by the way, the type of 

     traverseDirectory  :: (s -> RawFilePath -> IO s) -> s -> RawFilePath -> IO s

it is almost already in the form needed for a `fold` function that folds a directory-related
 `FoldlM IO RawFilePath b`  over `RawFilePath`s -- i.e., the directory trees that start at them. 

yours Michael

Aleksey Khudyakov

unread,
Sep 16, 2013, 8:41:44 AM9/16/13
to haskel...@googlegroups.com
On 16.09.2013 03:16, Gabriel Gonzalez wrote:
> So there are three major contenders, each with one major advantage:
>
> * `system-filepath` is more strongly typed and is much better at
> avoiding programming errors
> * `posix-paths` (based off of `System.Posix.ByteString`'s `RawFilePath)
> is the fastest
> * The `Prelude`'s `FilePath` is the most widespread and least cumbersome
> to use
>
There is also a correctness issue. File names are not strings. On linux
names are sequences of bytes which may or may not be decoded to strings
according to current locale. No idea about windows.

It's not a contrived problem and does happen in practice. Most common
source is .zip files with cyrillic names created in Windows. If they are
unpacked in linux you get gibberish. KDE apps are (or maybe were) unable
to do anything with them because they apparently assumed that file names
_are_ strings.


And posix-paths is not supported under Windows. I think it removes this
package from list of options.

Gabriel Gonzalez

unread,
Sep 16, 2013, 11:16:53 AM9/16/13
to haskel...@googlegroups.com, Michael Thompson
Yes, John Lato already pointed this out to me. `system-filepath` is
really slow. This is for several reasons, some of which you already
mentioned:

* It uses `String`s and lists internally
* You have to decode to and from strings repeatedly.

However, `posix-paths` has the really huge disadvantage that it doesn't
hide the internal type behind a newtype. If it did then I would more
seriously consider switching.

The other option is having `system-filepath` modify its internal
implementation to be faster.

I'll address the rest of your points inline below:

On 09/15/2013 11:18 PM, Michael Thompson wrote:
>
> I was fiddling with dirstream and pipes-posix the other day.
> It was mostly fruitless, but I did convince myself that the
> operations in dirstream would be much faster without the fancy
> FilePath type -- which has many fields, including a couple of lists.
> That the principal operations are written with something like the
> `RawByteString` type wouldn't, I think, entail that operations
> using the (I guess) more humane FilePath type couldn't be exported,
> or also exported.
>
> I would be more confident in this, if I had had more success
> evading egregious incorrectness while trying to emulate
> posix-path's approach. The functions here
>
> https://github.com/michaelt/pipes-dirstream/blob/master/src/Data/DirStream/Posix.hs
>
> are about the first thing I came up with -- basically quoting a bit
> from dirstream
> and a bit from posix-paths. They are almost correct, but some of my
> more error-ridden attempts convinced me that `descendentOf` could be
> much faster
> if the Producer were written in a more direct style. (There is for
> example a lot of
> opening and closing of the dirstream which seems unnecessary?)
>

It opens and closes each directory once.

Note that you might be able to speed up the inner loop by having the
`loop` worker close over the `path` and `dirp` arguments like the
original code in `dirstream` does.

> In any event, if you look at the first bar chart here you can see
> differences of
> the sort I was seeing
>
> http://michaelt.github.io/bench/report.html
> (the output of
> https://github.com/michaelt/pipes-dirstream/blob/master/benchmarks/Bench.hs
> )
>
> The names in the chart are partly an attempt to evade peculiarities of
> the template I was using,
> The first function is from 'Real World Haskell' ; it is distinctly
> faster than dirstream but a catastrophic disaster
> when run on a large file system, an important issue I'm not addressing
> here. (If you haven't seen it, the once famous
> http://www.reddit.com/r/haskell/comments/cs54i/how_would_you_write_du_in_haskell/
> might be worth a look)
> The next two are from Lato's library, the third being his preferred
> `traverseDirectory`
>
> https://github.com/JohnLato/posix-paths/blob/master/src/System/Posix/Directory/Traversals.hs#L78
>
> It is extremely fast; this criterion module doesn't show it but it is
> also extremely well behaved
> on large file systems.

So I just want to point out one thing. While both `dirstream` and
Lato's code are well-behaved on large filesystems, `dirstream` has an
additional advantage which is that you only traverse as much of the file
system as you actually need, whereas Lato's traversal must complete the
recursive traversal, even if you didn't need all the files.

The simplest example of this is the following `pipes` code:

every (descendentOf "/") >-> P.take 5

That only traverses 5 files and then stops, whereas Lato's code would
have to traverse my entire filesystem. So while `dirstream` is losing
on raw efficiency it has a (huge) performance edge in scenarios where it
can be lazy.

Of course, having raw efficiency would be nice, too.

>
> The next three are written as Producers, or rather ListT's. The first
> and slowest of them
> `pipes.string`, Lato wrote; the second, I wrote using `for`, hoping
> for a distinct advantage,
> but is negligibly faster here. Both take about seven times as much
> time as `traverseDirectory`.
>
> The third, `pipes.bs` is the slapdash construction I mentioned above.
> It is half as fast as `traverseDirectory` but still more than three
> times as fast as the
> pipes defined with the fancy filepath type.
>
> I wouldn't bother mentioning this but I am pretty sure that a properly
> written Producer
> could be about as fast as `paths-posix`, which really is a giant step
> forward.
>

So here's an important benchmark to try. Could you modify your
`pipes.bs` benchmark so that it accepts and produces `system-filepath`
filepaths, but internally uses `RawFilePath` so that the traversal goes
quickly? Whether or not this will work depends on whether conversion
from `RawFilePath`s to `system-filepath` filepaths is efficient or not.

> Note also, by the way, the type of
>
> traverseDirectory :: (s -> RawFilePath -> IO s) -> s ->
> RawFilePath -> IO s
>
> it is almost already in the form needed for a `fold` function that
> folds a directory-related
> `FoldlM IO RawFilePath b` over `RawFilePath`s -- i.e., the directory
> trees that start at them.
>

Yes, but if you have a `ListT` you can already fold it anyway:

P.foldM step begin done (every (descendentOf "someDirectory"))

So just focus on making the `ListT` implementation go efficiently and
then you'll get the fold for free.

Michael Thompson

unread,
Sep 16, 2013, 11:21:46 AM9/16/13
to haskel...@googlegroups.com, Michael Thompson
(this is in response to Alexei's message, I will think about the other points in a minute)


And posix-paths is not supported under Windows. I think it removes this 
package from list of options. 

The present library is pretty emphatically posix only, no? This could I guess be repaired it had a 
separate implementation for `childOf`  suitable for Windows. --Maybe following the model of 
`System.Directory.getContents` , but `yield` ing instead of collecting a list?

    getDirectoryContents :: FilePath -> IO [FilePath]
    getDirectoryContents path =
      modifyIOError ((`ioeSetFileName` path) . 
                     (`ioeSetLocation` "getDirectoryContents")) $ do

    -- Windows branch: 

      bracket
         (Win32.findFirstFile (path </> "*"))
         (\(h,_) -> Win32.findClose h)
         (\(h,fdat) -> loop h fdat [])
      where
        loop :: Win32.HANDLE -> Win32.FindData -> [FilePath] -> IO [FilePath]
        loop h fdat acc = do
           filename <- Win32.getFindDataFileName fdat
           more <- Win32.findNextFile h fdat
           if more
              then loop h fdat (filename:acc)
              else return (filename:acc)


-- which, like the posix branch of the conditional, kind of says "Why 
didn't they start writing this stuff in terms of a proper ListT IO 
years ago?" but never mind.  It seems the windows version is necessarily String based, 
there being no equivalent of the fairly recent `.ByteString`
modules in the `unix` package.  Again it looks to me as though something like 
the `childOf` and `descendentOf` functions should be defined separately for 
different infrastructures.  

Gabriel Gonzalez

unread,
Sep 16, 2013, 11:41:45 AM9/16/13
to haskel...@googlegroups.com, Michael Thompson, Aleksey Khudyakov

dirStream already has support for windows.  It does exactly what you said: it copies the code from the `directory` library and uses `yield` instead.

I agree with Aleksey that Windows support is a must, but we can always use the fast implementation for POSIX and a slower implementation for Windows if necessary, so I don't see any reason why optimizing `dirstream` will interfere with Windows support.

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Michael Thompson

unread,
Sep 16, 2013, 11:48:28 AM9/16/13
to haskel...@googlegroups.com, Michael Thompson, Aleksey Khudyakov


dirStream already has support for windows.  It does exactly what you said: it copies the code from the `directory` library and uses `yield` instead.

I see! As I said, I was fiddling with this a couple days ago -- look again and suddenly there's windows support...

The point about  

>      every (descendentOf "/") >-> P.take 5 
>
> That only traverses 5 files and then stops, whereas Lato's code would 
> have to traverse my entire filesystem.

is indeed completely devastating. Note however that my imaginary `Data.Dirstream.Posix` imports only `(</>)` 
and  `readDirEnt`  from Lato. The latter is pure lower level ick 


I suspect it is the principal source of his seeming advantage, since it is certainly the source of the advantage 
of `Data.Dirstream.Posix.descendentsOf` ?


I will study the other parts of your message. 

Gabriel Gonzalez

unread,
Sep 16, 2013, 1:41:38 PM9/16/13
to haskel...@googlegroups.com, Dag Odenhall, practica...@gmail.com, Aleksey Khudyakov
So I think I will delay `pipes-filesystem` (the renamed `dirstream`) for a while, since it's still not clear what `FilePath` library to use.  Plus, I should delay it anyway until `pipes-text` comes out since the scope of `pipes-filesystem` would include a `readFile` and `writeFile` for `Text`.

It seems that the two major issues that need to be resolved for `FilePath` are:

* efficiency
* type safety

I'm a bit reluctant to propose a competing filepath library, so it really boils down to improving the efficiency of `system-filepath`, ignoring the efficiency loss, or asking John Lato to provide a more strongly typed layer on top of `posix-paths` (and to also make it support windows).


--

Dag Odenhall

unread,
Sep 16, 2013, 4:41:51 PM9/16/13
to Gabriel Gonzalez, haskel...@googlegroups.com, practica...@gmail.com, Aleksey Khudyakov

Call me crazy but I actually do want yet another completely new package for file paths. :-)

Why? Well, none of the existing solutions are good enough and the changes that would be required to satisfy my ideals would break backwards compatibility completely anyway.

My half-baked idea is something like:

  • Use ByteString to represent path segments. Paths are addresses, not content, and we can't decode correctly and reliably in a portable fashion anyway. On WIndows this might mean storing the UTF-16 bytes undecoded.
  • Use phantom types and/or GADTs to track whether a path is relative or absolute and whether it's for a file or a directory. Maybe even go crazy with DataKinds here. I know you lot loathe GHC extensions but this is all in the name of static guarantees, people! :-)
  • Use Category to combine path pieces into a full path. Thinking about this more though, I don't think it works. Probably need a custom operator for the phantom types to work out.
  • Completely rely on pipes for any and all IO operations.
  • Maybe: don't provide any special Text pipes, but instead only ByteString pipes and pipes for encoding/decoding a bytestring pipe using a given encoding or the system locale. This might be taking things too far but honestly the whole idea of a “system locale” encoding is broken and not reliable. It encourages people to write bad code. On the other hand, so does using bytestring when an encoding is in fact known. No easy answers here.
  • Maybe: a QuasiQuoter for file paths that fail for invalid bytes at compile-time.

root :: Path Absolute Directory

dir :: ByteString -> Path c Directory

file :: ByteString -> Path c File

etcPasswd :: Path Absolute File  -- should also be correctly inferred

etcPasswd = file "passwd" . dir "etc" . root

-- Or without Char8, and with order as we're used to
etcPasswd' = root >>> [dir|etc|] >>> [file|passwd|]

-- Of course helpers could easily be provided to make that more convenient

etcPasswd'' = [filePath|/etc/passwd|]  -- we get 'Absolute' from the prefix slash

Maybe I‘m crazy, but these are rough sketches of my ideal Haskell filepath library. I’ve been meaning to try implementing a proof of concept myself, is there any interest at all?

Michael Thompson

unread,
Sep 16, 2013, 11:02:52 PM9/16/13
to haskel...@googlegroups.com, Gabriel Gonzalez, practica...@gmail.com, Aleksey Khudyakov, dag.od...@gmail.com
I defined the obvious functions with `RawByteString` on the inside and `FileSystem.FilePath` on the outside in
https://github.com/michaelt/pipes-dirstream/blob/master/src/Data/DirStream/Posix.hs
I also experimented with removing the `SafeT` lining to see what difference that might be making. 

A somewhat more rational use of criterion shows up in http://michaelt.github.io/bench/report.html
which I think gives a good idea of how the different factors are interacting.  The second battery of
tests -- which just measure 'length' in the sense of Pipes.Prelude -- make a pretty good 
demonstration of my inarticulate suspicion, that the use of the fancy FilePath type
*inside* `descendentsOf` throws a wrench into the works.  If it is just used to enter and exit the system
as in the `pipes bs+fs` functions, then it is just an ordinary expenditure.

I haven't labored much on convincing myself of the correctness or equivalence of the different variants 
of `descendentsOf`.

yours ever, Michael

Gabriel Gonzalez

unread,
Sep 17, 2013, 6:35:32 PM9/17/13
to Michael Thompson, haskel...@googlegroups.com, Aleksey Khudyakov, dag.od...@gmail.com
So if I understand that correctly, then if you only use a
bytestring-based representation and never convert to
`system-filepath:FilePath` we are about 2x as slow as `posix-paths`,
which I think is acceptable. Converting to system-filepath's FilePath
type makes it 3x as slow (assuming you force evaluation of the filepath,
which is what the first benchmark does).

So I think I wouldn't mind waiting for Dag's solution. If he can get
something that performs faster than `system-filepath` and provides
better type safety then I'd be willing to adopt that instead. I think
being 3x as slow is a little bit too much, since my rule of thumb is you
can't claim to be competitive unless you are within a factor of "e".

Dag Odenhall

unread,
Sep 18, 2013, 7:12:52 AM9/18/13
to haskel...@googlegroups.com, Michael Thompson, Aleksey Khudyakov

My Path type might be fast because it uses ByteString and Text and because we can avoid string manipulations, but on the other hand it might be slow because it has more structure. What sort of operations matter for performance here? I'd like to write a benchmark suite and compare against posix-paths. My main concern personally is correctness, and I want to avoid sacrificing that for performance as long as the overhead isn't horrible, but I do love tuning for performance too. And maybe, if necessary, we can provide unsafe operations that libraries like pipes-filesystem can use internally while the user still benefits from the safety properties.

I pushed a preview to GitHub if anyone's interested. Lots of actual path manipulation not yet implemented and it currently requires unix but the design doesn‘t dictate it. There are some obvious optimization targets I haven’t bothered with yet. I ended up adding support for both ByteString and Text because it makes it possible to express “use the system locale” in pure code and deferring the encoding operations, and I‘m not sure we can do without system locale: it has its uses on POSIX, and on Windows we need it because I don’t know a pure way to encode as UTF-16 using the architecture-dependent byte order. But nothing is set in stone here, of course.



--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscribe@googlegroups.com.

Gabriel Gonzalez

unread,
Sep 18, 2013, 10:56:46 AM9/18/13
to haskel...@googlegroups.com, Dag Odenhall, Michael Thompson, Aleksey Khudyakov
On 09/18/2013 04:12 AM, Dag Odenhall wrote:

My Path type might be fast because it uses ByteString and Text and because we can avoid string manipulations, but on the other hand it might be slow because it has more structure. What sort of operations matter for performance here? I'd like to write a benchmark suite and compare against posix-paths. My main concern personally is correctness, and I want to avoid sacrificing that for performance as long as the overhead isn't horrible, but I do love tuning for performance too. And maybe, if necessary, we can provide unsafe operations that libraries like pipes-filesystem can use internally while the user still benefits from the safety properties.


I agree that correctness matters the most.

The directory traversals only need fast concatenation of paths.

Also, it helps to have fast conversion to and from unstructured types (i.e. `Text`/`ByteString/`String`) until we can define FFI bindings to replace existing bindings.� Also, it's important to always have fast encoding and decoding to `ByteString`s so that the FFI bindings will be fast.

I pushed a preview to GitHub if anyone's interested. Lots of actual path manipulation not yet implemented and it currently requires unix but the design doesn�t dictate it. There are some obvious optimization targets I haven�t bothered with yet. I ended up adding support for both ByteString and Text because it makes it possible to express �use the system locale� in pure code and deferring the encoding operations, and I�m not sure we can do without system locale: it has its uses on POSIX, and on Windows we need it because I don�t know a pure way to encode as UTF-16 using the architecture-dependent byte order. But nothing is set in stone here, of course.



I really like the idea of also specifying a `Remote` root.� That never occurred to me but it makes a lot of sense.

One thing I'd recommend is using `UNPACK` wherever possible, such as the fields of each constructor of the `Component` type.

Also, if you want to improve speed you should minimize bytestring and text concatenations.� The trick to doing this is to use the `B.concat` operator when combining more than 2 bytestrings together into a single string, like this:

��� windows :: Path r n Decoded -> Text
��� windows path = case path of
��� ��� RootDirectory -> "\\"
��� ��� DriveName c -> B.concat [component c, ":", "\\"]
��� ��� ...

Alternatively, you can go through the blaze builder interface where concatenation is basically free since it defers building the string until you need it.

The other thing I like about your interface is that it is syntactically much cleaner than `system-filepath`.


On Wed, Sep 18, 2013 at 12:35 AM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
So if I understand that correctly, then if you only use a bytestring-based representation and never convert to `system-filepath:FilePath` we are about 2x as slow as `posix-paths`, which I think is acceptable. �Converting to system-filepath's FilePath type makes it 3x as slow (assuming you force evaluation of the filepath, which is what the first benchmark does).

So I think I wouldn't mind waiting for Dag's solution. �If he can get something that performs faster than `system-filepath` and provides better type safety then I'd be willing to adopt that instead. �I think being 3x as slow is a little bit too much, since my rule of thumb is you can't claim to be competitive unless you are within a factor of "e".


On 09/16/2013 08:02 PM, Michael Thompson wrote:
I defined the obvious functions with `RawByteString` on the inside and `FileSystem.FilePath` on the outside in
https://github.com/michaelt/pipes-dirstream/blob/master/src/Data/DirStream/Posix.hs
I also experimented with removing the `SafeT` lining to see what difference that might be making.

A somewhat more rational use of criterion shows up in http://michaelt.github.io/bench/report.html
which I think gives a good idea of how the different factors are interacting. �The second battery of

tests -- which just measure 'length' in the sense of Pipes.Prelude -- make a pretty good
demonstration of my inarticulate suspicion, that the use of the fancy FilePath type
*inside* `descendentsOf` throws a wrench into the works. �If it is just used to enter and exit the system

as in the `pipes bs+fs` functions, then it is just an ordinary expenditure.

I haven't labored much on convincing myself of the correctness or equivalence of the different variants
of `descendentsOf`.

yours ever, Michael

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Dag Odenhall

unread,
Sep 18, 2013, 12:09:38 PM9/18/13
to haskel...@googlegroups.com, Michael Thompson, Aleksey Khudyakov

On Wed, Sep 18, 2013 at 4:56 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

The directory traversals only need fast concatenation of paths.

I imagine “concatenation” can be really fast with my library because it doesn‘t actually manipulate any string type - it’s basically (:). But I'm not sure if pipes-filetype can work with my Path directly without encoding and rendering it to a ByteString anyway? Perhaps with low-level bindings to a filesystem library; directly traversing the inodes etc, but then you wonder if that's really going to be faster than writing a bytestring and calling out to a higher-level C library that works with *char paths. It would certainly be interesting to have comparative benchmarks here!

Also, it helps to have fast conversion to and from unstructured types (i.e. `Text`/`ByteString/`String`) until we can define FFI bindings to replace existing bindings.  Also, it's important to always have fast encoding and decoding to `ByteString`s so that the FFI bindings will be fast.

That's in part why I have the Encoding kind. It means we can encode the initial path once, and then when we traverse directories under it we can keep things in the encoded state which means no further decoding or encoding necessary unless the user wants to work with the system locale and Text. We might then be able to convert to unstructured types quickly with concatenation/builders.

One problem here though is that encode is “lossy” in the sense that we lose the knowledge of what parts have a known encoding and what parts are raw bytestrings, so the only safe way to go back to Mixed is to transform all components to ByteString - even those that were Text before. I don‘t think there’s any way around this though: any idea for working around it would probably amount to something similar to “don't encode in the first place” (because this is exactly the problem that Mixed solves). I suppose we could turn Encode into Encode{Text,ByteString} :: ByteString -> Component Encoded but my gut says it feels wrong and unclean. Maybe if it has a significant effect on performance…

One thing I'd recommend is using `UNPACK` wherever possible, such as the fields of each constructor of the `Component` type.

In my understanding UNPACK can be hurtful, at least for larger types (and in this context even a small path component is “large”). This is the kind of thing I want to benchmark properly first.

Also, if you want to improve speed you should minimize bytestring and text concatenations.  The trick to doing this is to use the `B.concat` operator when combining more than 2 bytestrings together into a single string, like this:


    windows :: Path r n Decoded -> Text
    windows path = case path of
        RootDirectory -> "\\"

        DriveName c -> B.concat [component c, ":", "\\"]
        ...

Alternatively, you can go through the blaze builder interface where concatenation is basically free since it defers building the string until you need it.

Yeah; using the bytestring and text Builder monoids was one of the things I was thinking of when I wrote “obvious optimization targets”. On the other hand they build lazy types and are optimized for larger chunks, so, again, benchmarks!

Another one is initializing the system locale once per call to the encode and decode functions, moving the recursion to a local binding. We still need to initialize it for each “external” call because the system locale can be changed at runtime.

Dag Odenhall

unread,
Sep 18, 2013, 1:05:04 PM9/18/13
to haskel...@googlegroups.com

On Wed, Sep 18, 2013 at 5:39 PM, James Deikun <ja...@place.org> wrote:

Comments:

It seems like you've thought through a lot of complications of
processing paths in pure code--like having a special anchor for 'home'
so one doesn't need to know where the home directory actually is when
constructing an anchored path.

Theoretically, unanchored paths should form a monoid that acts on
anchored paths from the right.  However, you seem to have given up on
the idea of unanchored paths and unified them with relative paths, and
introduced a distinction between directory and file paths instead.  I
wonder why this is?  I don't see much obvious use for the file/directory
distinction; there's no way to make the filesystem respect it, and it's
not as if the special functionality makes sense only for files.
Directories with extensions aren't common in the traditional unix world
or windows, but they're all over mac os x.  I suppose it's meant to
represent intention, of the sort that might be signalled by ending a
path with a '/' in a `mv` invocation, but it seems a little heavyweight
at first sight.

The idea is that a File is a leaf, so the types will prevent you from appending a path to a file. FileExtension could easily be changed to allow directories, if people think that's a good idea; the main point here is </> which is also why it can't be a monoid for file paths. Basically the whole point of my library is to prevent logic errors using the type system, and as always with types they will prevent things that could be valid under some circumstances. If we do away with that we might as well be doing newtype Path = Path [Either ByteString Text]. :-)

Now you instead have a monoid of relative directory paths that acts on
anchored directory paths from the right and on relative file paths from
the left, and there's a gap in the theory where you need another special
operation to fuse relative file paths to anchored directory paths.

I agree this feels a bit unelegant. It's heritage from the pathtype package I‘m basing my work on. I’m considering changing it so Current really means current working directory and you can't prepend to it etc, and add another Reference type for what I think you mean by “unanchored” paths.

Alternately, I suppose you could make all of it into a semigroupoid and
have only a single composition operator (but still have a monoid
instance for relative directory paths).  This would cover all uses of
your `</>` under a single algebraic roof, which is nice, and
semigroupoids are actually more established than left and right monoid
actions in haskell anyway, so it's probably not as big of a loss to have
the file/directory distinction here as it might be in the math world.

This sounds interesting; I'll look into it.

`<//>` and `<~/>` seem too confusing to me to be worth any finger typing
they would save, and encouraging hardcoded `root` seems bad for
portability when windows doesn't have a single root.  (There's obviously
too much investment in windows compatibility in the library as it stands
to encourage people to screw it up with things like that.)

They may well be removed. Mostly I noticed I was typing root </> and home </> a lot in GHCi while testing it, but they do feel rather backwards in this infix form.

Windows actually sort of does have a “root”: it has “root relative to current drive”. However, this is arguably not an “absolute” path but my library currently treats it as one. Portability is hard. Part of the problem here is also that I want to have a single API for manipulating paths for either environment, on either environment. That is, I want to be able to express Windows paths on a Linux system. Other libraries usually make these things compile-time conditional, and you can only work with local paths.

Both `windows` and `posix` will blithely produce paths that can't be
dereferenced successfully on the systems in question, mostly because
path components are never checked for illegal or special characters, and
because on windows there are special entire path components like 'com1'
and 'nul'.

Yes, I haven‘t gotten to sanitizing path components yet but it’s certainly part of the plan.

Also, `windows` and `posix` aren't really specific enough--if you're
passing a path to posix file operations on a unix system, the
conventions used for home-anchored and remote-anchored paths won't work;
for windows syscalls the remote-anchored paths will work but the
home-anchored will not (iirc); commands like scp, rsync, or (sometimes)
tar will accept the remote paths as arguments when you exec them on unix
but generally only interpretation by a shell will expand initial '~';
most contexts on unix won't accept remote-anchored paths period, and
often the ones that will require a different encoding of them than the
bog-standard 'hostname:path'.  Also you probably need something to
create 'file' urls, which work differently from posix paths.

I was wondering how long it would take for someone to comment on this. ;-)

posix actually started out as a hand-written Show Path instance with a hard-coded UTF-8 assumption. Then I wanted a normal Show which I first wrote by hand before I realized I could derive it with StandaloneDeriving, but anyway at this point posix was a separate render function that I then started to generalize to take an encoding function and a data Representation = Posix | Windows

Anyway what I‘m trying to hint at here is that they’re only meant to be “pretty printers” and as a sort of sanity check for myself that both POSIX and Windows systems could interpret everything in a Path. They may not even be part of the public API in the end. I haven't even gotten to interacting with actual file systems yet.

Also regarding Remote: I felt I could add things like that because they‘re tracked in the types. If you have a function that couldn’t possibly do anything with a remote path, well, you can make the type system prevent people from passing you remote paths! Similarly, you can forbid “home-anchored” paths by only accepting Path Root n e and then the user has to call absolute on it first.

(Examples of different remote path syntax: besides file urls, rsync
accepts a remote path with a double : to mean something different; other
programs accept other forms of uris as remote paths; some systems still
have distributed filesystems like afs or coda that place remote files
somewhere under the filesystem root; gnome vfs allows a more
windows-like form of remote access as long as you use their special
functions instead of raw syscalls, but it works through a uri-like
syntax.)

Yeah, again: posix and windows are meant for pretty-printing, for debugging; not for passing to other things, and I haven't gotten to that yet.

It seems like it would be better to have a larger suite of conversion
functions that checks for path validity, using escaping when possible
and flagging an error when it's impossible; and is typed to accept only
paths anchored in a way that the receiving context can understand.  This
might add up to a lot of conversion functions, unfortunately, but that's
the price of correctness first.  At the very least, there should be
functions that construct paths that can be safely and correctly passed
to raw posix syscalls/C library functions, to raw win32 syscalls, and as
arguments for `System.Process` on the current system.  (This last might
require IO, but that's not too bad since you're almost certainly about
to do IO with the result anyway.)

Absolutely! I envision it something like:


data Representation = Posix | Windows | URI | Rsync | ...

class RenderPath e where

    type RenderedPath e

    render :: Representation -> Path r n e -> RenderedPath e

instance RenderPath Encoded where

    type RenderedPath Encoded = ByteString

    render Posix = posix

    render ...

instance RenderPath Decoded where

    type RenderedPath Decoded = Text

    render Windows = windows

    render ...

It would also be possible to take this further to statically prevent things like render Posix home.

I actually think we can avoid IO because of the Encoding kind: we do the sanitation in encode and decode which run in IO and then don't write any RenderPath Mixed instance, which we can't really do anyway, at least not without hard-coding some encoding.

It's great that you're doing this work on a path type; it's so hard to
get a good path type because there are all these annoying details and
not many people even want to work on it except people who are
happy-go-lucky enough to not notice the problems they're getting
themselves into; you seem like one of the rare people who actually
thinks about this problem and still wants to tackle it.  (I was actually
impressed the haskell ecosystem even had something like system-filepath,
and as you can probably tell i'm not that easy to impress.  :)

Thanks for the kind words and the feedback!

Gabriel Gonzalez

unread,
Sep 18, 2013, 1:06:25 PM9/18/13
to haskel...@googlegroups.com, James Deikun, dag.od...@gmail.com

On 09/18/2013 08:39 AM, James Deikun wrote:
> On Wed, 2013-09-18 at 13:12 +0200, Dag Odenhall wrote:
> Comments:
>
> It seems like you've thought through a lot of complications of
> processing paths in pure code--like having a special anchor for 'home'
> so one doesn't need to know where the home directory actually is when
> constructing an anchored path.
>
> Theoretically, unanchored paths should form a monoid that acts on
> anchored paths from the right.

Actually, I was thinking some more on this, but why not unify
`Reference` and `Node` into a single data type (let me hypothetically
call it `PWD`):

data PWD = Root | Home | Remote | Directory | File

The implication is that `Directory` would be distinct from
`Home`/`Root`/`Remote`, and you would need to explicitly convert them to
`Directory` (see below).

Then you can make a category where the objects are members of `PWD` and
paths are the morphisms:

In other words, the `Path` type would instead be:

data Path :: Encoding -> PWD -> PWD -> * where ...

instance Category (Path encoding) where ...

Then `(</>)` would just be `(.)`, and `cwd` would be `id`.

Now, the "casting" from `Remote`/`Home`/`Root` to `Directory` is
actually not a problem at all because you can define:

home :: Path e Home Directory

root :: Path e Root Directory

remote :: Path e Remote Directory

Similarly, you would have:

dir :: Text -> Path Mixed Directory Directory

file :: Text -> Path Mixed Directory File

So then you could just write something like:

home >>> dir "gabriel" >>> file ".bashrc":: Path Mixed Home File

Dag Odenhall

unread,
Sep 18, 2013, 1:17:19 PM9/18/13
to haskel...@googlegroups.com, James Deikun
This looks very promising; I'll give it a try in a branch.


Bardur Arantsson

unread,
Sep 18, 2013, 1:44:18 PM9/18/13
to haskel...@googlegroups.com
On 2013-09-18 17:39, James Deikun wrote:
> Directories with extensions aren't common in the traditional unix world
> or windows, but they're all over mac os x.

Just a minor nit, but look at /etc under Linux. There's usually quite a
few directories named

*.d

in there :).

Of course, it's true that one usually does treat these "semantically" in
the sense that all *.d directories are treated alike, like say, *.jpg files.

Anyway...

Regards,


Dag Odenhall

unread,
Sep 18, 2013, 1:47:44 PM9/18/13
to haskel...@googlegroups.com

Yeah, I think the main point of <.> is that you actually want to append extensions to filenames, for example let outfile = infile <.> "gz" in an implementation of gzip, say, but I don‘t think this is commonly done with directories where it’s more the case that the dot is part of the name and mostly fixed. Thoughts?



--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Bardur Arantsson

unread,
Sep 18, 2013, 2:07:52 PM9/18/13
to haskel...@googlegroups.com
On 2013-09-18 19:44, Bardur Arantsson wrote:
> On 2013-09-18 17:39, James Deikun wrote:
>> Directories with extensions aren't common in the traditional unix world
>> or windows, but they're all over mac os x.
>
> Just a minor nit, but look at /etc under Linux. There's usually quite a
> few directories named
>
> *.d
>
> in there :).
>
> Of course, it's true that one usually does treat these "semantically" in
^^^^ does NOT

Michael Thompson

unread,
Sep 18, 2013, 2:30:03 PM9/18/13
to haskel...@googlegroups.com

This `paths` package is a really interesting idea. I added a little benchmark to the others 
I was making. The result was this:  http://michaelt.github.io/bench/pathsreport.html which 
is not too bad considering what is happening.

For `childOf` and `descendentOf` it is using `childOfCooked` and `desc` defined here


It was a little confusing, should the ideal `childOf` function return some type existentially quantified 
over the Directory/File index ?  Or should such a thing be in `descendentsOf`?  I suppose the
ideal `descendentsOf` would be somehow leveraging all the available type information? 
And checking for it?  Here I was just deciding if something was a DirectoryPath and ignoring the
FilePath / FileExtension distinction.  More checks of the form `(fmap Posix.isDirectory (Posix.getFileStatus child))`
would probably start getting expensive.

Anyway, a look at the definitions may give you an idea of how confused your new users are going to be ;)

yours Michael 

Michael Thompson

unread,
Sep 18, 2013, 2:44:45 PM9/18/13
to haskel...@googlegroups.com

I should have added that the interesting lines of the two tests are `pipes fs` `pipes paths` and `pipes bs`:
`pipes fs` uses the `descendentsOf` from the  existing `dirstream` library, `pipes paths` substitutes Dag's 
type for `FileSystem.FilePath` and `pipes bs` just uses Bytestring, importing a helper from `posix-paths`

Dag Odenhall

unread,
Sep 18, 2013, 4:25:27 PM9/18/13
to haskel...@googlegroups.com, James Deikun

Ok so after some experimenting with this:

  1. We need a poly-kinded Category, which GHC 7.8 will have but we don‘t have currently. This isn’t a big problem because just like with pipes we can use Category for reasoning even if we don't (yet) have an actual instance.
  2. Your suggestion to “cast” to Directory doesn't quite work out because we need IO to resolve the working directory and the user home, which is exactly why I put abstractions into Path to let you refer to those places in pure code.
  3. I tried to work around that problem:
data Branch = Root | Drive | Home | Working | Remote | Tree

data Node = Directory Branch | File

data Path :: Encoding -> Node -> Node -> * where
    RootDirectory :: Path e (Directory Root) b
    HomeDirectory :: Path e (Directory Home) b
    ...

But now the types don't prevent me from doing things like home </> root!

So, I'm not sure how to make this a Category while keeping all the safety properties and allowing to express everything in pure code. Any ideas?



Gabriel Gonzalez

unread,
Sep 18, 2013, 4:30:37 PM9/18/13
to haskel...@googlegroups.com, James Deikun
On Wed, Sep 18, 2013 at 1:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ok so after some experimenting with this:

  1. We need a poly-kinded Category, which GHC 7.8 will have but we don‘t have currently. This isn’t a big problem because just like with pipes we can use Category for reasoning even if we don't (yet) have an actual instance.
  2. Your suggestion to “cast” to Directory doesn't quite work out because we need IO to resolve the working directory and the user home, which is exactly why I put abstractions into Path to let you refer to those places in pure code.
  3. I tried to work around that problem:
data Branch = Root | Drive | Home | Working | Remote | Tree

data Node = Directory Branch | File

data Path :: Encoding -> Node -> Node -> * where
    RootDirectory :: Path e (Directory Root) b
    HomeDirectory :: Path e (Directory Home) b
    ...


Shouldn't it be this?

    RootDirectory :: Path e (Directory Root) (Directory Tree)
 
I'm assuming that you are using `Directory Tree` to represent what I was calling `Directory`.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Dag Odenhall

unread,
Sep 18, 2013, 4:48:06 PM9/18/13
to haskel...@googlegroups.com, James Deikun

Well it's perfectly valid to have a File under root, yeah? And maybe we could live without that but then the same problem occurs for HomeDirectory where we certainly want to be able to refer to files! I have an idea though that I'm experimenting with now.

Gabriel Gonzalez

unread,
Sep 18, 2013, 4:51:12 PM9/18/13
to haskel...@googlegroups.com
My understanding was that if you wanted a file under root you would write:

    root >>> file "Myfile.txt"

... where:

    root :: Path e (Directory Root) (Directory Tree)
    root = RootDirectory
    
    file :: Text -> Path e (Directory Tree) File
    file = ...

Dag Odenhall

unread,
Sep 18, 2013, 4:55:11 PM9/18/13
to haskel...@googlegroups.com

Ah of course, let me try that…

Dag Odenhall

unread,
Sep 18, 2013, 5:25:33 PM9/18/13
to haskel...@googlegroups.com

Alright I think I have this working now, with the same properties as before! The remaining issue seems to be a GHC bug: with -Wall it says my patterns are non-exhaustive suggesting cases that wouldn‘t type check. I haven’t ported all the code yet but at least the Category (with PolyKinds) seems to work out — cool! It's what I wanted from the start anyway, so thanks for the pointers.

Gabriel Gonzalez

unread,
Sep 18, 2013, 5:27:49 PM9/18/13
to haskel...@googlegroups.com, Dag Odenhall
GHC always complains about non-exhaustive checks when using GADTs.  It's a known bug.  See: http://ghc.haskell.org/trac/ghc/ticket/6124

James Deikun

unread,
Sep 18, 2013, 11:39:57 AM9/18/13
to haskel...@googlegroups.com
On Wed, 2013-09-18 at 13:12 +0200, Dag Odenhall wrote:
> My Path type might be fast because it uses ByteString and Text and
> because we can avoid string manipulations, but on the other hand it
> might be slow because it has more structure. What sort of operations
> matter for performance here? I'd like to write a benchmark suite and
> compare against posix-paths. My main concern personally is
> correctness, and I want to avoid sacrificing that for performance as
> long as the overhead isn't horrible, but I do love tuning for
> performance too. And maybe, if necessary, we can provide unsafe
> operations that libraries like pipes-filesystem can use internally
> while the user still benefits from the safety properties.
>
> I pushed a preview to GitHub if anyone's interested. Lots of actual
> path manipulation not yet implemented and it currently requires unix
> but the design doesn‘t dictate it. There are some obvious optimization
> targets I haven’t bothered with yet. I ended up adding support for
> both ByteString and Text because it makes it possible to express “use
> the system locale” in pure code and deferring the encoding operations,
> and I‘m not sure we can do without system locale: it has its uses on
> POSIX, and on Windows we need it because I don’t know a pure way to
> encode as UTF-16 using the architecture-dependent byte order. But
> nothing is set in stone here, of course.

Comments:

It seems like you've thought through a lot of complications of
processing paths in pure code--like having a special anchor for 'home'
so one doesn't need to know where the home directory actually is when
constructing an anchored path.

Theoretically, unanchored paths should form a monoid that acts on
anchored paths from the right. However, you seem to have given up on
the idea of unanchored paths and unified them with relative paths, and
introduced a distinction between directory and file paths instead. I
wonder why this is? I don't see much obvious use for the file/directory
distinction; there's no way to make the filesystem respect it, and it's
not as if the special functionality makes sense only for files.
Directories with extensions aren't common in the traditional unix world
or windows, but they're all over mac os x. I suppose it's meant to
represent intention, of the sort that might be signalled by ending a
path with a '/' in a `mv` invocation, but it seems a little heavyweight
at first sight.

Now you instead have a monoid of relative directory paths that acts on
anchored directory paths from the right and on relative file paths from
the left, and there's a gap in the theory where you need another special
operation to fuse relative file paths to anchored directory paths.

Alternately, I suppose you could make all of it into a semigroupoid and
have only a single composition operator (but still have a monoid
instance for relative directory paths). This would cover all uses of
your `</>` under a single algebraic roof, which is nice, and
semigroupoids are actually more established than left and right monoid
actions in haskell anyway, so it's probably not as big of a loss to have
the file/directory distinction here as it might be in the math world.

`<//>` and `<~/>` seem too confusing to me to be worth any finger typing
they would save, and encouraging hardcoded `root` seems bad for
portability when windows doesn't have a single root. (There's obviously
too much investment in windows compatibility in the library as it stands
to encourage people to screw it up with things like that.)

Both `windows` and `posix` will blithely produce paths that can't be
dereferenced successfully on the systems in question, mostly because
path components are never checked for illegal or special characters, and
because on windows there are special entire path components like 'com1'
and 'nul'.

Also, `windows` and `posix` aren't really specific enough--if you're
passing a path to posix file operations on a unix system, the
conventions used for home-anchored and remote-anchored paths won't work;
for windows syscalls the remote-anchored paths will work but the
home-anchored will not (iirc); commands like scp, rsync, or (sometimes)
tar will accept the remote paths as arguments when you exec them on unix
but generally only interpretation by a shell will expand initial '~';
most contexts on unix won't accept remote-anchored paths period, and
often the ones that will require a different encoding of them than the
bog-standard 'hostname:path'. Also you probably need something to
create 'file' urls, which work differently from posix paths.

(Examples of different remote path syntax: besides file urls, rsync
accepts a remote path with a double : to mean something different; other
programs accept other forms of uris as remote paths; some systems still
have distributed filesystems like afs or coda that place remote files
somewhere under the filesystem root; gnome vfs allows a more
windows-like form of remote access as long as you use their special
functions instead of raw syscalls, but it works through a uri-like
syntax.)

It seems like it would be better to have a larger suite of conversion
functions that checks for path validity, using escaping when possible
and flagging an error when it's impossible; and is typed to accept only
paths anchored in a way that the receiving context can understand. This
might add up to a lot of conversion functions, unfortunately, but that's
the price of correctness first. At the very least, there should be
functions that construct paths that can be safely and correctly passed
to raw posix syscalls/C library functions, to raw win32 syscalls, and as
arguments for `System.Process` on the current system. (This last might
require IO, but that's not too bad since you're almost certainly about
to do IO with the result anyway.)

James Deikun

unread,
Sep 18, 2013, 3:42:36 PM9/18/13
to haskel...@googlegroups.com
On Wed, 2013-09-18 at 19:47 +0200, Dag Odenhall wrote:
> Yeah, I think the main point of <.> is that you actually want to
> append extensions to filenames, for example let outfile = infile <.>
> "gz" in an implementation of gzip, say, but I don‘t think this is
> commonly done with directories where it’s more the case that the dot
> is part of the name and mostly fixed. Thoughts?

let outfile = indir <.> "zip" ?


> Of course, it's true that one usually does[ not] treat these
> "semantically" in
> the sense that all *.d directories are treated alike, like
> say, *.jpg files.

Yeah, that's why i referenced the mac os '.app'/'.framework' and such
(inherited from NeXTstep).



Dag Odenhall

unread,
Sep 19, 2013, 7:33:24 AM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

New problem! cwd doesn't actually work as id because it needs to be Path e a a and in fact as hinted at above I changed cwd to be more like root and home. So I added a CurrentPath :: Path e a a to make the fake Category work out but then that means you can do things like absolute (id :: Path (Directory Home) (Directory Home) which type checks but results in this error:

*** Exception: src/System/Path/Internal.hs:(147,9)-(150,67): Non-exhaustive patterns in function relative

where

relative :: Path e (Directory Home) b -> Path e (Directory Tree) b
relative CurrentPath = ???

I don't have any way to construct a value of type Path e (Directory Tree) (Directory Home)?

Dag Odenhall

unread,
Sep 19, 2013, 8:43:17 AM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Maybe what I have is actually a semigroupoid as James suggested. I can then have CurrentDirectory :: Path e (Directory Tree) (Directory Tree) which will prevent it from type check in relative

Dag Odenhall

unread,
Sep 19, 2013, 10:21:57 AM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Alright that‘s better, but now I discovered that the (remaining) warnings for non-exhaustive patterns aren’t simply “buggy”: they actually do type check (even if the only possible value is bottom). This means for example undefined </> root type checks, whereas with my master branch:

>>> :t undefined </> root

<interactive>:1:15:
    Couldn't match type 'Root with 'Current
    Expected type: Path 'Current 'Directory e0
      Actual type: Path 'Root 'Directory e0
    In the second argument of `(</>)', namely `root'
    In the expression: undefined </> root

Dag Odenhall

unread,
Sep 19, 2013, 10:38:59 AM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

And if I try to keep the “morphism” structure (what do you call it?) but type-restricting it so it can't even be a Semigroupoid:

(</>) :: Path e (Directory t) b -> Path e b (Directory Tree) -> Path e (Directory t) (Directory Tree)

Well now we can't append files to paths!

Gabriel Gonzalez

unread,
Sep 19, 2013, 12:35:03 PM9/19/13
to dag.od...@gmail.com, haskel...@googlegroups.com
I haven't type-checked this, but can't you just add another case to `absolute`, defined as:

��� absolute CurrentPath = do
� �� �� Just homeDir <- Posix.getEnv "HOME"
������� let dirs = tail (Posix.splitDirectories homeDir)
����������� rootPath = mconcat (map (DirectoryPath cur . ByteString) dirs)
� �� �� return rootPath


On 09/19/2013 04:33 AM, Dag Odenhall wrote:

Gabriel Gonzalez

unread,
Sep 19, 2013, 12:40:05 PM9/19/13
to dag.od...@gmail.com, haskel...@googlegroups.com
Never mind.� I see the flaw in what I suggested.� The issue is that you have no morphism from `Root` to `Home`.

I think the real issue is that you don't have such a morphism.� Why not provide one as another term in your `Path` type, i.e.: `RootToHome` or something like that?


On 09/19/2013 04:33 AM, Dag Odenhall wrote:

New problem! cwd doesn't actually work as id because it needs to be Path e a a and in fact as hinted at above I changed cwd to be more like root and home. So I added a CurrentPath :: Path e a a to make the fake Category work out but then that means you can do things like absolute (id :: Path (Directory Home) (Directory Home) which type checks but results in this error:

*** Exception: src/System/Path/Internal.hs:(147,9)-(150,67): Non-exhaustive patterns in function relative

where

relative :: Path e (Directory Home) b -> Path e (Directory Tree) b
relative CurrentPath = ???

I don't have any way to construct a value of type Path e (Directory Tree) (Directory Home)?

On Wed, Sep 18, 2013 at 11:27 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
GHC always complains about non-exhaustive checks when using GADTs. �It's a known bug. �See:�http://ghc.haskell.org/trac/ghc/ticket/6124


On Wed, Sep 18, 2013 at 2:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Alright I think I have this working now, with the same properties as before! The remaining issue seems to be a GHC bug: with -Wall it says my patterns are non-exhaustive suggesting cases that wouldn�t type check. I haven�t ported all the code yet but at least the Category (with PolyKinds) seems to work out � cool! It's what I wanted from the start anyway, so thanks for the pointers.



On Wed, Sep 18, 2013 at 10:55 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ah of course, let me try that�



On Wed, Sep 18, 2013 at 10:51 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
My understanding was that if you wanted a file under root you would write:

� � root >>> file "Myfile.txt"

... where:

� � root :: Path e (Directory Root) (Directory Tree)
� � root = RootDirectory
� ��
� � file :: Text -> Path e (Directory Tree) File
� � file = ...


On Wed, Sep 18, 2013 at 1:48 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Well it's perfectly valid to have a File under root, yeah? And maybe we could live without that but then the same problem occurs for HomeDirectory where we certainly want to be able to refer to files! I have an idea though that I'm experimenting with now.

On Wed, Sep 18, 2013 at 10:30 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
On Wed, Sep 18, 2013 at 1:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ok so after some experimenting with this:

  1. We need a poly-kinded Category, which GHC 7.8 will have but we don�t have currently. This isn�t a big problem because just like with pipes we can use Category for reasoning even if we don't (yet) have an actual instance.
  2. Your suggestion to �cast� to Directory doesn't quite work out because we need IO to resolve the working directory and the user home, which is exactly why I put abstractions into Path to let you refer to those places in pure code.
  1. I tried to work around that problem:
data Branch = Root | Drive | Home | Working | Remote | Tree

data Node = Directory Branch | File

data Path :: Encoding -> Node -> Node -> * where
    RootDirectory :: Path e (Directory Root) b
    HomeDirectory :: Path e (Directory Home) b
    ...
Shouldn't it be this?

� � RootDirectory :: Path e (Directory Root) (Directory Tree)
�
I'm assuming that you are using `Directory Tree` to represent what I was calling `Directory`.

But now the types don't prevent me from doing things like home </> root!

So, I'm not sure how to make this a Category while keeping all the safety properties and allowing to express everything in pure code. Any ideas?

On Wed, Sep 18, 2013 at 7:06 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

On 09/18/2013 08:39 AM, James Deikun wrote:
On Wed, 2013-09-18 at 13:12 +0200, Dag Odenhall wrote:
Comments:

It seems like you've thought through a lot of complications of
processing paths in pure code--like having a special anchor for 'home'
so one doesn't need to know where the home directory actually is when
constructing an anchored path.

Theoretically, unanchored paths should form a monoid that acts on
anchored paths from the right.

Actually, I was thinking some more on this, but why not unify `Reference` and `Node` into a single data type (let me hypothetically call it `PWD`):

� � data PWD = Root | Home | Remote | Directory | File


The implication is that `Directory` would be distinct from `Home`/`Root`/`Remote`, and you would need to explicitly convert them to `Directory` (see below).

Then you can make a category where the objects are members of `PWD` and paths are the morphisms:

In other words, the `Path` type would instead be:

� � data Path :: Encoding -> PWD -> PWD -> * where ...

� � instance Category (Path encoding) where ...


Then `(</>)` would just be `(.)`, and `cwd` would be `id`.

Now, the "casting" from `Remote`/`Home`/`Root` to `Directory` is actually not a problem at all because you can define:

� � home :: Path e Home Directory

� � root :: Path e Root Directory

� � remote :: Path e Remote Directory

Similarly, you would have:

� � dir :: Text -> Path Mixed Directory Directory

� � file :: Text -> Path Mixed Directory File


So then you could just write something like:

� � home >>> dir "gabriel" >>> file ".bashrc":: Path Mixed Home File


--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Gabriel Gonzalez

unread,
Sep 19, 2013, 1:00:26 PM9/19/13
to dag.od...@gmail.com, haskel...@googlegroups.com
Yet another option is that instead of adding a `RootToHome` term in your `Path` type you instead make it so that `RootDirectory` can terminate either on an ordinary directory or the home directory.� This would require refactoring your `Node` type to group home and ordinary directory together.


On 09/19/2013 04:33 AM, Dag Odenhall wrote:

New problem! cwd doesn't actually work as id because it needs to be Path e a a and in fact as hinted at above I changed cwd to be more like root and home. So I added a CurrentPath :: Path e a a to make the fake Category work out but then that means you can do things like absolute (id :: Path (Directory Home) (Directory Home) which type checks but results in this error:

*** Exception: src/System/Path/Internal.hs:(147,9)-(150,67): Non-exhaustive patterns in function relative

where

relative :: Path e (Directory Home) b -> Path e (Directory Tree) b
relative CurrentPath = ???

I don't have any way to construct a value of type Path e (Directory Tree) (Directory Home)?

On Wed, Sep 18, 2013 at 11:27 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
GHC always complains about non-exhaustive checks when using GADTs. �It's a known bug. �See:�http://ghc.haskell.org/trac/ghc/ticket/6124


On Wed, Sep 18, 2013 at 2:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Alright I think I have this working now, with the same properties as before! The remaining issue seems to be a GHC bug: with -Wall it says my patterns are non-exhaustive suggesting cases that wouldn�t type check. I haven�t ported all the code yet but at least the Category (with PolyKinds) seems to work out � cool! It's what I wanted from the start anyway, so thanks for the pointers.



On Wed, Sep 18, 2013 at 10:55 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ah of course, let me try that�



On Wed, Sep 18, 2013 at 10:51 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
My understanding was that if you wanted a file under root you would write:

� � root >>> file "Myfile.txt"

... where:

� � root :: Path e (Directory Root) (Directory Tree)
� � root = RootDirectory
� ��
� � file :: Text -> Path e (Directory Tree) File
� � file = ...


On Wed, Sep 18, 2013 at 1:48 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Well it's perfectly valid to have a File under root, yeah? And maybe we could live without that but then the same problem occurs for HomeDirectory where we certainly want to be able to refer to files! I have an idea though that I'm experimenting with now.

On Wed, Sep 18, 2013 at 10:30 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
On Wed, Sep 18, 2013 at 1:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ok so after some experimenting with this:

  1. We need a poly-kinded Category, which GHC 7.8 will have but we don�t have currently. This isn�t a big problem because just like with pipes we can use Category for reasoning even if we don't (yet) have an actual instance.
  2. Your suggestion to �cast� to Directory doesn't quite work out because we need IO to resolve the working directory and the user home, which is exactly why I put abstractions into Path to let you refer to those places in pure code.
  1. I tried to work around that problem:
data Branch = Root | Drive | Home | Working | Remote | Tree

data Node = Directory Branch | File

data Path :: Encoding -> Node -> Node -> * where
    RootDirectory :: Path e (Directory Root) b
    HomeDirectory :: Path e (Directory Home) b
    ...
Shouldn't it be this?

� � RootDirectory :: Path e (Directory Root) (Directory Tree)
�
I'm assuming that you are using `Directory Tree` to represent what I was calling `Directory`.

But now the types don't prevent me from doing things like home </> root!

So, I'm not sure how to make this a Category while keeping all the safety properties and allowing to express everything in pure code. Any ideas?

On Wed, Sep 18, 2013 at 7:06 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

On 09/18/2013 08:39 AM, James Deikun wrote:
On Wed, 2013-09-18 at 13:12 +0200, Dag Odenhall wrote:
Comments:

It seems like you've thought through a lot of complications of
processing paths in pure code--like having a special anchor for 'home'
so one doesn't need to know where the home directory actually is when
constructing an anchored path.

Theoretically, unanchored paths should form a monoid that acts on
anchored paths from the right.

Actually, I was thinking some more on this, but why not unify `Reference` and `Node` into a single data type (let me hypothetically call it `PWD`):

� � data PWD = Root | Home | Remote | Directory | File


The implication is that `Directory` would be distinct from `Home`/`Root`/`Remote`, and you would need to explicitly convert them to `Directory` (see below).

Then you can make a category where the objects are members of `PWD` and paths are the morphisms:

In other words, the `Path` type would instead be:

� � data Path :: Encoding -> PWD -> PWD -> * where ...

� � instance Category (Path encoding) where ...


Then `(</>)` would just be `(.)`, and `cwd` would be `id`.

Now, the "casting" from `Remote`/`Home`/`Root` to `Directory` is actually not a problem at all because you can define:

� � home :: Path e Home Directory

� � root :: Path e Root Directory

� � remote :: Path e Remote Directory

Similarly, you would have:

� � dir :: Text -> Path Mixed Directory Directory

� � file :: Text -> Path Mixed Directory File


So then you could just write something like:

� � home >>> dir "gabriel" >>> file ".bashrc":: Path Mixed Home File


--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Dag Odenhall

unread,
Sep 19, 2013, 12:59:41 PM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Even if we find a solution to that problem, the issue with the lessened type safety remains. I'd like to find a solution to that first.

On Thu, Sep 19, 2013 at 6:40 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

Never mind.  I see the flaw in what I suggested.  The issue is that you have no morphism from `Root` to `Home`.

I think the real issue is that you don't have such a morphism.  Why not provide one as another term in your `Path` type, i.e.: `RootToHome` or something like that?


On 09/19/2013 04:33 AM, Dag Odenhall wrote:

New problem! cwd doesn't actually work as id because it needs to be Path e a a and in fact as hinted at above I changed cwd to be more like root and home. So I added a CurrentPath :: Path e a a to make the fake Category work out but then that means you can do things like absolute (id :: Path (Directory Home) (Directory Home) which type checks but results in this error:

*** Exception: src/System/Path/Internal.hs:(147,9)-(150,67): Non-exhaustive patterns in function relative

where

relative :: Path e (Directory Home) b -> Path e (Directory Tree) b
relative CurrentPath = ???

I don't have any way to construct a value of type Path e (Directory Tree) (Directory Home)?

On Wed, Sep 18, 2013 at 11:27 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
GHC always complains about non-exhaustive checks when using GADTs.  It's a known bug.  See: http://ghc.haskell.org/trac/ghc/ticket/6124


On Wed, Sep 18, 2013 at 2:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Alright I think I have this working now, with the same properties as before! The remaining issue seems to be a GHC bug: with -Wall it says my patterns are non-exhaustive suggesting cases that wouldn‘t type check. I haven’t ported all the code yet but at least the Category (with PolyKinds) seems to work out — cool! It's what I wanted from the start anyway, so thanks for the pointers.



On Wed, Sep 18, 2013 at 10:55 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ah of course, let me try that…



On Wed, Sep 18, 2013 at 10:51 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
My understanding was that if you wanted a file under root you would write:

    root >>> file "Myfile.txt"

... where:

    root :: Path e (Directory Root) (Directory Tree)
    root = RootDirectory
    
    file :: Text -> Path e (Directory Tree) File
    file = ...


On Wed, Sep 18, 2013 at 1:48 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Well it's perfectly valid to have a File under root, yeah? And maybe we could live without that but then the same problem occurs for HomeDirectory where we certainly want to be able to refer to files! I have an idea though that I'm experimenting with now.

On Wed, Sep 18, 2013 at 10:30 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
On Wed, Sep 18, 2013 at 1:25 PM, Dag Odenhall <dag.od...@gmail.com> wrote:

Ok so after some experimenting with this:

  1. We need a poly-kinded Category, which GHC 7.8 will have but we don‘t have currently. This isn’t a big problem because just like with pipes we can use Category for reasoning even if we don't (yet) have an actual instance.
  2. Your suggestion to “cast” to Directory doesn't quite work out because we need IO to resolve the working directory and the user home, which is exactly why I put abstractions into Path to let you refer to those places in pure code.
  1. I tried to work around that problem:
data Branch = Root | Drive | Home | Working | Remote | Tree

data Node = Directory Branch | File

data Path :: Encoding -> Node -> Node -> * where
    RootDirectory :: Path e (Directory Root) b
    HomeDirectory :: Path e (Directory Home) b
    ...
Shouldn't it be this?

    RootDirectory :: Path e (Directory Root) (Directory Tree)
 
I'm assuming that you are using `Directory Tree` to represent what I was calling `Directory`.

But now the types don't prevent me from doing things like home </> root!

So, I'm not sure how to make this a Category while keeping all the safety properties and allowing to express everything in pure code. Any ideas?

On Wed, Sep 18, 2013 at 7:06 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

On 09/18/2013 08:39 AM, James Deikun wrote:
On Wed, 2013-09-18 at 13:12 +0200, Dag Odenhall wrote:
Comments:

It seems like you've thought through a lot of complications of
processing paths in pure code--like having a special anchor for 'home'
so one doesn't need to know where the home directory actually is when
constructing an anchored path.

Theoretically, unanchored paths should form a monoid that acts on
anchored paths from the right.

Actually, I was thinking some more on this, but why not unify `Reference` and `Node` into a single data type (let me hypothetically call it `PWD`):

    data PWD = Root | Home | Remote | Directory | File

The implication is that `Directory` would be distinct from `Home`/`Root`/`Remote`, and you would need to explicitly convert them to `Directory` (see below).

Then you can make a category where the objects are members of `PWD` and paths are the morphisms:

In other words, the `Path` type would instead be:

    data Path :: Encoding -> PWD -> PWD -> * where ...

    instance Category (Path encoding) where ...

Then `(</>)` would just be `(.)`, and `cwd` would be `id`.

Now, the "casting" from `Remote`/`Home`/`Root` to `Directory` is actually not a problem at all because you can define:

    home :: Path e Home Directory

    root :: Path e Root Directory

    remote :: Path e Remote Directory

Similarly, you would have:

    dir :: Text -> Path Mixed Directory Directory

    file :: Text -> Path Mixed Directory File

So then you could just write something like:

Dag Odenhall

unread,
Sep 19, 2013, 3:42:57 PM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

I feel like maybe the purpose of something like Category is only to track how a path was built up — not actually forbidding any combinations. My phantom types are indexes not parameters, something like that?

Dag Odenhall

unread,
Sep 19, 2013, 3:49:53 PM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

I think I have a much saner Path as a Category formulation; gonna see how it plays out. :-)

Dag Odenhall

unread,
Sep 19, 2013, 4:17:37 PM9/19/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Here is the changes to make Path more “categorical”. As far as I can tell it is exactly the same in what it allows and forbids, but cleaner and and without any non-exhaustive patterns etc.

This still allows things like root . undefined to type check, but I moved </> to a class:

class Append b where
    (</>) :: Path e (Directory t) b -> Path e b c -> Path e (Directory t) c
    (</>) = Path

instance Append (Directory Tree)
instance Append File

Thus if you avoid (.):

>>> :t undefined </> root

<interactive>:1:11:
    No instance for (Append ('Directory 'Root))
      arising from a use of `</>'
    Possible fix:
      add an instance declaration for (Append ('Directory 'Root))
    In the expression: undefined </> root

It does what I want. I‘m not sure how I feel about this though. What’s the point of making it a Category if we lose safety when we use it?

Gabriel Gonzalez

unread,
Sep 19, 2013, 7:23:30 PM9/19/13
to dag.od...@gmail.com, haskel...@googlegroups.com
What's the lessened type safety you are referring to?


On 09/19/2013 09:59 AM, Dag Odenhall wrote:

Even if we find a solution to that problem, the issue with the lessened type safety remains. I'd like to find a solution to that first.

On Thu, Sep 19, 2013 at 6:40 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

Never mind.� I see the flaw in what I suggested.� The issue is that you have no morphism from `Root` to `Home`.

I think the real issue is that you don't have such a morphism.� Why not provide one as another term in your `Path` type, i.e.: `RootToHome` or something like that?

Gabriel Gonzalez

unread,
Sep 19, 2013, 7:24:20 PM9/19/13
to dag.od...@gmail.com, haskel...@googlegroups.com
No, a category should definitely be able to forbid combinations.� The idea is that the only valid paths are those that you enumerate in your `Path` type.� Anything you leave out is forbidden.

On 09/19/2013 12:42 PM, Dag Odenhall wrote:

I feel like maybe the purpose of something like Category is only to track how a path was built up � not actually forbidding any combinations. My phantom types are indexes not parameters, something like that?



Dag Odenhall

unread,
Sep 20, 2013, 5:29:46 AM9/20/13
to haskel...@googlegroups.com

It‘s not forbidden, it’s just not possible to construct. But you can still write the types abstractly:

bad :: Path e a (Directory Root) -> Path e a (Directory Tree)
bad = (root .)

Which is nonsense but type checks. Actually with Category it isn't nonsense because id satisfies it, but with Semigroupoids it would be nonsense and only undefined could satisfy it.

Maybe I'm overthinking this.



On Fri, Sep 20, 2013 at 1:24 AM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
No, a category should definitely be able to forbid combinations.  The idea is that the only valid paths are those that you enumerate in your `Path` type.  Anything you leave out is forbidden.


On 09/19/2013 12:42 PM, Dag Odenhall wrote:

I feel like maybe the purpose of something like Category is only to track how a path was built up — not actually forbidding any combinations. My phantom types are indexes not parameters, something like that?



Dag Odenhall

unread,
Sep 20, 2013, 5:35:57 AM9/20/13
to haskel...@googlegroups.com

On Wed, Sep 18, 2013 at 9:42 PM, James Deikun <ja...@place.org> wrote:

let outfile = indir <.> "zip"  ?

This is actually an interesting example because it wouldn't type check even without <.> because you're treating a Directory as a File. Presumably in the final version we have combinators for converting between the two, and then the <.> works in this case anyway by merit of being applied to a File; something like let outfile = file indir <.> "zip".

Dag Odenhall

unread,
Sep 20, 2013, 6:25:09 AM9/20/13
to haskel...@googlegroups.com

On Fri, Sep 20, 2013 at 11:29 AM, Dag Odenhall <dag.od...@gmail.com> wrote:

Maybe I'm overthinking this.

I have a sneaking suspicion I'm trying to forbid the very things that make Category so friendly to composition. I should probably give in to the power of sound theory! ;-)

Gabriel Gonzalez

unread,
Sep 20, 2013, 10:25:13 AM9/20/13
to haskel...@googlegroups.com, Dag Odenhall
I feel like I'm missing something.� Why is that bad?

Can you just reduce the problem to the simplest problematic scenario that having having a `Category` causes?� I still don't understand how having a `Category` messes up safety or soundness.

On 09/20/2013 02:29 AM, Dag Odenhall wrote:

It�s not forbidden, it�s just not possible to construct. But you can still write the types abstractly:

bad :: Path e a (Directory Root) -> Path e a (Directory Tree)
bad = (root .)

Which is nonsense but type checks. Actually with Category it isn't nonsense because id satisfies it, but with Semigroupoids it would be nonsense and only undefined could satisfy it.

Maybe I'm overthinking this.

On Fri, Sep 20, 2013 at 1:24 AM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
No, a category should definitely be able to forbid combinations.� The idea is that the only valid paths are those that you enumerate in your `Path` type.� Anything you leave out is forbidden.


On 09/19/2013 12:42 PM, Dag Odenhall wrote:

I feel like maybe the purpose of something like Category is only to track how a path was built up � not actually forbidding any combinations. My phantom types are indexes not parameters, something like that?



--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Dag Odenhall

unread,
Sep 20, 2013, 12:03:24 PM9/20/13
to haskel...@googlegroups.com

The idea was that anything </> root should fail to type check, which it did in my original implementation but not the Category-based experiments. But I learned to stop worrying and love the Category, so, I just pushed a big rewrite to master:

  1. Path is now a Category (defined locally until we have GHC 7.8 with PolyKinds in Control.Category)
  2. I dropped the whole Encoding thing
  3. I dropped absolute
  4. Instead, posix and windows now run in IO and handle encodings and resolving references — and forbidding references that they can't handle — and then return a Builder

I think it's all much cleaner. The idea then is that things like pipes-filesystem can take a Path, encode and resolve it with posix and then keep that original builder around to concatenate children as it traverses a directory tree. I'm hoping this has the potential to provide good performance.

Of course we‘ll also need a platform agnostic API, and I have some ideas for that I’m going to work on next.



On Fri, Sep 20, 2013 at 4:25 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
I feel like I'm missing something.  Why is that bad?

Can you just reduce the problem to the simplest problematic scenario that having having a `Category` causes?  I still don't understand how having a `Category` messes up safety or soundness.


On 09/20/2013 02:29 AM, Dag Odenhall wrote:

It‘s not forbidden, it’s just not possible to construct. But you can still write the types abstractly:

bad :: Path e a (Directory Root) -> Path e a (Directory Tree)
bad = (root .)

Which is nonsense but type checks. Actually with Category it isn't nonsense because id satisfies it, but with Semigroupoids it would be nonsense and only undefined could satisfy it.

Maybe I'm overthinking this.

On Fri, Sep 20, 2013 at 1:24 AM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
No, a category should definitely be able to forbid combinations.  The idea is that the only valid paths are those that you enumerate in your `Path` type.  Anything you leave out is forbidden.


On 09/19/2013 12:42 PM, Dag Odenhall wrote:

I feel like maybe the purpose of something like Category is only to track how a path was built up — not actually forbidding any combinations. My phantom types are indexes not parameters, something like that?



--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Gabriel Gonzalez

unread,
Sep 20, 2013, 4:36:32 PM9/20/13
to haskel...@googlegroups.com
This looks really good to me.

If you want even better performance, I want to point out that you can store `Builder`s directly within your `Component` type.  As a bonus, the `Builder`s for `Text` and `ByteString` have `IsString` instances that you can reuse when defining the `IsString` instance for `Component`.

Dag Odenhall

unread,
Sep 20, 2013, 5:15:41 PM9/20/13
to haskel...@googlegroups.com

On Fri, Sep 20, 2013 at 10:36 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

If you want even better performance, I want to point out that you can store `Builder`s directly within your `Component` type.  As a bonus, the `Builder`s for `Text` and `ByteString` have `IsString` instances that you can reuse when defining the `IsString` instance for `Component`.

I was actually going to do that at first but then I realized it would complicate and slow down the cross-platform and component sanitation code, because then to work with the text/bytestring we have to unwrap and rewrap the builders.

It might still be possible though; I'll take a look at it again once the relevant code is in place.

Gabriel Gonzalez

unread,
Sep 20, 2013, 5:18:17 PM9/20/13
to haskel...@googlegroups.com
Oh yeah, I hadn't thought of that.  Don't worry about it, then! :)


--

Dag Odenhall

unread,
Sep 22, 2013, 2:26:47 PM9/22/13
to haskel...@googlegroups.com

I pushed another big rewrite to master.

Turns out, I only need one unified Node kind for the objects in the path category. Who'd have thought!

I also decoupled locale handling and path rendering again, but without reintroducing the Encoding kind. Instead I made my own very simple but hopefully efficient Builder that can work with both Text and ByteString at the same time. I can then abstract cross-platform handling into a single type. This also means you can build a POSIX path as Text (if the locale can decode the bytes) or a Windows path as ByteString, should you want to for whatever reason, but internally I use posix-paths and unix whenever possible and otherwise fall back on filepath and directory.

Question time! Are any of these claims true?

  1. All Node types except for Directory and File are “initial objects” in the path category
  2. File is a “terminal object”
  3. DirectoryName and FileExtensions are “endomorphisms”
  4. Path is a “homomorphism”

Dag Odenhall

unread,
Sep 25, 2013, 6:01:17 AM9/25/13
to haskel...@googlegroups.com

Is Path really properly a category/semigroupoid given this [lack of] associativity?

>>> "a" </> ("b" </> "c")
Path (DirectoryName (Text "a")) (Path (DirectoryName (Text "b")) (DirectoryName (Text "c")))
>>> ("a" </> "b") </> "c"
Path (Path (DirectoryName (Text "a")) (DirectoryName (Text "b"))) (DirectoryName (Text "c"))

These will always refer to the same concrete path but if we export the constructors you can observe the difference.

Similarly, was Edge a true identity? With the Category instance I think it was but again, with the constructors you could write things like Path Edge Edge which is treated the same as Edge in the package itself but is observably different if you have the constructors (or just the Show instance).

Dag Odenhall

unread,
Sep 25, 2013, 9:38:39 AM9/25/13
to haskel...@googlegroups.com

It is easy enough to ensure associativity without hiding any constructors; see experimental commit. This means another type to deal with, although really most users won't have to and we can organize the documentation to de-emphasize some types. I might need better names for some types too.

Now of course linked list have really inefficient concatenation, which is really what the composition of morphisms is now. Does this matter? Is it worth it? Is there a more efficient structure that still enforces associativity? I suspect people won't be constructing insanely long chains of abstract paths (and things like pipes-filesystem will mostly be dealing with the concrete PathName type) and we could optimize for the common case and make sure things like a </> b </> c behave more like a:b:c:[] than like [a]++[b,c].

Anyway:

>>> "a" </> ("b" </> "c")
Path (DirectoryName (Text "a")) (Path (DirectoryName (Text "b")) (Path (DirectoryName (Text "c")) End))
>>> ("a" </> "b") </> "c"
Path (DirectoryName (Text "a")) (Path (DirectoryName (Text "b")) (Path (DirectoryName (Text "c")) End))

What do you think, is this better?

Gabriel Gonzalez

unread,
Sep 25, 2013, 10:08:43 AM9/25/13
to haskel...@googlegroups.com
So what you've built is basically the same way a free category is designed.  I think that just ensuring the correct fixity is good enough.  Also, even if you used the wrong fixity for whatever reason, I don't anticipate this being used on a really deeply nested directory tree where the performance cost would matter.

Dag Odenhall

unread,
Sep 25, 2013, 11:25:07 AM9/25/13
to haskel...@googlegroups.com

Free category eh, is that defined anywhere on Hackage? I'm guessing it looks something like this?

data Free :: (a -> a -> *) -> a -> a -> * where
    Id :: Free hom a a
    Free :: hom a b -> Free hom b c -> Free hom a c

Interestingly, according to Wikipedia the free category is also called the “path category”!



On Wed, Sep 25, 2013 at 4:08 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:
So what you've built is basically the same way a free category is designed.  I think that just ensuring the correct fixity is good enough.  Also, even if you used the wrong fixity for whatever reason, I don't anticipate this being used on a really deeply nested directory tree where the performance cost would matter.

--

Gabriel Gonzalez

unread,
Sep 25, 2013, 11:29:01 AM9/25/13
to haskel...@googlegroups.com, Dag Odenhall

Yes, that is the most popular way to define it, as a linked list of morphisms, since that parallels the most popular implementation for free monoids: linked lists. 

Dag Odenhall

unread,
Sep 25, 2013, 1:51:20 PM9/25/13
to Gabriel Gonzalez, haskel...@googlegroups.com

I'm not sure I like having an identity though: it always renders as the empty string but type checks in pretty much any context including those that might be expecting a fully qualified path, so then we have to deal with that case everywhere and either throw errors or special-case it contextually and say things like "resolving the identity as Root is the same as resolving RootDirectory itself" which doesn't feel very elegant to me.

How strongly do you feel this needs to be a Category as opposed to “just” a Semigroupoid?

Dag Odenhall

unread,
Sep 25, 2013, 1:57:05 PM9/25/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Actually, if I resolve root the same way I resolve home it does work out and doesn't feel any less elegant. Hm.

Dag Odenhall

unread,
Sep 25, 2013, 2:01:01 PM9/25/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Oh except I can't do that for Drive because I have no drive letter with just the identity! Gah.

Gabriel Gonzalez

unread,
Sep 25, 2013, 3:57:03 PM9/25/13
to Dag Odenhall, haskel...@googlegroups.com
You can always prevent the identity morphism from type-checking by making sure that your functions don't accept morphisms with the same domain and codomain, which is probably the correct behavior anyway.

For example, a function that accepts absolute paths would accept morphisms of type `Path Root (Tree Directory)`.  Same thing for things rooted in `Home` or a drive letter.  That prevents the identity morphism from type-checking.

I would definitely keep the identity morphism.  I know it's causing problems but in cases like these when I have a conflict in my design and the theory, I usually suspect my design is at fault.  Think of the identity morphism as a "test case" for your design that will catch design mistakes.

Dag Odenhall

unread,
Sep 25, 2013, 4:22:30 PM9/25/13
to Gabriel Gonzalez, haskel...@googlegroups.com

On Wed, Sep 25, 2013 at 9:57 PM, Gabriel Gonzalez <gabri...@gmail.com> wrote:

You can always prevent the identity morphism from type-checking by making sure that your functions don't accept morphisms with the same domain and codomain, which is probably the correct behavior anyway.

For example, a function that accepts absolute paths would accept morphisms of type `Path Root (Tree Directory)`.  Same thing for things rooted in `Home` or a drive letter.  That prevents the identity morphism from type-checking.

This could work. I'll have to write duplicate instances to deal with File too (or reintroduce a new kind) but it should work.

I would definitely keep the identity morphism.  I know it's causing problems but in cases like these when I have a conflict in my design and the theory, I usually suspect my design is at fault.  Think of the identity morphism as a "test case" for your design that will catch design mistakes.

Yeah I was thinking that too, although at the same time I was feeling that maybe it really is a semigroupoid and I'm just trying to shoehorn it into a category. ;-)

Gabriel Gonzalez

unread,
Sep 25, 2013, 4:56:18 PM9/25/13
to Dag Odenhall, haskel...@googlegroups.com
I always try to stick to "brand name" category theory abstractions (i.e. category/monoid/functor) and stay away from their close cousins like semigroups, semigroupoids, profunctors, actions, etc.  This is one case where Edward Kmett and I philosophically differ.  He's happy with an abstraction as long as it has some mathematical name (or he makes up a new name for it.  See: "affine traversals" or "improper isomorphisms").  However, this sort of taxonomy has always seemed to me like the lowest form of mathematics.

I prefer to restrict myself to a much narrower range of abstractions and assume that if something does not fit within that curated set of abstractions then I must have missed the mark and therefore need to rethink my approach.  In other words, I don't think that all mathematical abstractions are created equal and that a category is in a very different league from a semigroupoid.

In fact, this was one of the reasons I bailed out on making `pipes` depend on `foldl` at the last minute.  After I released `foldl`, I discovered that the`Fold` type forms a profunctor but not a category and this was a big warning sign to me that I needed to rethink the `foldl` library because there is probably a deeper and better abstraction that I had just missed.

Dag Odenhall

unread,
Sep 25, 2013, 5:05:07 PM9/25/13
to Gabriel Gonzalez, haskel...@googlegroups.com

Actually I'm being stupid: I just need a new, empty class with Directory and File as the lone instances, and I can constrain the b of a path with that class. Obviously.

Dan Burton

unread,
Sep 25, 2013, 5:16:31 PM9/25/13
to haskel...@googlegroups.com, Dag Odenhall
Do realize, though, that [X] *not* being a [SomeAbstraction] can mean that [X] is actually interesting and useful. (e.g. things which are Applicative but not Monad.) Obeying too many laws eventually limits you to boring old Identity.

-- Dan Burton


Gabriel Gonzalez

unread,
Sep 25, 2013, 5:21:46 PM9/25/13
to haskel...@googlegroups.com
I don't view `Applicative` as a weaker form of a `Monad`.  The significance of an `Applicative` is from its behavior as a lax monoidal functor, totally independent of its relationship to `Monad`.  However, `Apply` (the semigroup analog of `Applicative`) is definitely one of the abstractions I avoid.

Dag Odenhall

unread,
Sep 25, 2013, 5:28:12 PM9/25/13
to haskel...@googlegroups.com

I bet you have recurring nightmares about Pointed.

Gabriel Gonzalez

unread,
Sep 25, 2013, 5:29:51 PM9/25/13
to haskel...@googlegroups.com, Dag Odenhall
Yes.  Damn you, `Pointed` and `Bind`! :(

Oliver Charles

unread,
Sep 25, 2013, 5:38:29 PM9/25/13
to haskel...@googlegroups.com
On 09/25/2013 09:56 PM, Gabriel Gonzalez wrote:
> I always try to stick to "brand name" category theory abstractions (i.e.
> category/monoid/functor) and stay away from their close cousins like
> semigroups, semigroupoids, profunctors, actions, etc. This is one case
> where Edward Kmett and I philosophically differ. He's happy with an
> abstraction as long as it has some mathematical name (or he makes up a
> new name for it. See: "affine traversals" or "improper isomorphisms").
> However, this sort of taxonomy has always seemed to me like the lowest
> form of mathematics.

Commentary like that feels like the lowest form of constructive
discussion imo. You might disagree with how Edward designs things, but
calling him out on it in a discussion he's not even involved in is out
of line. (And that's not at all what makes Edward happy - profunctor and
other lesser known cousins are very useful in lens, but I don't want to
go down that path).

- ocharles

signature.asc

Gabriel Gonzalez

unread,
Sep 25, 2013, 5:43:43 PM9/25/13
to haskel...@googlegroups.com, Oliver Charles
That's not an attack on him.  I have immense respect for Edward.  I disagree with ideas, not people.

I named him explicitly because I was worried that if I didn't then people would interpret it as a thinly veiled attack on him because he's the biggest proponent of all the concepts I'm denouncing (`Bind`, `Apply`, `profunctors`, etc.).

Aleksey Khudyakov

unread,
Sep 26, 2013, 6:12:32 AM9/26/13
to haskel...@googlegroups.com
On 26.09.2013 00:56, Gabriel Gonzalez wrote:
> I always try to stick to "brand name" category theory abstractions (i.e.
> category/monoid/functor) and stay away from their close cousins like
> semigroups, semigroupoids, profunctors, actions, etc. This is one case
> where Edward Kmett and I philosophically differ. He's happy with an
> abstraction as long as it has some mathematical name (or he makes up a
> new name for it. See: "affine traversals" or "improper isomorphisms").
> However, this sort of taxonomy has always seemed to me like the lowest
> form of mathematics.
>
> I prefer to restrict myself to a much narrower range of abstractions and
> assume that if something does not fit within that curated set of
> abstractions then I must have missed the mark and therefore need to
> rethink my approach. In other words, I don't think that all
> mathematical abstractions are created equal and that a category is in a
> very different league from a semigroupoid.
>
> In fact, this was one of the reasons I bailed out on making `pipes`
> depend on `foldl` at the last minute. After I released `foldl`, I
> discovered that the`Fold` type forms a profunctor but not a category and
> this was a big warning sign to me that I needed to rethink the `foldl`
> library because there is probably a deeper and better abstraction that I
> had just missed.
>
Being a profunctor is just a nature of fold. Fold result is in covariant
position, fold argument is in the contravariant position.
Arrange type parameters suitably and you'll get a profunctor.

And then what is meaning of category of folds? With fold you have only
one result so if you feed it to next fold it's actually equivalent to fmap.

There is nothing wrong with fancy (for the lack of better word)
abstractions. It turns out that some thing are those objects like fold
is profunctor so we need name for such abstraction. Next question
whether it's useful name or not.

Iain Nicol

unread,
Sep 26, 2013, 8:47:55 AM9/26/13
to haskel...@googlegroups.com
On 2013-09-25, Dag Odenhall wrote:
> Now of course linked list have really inefficient
> concatenation, which is really what the composition of
> morphisms is now. Does this matter? Is it worth it? Is there
> a more efficient structure that still enforces associativity?

Regardless of whether it matters for this use case, future reference
look at `Data.Sequence`[1]. Prepends, appends, concatenations, and
random access all have quite nice asymptotic behaviour.

[1]
http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-Sequence.html

Dag Odenhall

unread,
Sep 26, 2013, 9:04:12 AM9/26/13
to haskel...@googlegroups.com

Yeah, that was one structure I was thinking of. Does it enforce associativity in the types? I‘ll have to read the paper on fingertrees. It wouldn’t be hard to enforce associativity in a function, but I‘d prefer something that is correct by construction such that I can export the constructors in the public API. Another possible contender is the difference list, under the assumption that we don’t usually need access to the Path parts until it's completely built up anyway.

But yeah, this is all probably premature optimization talk. Interesting, none the less. :-)



Gabriel Gonzalez

unread,
Sep 26, 2013, 10:30:44 AM9/26/13
to haskel...@googlegroups.com, Aleksey Khudyakov
What I meant was that there may be a slight variation on a fold that
still has these nice properties in addition to being a category. If
it's a category then it will automatically be a profunctor too, except
we wouldn't need the profunctor any longer.

Aleksey Khudyakov

unread,
Sep 26, 2013, 10:41:47 AM9/26/13
to haskel...@googlegroups.com
I doubt it exists. Or maybe it require not so slight change. In essesnce
fold is mapping from n-elements set to 1 element. Then nested fold is
mapping from set of sets to single element. So types are not quite right
and there's no reasonable identity as well.

I of course can miss something.

Gabriel Gonzalez

unread,
Sep 26, 2013, 8:18:14 PM9/26/13
to haskel...@googlegroups.com, alexey....@gmail.com

> I doubt it exists. Or maybe it require not so slight change. In
> essesnce fold is mapping from n-elements set to 1 element. Then nested
> fold is mapping from set of sets to single element. So types are not
> quite right and there's no reasonable identity as well.

Right, any type of fold has no identity, but maybe that's because a fold
is not the best solution, even for problems that are ostensibly only
about folding. For example, maybe the real solution is something like:

b -> Fold a b

I have no idea if that forms a category, but it's just a hypothetical
idea to make a point. Maybe the truly elegant abstraction is a slight
variation on a fold that is still equally useful and powerful.
Reply all
Reply to author
Forward
0 new messages