Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Haskell] [ANN] Safe Lazy IO in Haskell

60 views
Skip to first unread message

Nicolas Pouillard

unread,
Mar 20, 2009, 2:44:13 PM3/20/09
to Haskell-Cafe, Haskell
Hi folks,

We have good news (nevertheless we hope) for all the lazy guys standing there.
Since their birth, lazy IOs have been a great way to modularly leverage all the
good things we have with *pure*, *lazy*, *Haskell* functions to the real world
of files.

We are happy to present the safe-lazy-io package [1] that does exactly this
and is going to be explained and motivated in the rest of this post.

=== The context ===

Although these times were hard with the Lazy/IO technique, some people continue
to defend them arguing that all discovered problems about it was not that harmful
and that taking care was sufficient. Indeed some issues have been discovered about
Lazy/IOs, some have been fixed in the underlying machinery, some have just been
hidden and some others are still around.

== An alternative ==

An alternative design has been proposed --and is still evolving--, it is called
"Iteratee" [2] and has been designed by Oleg Kiselyov. This new design has tons
of advantages over standard imperative IOs, and shares some of the goals of
Lazy/IOs. Iteratee provides a way to do incremental processing in a
high-level style. Indeed both processed data (via enumerators) and processing
code (called iteratee) can be modularly composed. The handling of file-system
resources is precise and safe. Catching errors can be done precisely and can
be interleaved with the processing. In spite of all this, there is an important
drawback: a lot of code has to be re-written and thought in another way.
Processing becomes explicitly chunked which is not always needed and, even
worse, exceptions handling also becomes very explicit. While this makes sense
in a wide range of applications it makes things less natural than the general
case of pure functions. We think that Iteratee have too be studied more, and
we recommend them when you have incrementally react to IO errors.

== Issues of Standard Lazy/IO ==

We think that we can save Lazy/IO cheaply, but before explaining the way we
solve such and such issue, let's first expose Lazy/IO and its issues.

One of the main Lazy/IO functions is 'readFile': it takes a file path opens it
and returns the list of characters until the end of the file is reached. The
characteristic of 'readFile' is that only the opening is done strictly, while
the reading is performed lazily as much as the output list is processed.

Cousins of 'readFile' are 'hGetContents' that takes a file handle and 'getContents'
that reads on the standard input.

This technique enables to process a file as if the file was completely stored
in memory. Because it is read lazily one knows that only the required part of
the file will be read. Even better, if the input is consumed to produce a
small output or the output is emitted incrementally, then the processing can
be done in constant memory space.

Examples:
-- Prints the number of words read on stdin
> countWords = print . length . words =<< getContents
-- Prints the length of the longest line
> maxLineLen = print . maximum . map length . lines =<< getContents
-- Prints in lower case the text read on stdin
> lowerText = interact (map toLower)
-- Alternatively
> lowerText = putStr . map toLower =<< getContents

All these examples are pretty idiomatic Haskell code and make a simple
use of Lazy/IOs. Each of them runs in constant memory space even
if they are declared as if the whole contents were available at once.

By using stream fusion or 'ByteString''s one can get even faster code while
keeping the code almost the same. Here we will stay with the default list
of 'Char''s data type. However one goal of our approach is to be trivially
adaptable to those data types.

Using our library will be rougly a matter of namespace switch plus a running
function:

> lowerText = LI.run' (SIO.putStr . map toLower <$> LI.getContents)

However we will introducing this library as one goes along.

Here is another example where the Lazy/IO are still easy to use but no longer
scales well. This program counts the lines of all the files given in arguments:

> countLines = print . length . lines . concat =<< mapM readFile =<< getArgs

Here the problem is the limitation of simultaneous opened files. Indeed,
all the files are opened at the beginning therefore reaching the limit easily.

It's time to recall when the files are closed. With standard Lazy/IOs the
handle is closed when you reach the end of the file, and so when you've
explored the whole list returned by 'readFile'.

Note also that if you manually open the file and get a handle, then you can
manually close the file, however if by misfortune you close the file and
then still consume the lazy list you will get a truncated list, observing
how much of the file has been read. This last point is due to the fact
that 'readFile' considers the reading error as the end of the file.

In particular one can fix this program, by simply counting the number of lines
of each file separately and then compute the sum to get the final result.

> countLines = print . sum =<< mapM (fmap (length . lines) . readFile) =<< getArgs

However once again this program exhausts the handle resources. Trying
to close the files will not save us either, one just risks getting truncated
files. Indeed the list of intermediate results is produced eagerly but each
intermediate result is lazy and then each file is opened but not immediately
closed since the computation is delayed.
Hopefully adding a bit of strictness cures the problem:

> countLines = print . sum =<< mapM ((return' . length . lines =<<) . readFile) =<< getArgs
> where return' x = x `seq` return x

Until there, we have disclosed three problems:
* while reading is lazy, opening is strict, which leads to a less
natural processing of multiple files
* the closing of files is hard to predict
* the errors during reading are silently discarded

The last one is a bit trickier and has recently been exposed by Oleg Kiselyov [3].
The problem appears when one gets twice the contents of the same stream---or some kind
of inter-dependent streams. Because reading is implicitly driven by the consumer, the interleaving
of reading may then depend on the reduction strategy employed. This is the case even
if the consumer is a pure function.

Basically in this example one can observe different values when using one of these functions:
> f1 x y = x `seq` y `seq` x - y
> f2 x y = y `seq` x `seq` x - y

In this example one reads stdin twice and relies on the error handling to end one stream while
keeping the other opened. Moreover there are other ways to achieve this like the
use of unix fifo files, or using 'getChanContents' from the "Control.Concurrent.Chan"
module.

=== Our solution ===

Here we will present another design, based on a very simple idea. Our goal is
to provide IO processing in a style very similar to standard Lazy/IO with the
following differences:
- preservation of the determinism;
- a simple control exceptions;
- and a precise management of resources.

Our solution is made of three key ingredients: a bit of strictness, some predefined
schemas to interleave inputs, some scopes and abstract types to delimit lazy input
operations from strict IO operations.

== Dealing with a single input ==

Let's present the first ingredient alone through a first example:

> mapHandleContents :: NFData sa => Handle -> (String -> sa) -> IO sa
> mapHandleContents h f = do
> s <- hGetContents h
> return' (f s) `finally` hClose h

> return' :: (Monad m, NFData sa) => sa -> m sa
> return' x = rnf x `seq` return x

It implements some combination of 'fmap' and 'hGetContents'.
Actually some of our examples fit nicely in that model:

> countWords = print =<< mapHandleContents stdin (length . words)
> maxLineLen = print =<< mapHandleContents stdin (maximum . map length . lines)
> lowerText = putStr =<< mapHandleContents stdin (map toLower)

However while the two first examples work well in this setting, the third one
tries to allocate the whole result in memory before printing it.

Here the ingredient that is used is strictness: the purpose in forcing the
result is to be sure that all the needed input is read, before the file is
closed.

So here we rely on 'NFData' instances to really perform deep forcing---this
kind of assumption is a bit like 'Typeable' instances.
In particular functions must not be an instance of 'NFData': indeed, we have
no way to force lazy values that are stored in the closure of a function.

The same remark applies to the 'IO' monad for at least three reasons:
'IO' if often represented by functions; lazy 'IORef''s could be used
to hide one input for later use; exceptions with a lazy contents could
also be used to make a lazy value escape.

Let's now add some more strictness into the meal: the 'SIO' monad!

== The 'SIO' monad ==

The 'SIO' monad is a thin layer over the 'IO' monad, populated only by
strict 'IO' operations. In particular these operations are strict
in the output, which means that once the output is produced then we know
that the given arguments cannot be further evaluated/forced.
Here is an example of strict IO using the 'SIO' monad:

> import qualified System.IO.Strict as SIO
> import System.IO.Strict (SIO)
> countWords = SIO.run (SIO.print . length . words =<< SIO.getContents)

Of course this function does not scale well since it reads the whole
contents in memory before processing it.

For now the strict-io [4] package contains wrappers for functions
in "System.IO", and strict 'IORef''s.

One can now introduce a function in lines of 'mapHandleContents':

> withHandleContents :: NFData sa => Handle -> (String -> SIO sa) -> IO sa
> withHandleContents h f = do
> s <- hGetContents h
> SIO.run (f s) `finally` hClose h

One can then rewrite 'lowerText' as follow:

> lowerText = withHandleContents stdin (SIO.putStr . map toLower)

Until there one can deal quite nicely with single input, many outputs
processing. Currently the only requirement is to delimit a scope where
the resource will be used to return a strict value.

Dealing with multiple inputs will lead us to our final design of lazy
inputs.

== Introducing 'LI', Lazy Inputs ==

We first introduce a type for these lazy inputs namely 'LI'.
This type is abstract and we will progressively introduce functions
to build, combine and run them.

The 'LI' type is a pointed functor, but not necessarily a monad nor
an applicative functor.

Therefore one exports the 'pure' function as 'pureLI'. Building primitives
allow to read files or handles:

> LI.pureLI :: a -> LI a
> LI.hGetContents :: Handle -> LI String
> LI.getContents :: LI String
> LI.readFile :: FilePath -> LI String
> LI.getChanContents :: Chan a -> LI [a]

Being a functor is important: it allows to wholly transform the underlying
input lazily using standard functions about lists for instance:

> length <$> LI.readFile "foo"
> words <$> LI.readFile "foo"

Extracting a final value of a lazy input ('LI' type) is a matter of using:

> LI.run :: (NFData sa) => LI sa -> IO sa
Or
> LI.run' :: (NFData sa) => LI (SIO a) -> IO sa

One can therefore re-write our examples using lazy inputs:

> -- embedded printing
> countWords = LI.run' (SIO.print . length . words <$> LI.getContents)
> -- external printing
> maxLineLen = print =<< LI.run (maximum . map length . lines <$> LI.getContents)
> lowerText = LI.run' (SIO.putStr . map toLower <$> LI.getContents)

== Combining inputs ==

Finally we come to managing multiple inputs. To get both laziness and
precise resource management we will provide dedicated combinators.
The first one is as simple as appending.

> LI.append :: NFData sa => LI [sa] -> LI [sa] -> LI [sa]

This one produces a single stream out that sequences the two given streams.
It also sequences the usage of resources: the first resource is closed and
then the second one is opened.

Note that this combinator is still quite general since one can process each
input beforehand:

> -- one can drop parts of the inputs
> (take 100 <$> i1) `LI.append` (drop 100 <$> i2)
> -- one can tag each input to know where they come from
> Left <$> i1 `LI.append` Right <$> i2

The second one is 'LI.zipWith' which opens the two resources and joins the items
into a single stream. Again, since 'LI' is a functor one can join not only
characters but also words, lines, chunks... A problem with zipping is that it
stops on the shorter input (loosing a part of the other), hopefully one can
prolongate them:

> zipMaybesWith :: (NFData sa, NFData sb) -> (Maybe sa -> Maybe sb -> c) -> LI [sa] -> LI [sb] -> LI [c]
> zipMaybesWith f xs ys =
> map (uncurry f) . takeWhile someJust <$> zip (prolongate <$> xs) (prolongate <$> ys)
> where someJust (Nothing, Nothing) = False
> someJust _ = True
> prolongate :: [a] -> [Maybe a]
> prolongate zs = map Just zs ++ repeat Nothing

The last one is 'LI.interleave':

> LI.interleave :: (NFData sa) -> LI [sa] -> LI [sa] -> LI [sa]

This function is currently left biased, moreover each resource is closed as soon
as we reach its end. However since the inputs are mixed up together one generally
prefers a tagged version trivially build upon this one:

> interleaveEither :: (NFData sa, NFData sb) => LI [sa] -> LI [sb] -> LI [Either sa sb]
> interleaveEither a b = interleave (map Left <$> a) (map Right <$> b)

Here are some final programs that scale well with the number of files.

> -- number of words in the given files
> main = print =<< LI.run . fmap (length . words) . LI.concat . map LI.readFile =<< getArgs

> -- almost the same thing but counts words independently in each file
> main = print
> =<< LI.run . fmap sum . LI.sequence . map (fmap (length . words) . LI.readFile)
> =<< getArgs

> -- prints to stdout swap-cased concatenation of all input files
> main = LI.run' . (fmap (SIO.putStr . fmap swapCase) . LI.concat . map LI.readFile) =<< getArgs
> where swapCase c | isUpper c = toLower c
> | otherwise = toUpper c

> -- sums character code points of inputs
> main = print =<< LI.run . fmap (sum . map (toInteger . ord)) . LI.concat . map LI.readFile =<< getArgs

Our solution is from now widely available as an Hackage package named "safe-lazy-io" [4].

We hope you will freely enjoy using Lazy/IO again!

As usual, criticisms, comments, and help are expected!

Finally, I would like to thank Benoit Montagu and Francois Pottier for helping
me out to polish this work!

[1]: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/safe-lazy-io
[2]: http://okmij.org/ftp/Streams.html
[3]: http://www.haskell.org/pipermail/haskell/2009-March/021064.html
[4]: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/strict-io

--
Nicolas Pouillard
_______________________________________________
Haskell mailing list
Has...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Felipe Lessa

unread,
Mar 20, 2009, 6:56:48 PM3/20/09
to Haskell-Cafe
On Fri, Mar 20, 2009 at 07:42:28PM +0100, Nicolas Pouillard wrote:
> We have good news (nevertheless we hope) for all the lazy guys standing there.
> Since their birth, lazy IOs have been a great way to modularly leverage all the
> good things we have with *pure*, *lazy*, *Haskell* functions to the real world
> of files.

Hey, that's really great! Even if I can't tell you that I used your
library and found out that it works fine, it sure looks handy.

I was reading the sources, and for 'interleaveHandles' you can
probably use forkIO. Internally GHC will use select whenever a forkIO
blocks on something. Probably telling the forkIO's to write on a Chan
would suffice, but something more elaborate to get as much data as
possible because of the Chan's overhead should be better, maybe block
with hWaitForInput and then use hGetBufNonBlocking? Something on the
lines the untested code below:

> import Control.Concurrent (forkIO)
> import Data.Char (chr)
> import Data.Function (fix)
> import Data.Word8 (Word8)
> import Foreign.Marshal.Alloc (allocaBytes)
> import Foreign.Storable (peekByteOf)
>
> interleaveHandlesHelper :: Handle -> Handle -> IO [Either [Char] [Char]]
> interleaveHandlesHelper h1 h2 = do chan <- newChan
> forkIO $ forkFor Left h1 chan
> forkIO $ forkFor Right h2 chan
> getThem chan
> where
> timeout = -1 -- block forever, that's what we want
> bufSize = 4096 -- more? less?
> forkFor tag h chan = allocaBytes bufSize $ \buf ->
> (fix $ \f -> do hWaitForInput h timeout
> cnt <- hGetBufNonBlocking h buf bufSize
> readBuf buf cnt >>= writeChan chan . Just . tag
> f) `catchEOF` (writeChan chan Nothing)
> getThem chan = go 2 -- two is the number of handles
> where go 0 = return []
> go n = unsafeInterleaveIO $ do -- lazy
> c <- readChan chan
> case c of
> Nothing -> go (n-1)
> Just d -> (d:) `fmap` go n
>
> readBuf :: Ptr Word8 -> Int -> IO [Char]
> readBuf ptr cnt = mapM (toChar `fmap` peekByteOf ptr) [0..cnt-1]
> where toChar = chr. fromIntegral -- maybe use Data.ByteString.Internal.w2c


Thanks!

--
Felipe.
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Henning Thielemann

unread,
Mar 21, 2009, 5:27:28 PM3/21/09
to Nicolas Pouillard, Haskell-Cafe

On Fri, 20 Mar 2009, Nicolas Pouillard wrote:

> Hi folks,
>
> We have good news (nevertheless we hope) for all the lazy guys standing there.
> Since their birth, lazy IOs have been a great way to modularly leverage all the
> good things we have with *pure*, *lazy*, *Haskell* functions to the real world
> of files.

Maybe you know of my packages lazy-io and explicit-exception which also
aim at lazy I/O and asynchronous exception handling. With lazy-io, you are
able to write more complicated things than getContents. I needed this for
HTTP communication that is run by demand. That is when the HTTP response
header is requested, then the function could send a HTTP request first. Is
it possible and sensible to combine this with safe-lazy-io?

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/lazyio
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/explicit-exception

I have also code that demonstrates the usage of explicit asynchronous
exceptions. I have however still not a set of combinators that makes
working with asynchronous exceptions as simple as working with synchronous
ones:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/spreadsheet

nicolas.pouillard

unread,
Mar 22, 2009, 1:52:51 PM3/22/09
to Henning Thielemann, Haskell-Cafe
Excerpts from Henning Thielemann's message of Sat Mar 21 22:27:08 +0100 2009:

>
> On Fri, 20 Mar 2009, Nicolas Pouillard wrote:
>
> > Hi folks,
> >
> > We have good news (nevertheless we hope) for all the lazy guys standing there.
> > Since their birth, lazy IOs have been a great way to modularly leverage all the
> > good things we have with *pure*, *lazy*, *Haskell* functions to the real world
> > of files.
>
> Maybe you know of my packages lazy-io and explicit-exception which also
> aim at lazy I/O and asynchronous exception handling.

I was indeed aware of these two packages but I think they hold orthogonal
ideas.

About the lazy-io package, as explained in the documentation one has to
carefully choose which operations can be lifted. In safe-lazy-io I try
to choose a set of well behaving combinators to replace 'getContents' in the
IO monad.

Moreover if I take the three problems of standard lazy IO in turn:
1/ Control of resources: One advantage over standard lazy IO is that
the file opening can also be done lazily, avoiding an immediate
resource exhaustion. However one still relies on evaluation and garbage
collection to take care of closing handles, which is not satisfying since
handles are scarce resources.
2/ Control of exceptions: If one writes a 'getContents' function such that
it no longer hides I/O errors during reading, how do you guarantee
that exceptions will happen during the LazyIO.run and not after?
3/ Determinism: when freely combining multiple inputs one risks the problem
mentioned by Oleg [1], when using your package it will depend on
the 'getContents' function we use:
a) if we 'liftIO' the standard 'getContents' function, we can have the issue.
b) if we write a new 'getContents' as below [2], then (if I got right
your lazy IO monad) all reads are chained. And then one has
to process inputs in the same order.

However I've found the underlying idea of your monad brilliant. I've tried
a little to use something similar as a base for the implementation but didn't
succeed.

> With lazy-io, you are
> able to write more complicated things than getContents. I needed this for
> HTTP communication that is run by demand. That is when the HTTP response
> header is requested, then the function could send a HTTP request first. Is
> it possible and sensible to combine this with safe-lazy-io?

While currently focusing only on reading file handles, the long term purpose
for this technique is to have new primitives like reading on sockets, using
bytestrings...

> I have also code that demonstrates the usage of explicit asynchronous
> exceptions. I have however still not a set of combinators that makes
> working with asynchronous exceptions as simple as working with synchronous
> ones:
> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/spreadsheet

I also think that explicit asynchronous exceptions could be part of the equation,
however I currently don't know how to mix them well.

Best regards,

[1]: http://www.haskell.org/pipermail/haskell/2009-March/021064.html
[2]
hGetContents :: Handle -> LIO.T String
hGetContents h = lazyRead
where lazyRead = do
isEOF <- liftIO $ hIsEOF h
if isEOF
then do
unit <- liftIO $ hClose h
return $ unit `seq` []
else do
c <- liftIO $ hGetChar h
cs <- lazyRead
return $ c : cs


--
Nicolas Pouillard

Henning Thielemann

unread,
Mar 22, 2009, 5:53:27 PM3/22/09
to nicolas.pouillard, Haskell-Cafe

On Sun, 22 Mar 2009, nicolas.pouillard wrote:

> Excerpts from Henning Thielemann's message of Sat Mar 21 22:27:08 +0100 2009:
>>

>> Maybe you know of my packages lazy-io and explicit-exception which also
>> aim at lazy I/O and asynchronous exception handling.
>
> I was indeed aware of these two packages but I think they hold orthogonal
> ideas.
>
> About the lazy-io package, as explained in the documentation one has to
> carefully choose which operations can be lifted. In safe-lazy-io I try
> to choose a set of well behaving combinators to replace 'getContents' in the
> IO monad.
>
> Moreover if I take the three problems of standard lazy IO in turn:
> 1/ Control of resources: One advantage over standard lazy IO is that
> the file opening can also be done lazily, avoiding an immediate
> resource exhaustion. However one still relies on evaluation and garbage
> collection to take care of closing handles, which is not satisfying since
> handles are scarce resources.
> 2/ Control of exceptions: If one writes a 'getContents' function such that
> it no longer hides I/O errors during reading, how do you guarantee
> that exceptions will happen during the LazyIO.run and not after?

Currently I cannot guarantee anything. However my idea is to stay away
from built-in exceptions in IO. In explicit-exception there is an
experimental hidden module which provides an IO monad wrapper called SIO
which cannot throw any IO exception.
http://code.haskell.org/explicit-exception/src/System/IO/Straight.hs
Actually, I think it's the wrong way round to build an exception-free
monad on top of one with exceptions. Instead IO should be built on top of
SIO, but that's not possible for historical reasons.
The only safe operation to get into SIO is
ioToExceptionalSIO :: IO a -> ExceptionalT IOException SIO a
That is, it makes exceptions explicit and SIO operations can never throw
IO exceptions. You should convert synchronous explicit exceptions of
atomic operations like getChar into asynchronous explicit exceptions,
combine them lazily to big operations like getContents. Then you get
getContents :: SIO (Asynchronous.Exception IOException String)
If you run lazy SIO operations you can't become surprised by exceptions.

> 3/ Determinism: when freely combining multiple inputs one risks the problem
> mentioned by Oleg [1], when using your package it will depend on
> the 'getContents' function we use:
> a) if we 'liftIO' the standard 'getContents' function, we can have the issue.
> b) if we write a new 'getContents' as below [2], then (if I got right
> your lazy IO monad) all reads are chained. And then one has
> to process inputs in the same order.

I wouldn't build hClose into getContents, because you never know, whether
the file is read until it's end. If you call a LazyIO.getContents twice,
the contents are read sequential. In order to read file contents
simultaneously you must call (LazyIO.run LazyIO.getContents) twice in the
IO monad.

nicolas.pouillard

unread,
Mar 22, 2009, 6:30:21 PM3/22/09
to Henning Thielemann, Haskell-Cafe
Excerpts from Henning Thielemann's message of Sun Mar 22 22:52:48 +0100 2009:

It sounds like a nice idea, it would be great to have a straight-io package
to play a bit more with explicit exceptions in things like 'IO'.

For safe-lazy-io I wanted to keep the exception management as light as
possible. In particular when writing programs where most of the 'IO' errors
are considered fatals---EOF is not fatal of course but using getContents one
do not see it.

> > 3/ Determinism: when freely combining multiple inputs one risks the problem
> > mentioned by Oleg [1], when using your package it will depend on
> > the 'getContents' function we use:
> > a) if we 'liftIO' the standard 'getContents' function, we can have the issue.
> > b) if we write a new 'getContents' as below [2], then (if I got right
> > your lazy IO monad) all reads are chained. And then one has
> > to process inputs in the same order.
>
> I wouldn't build hClose into getContents, because you never know, whether
> the file is read until it's end. If you call a LazyIO.getContents twice,
> the contents are read sequential. In order to read file contents
> simultaneously you must call (LazyIO.run LazyIO.getContents) twice in the
> IO monad.

Right but one of the purposes of safe-lazy-io is to provides a good
management of file handles in particular closing them. Actually the
implementation of lazy inputs focus particularly on that---through the
'Finalized' values.

http://hackage.haskell.org/packages/archive/safe-lazy-io/0.1/doc/html/src/System-IO-Lazy-Input-Internals.html

--
Nicolas Pouillard

Henning Thielemann

unread,
Mar 22, 2009, 6:59:15 PM3/22/09
to nicolas.pouillard, Haskell-Cafe

On Sun, 22 Mar 2009, nicolas.pouillard wrote:

> It sounds like a nice idea, it would be great to have a straight-io package
> to play a bit more with explicit exceptions in things like 'IO'.

Maybe I should then restrict lifting to LazyIO to SIO actions. That would
not make LazyIO safe, but reduces surprises.

nicolas.pouillard

unread,
Mar 23, 2009, 5:56:36 AM3/23/09
to Henning Thielemann, Haskell-Cafe
Excerpts from Henning Thielemann's message of Sun Mar 22 23:58:44 +0100 2009:

>
> On Sun, 22 Mar 2009, nicolas.pouillard wrote:
>
> > It sounds like a nice idea, it would be great to have a straight-io package
> > to play a bit more with explicit exceptions in things like 'IO'.
>
> Maybe I should then restrict lifting to LazyIO to SIO actions. That would
> not make LazyIO safe, but reduces surprises.

By SIO you actually mean straight-io right? I was confused because I also
have an SIO monad in the strict-io package.

--
Nicolas Pouillard

Henning Thielemann

unread,
Mar 23, 2009, 6:06:53 AM3/23/09
to nicolas.pouillard, Haskell-Cafe

On Mon, 23 Mar 2009, nicolas.pouillard wrote:

> Excerpts from Henning Thielemann's message of Sun Mar 22 23:58:44 +0100 2009:
>>
>> On Sun, 22 Mar 2009, nicolas.pouillard wrote:
>>
>>> It sounds like a nice idea, it would be great to have a straight-io package
>>> to play a bit more with explicit exceptions in things like 'IO'.
>>
>> Maybe I should then restrict lifting to LazyIO to SIO actions. That would
>> not make LazyIO safe, but reduces surprises.
>
> By SIO you actually mean straight-io right?

Yes

> I was confused because I also have an SIO monad in the strict-io
> package.

Sorry

nicolas.pouillard

unread,
Mar 23, 2009, 6:35:33 AM3/23/09
to Henning Thielemann, Haskell-Cafe
Excerpts from Henning Thielemann's message of Mon Mar 23 11:06:20 +0100 2009:

>
> On Mon, 23 Mar 2009, nicolas.pouillard wrote:
>
> > Excerpts from Henning Thielemann's message of Sun Mar 22 23:58:44 +0100 2009:
> >>
> >> On Sun, 22 Mar 2009, nicolas.pouillard wrote:
> >>
> >>> It sounds like a nice idea, it would be great to have a straight-io package
> >>> to play a bit more with explicit exceptions in things like 'IO'.
> >>
> >> Maybe I should then restrict lifting to LazyIO to SIO actions. That would
> >> not make LazyIO safe, but reduces surprises.
> >
> > By SIO you actually mean straight-io right?
>
> Yes

Then what do you mean by "lifting to LazyIO to SIO actions"?

Do you mean

liftSIO :: SIO a -> LazyIO.T a

which says that we only lift computations that explicitly throws exceptions.

In that case it be actually safer, but all of this greatly depends on how
reasonable is the explicit exception handling.

In particular in the case 'IO', using explicit exception is maybe too heavy.

--
Nicolas Pouillard

Henning Thielemann

unread,
Mar 23, 2009, 6:52:52 AM3/23/09
to nicolas.pouillard, Haskell-Cafe

On Mon, 23 Mar 2009, nicolas.pouillard wrote:

> Excerpts from Henning Thielemann's message of Mon Mar 23 11:06:20 +0100 2009:
>>

>> Yes
>
> Then what do you mean by "lifting to LazyIO to SIO actions"?
>
> Do you mean
>
> liftSIO :: SIO a -> LazyIO.T a
>
> which says that we only lift computations that explicitly throws exceptions.

Yes.

> In that case it be actually safer, but all of this greatly depends on how
> reasonable is the explicit exception handling.

If it does not fit, you can change it. :-) That's the advantage over
built-in IO exceptions.

> In particular in the case 'IO', using explicit exception is maybe too heavy.

I think it's precisely the best thing to do, given all the problems with
asynchronous, imprecise and what-know-I exceptions.

Wei Hu

unread,
Mar 23, 2009, 12:37:41 PM3/23/09
to haskel...@haskell.org
Nicolas Pouillard <nicolas.pouillard <at> gmail.com> writes:

>
> Hi folks,
>
> We have good news (nevertheless we hope) for all the lazy guys standing there.
> Since their birth, lazy IOs have been a great way to modularly leverage all the
> good things we have with *pure*, *lazy*, *Haskell* functions to the real world
> of files.
>

> We are happy to present the safe-lazy-io package [1] that does exactly this
> and is going to be explained and motivated in the rest of this post.

Hi,

Please let me know if I understood your code correctly. So, the SIO
module is used only to ensure that the file processing is finished
before the finalizer closes the file, right?

In System.IO.Lazy.Input, run is defined as

> run :: NFData sa => LI sa -> IO sa
> run = run' . fmap return'

Can I change it to

> run = run' . fmap return
?

I think the semantics is the same because run' will strictly force the
processing anyway?

nicolas.pouillard

unread,
Mar 24, 2009, 8:45:06 AM3/24/09
to Wei Hu, haskell-cafe
Excerpts from Wei Hu's message of Mon Mar 23 17:37:15 +0100 2009:

> Nicolas Pouillard <nicolas.pouillard <at> gmail.com> writes:
>
> >
> > Hi folks,
> >
> > We have good news (nevertheless we hope) for all the lazy guys standing there.
> > Since their birth, lazy IOs have been a great way to modularly leverage all the
> > good things we have with *pure*, *lazy*, *Haskell* functions to the real world
> > of files.
> >
> > We are happy to present the safe-lazy-io package [1] that does exactly this
> > and is going to be explained and motivated in the rest of this post.
>
> Hi,

Hi,

> Please let me know if I understood your code correctly. So, the SIO
> module is used only to ensure that the file processing is finished
> before the finalizer closes the file, right?
>
> In System.IO.Lazy.Input, run is defined as
>
> > run :: NFData sa => LI sa -> IO sa
> > run = run' . fmap return'
>
> Can I change it to
>
> > run = run' . fmap return
> ?

Yes

> I think the semantics is the same because run' will strictly force the
> processing anyway?

Exactly I've pushed the change to the repository [1]

Thanks for spotting this!

[1]: http://patch-tag.com/publicrepos/safe-lazy-io

--
Nicolas Pouillard

Jason Dusek

unread,
May 17, 2009, 9:45:46 AM5/17/09
to Nicolas Pouillard, Haskell, Haskell-Cafe
From the documentation:

" LI could be a strict monad and a strict applicative functor.
However it is not a lazy monad nor a lazy applicative
functor as required Haskell. Hopefully it is a lazy
(pointed) functor at least.

I'd like to understand this better -- how is LI incompatible
with being a lazy monad, exactly?

--
Jason Dusek

Nicolas Pouillard

unread,
May 18, 2009, 1:32:16 PM5/18/09
to Jason Dusek, Haskell, Haskell-Cafe
Excerpts from Jason Dusek's message of Sun May 17 15:45:25 +0200 2009:

> From the documentation:
>
> " LI could be a strict monad and a strict applicative functor.
> However it is not a lazy monad nor a lazy applicative
> functor as required Haskell. Hopefully it is a lazy
> (pointed) functor at least.

The type I would need for bind is this one:

(>>=) :: NFData sa => LI sa -> (sa -> LI b) -> LI b

And because of the NFData constraint this type bind is less general than the
required one.

BTW this operator is exported as (!>>=) by System.IO.Lazy.Input.Extra.

By using the rmonad we could add this NFData constraint, but that's not like
having a Monad instance directly.

Best regards,

--
Nicolas Pouillard

Taral

unread,
May 18, 2009, 6:06:20 PM5/18/09
to Nicolas Pouillard, Haskell-Cafe
On Mon, May 18, 2009 at 10:30 AM, Nicolas Pouillard
<nicolas....@gmail.com> wrote:
> The type I would need for bind is this one:
>
> �(>>=) :: NFData sa => LI sa -> (sa -> LI b) -> LI b

Will this do?

(>>=) :: (NFData sa, NFData b) => LI sa -> (sa -> LI b) -> LI b

--
Taral <tar...@gmail.com>
"Please let me know if there's any further trouble I can give you."
-- Unknown

Ryan Ingram

unread,
May 19, 2009, 1:07:21 AM5/19/09
to Taral, Haskell-Cafe
On Mon, May 18, 2009 at 3:05 PM, Taral <tar...@gmail.com> wrote:
> Will this do?
>
> (>>=) :: (NFData sa, NFData b) => LI sa -> (sa -> LI b) -> LI b

No, the problem is that >>= on monads has no constraints, it must have the type
> LI a -> (a -> LI b) -> LI b

This is a common problem with trying to use do-notation; there are
some cases where you can't make the object an instance of Monad. The
same problem holds for Data.Set; you'd can write

setBind :: Ord b => Set a -> (a -> Set b) -> Set b
setBind m f = unions (map f $ toList m)

but there is no way to use setBind for a definition of >>=

-- ryan

Miguel Mitrofanov

unread,
May 19, 2009, 2:19:14 AM5/19/09
to Ryan Ingram, Taral, Haskell-Cafe

On 19 May 2009, at 09:06, Ryan Ingram wrote:

> This is a common problem with trying to use do-notation; there are
> some cases where you can't make the object an instance of Monad. The
> same problem holds for Data.Set; you'd can write
>
> setBind :: Ord b => Set a -> (a -> Set b) -> Set b
> setBind m f = unions (map f $ toList m)
>
> but there is no way to use setBind for a definition of >>=

You can use a continuation trick.

Jason Dusek

unread,
May 19, 2009, 2:29:15 AM5/19/09
to Miguel Mitrofanov, Taral, Haskell-Cafe
2009/05/18 Miguel Mitrofanov <migue...@yandex.ru>:

> On 19 May 2009, at 09:06, Ryan Ingram wrote:
>
>> This is a common problem with trying to use do-notation; there are
>> some cases where you can't make the object an instance of Monad.  The
>> same problem holds for Data.Set; you'd can write
>>
>> setBind :: Ord b => Set a -> (a -> Set b) -> Set b
>> setBind m f = unions (map f $ toList m)
>>
>> but there is no way to use setBind for a definition of >>=
>
> You can use a continuation trick.

Trick?

--
Jason Dusek

Henning Thielemann

unread,
May 19, 2009, 3:46:44 AM5/19/09
to Nicolas Pouillard, Haskell-Cafe

On Mon, 18 May 2009, Nicolas Pouillard wrote:

> Excerpts from Jason Dusek's message of Sun May 17 15:45:25 +0200 2009:
>> From the documentation:
>>
>> " LI could be a strict monad and a strict applicative functor.
>> However it is not a lazy monad nor a lazy applicative
>> functor as required Haskell. Hopefully it is a lazy
>> (pointed) functor at least.
>
> The type I would need for bind is this one:
>
> (>>=) :: NFData sa => LI sa -> (sa -> LI b) -> LI b
>
> And because of the NFData constraint this type bind is less general than the
> required one.

Looks very similar to the operator I need for binding with
respect to asynchronous exceptions:

bind :: (Monoid a, Monad m) =>
ExceptionalT e m a -> (a -> ExceptionalT e m b) -> ExceptionalT e m b

Nicolas Pouillard

unread,
May 19, 2009, 4:00:59 AM5/19/09
to Taral, Haskell-Cafe
Excerpts from Taral's message of Tue May 19 00:05:39 +0200 2009:

> On Mon, May 18, 2009 at 10:30 AM, Nicolas Pouillard
> <nicolas....@gmail.com> wrote:
> > The type I would need for bind is this one:
> >
> >  (>>=) :: NFData sa => LI sa -> (sa -> LI b) -> LI b
>
> Will this do?
>
> (>>=) :: (NFData sa, NFData b) => LI sa -> (sa -> LI b) -> LI b

No this one would be too strict. In particular functions are not member of
NFData (and for good reasons) and we may want to have LI values holding non
"forcable" values.

However I got your idea and it can be useful.

Thanks,

--
Nicolas Pouillard

Ryan Ingram

unread,
May 19, 2009, 4:23:20 AM5/19/09
to Henning Thielemann, Haskell-Cafe
To be fair, you can do this with some extensions; I first saw this in
a paper on Oleg's site [1]. Here's some sample code:

{-# LANGUAGE NoImplicitPrelude, TypeFamilies, MultiParamTypeClasses #-}
module SetMonad where
import qualified Data.Set as S
import qualified Prelude as P (Monad, (>>=), (>>), return, fail)
import Prelude hiding (Monad, (>>=), (>>), return, fail)

class ConstrainedPoint pa where
type PointElem pa
return :: PointElem pa -> pa

class ConstrainedBind ma mb where
type BindElem ma
(>>=) :: ma -> (BindElem ma -> mb) -> mb
(>>) :: ma -> mb -> mb
m >> n = m >>= const n

class ConstrainedFail pa where
fail :: String -> pa

instance ConstrainedPoint (S.Set a) where
type PointElem (S.Set a) = a
return = S.singleton

instance Ord b => ConstrainedBind (S.Set a) (S.Set b) where
type BindElem (S.Set a) = a
m >>= f = S.unions $ map f $ S.toList m

test :: S.Set Int
test = do
x <- S.fromList [1,2,3]
y <- S.fromList [1,2,3]
return (x+y)

-- ghci> test
-- fromList [2,3,4,5,6]

-- ryan

[1] http://www.okmij.org/ftp/Haskell/types.html#restricted-datatypes

On Tue, May 19, 2009 at 12:46 AM, Henning Thielemann
<lem...@henning-thielemann.de> wrote:
>
> On Mon, 18 May 2009, Nicolas Pouillard wrote:
>
>> Excerpts from Jason Dusek's message of Sun May 17 15:45:25 +0200 2009:
>>>

>>> �ソスFrom the documentation:
>>>
>>> �ソス" �ソスLI could be a strict monad and a strict applicative functor.
>>> �ソス �ソスHowever it is not a lazy monad nor a lazy applicative
>>> �ソス �ソスfunctor as required Haskell. Hopefully it is a lazy
>>> �ソス �ソス(pointed) functor at least.


>>
>> The type I would need for bind is this one:
>>

>> �ソス(>>=) :: NFData sa => LI sa -> (sa -> LI b) -> LI b


>>
>> And because of the NFData constraint this type bind is less general than
>> the
>> required one.
>
> Looks very similar to the operator I need for binding with respect to
> asynchronous exceptions:
>
> bind :: (Monoid a, Monad m) =>

> �ソス ExceptionalT e m a -> (a -> ExceptionalT e m b) -> ExceptionalT e m b

Miguel Mitrofanov

unread,
May 19, 2009, 4:30:12 AM5/19/09
to Jason Dusek, Taral, Haskell-Cafe
I've posted it once or twice.

newtype C m r a = C ((a -> m r) -> m r)

It's a monad, regardless of whether m is one or not. If you have something like "return" and "bind", but not exactly the same, you can make
"casting" functions

m a -> C m r a

and backwards.

Sittampalam, Ganesh

unread,
May 19, 2009, 4:51:29 AM5/19/09
to Nicolas Pouillard, Ryan Ingram, Henning Thielemann, Haskell-Cafe
Nicolas Pouillard wrote:
> Excerpts from Ryan Ingram's message of Tue May 19 10:23:01 +0200 2009:

>> To be fair, you can do this with some extensions; I first saw this in
>> a paper on Oleg's site [1]. Here's some sample code:
>
> This seems like the same trick as the rmonad package:
> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/rmonad

It's similar, but rmonad uses an associated datatype to wrap up the
constraint, and doesn't split the Monad class up into separate pieces
(which generally makes type inference harder).

rmonad also supplies an embedding to turn any restricted monad into a
normal monad at the cost of using embed/unEmbed to get into and out of
the embedding.

Ganesh

===============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================

Ryan Ingram

unread,
May 19, 2009, 5:31:30 AM5/19/09
to migue...@yandex.ru, Haskell-Cafe, Taral
On Tue, May 19, 2009 at 12:54 AM, Miguel Mitrofanov
<migue...@yandex.ru> wrote:
> I've posted it once or twice.
>
> newtype C m r a = C ((a -> m r) -> m r)
>
> It's a monad, regardless of whether m is one or not. If you have something
> like "return" and "bind", but not exactly the same, you can make "casting"
> functions
>
> m a -> C m r a
>
> and backwards.

This isn't great, though. Consider this (slightly generalized) version:

> newtype CpsM c t a = CpsM { unCpsM :: forall b. c b -> (a -> t b) -> t b }

We can easily make this a monad for any c & t:

> instance Monad (CpsM c t) where
> return x = CpsM $ \_ k -> k x
> m >>= f = CpsM $ \c k -> unCpsM m c $ \x -> unCpsM (f x) c k

Here's a useful one:

> -- reify Ord constraint in a data structure
> data OrdConstraint a where
> HasOrd :: Ord a => OrdConstraint a
> type M = CpsM OrdConstraint S.Set

along with your "casting" functions:

> liftS :: S.Set a -> M a
> liftS s = CpsM $ \c@HasOrd k -> S.unions $ map k $ S.toList s

> runS :: Ord a => M a -> S.Set a
> runS m = unCpsM m HasOrd S.singleton

Now consider this code:

> inner = do
> x <- liftS (S.fromList [1..3])
> y <- liftS (S.fromList [1..3])
> return (x+y)

> outer = do
> x <- inner
> y <- inner
> return (x+y)

If you evaluate (runS outer), eventually you get to a state like this:

= let f x = inner >>= \y -> return (x+y)
g x2 = liftS (S.fromList [1..3]) >>= \y2 -> return (x2+y2)
h = HasOrd
k = \a2 -> unCpsM (g a2) h $ \a -> unCpsM (f a) h S.singleton
in S.unions $ map k [1,2,3]

which, after all the evaluation, leads to this:

= S.unions
[S.fromList [4,5,6,7,8,9,10],
S.fromList [5,6,7,8,9,10,11],
S.fromList [6,7,8,9,10,11,12]]

We didn't really do any better than if we just stuck everything in a
list and converted to a set at the end!

Compare to the result of the same code using the restricted monad
solution (in this case runS = id, liftS = id):

inner >>= \x -> inner >>= \y -> return (x+y)
= (Set [1,2,3] >>= \x -> Set [1,2,3] >>= \y -> return (x+y))
>>= \x -> inner >>= \y -> return (x+y)
= (S.unions (map (\x -> Set [1,2,3] >>= \y -> return (x+y)) [1,2,3]))
>>= \x -> inner >>= \y -> return (x+y)
= S.unions [Set [2,3,4], Set [3,4,5], Set [4,5,6]]
>>= \x -> inner >>= \y -> return (x+y)
= Set [2,3,4,5,6]
>>= \x -> inner >>= \y -> return (x+y)

Notice how we've already snipped off a bunch of the computation that
the continuation-based version ran; the left-associated >>= let us
pre-collapse parts of the set down, which we will never do until the
end of the CPS version. (This is obvious if you notice that in the
CPS version, the only HasOrd getting passed around is for the final
result type; we never call S.unions at any intermediate type!)

Of course, you can manually cache the result yourself by wrapping "inner":

> cacheS = liftS . runS
> inner_cached = cacheS inner

A version of "outer" using this version has the same behavior as the
non-CPS version. But it sucks to have to insert the equivalent of
"optimize this please" everywhere in your code :)

-- ryan

0 new messages