[Haskell-cafe] Reading and writing to handles made with System.Process far too slow

83 views
Skip to first unread message

Mateusz Kowalczyk

unread,
May 12, 2014, 10:56:46 AM5/12/14
to haskell-cafe
Hi,

I'm have some business in piping some data and reading some data back
out of a socket so I thought that I'd just use the ‘socat’ tool. I went
off to System.Process just to find out that reading and writing are
taking far too long.

I put together a small example which only requires that you have ‘cat’
on your system:


{-# LANGUAGE UnicodeSyntax #-}
module Uzbl.WithSource where

import GHC.IO.Handle ( hPutStr, hGetContents, hSetBuffering
, BufferMode(..))
import System.Process ( createProcess, proc
, StdStream(CreatePipe), std_out, std_in)

gs ∷ IO String
gs = do
let sp = (proc "cat" [])
{ std_out = CreatePipe, std_in = CreatePipe }
(Just hin, Just hout, _, _) ← createProcess sp
-- hSetBuffering hin NoBuffering
-- hSetBuffering hout NoBuffering
hPutStr hin "Test data"
hGetContents hout


All this should effectively do is to give you back "Test data". While it
*does* do that, it takes far too long. When I run ‘gs’, it will start to
(lazily) print the result, printing nothing but opening ‘"’ and then
after about 2-3 seconds printing the rest and finishing.

If we set buffering on the in-handle (hin) to NoBuffering, we get a
slightly different behaviour: pretty much straight away we'll have
‘"Test data’ but then it will wait for the same amount of time to
conclude that it's the end of the response. Changing buffering mode on
‘hout’ seems to make no difference. Setting precise number in a
BlockBuffering seems to be no improvement and in the actual application
I will not know how long the data I'm piping in and out will be.

GHC 7.8.2, process-1.2.2.0; I'm running ‘gs’ in GHCi. It seems that if I
change the module name to Main, make ‘main = gs >>= putStrLn’, compile
the file and run it, it just hangs there! If I add a newline at the end,
it will print but the program will not finish. This makes me think that
perhaps I should be closing handles somewhere (but if I try inside the
function, I get no output, thanks lazy I/O).

What I would expect this program to do is to produce same result as
‘print "Test data" | cat’.

--
Mateusz K.
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Mateusz Kowalczyk

unread,
May 12, 2014, 11:00:43 AM5/12/14
to haskel...@haskell.org

As it often happens, I solved it straight away after posting to the
list. Here's the program that behaves how I wanted it to from the start:

gs ∷ IO String
gs = do
let sp = (proc "cat" [])
{ std_out = CreatePipe, std_in = CreatePipe }
(Just hin, Just hout, _, _) ← createProcess sp

hPutStr hin "Test data"
hClose hin
c ← hGetContents hout
length c `seq` hClose hout
return c

Krzysztof Skrzętnicki

unread,
May 13, 2014, 8:00:21 AM5/13/14
to Mateusz Kowalczyk, Haskell Cafe
Bear in mind that this program will also hang if you write enough data to it. There is an implicit buffer when piping data between processes. When it gets filled the process trying to write to it will simply wait forever. The "cat" you are spawning will wait too because you don't actually read any data. The solution is to perform writing and reading in concurrent fashion. Just try this program:

module Main where

import System.Process
import System.IO
import Control.Concurrent

r n = replicate n '.'

gs :: Int -> IO String
gs n = do
  print n

  let sp = (proc "cat" [])
              { std_out = CreatePipe, std_in = CreatePipe }
  (Just hin, Just hout, _, _) <- createProcess sp
  let cb = do
             hPutStr hin (r n)
             hClose hin
  -- forkIO cb
  cb
  c <- hGetContents hout

  length c `seq` hClose hout
  return c

main = do
  print "welcome"
  mapM_ gs [ 2 ^ x | x <- [0..20]]
  print "goodbye"

Without forkIO it hangs on my system with n = 2^18. If you replace "cb" with "forkIO cb" it will finish without hanging.

Best regards,
Krzysztof Skrzętnicki

Daniel Díaz

unread,
May 13, 2014, 2:09:56 PM5/13/14
to haskel...@googlegroups.com, haskell-cafe
I hope you don't mind a bit of self-promotion. My process-streaming library provides helper functions for process based on the pipes streaming library. The cat example would be written like this (taken verbatim from the tutorial):

 example6 = exitCode show $  
     execute3 (shell "cat") show  
         (surely . useProducer $ yield "aaaaaa\naaaaa")
         (separate 
             (encoding T.decodeIso8859_1 ignoreLeftovers $ surely $ T.toLazyM)  
             nop
         )

Returns:

>>> Right ((),("aaaaaa\naaaaa",()))
Writing stdin is done concurrently with the reading of stdout. When an exception is encountered, the library ensures that the handles are closed, the extant concurrent threads terminated, and the external process killed. Also stderr is drained even if you ignore it, to avoid deadlocks due to full buffers.

Reply all
Reply to author
Forward
0 new messages