Question about using the system encoding in Pipes.Prelude.Text

22 visualizações
Pular para a primeira mensagem não lida

Daniel Díaz

não lida,
5 de jun. de 2016, 04:33:0905/06/2016
para Haskell Pipes
Hi,

In the documentation for Pipes.Prelude.Text, we find the following:
  • The line-based operations, like those in Data.Text.IO, use the system encoding (and T.hGetLineT.hPutLine etc.) and thus are slower than the 'official' route, which would use the very fast bytestring IO operations from Pipes.ByteString and the encoding and decoding functions in Pipes.Text.Encoding, which are also quite fast thanks to the streaming-commons package.
  • I'm curious: why is using the system encoding slower?

Michael Thompson

não lida,
7 de jun. de 2016, 11:14:3907/06/2016
para Haskell Pipes
I never looked into why, but you can observe that e.g. `fmap (decodeUtf8) . B.readFile` is several times as fast as `T.readFile` .  It's the same with the other material in `Data.Text.(Lazy.)IO`.  I think this is why he doesn't include the IO functions in `Data.Text`: the official IO is via ByteString using the encoding and decoding functions, same as with pipes-text


    import qualified Data.Text as T
    import qualified Data.Text.IO as T
    import qualified Data.Text.Encoding as T
    import qualified Data.ByteString.Char8 as B
    import System.Environment

    main = do
      x <- getArgs 
      case x of 
        [] -> do 
          txt <- T.readFile "txt/words3d.txt"
          print $ T.length txt
        _  ->  do
          bs <- B.readFile "txt/words3d.txt"
          print $ T.length (T.decodeUtf8 bs)
Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem