Question about using the system encoding in Pipes.Prelude.Text

22 visualizações

Pular para a primeira mensagem não lida

Daniel Díaz

não lida,

5 de jun. de 2016, 04:33:0905/06/2016

para Haskell Pipes

Hi,

In the documentation for Pipes.Prelude.Text, we find the following:

The line-based operations, like those in Data.Text.IO, use the system encoding (and T.hGetLine, T.hPutLine etc.) and thus are slower than the 'official' route, which would use the very fast bytestring IO operations from Pipes.ByteString and the encoding and decoding functions in Pipes.Text.Encoding, which are also quite fast thanks to the streaming-commons package.

I'm curious: why is using the system encoding slower?

Michael Thompson

não lida,

7 de jun. de 2016, 11:14:3907/06/2016

para Haskell Pipes

I never looked into why, but you can observe that e.g. `fmap (decodeUtf8) . B.readFile` is several times as fast as `T.readFile` . It's the same with the other material in `Data.Text.(Lazy.)IO`. I think this is why he doesn't include the IO functions in `Data.Text`: the official IO is via ByteString using the encoding and decoding functions, same as with pipes-text

import qualified Data.Text as T

import qualified Data.Text.IO as T

import qualified Data.Text.Encoding as T

import qualified Data.ByteString.Char8 as B

import System.Environment

main = do

x <- getArgs

case x of

[] -> do

txt <- T.readFile "txt/words3d.txt"

print $ T.length txt

_ -> do

bs <- B.readFile "txt/words3d.txt"

print $ T.length (T.decodeUtf8 bs)

Responder a todos

Responder ao autor

Encaminhar

0 nova mensagem