问个haskell效率问题

22 views

Skip to first unread message

Yuankuns Shi

unread,

Jan 25, 2014, 3:18:36 AM1/25/14

to sh...@googlegroups.com

用haskell处理一个文本文件，很简单的字符串split join
import System.IO
import Control.Monad
import Data.List
import Data.List.Split
main=do
    handle<-openFile "20131008.test" ReadMode
    contents<-hGetContents handle
    let result= process contents
    write result
    hClose handle
write :: [Char] -> IO ()
write output=do
    h<-openFile "20131008.csv" WriteMode
    hPutStr h output
    hClose h
process istr=intercalate "\n" (map rep_spt (lines istr))
rep_spt istr=intercalate ";" (splitOn "," istr)

对应的python版本是
import string
f_r=file('20131008.test','r')
a=f_r.readlines()
result=[]
for i in a:
    result.append(string.join(i.split(','),';'))
f_w=file('20131008.csv','w')
f_w.writelines(result)

处理一个1.7g的文件，haskell编译好的用了4分钟，而python则是50秒。请问haskell代码是哪里写错了吗？谢谢。

yi lu

unread,

Jan 26, 2014, 8:34:57 AM1/26/14

to sh...@googlegroups.com

感觉是lazy问题，可以看到 http://hackage.haskell.org/package/bytestring 这里有 Data.ByteString.Char8 和
Data.ByteString.Lazy.Char8 两种，具体原理我不清楚，不过第一种似乎是全读入再处理，后面的是读一点处理一点（lazy）。

希望能有点启发。

--
-- You received this message because you are subscribed to the Google Groups Shanghai Linux User Group group. To post to this group, send email to sh...@googlegroups.com. To unsubscribe from this group, send email to shlug+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/shlug?hl=zh-CN
---
您收到此邮件是因为您订阅了 Google 网上论坛的“Shanghai Linux User Group”论坛。
要退订此论坛并停止接收此论坛的电子邮件，请发送电子邮件到 shlug+un...@googlegroups.com。
要查看更多选项，请访问 https://groups.google.com/groups/opt_out。

Reply all

Reply to author

Forward

0 new messages