问个haskell效率问题

22 views
Skip to first unread message

Yuankuns Shi

unread,
Jan 25, 2014, 3:18:36 AM1/25/14
to sh...@googlegroups.com

用haskell处理一个文本文件,很简单的字符串split join
import System.IO
import Control.Monad
import Data.List
import Data.List.Split
main=do
    handle<-openFile "20131008.test" ReadMode
    contents<-hGetContents handle
    let result= process contents
    write result
    hClose handle
write :: [Char] -> IO ()
write output=do
    h<-openFile "20131008.csv" WriteMode
    hPutStr h output
    hClose h
process istr=intercalate "\n" (map rep_spt (lines istr))
rep_spt istr=intercalate ";" (splitOn "," istr)

对应的python版本是
import string
f_r=file('20131008.test','r')
a=f_r.readlines()
result=[]
for i in a:
    result.append(string.join(i.split(','),';'))
f_w=file('20131008.csv','w')
f_w.writelines(result)

处理一个1.7g的文件,haskell编译好的用了4分钟,而python则是50秒。请问haskell代码是哪里写错了吗?谢谢。

yi lu

unread,
Jan 26, 2014, 8:34:57 AM1/26/14
to sh...@googlegroups.com
感觉是lazy问题,可以看到   http://hackage.haskell.org/package/bytestring  这里有 Data.ByteString.Char8
Data.ByteString.Lazy.Char8 两种,具体原理我不清楚,不过第一种似乎是全读入再处理,后面的是读一点处理一点(lazy)。

希望能有点启发。


--
-- You received this message because you are subscribed to the Google Groups Shanghai Linux User Group group. To post to this group, send email to sh...@googlegroups.com. To unsubscribe from this group, send email to shlug+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/shlug?hl=zh-CN
---
您收到此邮件是因为您订阅了 Google 网上论坛的“Shanghai Linux User Group”论坛。
要退订此论坛并停止接收此论坛的电子邮件,请发送电子邮件到 shlug+un...@googlegroups.com
要查看更多选项,请访问 https://groups.google.com/groups/opt_out。

Reply all
Reply to author
Forward
0 new messages