一段python程序的效率问题

1 view
Skip to first unread message

俊杰蔡

unread,
Apr 3, 2008, 5:50:31 AM4/3/08
to pyth...@googlegroups.com
hi,all:

        同事闲聊,比较下perl和python的效率。 以下是计算5000000个7位随机数模3 ,并打入文件,一行一个。结果测试发现,python比perl慢了12倍,各位有好办法优化优化么?

perl:

#!/usr/bin/perl -w
use strict;


open (WW,"> 500000") or die "$!";
foreach(1..5000000){
my $i = int(rand 10000000) % 3;
print WW $i."\n";
}
close WW;

python:

import random
import time

__revision__ = '0.1'

def test():
    fh = open("test_cjj","w")
   
    for i in range(5000000):
        data = random.randrange(1000000,9999999,1)
        yu = data % 3
        fh.write(str(yu)+"\n")
                                                  
    fh.close()

if __name__ == "__main__" :

    test()


运行结果:
time ./500.pl
real 0m2.119s
user 0m2.111s
sys 0m0.008s

time ./a.py
real    6m35.764s
user    4m42.762s
sys     1m47.011s




xxmplus

unread,
Apr 3, 2008, 5:57:53 AM4/3/08
to pyth...@googlegroups.com
虽然我不懂perl,不过看上去为什么perl只打开关闭了文件一次,而python要把打开关文件放在函数里,岂不是要操作文件5000000次?

2008/4/3 俊杰蔡 <yzcai...@gmail.com>:

--
Any complex technology which doesn't come with documentation must be the best
available.

Albert Lee

unread,
Apr 3, 2008, 5:59:07 AM4/3/08
to pyth...@googlegroups.com
xrange

Albert Lee

unread,
Apr 3, 2008, 6:06:29 AM4/3/08
to pyth...@googlegroups.com
python也是打开关闭一次。 用 xrange 可以提高一点点。剩下的实在没办法了,io函数的性能问题吧

xxmplus

unread,
Apr 3, 2008, 6:08:00 AM4/3/08
to pyth...@googlegroups.com
为什么也是一次呀

2008/4/3 Albert Lee <hanzh...@gmail.com>:

--

HYRY

unread,
Apr 3, 2008, 6:11:04 AM4/3/08
to python-cn`CPyUG`华蟒用户组
randrange比较慢,改为int(random()*(9999999-1000000)+1000000) 试试看,另外减少写文件的次数也可
以提高速度。

Bruce Wang

unread,
Apr 3, 2008, 6:48:54 AM4/3/08
to pyth...@googlegroups.com


2008/4/3 Albert Lee <hanzh...@gmail.com>:

python也是打开关闭一次。 用 xrange 可以提高一点点。剩下的实在没办法了,io函数的性能问题吧




用StringIO缓存一下结果,可以缩短到1分钟左右


import cStringIO as StringIO

import random
import time

__revision__ = '0.1'

def test():
    fh = open("test_cjj","w")
    output = StringIO.StringIO()
  
    for i in xrange(5000000):

        data = random.randrange(1000000,9999999,1)
        yu = data % 3
        print >>output, yu                                          
   
    fh.write(output.getvalue())

    fh.close()

if __name__ == "__main__" :
    test()


 
--
simple is good
http://brucewang.net
http://io.brucewang.net
http://twitter.com/number5
skype: number5

ygao

unread,
Apr 3, 2008, 7:15:32 AM4/3/08
to pyth...@googlegroups.com


2008/4/3 俊杰蔡 <yzcai...@gmail.com>:
hi,all:

        同事闲聊,比较下perl和python的效率。 以下是计算5000000个7位随机数模3 ,并打入文件,一行一个。结果测试发现,python比perl慢了12倍,各位有好办法优化优化么?

perl:

#!/usr/bin/perl -w
use strict;


open (WW,"> 500000") or die "$!";
foreach(1..5000000){
my $i = int(rand 10000000) % 3;
print WW $i."\n";
}
close WW;

python:

import random
import time

__revision__ = '0.1'

def test():
    fh = open("test_cjj","w",500000)
再测试!

   
    for i in range(5000000):
        data = random.randrange(1000000,9999999,1)
        yu = data % 3
        fh.write(str(yu)+"\n")
                                                  
    fh.close()

if __name__ == "__main__" :

    test()


运行结果:
time ./500.pl
real 0m2.119s
user 0m2.111s
sys 0m0.008s

time ./a.py
real    6m35.764s
user    4m42.762s
sys     1m47.011s









--
※※※※※※※※※※※※※※※※※
My blog: http://blog.donews.com/ygao

俊杰蔡

unread,
Apr 3, 2008, 10:33:34 AM4/3/08
to pyth...@googlegroups.com
采用了几种方法比较:

(1)

def test():
    fh = open("test_cjj","w",500000)
    output = StringIO.StringIO()
    for i in range(5000000):
        data = int(random()*(9999999-1000000)+1000000)

        yu = data % 3
        print >>output, yu 
    fh.write(str(yu)+"\n")
    fh.close()

time ./a.py

real    0m26.322s
user    0m25.654s
sys     0m0.232s


(2)

def test():
    fh = open("test_cjj","w",500000)
    for i in range(5000000):
        data = random.randrange(1000000,9999999,1)
        yu = data % 3
        fh.write(str(yu)+"\n")
    fh.close()

time ./a.py

real    0m26.242s
user    0m25.110s
sys     0m0.184s

(3)

def test():
    fh = open("test_cjj","w")
    for i in range(5000000):
        data = int(random()*(9999999-1000000)+1000000)

        yu = data % 3
        fh.write(str(yu)+"\n")
    fh.close()

time ./a.py

real    0m12.044s
user    0m11.497s
sys     0m0.152s

(4)

def test():
    fh = open("test_cjj","w",500000)
    for i in xrange(5000000):
        data = int(random()*(9999999-1000000)+1000000)

        yu = data % 3
        fh.write(str(yu)+"\n")
    fh.close()

time ./a.py

real    0m12.057s
user    0m11.561s
sys     0m0.088s

(5)

def test():
    fh = open("test_cjj","w",500000)
    for i in range(5000000):
        data = int(random()*(9999999-1000000)+1000000)

        yu = data % 3
        fh.write(str(yu)+"\n")
    fh.close()

time ./a.py

real    0m11.520s
user    0m11.173s
sys     0m0.164s

从以上对比得出:

(a)random.random()函数比random.randrange()函数快。
(b)xrange不一定比range快。
(c)使用StringIO缓存全部内容,一下子写也不一定快。
(d)write() 块写效率会提高。


在08-4-3,ygao <ygao...@gmail.com> 写道:

Lich_Ray

unread,
Apr 3, 2008, 10:42:21 AM4/3/08
to pyth...@googlegroups.com
这样看来,使用那些比较古老的方式和思路对 Python 来说反而能提供效率咯?

2008/4/3 俊杰蔡 <yzcai...@gmail.com>:



--
Ray Stinger, nickname Lich_Ray
God is in his heaven, all's right with the world.
-------------------------------------------------
let focus = 'computing' in where:
http://lichray.javaeye.com
let focus = 'computing' in here:
http://inblogs.net/let-in

Zoom.Quiet

unread,
Apr 3, 2008, 10:45:11 AM4/3/08
to pyth...@googlegroups.com
2008/4/3 俊杰蔡 <yzcai...@gmail.com>:
是也乎,是也乎,效率在计算和写两方面:
http://wiki.woodpecker.org.cn/moin/MicroProj/2008-04-03

--
'''过程改进乃是开始催生可促生靠谱的人的组织!
PI keeps evolving organizations which promoting people be good!
'''http://zoomquiet.org
Pls. usage OOo to replace M$ Office. http://zh.openoffice.org
Pls. usage 7-zip to replace WinRAR/WinZip. http://7-zip.org
You can get the truely Freedom 4 software.

俊杰蔡

unread,
Apr 3, 2008, 10:52:45 AM4/3/08
to pyth...@googlegroups.com
在这个例子中,python输给perl的根源应该是python的对象性。

在08-4-3,Zoom. Quiet <zoom....@gmail.com> 写道:

qgg

unread,
Apr 3, 2008, 10:53:29 AM4/3/08
to python-cn`CPyUG`华蟒用户组
建议安个psyco,然后:
import psyco
psyco.full()

On Apr 3, 10:45 pm, Zoom.Quiet <zoom.qu...@gmail.com> wrote:
> 2008/4/3 俊杰蔡 <yzcaijun...@gmail.com>:
> > 在08-4-3,ygao <ygao2...@gmail.com> 写道:
>
> > > 2008/4/3 俊杰蔡 <yzcaijun...@gmail.com>:
> Pls. usage OOo to replace M$ Office.http://zh.openoffice.org
> Pls. usage 7-zip to replace WinRAR/WinZip.http://7-zip.org
> You can get the truely Freedom 4 software.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

俊杰蔡

unread,
Apr 3, 2008, 11:10:35 AM4/3/08
to pyth...@googlegroups.com
def test():
    fh = open("test_cjj","w",500000)
    for i in range(5000000):
        data = intzrandom()*(9999999-1000000)+1000000)

        yu = data % 3
        fh.write(str(yu)+"\n")
    fh.close()

if __name__ == "__main__" :
 
    try:
        import psyco
        psyco.full()
    except ImportError:
        pass


    test()

time ./a.py

real    0m6.066s
user    0m5.892s
sys     0m0.044s


很好,很强大


在08-4-3,qgg <qggj...@msn.com> 写道:

Zoom.Quiet

unread,
Apr 3, 2008, 11:33:10 AM4/3/08
to pyth...@googlegroups.com, SPyUG~上海及长江三角区Py用户组, ZPyUG~珠江三角区Py用户组, pyth...@googlegroups.com, CPUG-华东南用户组
2008/4/3 俊杰蔡 <yzcai...@gmail.com>:

> def test():
> fh = open("test_cjj","w",500000)
> for i in range(5000000):
> data = intzrandom()*(9999999-1000000)+1000000)
>
> yu = data % 3
> fh.write(str(yu)+"\n")
> fh.close()
>
> if __name__ == "__main__" :
>
> try:
> import psyco
> psyco.full()
> except ImportError:
> pass
>
>
> test()
>
> time ./a.py
>
> real 0m6.066s
> user 0m5.892s
> sys 0m0.044s
>
>
> 很好,很强大
>

详细记要在:
http://wiki.woodpecker.org.cn/moin/MicroProj/2008-04-03

的确 很蟒,很暴力!

--

'''过程改进乃是开始催生可促生靠谱的人的组织!
PI keeps evolving organizations which promoting people be good!
'''http://zoomquiet.org

Pls. usage OOo to replace M$ Office. http://zh.openoffice.org
Pls. usage 7-zip to replace WinRAR/WinZip. http://7-zip.org

高榕

unread,
Apr 3, 2008, 12:46:28 PM4/3/08
to pyth...@googlegroups.com

python的效率是不行啊,php,perl都比它高,更别说c了,这是python的软肋啊。


qgg

unread,
Apr 3, 2008, 2:29:40 PM4/3/08
to python-cn`CPyUG`华蟒用户组
我觉得具体这件事上说PYTHON效率不高不公平,因为文件写入的方式和随机函数都很难保证和PERL是同质的。

On Apr 4, 12:46 am, "高榕" <mageguo...@gmail.com> wrote:
> python的效率是不行啊,php,perl都比它高,更别说c了,这是python的软肋啊。
>
>
>

boost...@googlemail.com

unread,
Apr 3, 2008, 2:40:56 PM4/3/08
to python-cn`CPyUG`华蟒用户组
> python的效率是不行啊,php,perl都比它高,更别说c了,这是python的软肋啊。
不是为了推广PYTHON, 是被逼无奈:-(

那位朋友能看出下面数字的差别吗? 也是11倍吧! 不知道如何计算real, sys, user 的时间分配。
34.6840000153 #原来的程序
3.06199979782 #我的程序

import random
import time

__revision__ = '0.1'

def test_org():
fh = open("test_cjj","w")

for i in range(5000000):
data = random.randrange(1000000,9999999,1)
yu = data % 3
fh.write(str(yu)+"\n")

fh.close()

def test_tryme_once():
fh = open("test_cjj","w")
ran = random.random
for i in xrange(5000000):
data = ran()*(9999999-1000000)+1000000
yu = data % 3
fh.write(str(yu)+"\n")
fh.close()

if __name__ == "__main__" :
beginTime = time.time()
test_org()
endTime = time.time()

diffTime = endTime - beginTime
print diffTime
beginTime = time.time()
test_tryme_once()
endTime = time.time()

diffTime = endTime - beginTime
print diffTime

boost...@googlemail.com

unread,
Apr 3, 2008, 2:46:47 PM4/3/08
to python-cn`CPyUG`华蟒用户组
pysco 不错。 我们的程序也用了, 不过没有比较过。
#15.0340001583
#1.40799999237
没有pysco的结果!
> 34.6840000153 #原来的程序
> 3.06199979782 #我的程序

qgg

unread,
Apr 3, 2008, 2:49:22 PM4/3/08
to python-cn`CPyUG`华蟒用户组
你的程序不对吧?只写入了一个数字,而且还不是整数。

On Apr 4, 2:40 am, "boostpy2...@yahoo.com.cn"

jerryji

unread,
Apr 3, 2008, 10:53:50 PM4/3/08
to python-cn`CPyUG`华蟒用户组
在内存里生成全部数据一次写入(OK,是cheat啦)可以在不用psyco的情况下提高效率30%左右:

$ cat rand7.py
#!/usr/bin/env python

import random

five_million = 5000000

def gen_yu():
for i in xrange(five_million):
data = int(random.random()*(9999999))
yu = data % 3
yield yu

def test():
fh = file(str(five_million), 'w')
content = '\n'.join([str(m) for m in gen_yu()])
fh.write(content)
fh.close()

if __name__ == '__main__':
test()

$ time ./rand7.py
real 0m8.221s
user 0m8.145s

$ cat rand2.py
#!/usr/bin/env python

import random

def test():
five_million = 5000000
fh = file(str(five_million), 'w')
for i in xrange(five_million):
data = int(random.random()*(9999999))
yu = data % 3
fh.write('%d\n' % yu)
fh.close()

if __name__ == '__main__':
test()

$ time ./rand2.py
real 0m11.582s
user 0m11.521s

Jerry

On Apr 3, 12:10 pm, "俊杰蔡" <yzcaijun...@gmail.com> wrote:
> def test():
> fh = open("test_cjj","w",500000)
> for i in range(5000000):
> data = intzrandom()*(9999999-1000000)+1000000)
> yu = data % 3
> fh.write(str(yu)+"\n")
> fh.close()
>
> if __name__ == "__main__" :
>
> try:
> import psyco
> psyco.full()
> except ImportError:
> pass
>
> test()
>
> time ./a.py
>
> real 0m6.066s
> user 0m5.892s
> sys 0m0.044s
>
> 很好,很强大
>
> 在08-4-3,qgg <qggjo...@msn.com> 写道:

Zoom.Quiet

unread,
Apr 4, 2008, 1:50:39 AM4/4/08
to pyth...@googlegroups.com, ZPyUG~珠江三角区Py用户组, SPyUG~上海及长江三角区Py用户组, CPUG-华东南用户组
使用迭代后,进行 pysco 加速反而无益
http://wiki.woodpecker.org.cn/moin/MicroProj/2008-04-03

最后还是最真觉的方式最快: Simple is better ;)
不过,在不使用C加速前,的确是使用生成器的迭代计算要快

2008/4/4 jerryji <jerry...@gmail.com>:

Pls. usage OOo to replace M$ Office. http://zh.openoffice.org
Pls. usage 7-zip to replace WinRAR/WinZip. http://7-zip.org

boost...@googlemail.com

unread,
Apr 4, 2008, 3:36:18 AM4/4/08
to python-cn`CPyUG`华蟒用户组


On 3 Apr., 20:49, qgg <qggjo...@msn.com> wrote:
> 你的程序不对吧?只写入了一个数字,而且还不是整数。
我的程序是对的, 不过copy & paste 错位。 你指出的另一点对, 我没有用整数, 换成整数的结果为
#38.3469998837
#6.48200011253
用pysco 的结果
#15.8539998531
#3.09300017357

至少是六倍!

boost...@googlemail.com

unread,
Apr 4, 2008, 3:41:13 AM4/4/08
to python-cn`CPyUG`华蟒用户组


On 4 Apr., 04:53, jerryji <jerryji1...@gmail.com> wrote:
> 在内存里生成全部数据一次写入(OK,是cheat啦)可以在不用psyco的情况下提高效率30%左右:
> data = int(random.random()*(9999999))

看看我的代码, ran= random.random, 然后再用。 这是EXPERT的赠言。

yuting cui

unread,
Apr 4, 2008, 9:50:37 AM4/4/08
to pyth...@googlegroups.com
呃...randrange为了不管你传进去的数字有多大都能保证统计正确性增加了很多额外开销

在 08-4-4,boost...@yahoo.com.cn<boost...@googlemail.com> 写道:

jerryji

unread,
Apr 4, 2008, 10:22:40 AM4/4/08
to python-cn`CPyUG`华蟒用户组
With all the respect, I don't see any difference between:

<code type="python">
rand = random.random
rand()
</code>

and

<code type="python">
random.random()
</code>

is there any reference on the expert advice?

Thanks.

Jerry

On Apr 4, 3:41 am, "boostpy2...@yahoo.com.cn"

boost...@googlemail.com

unread,
Apr 4, 2008, 1:00:18 PM4/4/08
to python-cn`CPyUG`华蟒用户组
> With all the respect, I don't see any difference between:
rand = random.random
type(rand)
<type 'builtin_function_or_method'>
type(random)
<type 'module'>

If you use random.random() , python will use random as start point
and search random the function random and then use them.
Instead of that, python use rand (random.random) as builtin function
directly

ca
I have red that somewhere before.

俊杰蔡

unread,
Apr 4, 2008, 2:03:43 PM4/4/08
to pyth...@googlegroups.com
cjj.c:
#include <stdio.h>
void amaze()
{
        FILE *fp;
        int i,num;
        fp = fopen("test_cjj","w");
        for(i=0;i<5000000;i++)
        {
                num = (rand()%9000000+1000000) % 3;
                fprintf(fp,"%d\n",num);   
        }
        fclose(fp);
}

cjj.i:
%module cjj
%{
extern void amaze();
%}
extern void amaze();

swig -python cjj.i
gcc -c cjj.c cjj_wrap.c -I/usr/include/python2.5
ld -shared cjj.o cjj_wrap.o -o _cjj.so

a.py:
import cjj
__revision__ = '0.1'

if __name__ == "__main__" :
    try:
        import psyco
        psyco.full()
    except ImportError:
        pass
    cjj.amaze()

time ./a.py

real    0m1.271s
user    0m1.228s
sys     0m0.036s


python的魅力所在!



小龙

unread,
Apr 4, 2008, 4:17:55 PM4/4/08
to pyth...@googlegroups.com
代码执行时间是怎么测出来的, linux命令,还是工具?



--
deSign thE  fuTure
http://www.freeis.cn/

Zoom.Quiet

unread,
Apr 4, 2008, 10:20:11 PM4/4/08
to pyth...@googlegroups.com, ZPyUG~珠江三角区Py用户组, SPyUG~上海及长江三角区Py用户组
说的好!算是终极版本了!记录下来:
http://wiki.woodpecker.org.cn/moin/MicroProj/2008-04-03


期望没有人使用汇編再来整了.. ;)

2008/4/5 俊杰蔡 <yzcai...@gmail.com>:

--

'''过程改进乃是开始催生可促生靠谱的人的组织!
PI keeps evolving organizations which promoting people be good!
'''http://zoomquiet.org

Pls. usage OOo to replace M$ Office. http://zh.openoffice.org
Pls. usage 7-zip to replace WinRAR/WinZip. http://7-zip.org

lan haiping

unread,
Apr 5, 2008, 12:28:23 AM4/5/08
to pyth...@googlegroups.com
time

2008/4/5 小龙 <fre...@gmail.com>:



--
Hai-Ping Lan
Department of Electronics ,
Peking University , Bejing, 100871
lanha...@gmail.com, hp...@pku.edu.cn
Reply all
Reply to author
Forward
0 new messages