C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite2.py')"
raw times: 123 126 125
3 loops, best of 3: 41 sec per loop
C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite5.py')"
raw times: 34 34.3 34
3 loops, best of 3: 11.3 sec per loop
C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite6.py')"
raw times: 0.4 0.447 0.391
3 loops, best of 3: 130 msec per loop
If you can just copy a preexisting file it will surely increase the speed to the levels you need, but doing the cStringIO operations can reduce the time in 72%.
Strangely I just realised that the time it takes to complete such scripts is the same no matter what hard drive I choose to run them. The results are the same for an SSD (main drive) and a HDD.
I think it's very strange to take 11.3s to write 50MB (4.4MB/s) sequentially on a SSD which is capable of 140MB/s.
Is that a Python problem? Why does it take the same time on the HDD?
### fastwrite2.py ### <<< this is your code
size = 50*1024*1024
value = 0
filename = 'fastwrite2.dat'
with open(filename, "w") as f:
while f.tell()< size:
f.write("{0}\n".format(value))
value += 1
f.close()
### fastwrite5.py ###
import cStringIO
size = 50*1024*1024
value = 0
filename = 'fastwrite5.dat'
x = 0
b = cStringIO.StringIO()
while x < size:
line = '{0}\n'.format(value)
b.write(line)
value += 1
x += len(line)+1
f = open(filename, 'w')
f.write(b.getvalue())
f.close()
b.close()
### fastwrite6.py ###
import shutil
src = 'fastwrite.dat'
dst = 'fastwrite6.dat'
shutil.copyfile(src, dst)
----------------------------------------
> Date: Fri, 17 May 2013 07:58:43 -0400
> From: da...@davea.name
> To: pytho...@python.org
> Subject: Re: How to write fast into a file in python?
>
> On 05/17/2013 12:35 AM, lokesh...@gmail.com wrote:
>> On Friday, May 17, 2013 8:50:26 AM UTC+5:30, lokesh...@gmail.com wrote:
>>> I need to write numbers into a file upto 50mb and it should be fast
>>>
>>> can any one help me how to do that?
>>>
>>> i had written the following code..
>>>
>>> <SNIP>
>>> value = 0
>>>
>>> with open(filename, "w") as f:
>>>
>>> while f.tell()< size:
>>>
>>> f.write("{0}\n".format(value))
>>> <SNIP more double-spaced nonsense from googlegroups>
> If you must use googlegroups, at least read this
> http://wiki.python.org/moin/GoogleGroupsPython.
>>>
>>>
>>> it takes about 20sec i need 5 to 10 times less than that.
>> size = 50mb
>>
>
> Most of the time is spent figuring out whether the file has reached its
> limit size. If you want Python to go fast, just specify the data. On
> my Linux system, it takes 11 seconds to write the first 6338888 values,
> which is just under 50mb. If I write the obvious loop, writing that
> many values takes .25 seconds.
>
> --
> DaveA
It takes about 0.2s for the f.write() to return. Certainly because it writes to the system file cache (~250MB/s).
Using a little bit different approach I've got:
C:\src\Python>python -m timeit -cvn3 -r3 -s"from fastwrite5r import run" "run()"
raw times: 24 25.1 24.4
3 loops, best of 3: 8 sec per loop
This time it took 8s to complete from previous 11.3s.
Does those 3.3s are the time to "open, read, parse, compile" steps you told me?
If so, the execute step is really taking 8s, right?
Why does it take so long to build the string to be written? Can it get faster?
Thanks in advance!
### fastwrite5r.py ###
def run():
import cStringIO
size = 50*1024*1024
value = 0
filename = 'fastwrite5.dat'
x = 0
b = cStringIO.StringIO()
while x < size:
line = '{0}\n'.format(value)
b.write(line)
value += 1
x += len(line)+1
f = open(filename, 'w')
f.write(b.getvalue())
f.close()
b.close()
if __name__ == '__main__':
run()
----------------------------------------
> From: steve+comp....@pearwood.info
> Subject: Re: How to write fast into a file in python?
> Date: Fri, 17 May 2013 16:42:55 +0000
> To: pytho...@python.org
Thanks a lot!!!
> Oh, I forgot to mention: you have a bug in this function. You're already
> including the newline in the len(line), so there is no need to add one.
> The result is that you only generate 44MB instead of 50MB.
That's because I'm running on Windows.
What's the fastest way to check if '\n' translates to 2 bytes on file?
> Here are the results of profiling the above on my computer. Including the
> overhead of the profiler, it takes just over 50 seconds to run your file
> on my computer.
>
> [steve@ando ~]$ python -m cProfile fastwrite5.py
> 17846645 function calls in 53.575 seconds
>
Didn't know the cProfile module.Thanks a lot!
> Ordered by: standard name
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>)
> 1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}
> 5948879 5.582 0.000 5.582 0.000 {len}
> 1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}
> 1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
> 5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}
> 1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
> 5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects}
> 1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}
> 1 0.000 0.000 0.000 0.000 {open}
>
>
> As you can see, the time is dominated by repeatedly calling len(),
> str.format() and StringIO.write() methods. Actually writing the data to
> the file is quite a small percentage of the cumulative time.
>
> So, here's another version, this time using a pre-calculated limit. I
> cheated and just copied the result from the fastwrite5 output :-)
>
> # fasterwrite.py
> filename = 'fasterwrite.dat'
> with open(filename, 'w') as f:
> for i in xrange(5948879): # Actually only 44MB, not 50MB.
> f.write('%d\n' % i)
>
I had the same idea but kept the original method because I didn't want to waste time creating a function for calculating the actual number of iterations needed to deliver 50MB of data. ;)
> And the profile results are about twice as fast as fastwrite5 above, with
> only 8 seconds in total writing to my HDD.
>
> [steve@ando ~]$ python -m cProfile fasterwrite.py
> 5948882 function calls in 28.840 seconds
>
> Ordered by: standard name
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>)
> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
> 5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}
> 1 0.019 0.019 0.019 0.019 {open}
>
I thought there would be a call to format method by "'%d\n' % i". It seems the % operator is a lot faster than format.
I just stopped using it because I read it was going to be deprecated. :(
Why replace such a great and fast operator by a slow method? I mean, why format is been preferred over %?
> Without the overhead of the profiler, it is a little faster:
>
> [steve@ando ~]$ time python fasterwrite.py
>
> real 0m16.187s
> user 0m13.553s
> sys 0m0.508s
>
>
> Although it is still slower than the heavily optimized dd command,
> but not unreasonably slow for a high-level language:
>
> [steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat
> 90781+1 records in
> 90781+1 records out
> 46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s
>
> real 0m0.786s
> user 0m0.071s
> sys 0m0.595s
>
>
>
>
> --
> Steven
x += len(line)+len(os.linesep)-1
Not sure if it's the fastest way to achieve that. :/
On 17 May 2013 19:38, "Carlos Nepomuceno" <carlosne...@outlook.com> wrote:
>
> Think the following update will make the code more portable:
>
> x += len(line)+len(os.linesep)-1
>
> Not sure if it's the fastest way to achieve that. :/
>
Putting len(os.linesep)'s value into a local variable will make accessing it quite a bit faster. But why would you want to do that?
You mentioned "\n" translating to two lines, but this won't happen. Windows will not mess with what you write to your file. It's just that traditionally windows and windows programs use \r\n instead of just \n. I think it was for compatibility with os/2 or macintosh (I don't remember which), which used \r for newlines.
You don't have to follow this convention. If you open a \n-separated file with *any* text editor other than notepad, your newlines will be okay.
Internal representations only keep '\n' for simplicity, but if you wanna keep track of the file length you have to take that into account. ;)
________________________________
> Date: Sat, 18 May 2013 08:49:55 +0100
> Subject: RE: How to write fast into a file in python?
> From: fabiosa...@gmail.com
> To: carlosne...@outlook.com
> CC: pytho...@python.org
>
>
> On 17 May 2013 19:38, "Carlos Nepomuceno"
> <carlosne...@outlook.com<mailto:carlosne...@outlook.com>>
> wrote:
> >
> > Think the following update will make the code more portable:
> >
> > x += len(line)+len(os.linesep)-1
> >
> > Not sure if it's the fastest way to achieve that. :/
> >
>
Indeed! My mistake just made me find out that Acorn used that inversion on Acorn MOS.
According to this[1] (at page 449) the OSNEWL routine outputs '\n\r'.
What the hell those guys were thinking??? :p
"OSNEWL
This call issues an LF CR (line feed, carriage return) to the currently selected
output stream. The routine is entered at &FFE7."
I think the main difference between your create_file_numbers_file_like()
and the fastwrite5.py I sent earlier is that I've used cStringIO
instead of StringIO. It took 12s less using cStringIO.
My numbers are much greater, but I've used Python 2.7.5 instead:
C:\src\Python>python create_file_numbers.py
time taken to write a file of size 52428800 is 39.1199457743 seconds
time taken to write a file of size 52428800 is 14.8704800436 seconds
time taken to write a file of size 52428800 is 23.0011990985 seconds
I've downloaded bufsock.py and python2x3.py. The later one was hard to remove the source code from the web page.
Can I use them on my projects? I'm not used to the UCI license[1]. What's the difference to the GPL?
[1] http://stromberg.dnsalias.org/~dstromberg/UCI-license.html
________________________________
> Date: Sat, 18 May 2013 12:38:30 -0700
> Subject: Re: How to write fast into a file in python?
> From: drsa...@gmail.com
> To: lokesh...@gmail.com
> CC: pytho...@python.org
>
>
> With CPython 2.7.3:
> ./t
> time taken to write a file of size 52428800 is 15.86 seconds
>
> time taken to write a file of size 52428800 is 7.91 seconds
>
> time taken to write a file of size 52428800 is 9.64 seconds
>
>
> With pypy-1.9:
> ./t
> time taken to write a file of size 52428800 is 3.708232 seconds
>
> time taken to write a file of size 52428800 is 4.868304 seconds
>
> time taken to write a file of size 52428800 is 1.93612 seconds
>
> Here's the code:
> #!/usr/local/pypy-1.9/bin/pypy
> #!/usr/bin/python
>
> import sys
> import time
> import StringIO
>
> sys.path.insert(0, '/usr/local/lib')
> import bufsock
>
> def create_file_numbers_old(filename, size):
> start = time.clock()
>
> value = 0
> with open(filename, "w") as f:
> while f.tell() < size:
> f.write("{0}\n".format(value))
> value += 1
>
> end = time.clock()
>
> print "time taken to write a file of size", size, " is ", (end
> -start), "seconds \n"
>
> def create_file_numbers_bufsock(filename, intended_size):
> start = time.clock()
>
> value = 0
> with open(filename, "w") as f:
> bs = bufsock.bufsock(f)
> actual_size = 0
> while actual_size < intended_size:
> string = "{0}\n".format(value)
> actual_size += len(string) + 1
> bs.write(string)
> value += 1
> bs.flush()
>
> end = time.clock()
>
> print "time taken to write a file of size", intended_size, " is ",
> (end -start), "seconds \n"
>
>
> def create_file_numbers_file_like(filename, intended_size):
> start = time.clock()
>
> value = 0
> with open(filename, "w") as f:
> file_like = StringIO.StringIO()
> actual_size = 0
> while actual_size < intended_size:
> string = "{0}\n".format(value)
> actual_size += len(string) + 1
> file_like.write(string)
> value += 1
> file_like.seek(0)
> f.write(file_like.read())
>
> end = time.clock()
>
> print "time taken to write a file of size", intended_size, " is ",
> (end -start), "seconds \n"
>
> create_file_numbers_old('output.txt', 50 * 2**20)
> create_file_numbers_bufsock('output2.txt', 50 * 2**20)
> create_file_numbers_file_like('output3.txt', 50 * 2**20)
>
>
>
>
> On Thu, May 16, 2013 at 9:35 PM,
> <lokesh...@gmail.com<mailto:lokesh...@gmail.com>> wrote:
> On Friday, May 17, 2013 8:50:26 AM UTC+5:30,
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
> -- http://mail.python.org/mailman/listinfo/python-list
http://stromberg.dnsalias.org/svn/bufsock/trunk/bufsock.py
http://stromberg.dnsalias.org/~dstromberg/backshift/documentation/html/python2x3-pysrc.html
Are those the latest versions?
----------------------------------------
> From: carlosne...@outlook.com
> To: pytho...@python.org
> Subject: RE: How to write fast into a file in python?
> Date: Sun, 19 May 2013 08:31:08 +0300
> CC: lokesh...@gmail.com
>
> Thanks Dan! I've never used CPython or PyPy. Will try them later.
>
> I think the main difference between your create_file_numbers_file_like()
> and the fastwrite5.py I sent earlier is that I've used cStringIO
> instead of StringIO. It took 12s less using cStringIO.
>
> My numbers are much greater, but I've used Python 2.7.5 instead:
>
> C:\src\Python>python create_file_numbers.py
> time taken to write a file of size 52428800 is 39.1199457743 seconds
>
> time taken to write a file of size 52428800 is 14.8704800436 seconds
>
> time taken to write a file of size 52428800 is 23.0011990985 seconds
>
>
> I've downloaded bufsock.py and python2x3.py. The later one was hard to remove the source code from the web page.
>
> Can I use them on my projects? I'm not used to the UCI license[1]. What's the difference to the GPL?
>
>
>
>
> [1] http://stromberg.dnsalias.org/~dstromberg/UCI-license.html
>
> ________________________________
>> Date: Sat, 18 May 2013 12:38:30 -0700
>> Subject: Re: How to write fast into a file in python?
>> From: drsa...@gmail.com
>> To: lokesh...@gmail.com
>> CC: pytho...@python.org
>>
>>
>> With CPython 2.7.3:
>> ./t
>> time taken to write a file of size 52428800 is 15.86 seconds
>>
>> time taken to write a file of size 52428800 is 7.91 seconds
>>
>> time taken to write a file of size 52428800 is 9.64 seconds
>>
>>
>> With pypy-1.9:
>> ./t
>> time taken to write a file of size 52428800 is 3.708232 seconds
>>
>> time taken to write a file of size 52428800 is 4.868304 seconds
>>
>> time taken to write a file of size 52428800 is 1.93612 seconds
>>
>
>> Here's the code:
>> #!/usr/local/pypy-1.9/bin/pypy
>> #!/usr/bin/python
>>
>> import sys
>> import time
>> import StringIO
>>
>> sys.path.insert(0, '/usr/local/lib')
>> import bufsock
>>
>> def create_file_numbers_old(filename, size):
>> start = time.clock()
>>
>> value = 0
>> with open(filename, "w") as f:
>> while f.tell() < size:
>> f.write("{0}\n".format(value))
>> value += 1
>>
>> end = time.clock()
>>
>> print "time taken to write a file of size", size, " is ", (end
>> -start), "seconds \n"
>>
>> def create_file_numbers_bufsock(filename, intended_size):
>> start = time.clock()
>>
>> value = 0
>> with open(filename, "w") as f:
>> bs = bufsock.bufsock(f)
>> actual_size = 0
>> while actual_size < intended_size:
>> string = "{0}\n".format(value)
>> actual_size += len(string) + 1
>> bs.write(string)
>> value += 1
>> bs.flush()
>>
>> end = time.clock()
>>
>> print "time taken to write a file of size", intended_size, " is ",
>> (end -start), "seconds \n"
>>
>>
>> def create_file_numbers_file_like(filename, intended_size):
>> start = time.clock()
>>
>> value = 0
>> with open(filename, "w") as f:
>> file_like = StringIO.StringIO()
>> actual_size = 0
>> while actual_size < intended_size:
>> string = "{0}\n".format(value)
>> actual_size += len(string) + 1
>> file_like.write(string)
>> value += 1
>> file_like.seek(0)
>> f.write(file_like.read())
>>
>> end = time.clock()
>>
>> print "time taken to write a file of size", intended_size, " is ",
>> (end -start), "seconds \n"
>>
>> create_file_numbers_old('output.txt', 50 * 2**20)
>> create_file_numbers_bufsock('output2.txt', 50 * 2**20)
>> create_file_numbers_file_like('output3.txt', 50 * 2**20)
>>
>>
>>
>>
>> On Thu, May 16, 2013 at 9:35 PM,
>> <lokesh...@gmail.com<mailto:lokesh...@gmail.com>> wrote:
>> On Friday, May 17, 2013 8:50:26 AM UTC+5:30,
----------------------------------------
> Date: Sun, 19 May 2013 19:21:54 +1000
> Subject: Re: How to write fast into a file in python?
> From: ros...@gmail.com
> To: pytho...@python.org
Dirty deeds done dirt cheap! lol
----------------------------------------
> Date: Sun, 19 May 2013 16:44:55 +0100
> From: pyt...@mrabarnett.plus.com
> To: pytho...@python.org
> Subject: Re: How to write fast into a file in python?
>
> On 19/05/2013 04:53, Carlos Nepomuceno wrote:
>> ----------------------------------------
>>> Date: Sat, 18 May 2013 22:41:32 -0400
>>> From: da...@davea.name
>>> To: pytho...@python.org
>>> Subject: Re: How to write fast into a file in python?
>>>
>>> On 05/18/2013 01:00 PM, Carlos Nepomuceno wrote:
>>>> Python really writes '\n\r' on Windows. Just check the files.
>>>
>>> That's backwards. '\r\n' on Windows, IF you omit the b in the mode when
>>> creating the file.
>>
>> Indeed! My mistake just made me find out that Acorn used that inversion on Acorn MOS.
>>
>> According to this[1] (at page 449) the OSNEWL routine outputs '\n\r'.
>>
>> What the hell those guys were thinking??? :p
>>
> Doing it that way saved a few bytes.
>
> Code was something like this:
>
> FFE3 .OSASCI CMP #&0D
> FFE5 BNE OSWRCH
> FFE7 .OSNEWL LDA #&0A
> FFE9 JSR OSWRCH
> FFEC LDA #&0D
> FFEE .OSWRCH ...
>
> This means that the contents of the accumulator would always be
> preserved by a call to OSASCI.
>
>> "OSNEWL
>> This call issues an LF CR (line feed, carriage return) to the currently selected
>> output stream. The routine is entered at &FFE7."
>>
>> [1] http://regregex.bbcmicro.net/BPlusUserGuide-1.07.pdf
>>
>