import urllib2
url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
a = open('adobe.pdf', 'w')
for line in urllib2.urlopen(url):
a.write(line)
pdf is /not/ text. You're processing it like it's a text file (and
storing it like it's text, which on Windows is most likely a no no).
import urllib2
url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
response = urllib2.urlopen(url)
fh = open('adobe.pdf', 'wb')
fh.write(response.read())
fh.close()
response.close()
--
John Bokma j3b
Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
Sure you don't need this to be 'wb' instead of 'w'?
> for line in urllib2.urlopen(url):
> a.write(line)
I also don't know if this "for line...a.write(line)" loop is
doing newline translation. If it's a binary file, you should use
.read() (perhaps with a modest-sized block-size, writing it in a
loop if the file can end up being large.)
-tkc
Two guesses:
First, you need to call a.close() when you're done writing to the file.
This will happen automatically when you have no more references to the
file, but I'm guessing that you're running this code in IDLE or some
other IDE, and a is still a valid reference to the file after you run
that snippet.
Second, you're treating the pdf file as text (you're assuming it has
lines, you're not writing the file in binary mode, etc.). I don't
know if that's correct for a pdf file. I would do something like this
instead:
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32
IDLE 2.6.4
>>> import urllib2
>>> url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
>>> a = open('C:/test.pdf', 'wb')
>>> data = urllib2.urlopen(url).read()
>>> a.write(data)
>>> a.close()
That seems to works for me, in that it downloads a 16 page pdf
document, and that document opens without error or any other obvious
problems.
--
Jerry
If you're running Windows, try
a = open('adobe.pdf', 'wb')
[Works for me]
> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'
> a = open('adobe.pdf', 'w')
Try 'wb', just in case.
S
> for line in urllib2.urlopen(url):
> a.write(line)
'wb' does the trick. Thanks all!
Here is the final working code, i used an index(i)
to see how many reads took place, i have to assume there is
a default buffer size:
import urllib2
a = open('adobe.pdf', 'wb')
i = 0
for line in
urllib2.urlopen('http://www.whirlpoolwaterheaters.com/downloads/6510413.pdf'):
i = i + 1
a.write(line)
print "Number of reads: %d" % i
a.close()
NEW QUESTION if y'all are still reading:
Is there an integer increment operation in Python? I tried
using i++ but had to revert to 'i = i + 1'
>
>
> NEW QUESTION if y'all are still reading:
>
> Is there an integer increment operation in Python? I tried
> using i++ but had to revert to 'i = i + 1'
i+=1
<snip>
Nope, but try i += 1.
~Ethan~
Instead, use enumerate:
for i, line in enumerate(...):
...
Diez
Using a for loop here is still a BAD IDEA -- line could easily end up
megabytes in size (though that is statistically unlikely).
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
Just so the OP has it, dealing with binary files without reading
the entire content into memory would look something like
from urllib2 import urlopen
CHUNK_SIZE = 1024*4 # 4k, why not?
OUT_NAME = 'out.pdf'
a = open(OUT_NAME, 'wb')
u = urlopen(URL)
bytes_read = 0
while True:
data = u.read(CHUNK_SIZE)
if not data: break
a.write(data)
bytes_read += len(data)
print "Wrote %i bytes to %s" % (
bytes_read, OUT_NAME)
-tkc