Something like the Python equivalent of curl http://url.com/file.xml |
head -c 2048
Thanks!
Erik
If you're OK calling curl and head from within python:
from subprocess import Popen, PIPE
url = "http://docs.python.org/"
p1 = Popen(["curl", url], stdout = PIPE, stderr = PIPE)
p2 = Popen(["head", "-c", "1024"], stdin = p1.stdout, stdout = PIPE)
p2.communicate()[0]
If you want a pure python approach:
import urllib2
url = "http://docs.python.org/"
req = urllib2.Request(url)
f = urllib2.urlopen(req)
f.read(1024)
HTH,
Ben
urllib.urlopen gives you a file-like object, which you can then read
line by line or in fixed-size chunks. For example:
import urllib
chunk = urllib.urlopen('http://url.com/file.xml').read(2048)
At that point, chunk is just bytes, which you can write to a local
file, print, or whatever it is you want.
John
As the OP wants to save bandwidth, it's better to ask exactly the amount
of data to read. That is, add a Range header field [1] to the request, and
inspect the response for a corresponding Content-Range header [2].
py> import urllib2
py> url = "http://www.python.org/"
py> req = urllib2.Request(url)
py> req.add_header('Range', 'bytes=0-10239') # first 10K
py> f = urllib2.urlopen(req)
py> data = f.read()
py> print repr(data[-30:]), len(data)
'\t <a href="http://www.zope.' 10240
py> f.headers['Content-Range']
'bytes 0-10239/18196'
py> f.getcode()
206 # 206=Partial Content
py> f.close()
[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16
--
Gabriel Genellina
No, the entire file is not downloaded. My understanding of why this is (which
could be wrong) is that the output of curl is piped to head, and once head gets
the first 2k it closes the pipe. Then, when curl tries to write to the pipe
again, it gets sent the SIGPIPE signal at which point it exits.
Cheers,
Ben