New issue 69 by cuiheng....@gmail.com: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69
Hi,
I am currently using dpkt to parse libpcap format file. What I want to
do is to extract the raw content of the HTTP response. However, I found my
script ONLY works for the HTTP responses that have small content length
(<1500Byte). My script is something as:
#!/usr/bin/env python
import dpkt
f = open('test.cap')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if tcp.sport == 80 :
try:
http2 = dpkt.http.Response(tcp.data)
print http2.headers
except:
pass
f.close()
This is just a very simple example trying to print the HTTP headers of all
the responses.
However, comparing to my original test.cap file, I found the script does
NOT print out all the response headers. Only responses with small
contents(e.g. <1500B) can be printed out. I am wondering why it behaves
like that and is dpkt really able to extract ALL the HTTP response contents
(e.g. html files)?
ps, I am using dpkt-1.7 with Fedora OS.
Thanks for your comments.
heng
by the way, my python version is Python 2.4.3
Comment #2 on issue 69 by dugsong: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69
dpkt doesn't do any TCP stream reassembly. You need to do that yourself.
Here's an example:
http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/reasm.py
If you're doing it live, you need to do your own stream parsing instead,
e.g. dsniff's HTTP stream parser:
http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/http.py
Good luck!
thanks dubsong,
in this case, maybe dpkt is not the best option for the http response
parsing.
do you have any ideas which python modules may be a better choice in the
http parsing in given a libpcap file?