Issue 69 in dpkt: is dpkt able to parse the HTTP response ????

686 views
Skip to first unread message

dp...@googlecode.com

unread,
Apr 13, 2011, 9:11:49 AM4/13/11
to dp...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 69 by cuiheng....@gmail.com: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69

Hi,

I am currently using dpkt to parse libpcap format file. What I want to
do is to extract the raw content of the HTTP response. However, I found my
script ONLY works for the HTTP responses that have small content length
(<1500Byte). My script is something as:

#!/usr/bin/env python
import dpkt
f = open('test.cap')
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if tcp.sport == 80 :
try:
http2 = dpkt.http.Response(tcp.data)
print http2.headers
except:
pass
f.close()

This is just a very simple example trying to print the HTTP headers of all
the responses.
However, comparing to my original test.cap file, I found the script does
NOT print out all the response headers. Only responses with small
contents(e.g. <1500B) can be printed out. I am wondering why it behaves
like that and is dpkt really able to extract ALL the HTTP response contents
(e.g. html files)?

ps, I am using dpkt-1.7 with Fedora OS.

Thanks for your comments.
heng

dp...@googlecode.com

unread,
Apr 13, 2011, 9:28:57 AM4/13/11
to dp...@googlegroups.com

Comment #1 on issue 69 by cuiheng....@gmail.com: is dpkt able to parse the

by the way, my python version is Python 2.4.3

dp...@googlecode.com

unread,
Apr 13, 2011, 9:52:09 AM4/13/11
to dp...@googlegroups.com
Updates:
Status: WontFix

Comment #2 on issue 69 by dugsong: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69

dpkt doesn't do any TCP stream reassembly. You need to do that yourself.

Here's an example:

http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/reasm.py

If you're doing it live, you need to do your own stream parsing instead,
e.g. dsniff's HTTP stream parser:

http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/http.py

Good luck!

dp...@googlecode.com

unread,
Apr 13, 2011, 10:29:48 AM4/13/11
to dp...@googlegroups.com

Comment #3 on issue 69 by cuiheng....@gmail.com: is dpkt able to parse the

thanks dubsong,

in this case, maybe dpkt is not the best option for the http response
parsing.

do you have any ideas which python modules may be a better choice in the
http parsing in given a libpcap file?

Jeff Silverman

unread,
Apr 13, 2011, 5:30:40 PM4/13/11
to dp...@googlegroups.com
Heng,

The problem is that dpkt goes packet by packet.  When it encounters a packet with a source port of 80, it tries to parse the packet as an HTTP header.  If the packet happens to be the first packet in a response, then this is okay.  However, the response can be many packets in length.  So you have to pick up all of the packets in the stream.

There is a problem: the response can be limited by a byte counter in the header, a field called Content-Length, by the number of "chunks" of data being returned, or by closing the connection with a TCP packet with the FIN bit set.  Refer to RFC 2616 section 4.4 for details (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4).

I am working on some software that will pick up the HTTP response and return it as a string, but it isn't ready yet. You can follow my progress at git://github.com/jeffsilverm/dpkt_doc.git or at http://www.commercialventvac.com/dpkt.html 


Jeff



--
Jeff Silverman, linux sysadmin
nine two four   twentieth avenue east
Seattle, WA, nine eight one one two -3507
(2O6) 329-1O94
jeffs...@gmail.c0m (note the zero!)
http://www.commercialventvac.com/~jeffs/
Read my book, "Failure is Not an Option: How to build reliable computer systems from unreliable parts using Open Source software" http://www.commercialventvac.com/finao"
Reply all
Reply to author
Forward
0 new messages