Issue 69 in dpkt: is dpkt able to parse the HTTP response ????

dp...@googlecode.com

unread,

Apr 13, 2011, 9:11:49 AM4/13/11

to dp...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 69 by cuiheng....@gmail.com: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69

Hi,

I am currently using dpkt to parse libpcap format file. What I want to
do is to extract the raw content of the HTTP response. However, I found my
script ONLY works for the HTTP responses that have small content length
(<1500Byte). My script is something as:

#!/usr/bin/env python
import dpkt
f = open('test.cap')
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if tcp.sport == 80 :
try:
http2 = dpkt.http.Response(tcp.data)
print http2.headers
except:
pass
f.close()

This is just a very simple example trying to print the HTTP headers of all
the responses.
However, comparing to my original test.cap file, I found the script does
NOT print out all the response headers. Only responses with small
contents(e.g. <1500B) can be printed out. I am wondering why it behaves
like that and is dpkt really able to extract ALL the HTTP response contents
(e.g. html files)?

ps, I am using dpkt-1.7 with Fedora OS.

Thanks for your comments.
heng

dp...@googlecode.com

unread,

Apr 13, 2011, 9:28:57 AM4/13/11

to dp...@googlegroups.com

Comment #1 on issue 69 by cuiheng....@gmail.com: is dpkt able to parse the

HTTP response ????
http://code.google.com/p/dpkt/issues/detail?id=69

by the way, my python version is Python 2.4.3

dp...@googlecode.com

unread,

Apr 13, 2011, 9:52:09 AM4/13/11

to dp...@googlegroups.com

Updates:
Status: WontFix

Comment #2 on issue 69 by dugsong: is dpkt able to parse the HTTP
response ????
http://code.google.com/p/dpkt/issues/detail?id=69

dpkt doesn't do any TCP stream reassembly. You need to do that yourself.

Here's an example:

http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/reasm.py

If you're doing it live, you need to do your own stream parsing instead,
e.g. dsniff's HTTP stream parser:

http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/http.py

Good luck!

dp...@googlecode.com

unread,

Apr 13, 2011, 10:29:48 AM4/13/11

to dp...@googlegroups.com

Comment #3 on issue 69 by cuiheng....@gmail.com: is dpkt able to parse the

HTTP response ????
http://code.google.com/p/dpkt/issues/detail?id=69

thanks dubsong,

in this case, maybe dpkt is not the best option for the http response
parsing.

do you have any ideas which python modules may be a better choice in the
http parsing in given a libpcap file?

Jeff Silverman

unread,

Apr 13, 2011, 5:30:40 PM4/13/11

to dp...@googlegroups.com

Heng,

The problem is that dpkt goes packet by packet. When it encounters a packet with a source port of 80, it tries to parse the packet as an HTTP header. If the packet happens to be the first packet in a response, then this is okay. However, the response can be many packets in length. So you have to pick up all of the packets in the stream.

There is a problem: the response can be limited by a byte counter in the header, a field called Content-Length, by the number of "chunks" of data being returned, or by closing the connection with a TCP packet with the FIN bit set. Refer to RFC 2616 section 4.4 for details (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4).

I am working on some software that will pick up the HTTP response and return it as a string, but it isn't ready yet. You can follow my progress at git://github.com/jeffsilverm/dpkt_doc.git or at http://www.commercialventvac.com/dpkt.html

Jeff

--
Jeff Silverman, linux sysadmin
nine two four twentieth avenue east
Seattle, WA, nine eight one one two -3507
(2O6) 329-1O94
jeffs...@gmail.c0m (note the zero!)
http://www.commercialventvac.com/~jeffs/
Read my book, "Failure is Not an Option: How to build reliable computer systems from unreliable parts using Open Source software" http://www.commercialventvac.com/finao"

Reply all

Reply to author

Forward