Analysis of Network IP Trace

19 views
Skip to first unread message

Swapnil Debarshi

unread,
Jul 28, 2015, 1:05:10 PM7/28/15
to PyData

I'm new to pandas, and I am trying to carry out analysis of a network trace dump. I have read the dump file and created the following dataframe: 

So to detect the individual flows in the dataframe (data2), I have grouped the entire dataframe according to ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service'] using the following code:
flow = ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service']
grp1
= data2.groupby(flow, sort=False)

So when I do grp1.size() of the first twenty rows of data2, I get the following information:

What I would like to do now is to calculate the mean of ip_len, packet_len, var of ip_len, packet_len and mean of the interpacket arrival times (using the timestamps of packets belonging to the same flow). How can I accomplish this in pandas so that the dataframe is transformed in the required statistics of each flow i.e. the columns should contain the ip_src, ip_dst, sport, dport, ip_proto, service, mean & var values calculated as earlier. I have tried both the aggr and apply methods, but haven't been able to do it. Thanks in advance!
Reply all
Reply to author
Forward
0 new messages