I'm new to pandas, and I am trying to carry out analysis of a network trace dump. I have read the dump file and created the following dataframe:
So to detect the individual flows in the dataframe (data2), I have grouped the entire dataframe according to ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service'] using the following code:
flow = ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service']
grp1 = data2.groupby(flow, sort=False)
So when I do grp1.size() of the first twenty rows of data2, I get the following information:
What I would like to do now is to calculate the mean of ip_len, packet_len, var of ip_len, packet_len and mean of the interpacket arrival times (using the timestamps of packets belonging to the same flow). How can I accomplish this in pandas so that the dataframe is transformed in the required statistics of each flow i.e. the columns should contain the ip_src, ip_dst, sport, dport, ip_proto, service, mean & var values calculated as earlier. I have tried both the aggr and apply methods, but haven't been able to do it. Thanks in advance!