A beginner question

21 views
Skip to first unread message

twashburn

unread,
Mar 6, 2013, 4:24:04 PM3/6/13
to dumbo...@googlegroups.com
I'm logging events from various sources (firewalls, Proxies, Anti-virus etc) into a SEIM. From the SEIM the events are sysloged to Hadoop cluster.  I've work with the example ipcount.py and figured out which field would give me an IP address to count.  What I'd like to do is to use regex to isolate the source IP address or destination IP address. Eventually I'd want to count the destination IP and group the results by the source IP

Thoughts?

Regards
TimW

Tobias Speckbacher

unread,
Mar 6, 2013, 4:57:13 PM3/6/13
to dumbo...@googlegroups.com
emit a key containing source and destination ip from the mapper and sum up the number of occurrences in the reducer.
this will give you the number of times this source and destination occurred in your logs.

The output would be something like

source, destination, count

This solve your problem ?


--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages