Hadi Asghari
unread,Jan 6, 2012, 9:17:20 AM1/6/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to SANS Internet Storm Center / DShield
Hello everyone,
Our research group uses the DShield data to look at patterns of botnet
activity - with the assumption that much of the malicious activity
recorded comes from infected hosts. A key question for us is: what is
the best way to reduce the number of False Positives (=non-bots) in
the data?
The feed we have contains attack sources (IPs), the number of times
each source has been reported by the sensors ("reports" and
"targets") as well as protocol and port numbers. As I understand it,
some of the sensors - firewalls and IDS devices - will report any
unauthorized connection attempt as malicious, including ones that are
due to configuration errors, stale bookmarks or typos. Ideally we
would like to filter out as much of these sources as possible.
A simple technique to exclude all sources reported only once in a day
already halves the dataset but where would one draw the line? Any
ideas are appreciated!
Thanks in advance,
Hadi Asghari
--
Faculty of Technology, Policy and Management
Delft University of Technology