Optimization Opportunities

chudel

unread,

Sep 20, 2012, 11:25:38 AM9/20/12

to ope...@googlegroups.com

Good day,

I am pursuing Optimizing finding IOCs as a possible Disseration topic, probably focussing on the OpenIOC lanaguage and toolset. I think optimizations may be possible in the following areas and was curious to know your thoughts on the subject:

- Validation of IOC specification; it seems possible that an IOC can be written that is technically invalid (i.e.: looking for two separate hash values from the same single file) or don't make sense. The first step in optimization is to ensure all IOCs are syntactically valid; that individual elements within an IOC definition do not cancel themselves out.

- Optimization of the IOCs. If the goal is to know whether any one IOC exists, can a set of IOCs be reduced (binary decision diagrams) to the smallest set of IOCs that need to be checked to match any one IOC. (and then follow-up later with the fullset to identify which one)

- Applying cost to indicator retrieval. By applying and knowing the cost (time penalty) to obtain individual indicators, is it possible to "speed up" the process by selectively parsing individual IOC elements on a least-cost basis. For example, if an IOC is "A" OR "B" and "B" costs less than "A", do B first. An experiment should probably be performed to determine the cost of each indicator relative to one another.

- I'll show a bias - I feel (without the experiment above) that hashing is going to be the most expensive indicator to retrieve. Are there options to improve this component:

- what about a "Fast Hash" and a "Slow Hash" retrieval where the "Fast Hash" is perhaps just a hash of the first 64KB to be retrieved - is that much less expensive? Is the collision rate on that too high?

- what about a filesystem modification to maintain an active state table of all files and their hashes (think: /proc/fshash/a3/d1/24/16/92/ad/22/25/12/26/99/a5/f2/c50f63)

Cheers,
Christopher.

Jeff Bryner

unread,

Sep 20, 2012, 4:00:37 PM9/20/12

to ope...@googlegroups.com, chudel

I can't agree enough. I'm relatively new to openioc, but from playing
with it over the last month or so there's lot of ways to go wrong with
an implementation of a program to handle IOCs.

I think an optimizer would be a great first step especially if it
worked along the lines of database stored proc 'compilers' where they
work out an execution plan that would take a least-cost route to
completing the IOC query.

For example on live systems, examining the running processes is almost
always the fastest route to completion, so if an IOC has that OR a
mass 'file like *exe' with hash zbcd... then the best execution path
is to ignore the file test and complete the process check first.

Coding that logic, however is a real challenge! Especially as you note
you can easily create a nonsense IOC, contradictory IOC, etc.

Jeff.

David Ross

unread,

Oct 2, 2012, 5:13:38 PM10/2/12

to ope...@googlegroups.com

This is a great thread.
Validating IOCs can be rather tricky as the format is intentional very flexible. This is what's allowed it to survive the last several years without changes to the schema but it does allow for many different ways to skin a cat.
Any validation (past schema) would almost always have to happen by the application that is interpreting the IOC.
Some specific examples are obvious (md5sum is 'deadbeef' AND md5sum is 'beef123') but there are actually very few that are that obvious.

Optimization the way it was described here is certainly in the application and coudln't happen in the IOC schema. The application or tool would have to decide what was fast or slow to compute.
But it is very possible, don't get me wrong.

Regarding fast hashes or slow hashes: it's been my experience that calculating the md5sum of a file is a rather inexpensive operation for what the value provides (mostly false positive identification).
It's a different story if you are hashing very large files (gigs) as that will slow down a tiny bit. But, when was the last time you found malware that was gigs in size? So, an application could make a decision to not look at files over 10megs as a general rule.

Just a thought.

David Ross

Reply all

Reply to author

Forward