Python-Spidermonkey and PhoneyC

71 views

Skip to first unread message

Neha Jain

unread,

Jun 28, 2010, 10:51:23 PM6/28/10

to pho...@googlegroups.com

Hello Gentle Reader,

I have a thought of extracting features from the JS files that PhoneyC visits. The static way of extracting features is good but that isn't very prospective as most of the safe files too are obfuscated or packed, to reduce size and loading times. So, I think tracing the object creation and function calls from the python-spidermonkey engine and logging them back to another file from where it can be extracted to useful feature vector would be better. For this I need to know the python-spidermonkey calls and functions and some way to add the logging feature I am thinking of.

--
Smiles
Neha )))))

jose nazario

unread,

Jun 29, 2010, 1:39:06 PM6/29/10

to pho...@googlegroups.com

so, a thought.

in simple text classification, bayes works on tokens. one simple way
to build a corpus of tokens is to split the JS into tokens using a
simple regex. here's a simple example from the test/ directory:

>>> doc[:100]
'<SCRIPT language="javascript">\n var p_url =
"http://paksusic.cn/nuc/exe.php";\nfunction SS()\n{'
>>> m = re.findall('\w+[^\w]', doc)
>>> len(m)
373
>>> m
['SCRIPT ', 'language=', 'javascript"', 'var ', 'p_url ', 'http:',
'paksusic.', 'cn/', 'nuc/', 'exe.', 'php"', 'function ', 'SS(',
'try{', 'ret=', 'new ', 'ActiveXObject(', 'snpvw.', 'Snapshot ',
'Viewer ', 'Control.', '1"', 'var ', 'arbitrary_file ', 'p_url;', 'var
', 'dest ', 'C:', 'Program ', 'Files/', 'Outlook ', 'Express/',
'wab.', "exe'", 'document.', 'write(', 'object ', 'classid=',
'clsid:', 'F0E42D60-', '368C-', '11D0-', 'AD81-', "00A0C90DC8D9'",
'id=', "attack'", 'object>', 'attack.', 'SnapshotPath ', ...

which is truncated but you get the idea. looking at the token
distribution over malicious and then benign JS samples i wonder if
this would work.

have you considered a simple strategy like this? simpler to implement
and test than building a feature vector as you described it.