Static analysis of JS to find malicious obfuscation

tofumatt

unread,

Oct 8, 2015, 4:32:27 AM10/8/15

to dev-stati...@lists.mozilla.org, Andy McKay, Stuart Colville

Hi there static analysis folks!

I’m tofumatt, I'm working with Stuart Colville on the new add-ons validator, written in JS.

One of the things we’d like to improve in this validator is the ability to detect rule bypassing via code obfuscation. For example, mozIndexedDB is a deprecated identifier and that is easy to find with a custom ESLint rule. But if someone types:

var badDB = ‘m’;
badDB += ‘oz’
badDB = badDB + ‘IndexedDB’;
var myDeprecatedDB = window[badDB];

The existing validator and our scans for an identifier with AST (using ESLint/ESPrima) don’t catch it.

Are there any tools (especially JS ones!) that can be used to at least detect this kind of obfuscation? Without it the validator remains more an advisory/helpful tool than something we could use to automate security validation.

Apologies if this is the wrong list; didn’t know exactly who to turn to for this (I’ve also asked security and spidermonkey folks). If I should check with someone specific, please let me know.

Cheers,

- tofumatt

Ehsan Akhgari

unread,

Oct 9, 2015, 10:40:15 AM10/9/15

to tofumatt, dev-stati...@lists.mozilla.org, Andy McKay, Stuart Colville

This is not possible to achieve through static analysis based on the
source code level constructs. In other words, you cannot build a tool
that looks at source code, analyzes the tokens appearing in it, and
infer whether a property on an object has been accessed.

To detect this kind of pattern, you should probably look at methods
using symbolic execution. Very simplistically, symbolic execution looks
at the program's input (typically from the user and/or the external
world) and assumes that the input can be any valid value according to
some constraints (in typed languages for example, you start by assuming
that an integer input can take any valid value in the range acceptable
by the type of the variable it is stored in). Then you start evaluating
the program and as you make progress, you learn about the possible
subranges of a value. Through following different branches in a program
like this, you will find a set of constraints which can cause a program
to take a specific path, and sometimes solving the system of constraints
obtained like this gives you input values that can cause a program to
take a specific path (an unsafe operation, for example.)

I'm not really sure how much this technique can be used on JS. Based on
some quick searches, I found a number of resources which you may find
interesting:

* http://webblaze.cs.berkeley.edu/2010/kudzu/kudzu.pdf
* https://code.google.com/p/js-symbolic-executor/ (which I'm exporting
to github to make sure it won't get lost right now:
https://github.com/ehsan/js-symbolic-executor/)
* https://github.com/SRA-SiliconValley/jalangi (the readme says it has
an undocumented symbolic execution engine, not sure how useful it is
since the project is now replaced with jalangi2.)

That all being said, analyses based on symbolic execution even in typed
languages have a lot of practical limitations and using it even for the
simplest cases such as your example below in JS may very well turn into
a research project that would result in nothing in practice. In my
opinion, it's impractical to detect any useful properties in an add-on
JS code statically based on the source code.

> _______________________________________________
> dev-static-analysis mailing list
> dev-stati...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-static-analysis
>

tofumatt

unread,

Oct 9, 2015, 11:59:56 AM10/9/15

to dev-stati...@lists.mozilla.org, Ehsan Akhgari, Andy McKay, Stuart Colville

Hi Ehsan,

That’s been the impression I’ve been getting from asking around. I think it might be a kind of rabbit hole of a test case, and I’m especially worried that it could cover 80% of known obfuscation attempts, but that wouldn’t be much good :-)

Thanks a lot for the links, I’ll check into them.

Cheers,

- tofumatt

Joshua Cranmer 🐧

unread,

Oct 9, 2015, 1:21:10 PM10/9/15

to Ehsan Akhgari, tofumatt, Andy McKay, Stuart Colville

On 10/9/2015 9:33 AM, Ehsan Akhgari wrote:
> This is not possible to achieve through static analysis based on the
> source code level constructs. In other words, you cannot build a tool
> that looks at source code, analyzes the tokens appearing in it, and
> infer whether a property on an object has been accessed.

The best static analysis on JS that you can probably get is to tell you
when you're accessing a dynamic property on an object (i.e., filtering
out IndexExpressions based on types of the object). In principle, that
could be a starting point for manual analysis, but even that is likely
to produce way too many false positives.

> That all being said, analyses based on symbolic execution even in
> typed languages have a lot of practical limitations and using it even
> for the simplest cases such as your example below in JS may very well
> turn into a research project that would result in nothing in
> practice. In my opinion, it's impractical to detect any useful
> properties in an add-on JS code statically based on the source code.

As far as I'm aware, the only "production-scale" symbolic execution
tools that are used are really concolic execution engines (SAGE is the
ur-example here), since the ability to concretely execute code most of
the time gets rid of a few problems with symbolic execution (namely, the
fact that path numbers are exponentional or super-exponentional in
program size). Unfortunately, JS has properties that make symbolic
execution particularly difficult--its number system, for example, is
inherently floating point (which causes most symbolic execution engines
to keel over and die), and according to one of your references, string
handling is inherently at least PSPACE-hard (boolean satisfiability is
merely NP-hard).

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Ehsan Akhgari

unread,

Oct 9, 2015, 1:48:27 PM10/9/15

to tofumatt, dev-stati...@lists.mozilla.org, Andy McKay, Stuart Colville

On 2015-10-09 11:59 AM, tofumatt wrote:
> Hi Ehsan,
>
> That’s been the impression I’ve been getting from asking around. I think
> it might be a kind of rabbit hole of a test case, and I’m especially

> worried that it could cover 80% of /known/ obfuscation attempts, but

> that wouldn’t be much good :-)

Yeah. The way to think about this is that if you're looking for bad
code patterns in an add-on, you want your analysis to be sound. If it's
not, then its usefulness is hugely diminished.

Cheers,
Ehsan

Ehsan Akhgari

unread,

Oct 9, 2015, 1:55:54 PM10/9/15

to Joshua Cranmer 🐧, Andy McKay, tofumatt, Stuart Colville, dev-stati...@lists.mozilla.org

On Fri, Oct 9, 2015 at 1:21 PM, Joshua Cranmer 🐧 <pidg...@gmail.com>
wrote:

> On 10/9/2015 9:33 AM, Ehsan Akhgari wrote:
>
>> This is not possible to achieve through static analysis based on the
>> source code level constructs. In other words, you cannot build a tool that
>> looks at source code, analyzes the tokens appearing in it, and infer
>> whether a property on an object has been accessed.
>>
>
> The best static analysis on JS that you can probably get is to tell you
> when you're accessing a dynamic property on an object (i.e., filtering out
> IndexExpressions based on types of the object). In principle, that could be
> a starting point for manual analysis, but even that is likely to produce
> way too many false positives

That would only work if you prohibit index expressions on all objects that
can give you back the object you're interested in. As an example, if you
prohibited those expressions on |window| for example, it wouldn't catch
code like |document.getElementById("foo").ownerDocument.defaultView["bad" +
"property"]|.

Cheers,
--
Ehsan