The GC hazard static analysis uses sixgill, but really only the part
that produces a simplified version of the control flow graph. Brian
implemented some dataflow and general analysis infrastructure in
sixgill, but we don't make use of any of that; we just process the CFG
with custom JS code. I don't know much about the "built-in" constraint
solver that sixgill uses; Brian would have to talk about that. Also,
sixgill definitely doesn't handle all C++ features. I've had to teach it
about a couple that turned out to be needed for the GC analysis, but
there are a number of others that it simply punts on. (Code using those
features is just discarded.) Implementing more features is certainly
possible, but requires understanding the (undocumented?
underdocumented?) GCC internal data structures. Admittedly, many fancy
C++ features are handled earlier, so sixgill doesn't need to do anything
for them. What we have now is certainly adequate for the GC hazard
analysis, but it's something of an open question as to whether it would
be adequate for exception safety.
The bigger issue is that, at least the way we're using it now, sixgill
doesn't give you anything for dataflow other than a simplified CFG.
You'd have to implement that. For intraprocedural stuff, that's probably
not too horribly difficult if there are relatively straightforward ways
of recognizing (or annotating) values of interest, but it's a lot of
busywork.
Also, the call graph is necessarily conservative. Any function pointer
is assumed to be able to call anything. Can function pointers be labeled
noexcept? Also, while it does (conservatively) handle virtual function
calls, currently it accepts the possibility of binary extensions so
unless annotated otherwise, it assumes that you might override any
virtual method with one that invokes arbitrary code. But maybe we're
comfortable disallowing that now.
Oh, and the "annotations on various C++ types" currently take the form
of "a hardcoded list of types stored in a JS script", but that's fixed
in a version that I still haven't managed to deploy because b2g is
giving me trouble for unrelated reasons. Now you can do something like
struct MyTaggedPointer { ... } JS_HAZ_GC_POINTER; which boils down to
__attribute__((tag("GC Pointer"))). And similar for functions, which
introduces the possibility of having __attribute__((tag("throws
exceptions, yo"))) and then using the same code to do callgraph
traversals for the ones currently tagged __attribute__((tag("GC
Call"))). In other words: it is easy to add additional analyses, as long
as they want to do pretty much the same thing as the existing GC
analysis. I'm sure each analysis will need its own ugly collection of
special cases and things. As I said, though, tagging things as "leakable
resource" is not at all trivial, due to the lack of dataflow. I don't
even know if the dataflow would be precise enough in practice.