To a first approximation, cover.go contains three things:
* Coverage instrumentation
* Sonar instrumentation
* Literal extraction
Literal extraction is read-only, and IMHO does not make sense to put
into the compiler pass. It is also, incidentally, the place where I
have plans to improve (some vague, some already written and ready to
be cleaned up and turned into a PR at some point).
So it is less than 900 LOC.
Note also that 'go test' does code coverage with an s2s transformation
very similar to go-fuzz. (go-fuzz has fallen behind, because it
doesn't use cmd/internal/edit.) So there are opportunities for shared
code there. So possibly it is much less than 900 LOC.
> Q2. Is there an SSA pass that exists today that is not extremely intertwined with other things that happens to do something at least somewhat similar to what a theoretical fuzzing instrumentation SSA pass would need to do?
>
> * For example, is the race detector SSA pass a reasonable example, or maybe there are better examples?
For coverage, no, there isn't a similar existing pass today. Since
coverage only does memory writes, it is probably not too hard to write
and maintain a pass for, probably at the generic (arch-agnostic)
level. One challenge is figuring out the details of when to do a
write. The SSA form's CFG doesn't perfectly match the source code CFG:
Some branches get optimized away, and other branches get inserted
(handling nil pointer checks, bounds checks, write barrier checks,
inlining, etc.). This might be fine, but it will require some thought
and experimentation to (a) find the right places to put the writes and
(b) decide what the correct position information is for a single block
in the CFG that may span a wide, disconnected set of positions. Also,
the backend of the compiler is concurrent, so we'll need to put some
thought into how to name the writes. Also, we need to figure out where
the compiler should write the coverage position output file, how that
interacts with cmd/go, etc.
The story on sonar instrumentation is a bit uglier. We don't yet have
a mechanism to introduce function calls during an SSA pass. Because of
that, race detector instrumentation actually happens during
construction of the SSA form. We could do the same, I suppose, for
sonar instrumentation, but it is likely to have lots of challenging
corner cases. The race detector only cares about memory reads and
writes, which are easier to funnel through a single pinch point.
Comparisons are spread out a bunch more. Also, because SSA
construction occurs before the SSA passes, we'd end up
coverage-covering our sonar instrumentation. Another sticking point is
that sonar instrumentation uses interfaces. But handling of interfaces
(including allocation as needed) occurs prior to the Node-to-SSA
conversion, so we would need to change our sonar instrumentation to
use only concrete types. Even then we might end up with some sticky
spots.
None of this is to say that it is impossible...just that it is not as
simple as porting what we have to a new domain.
> Q3. Based on how things work in SSA today, would a theoretical fuzzing instrumentation SSA pass most likely primarily be imperative Go code, or primarily be SSA rewrite rules, or something else?
I don't think any of this could be done with rewrite rules. I suspect
you would need code in several places, including cmd/go.
> Q4. Roughly how stable is the SSA pass API now to maintain a pass on the side for some time?
As commented above, for the coverage piece only, probably stable
enough. For the other pieces, less so. (Or so I think...I haven't
tried.)
> Q5. Is there maybe an order-of-magnitude guess as to how many lines of code a first cut fuzzing instrumentation SSA pass could end up being based on the current SSA infrastructure that exists today? (From a sustainability perspective, 50 vs. 500 vs. 5,000 lines of code likely would make a difference).
I suspect that the LOC would be similar, depending on how complicated
the glue pieces are (cmd/go) and how much work goes into getting the
corner cases right.
However, working inside the compiler is generally much more fraught.
If you make an s2s mistake, the compiler usually tells you. If you
make a mistake inside the compiler, you find out later, possibly much
later, when things subtly misbehave. It also substantially raises the
bar for new contributors, who need to understand the compiler as
opposed to needing to understand just an AST.
> Q6. Is there an opportunity to do first something a bit more generic in the SSA world that would then translate to the fuzzing-specific pieces ending up being much smaller?
I think coverage is the best bet here, particularly since in theory it
could also be used by cmd/cover. In fact, I'd be tempted to start
there, since cmd/cover (test coverage) is already internal magic.
But as I have said before, I think this should stay s2s. :)
>>> Pain point.
>>> Compiler instrumentation will be simpler, faster, easier to maintain
>>> _and_ much simpler to integrate with other build systems. Go team says
>>> they want this on the side as s2s, plugged via go tool, but they also
>>> badly want this in bazel and blaze, which are separate huge Java
>>> systems that are very hard to modify and extend.
A minor rebuttal: There are multiple Go compilers.
Plus, now that go-fuzz uses go/packages, the input side is
build-system-agnostic. It should work out of the box with any system
that has a go/packages driver, which I believe bazel and blaze do. It
is possible there is more work to do on the 'go build' side, but that
could be done without touching Java. If someone wants this to work
with bazel/blaze, I would hope they would try to get it to work; it
might not be that bad.
-josh