A branch predictor effectively is an associative memory that maps
branch locations to target locations. For a given number of mappings,
an ANN could be quite a bit smaller than an equivalent table, but it
would take longer to "learn" each mapping. And I also have doubts that
it would be much faster to retrieve them.
More importantly, ANN based associative memory does not "unlearn"
things either quickly or easily unless you flush *all* the data -
resetting the network to its initial state. And when you exceed the
"memory" capacity of an ANN, it does not necessary degrade gracefully
but instead can become erratic.
The unlearning problem can be mitigated somewhat by using several
smaller ANNs in parallel rather than a single large one. But the
"memory" capacity of an ANN scales exponentially with the size of the
network: a single 2N node network can encode far more mappings than
can two N node networks.
Then also consider that, to really be effective and improve on
existing table predictors, you need to context switch the ANN with the
program it models to avoid pollution by foreign code.
The unlearning problem again.
I suppose you could just punt and model whatever is the whole set of
running software at any given time, but the unlearning problem
combined with erratic behavior when/if the ANN's capacity is exceeded
makes this approach rather untenable.
Certainly it technically is possible, but I have to wonder whether it
can improve on table predictors enough to be worth the effort.
YMMV,
George