FP (or logarithmic) probabilistic counters?

Paul A. Clayton

unread,

Dec 10, 2012, 4:37:45 PM12/10/12

to

I have read proposals to reduce counter size by
a factor of N by using probabilistic increment/
decrement (this might be viewed as a fixed point
counter where the "fractional" portion is
estimated by a uniform random number generator),
but I do not recall reading any proposal to use
a floating-point representation of a counter
(which could be degraded to a logarithmic
representation if only the exponent is used).

This might be a fault in my memory (I might
even have mentioned the possibility before) or
represent the limits of my reading, or it might
be that probabilistic FP counters are relatively
useless. On the other hand, such might be
useful to provide a relatively large counter
range with modest storage if storage overhead
is a greater concern than RNG and update
overhead. (For very large arrays of counters,
the less frequent need to update almost
saturated counters may have energy use advantages
even with the added RNG and update complexity.)

One nice aspect of an FP counter is that it
can "degrade" to an ordinary counter at low
magnitude, so probabilistic issues could be
avoided near zero. (It might also be more
helpful to use timer- and value-based adjustments
with a FP format to constrain probabilistic
issues near saturation because FP values near
saturation would--it seems--be more vulnerable
to probabilistic issues.)

The ability of an FP counter to extend the
counter range might be more helpful when various
adjustment amounts are used for different events.
(Probabilistic counters in general would seem
to work well with variable "strength" of events
by extending the range of "strengths" which can
influence the count without increase counter
storage requirements. [Even a 1-bit branch
predictor with probabilistic update might
associate different strengths to different
events, notably backward vs. forward branches
but also hint information {e.g., imagine using
register names to communicate _strength_ of
the hint} or some dynamic information.])

I think I have considered the possibility of FP
counters before (and may have mentioned such here
before, but a quick search did not discover such
a mention). Looking at the presentation for
"Improving Cache Management Policies Using Dynamic
Reuse Distance" (Duong et al., 2012), which used
reuse distance (remaining protection distance)
counters, made me think (again?) of the possibility
of FP counters (though FP counters might not be
appropriate for this application since the counters
tend to be somewhat small [OTOH, such _might_ be
appropriate since reuse distance might vary by
binary orders of magnitude]).

FP probabilistic counters are a somewhat obvious
extension of probabilistic counters. Presumably
one could extend the concept further with other
probability functions for state changes (perhaps
biasing updates to encode probabilistic
information about when the data was last
updated??), but such extension seems to be
beyond my creativity/madness/mathematical ability.

(Analog storage might provide a probabilistic
storage which might fit with the least
significant bits in a counter--whether
"fixed-point" or floating point. DRAM decay
might also be exploited to decay least
significant bits to zero or perhaps an
arbitrary value.)

Paul A. Clayton

unread,

Dec 10, 2012, 4:54:18 PM12/10/12

to

On Monday, December 10, 2012 4:37:45 PM UTC-5, Paul A. Clayton wrote:
[snip longish meandering post about FP-based counter]

I should have googled more before posting! "probabilistic
counter" results pointed to the wikipedia article
http://en.wikipedia.org/wiki/Approximate_counting_algorithm
which indicates that the logarithmic counter was proposed
in 1977 (by Robert Morris of Bell Labs).

(Being 35 years behind someone at Bell Labs might not be
so bad. :-)

Anyway, perhaps this mention will launch some
(interesting) discussion of uses and tradeoffs.
Otherwise "we now return you to your regularly
scheduled program".

Mark Thorson

unread,

Dec 10, 2012, 7:49:58 PM12/10/12

to

"Paul A. Clayton" wrote:
>
> I have read proposals to reduce counter size by
> a factor of N by using probabilistic increment/
> decrement (this might be viewed as a fixed point
> counter where the "fractional" portion is
> estimated by a uniform random number generator),
> but I do not recall reading any proposal to use
> a floating-point representation of a counter
> (which could be degraded to a logarithmic
> representation if only the exponent is used).

I first heard about probabilistic counters when I
mentioned my need to keep a fractional pixel position
to support slow-moving objects in my animation program.
A guy who designed video games for Mattel's Intelevision
told me a trick he used was to use a probabilistic
counter so that objects could be moving at less than
one pixel per time unit without keeping the fraction.
I thought that was clever. That was in 1980, and
keeping the arithmetic in 8 bits was a big win.

What's the application for which you think probabilistic
counters make sense today? What are you counting that
would overflow 64 bits?

Paul A. Clayton

unread,

Dec 10, 2012, 8:05:01 PM12/10/12

to

On Monday, December 10, 2012 7:49:58 PM UTC-5, Mark Thorson wrote:
[snip]

> I first heard about probabilistic counters when I
> mentioned my need to keep a fractional pixel position
> to support slow-moving objects in my animation program.
> A guy who designed video games for Mattel's Intelevision
> told me a trick he used was to use a probabilistic
> counter so that objects could be moving at less than
> one pixel per time unit without keeping the fraction.
> I thought that was clever. That was in 1980, and
> keeping the arithmetic in 8 bits was a big win.

As I noted in a follow-up post, Wikipedia (through
Google) indicates that the logarithmic probabilistic
counter was recognized in 1977; the shifted form may
have been recognized significantly earlier.

> What's the application for which you think probabilistic
> counters make sense today? What are you counting that
> would overflow 64 bits?

The application was intended as a general technique
to support large numbers of hardware counters, so
the storage savings was the main consideration.

For software, there appears to be a significant
use for approximate counting, mainly when storage
overhead of an exact counter would be too great
because one is dealing with "big data". (In theory,
cache capacity or memory bandwidth concerns could
justify lossy compression--which is effectively
what approximate counting is.)

Terje Mathisen

unread,

Dec 11, 2012, 3:02:37 AM12/11/12

to

Paul A. Clayton wrote:
> For software, there appears to be a significant
> use for approximate counting, mainly when storage

The SW interface Novell developed for their distributed Directory
Service included APIs to display a screenfull of data from lists of
arbitrary size, even when the actual total size was unknown (because
most of the data might reside at other servers?).

This API (afair) used percentages to move to approximate locations in
the total list, then allowed you to scroll up/down very efficiently.

> overhead of an exact counter would be too great
> because one is dealing with "big data". (In theory,
> cache capacity or memory bandwidth concerns could
> justify lossy compression--which is effectively
> what approximate counting is.)

A FP counter stops working when the count has passed about 2^54, since
at that point each new increment will be less than 0.5 ulp.

A probabilistic counter corresponds to keeping an additional word of
extra resolution.

I do have a problem seeing any situation where you are so size limited
today, and where you will have so many counters simultaneously growing
past 2^64 or some other "reasonable" size that you cannot employ
compression instead...

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Paul A. Clayton

unread,

Dec 11, 2012, 7:53:05 AM12/11/12

to

On Tuesday, December 11, 2012 3:02:37 AM UTC-5, Terje Mathisen wrote:
> Paul A. Clayton wrote:
>> For software, there appears to be a significant
>> use for approximate counting, mainly when storage
>
> The SW interface Novell developed for their distributed Directory
> Service included APIs to display a screenfull of data from lists of
> arbitrary size, even when the actual total size was unknown (because
> most of the data might reside at other servers?).
>
> This API (afair) used percentages to move to approximate locations in
> the total list, then allowed you to scroll up/down very efficiently.

Interesting. Thanks for sharing.

>> overhead of an exact counter would be too great
>> because one is dealing with "big data". (In theory,
>> cache capacity or memory bandwidth concerns could
>> justify lossy compression--which is effectively
>> what approximate counting is.)
>
> A FP counter stops working when the count has passed about 2^54, since
> at that point each new increment will be less than 0.5 ulp.
>
> A probabilistic counter corresponds to keeping an additional word of
> extra resolution.
>
> I do have a problem seeing any situation where you are so size limited
> today, and where you will have so many counters simultaneously growing
> past 2^64 or some other "reasonable" size that you cannot employ
> compression instead...

(A probabilistic counter _is_ a form of [lossy] compression.)

It seems there is a miscommunication. I was not thinking
of double (or even single) precision floating point but of,
e.g., 8-bit FP or even smaller. I was also thinking of
microarchitectural uses (which would be prediction-oriented
and so tolerant of some information loss and for which
compatibility is less of an issue and overhead would be a
significant issue [NRE forcing mass production which in
turn broadens the scope over which the cost must be
justified {though if a largish niche did benefit greatly,
a very large price premium for not fusing off the
functionality might allow some wasting of area for other
uses; also storage seems to be much easier to reassign to
alternative uses}; reducing the resource requirements may
also make implementing a novel mechanism acceptable
{even if the simulations indicate that providing more
resources would have a greater ROI, uncertainty of
knowledge may motivate a more conservative approach}]).

FP approximate counters _might_ be interesting for
something like a Bloom filter if the counts themselves
provided useful information since it could provide
exact counts for low magnitude counts (which could be
returned to zero without a reset of the filter) while
supporting very large values (which would saturate at
the minimum and maximum inexact values). Perhaps
such could be useful as an L1 filter where the L2
would not be probed if the counts in L1 were too high
(indicating that L2 would be likely to produce a
'hit' also)??

Anyway, I do not know what uses there might be for
such approximate counters, but given that they have
been used by software it seems reasonable that there
would be some potential microarchitectural use.

Stream of consciousness thought: there might be some
use for a timestamp composed of two timestamps, an
epoch number and a value from a portion of that epoch,
i.e., removing intermediate bits. (This is not FP
in that an exponent is not used and the 'mantissa' is
not the most significant bits, but it would seem to
have somewhat similar properties.) Such could
generally distinguish very old (different epoch) from
old (same epoch but different [unrecorded]
intermediate bits) and from young (same epoch and
intermediate bits), which might be useful. (It
might be useful to adjust the time count by a larger
increment when changing epoch--i.e., the first
sub-epoch would not use the lowest counts--or perhaps
"rotate" the count each sub-epoch--i.e., the actual
count would be C - N.)

One might even be able use knowledge of the
approximate nature of such information as confidence
information (e.g., to weigh predictor inputs or
balance against misprediction costs). E.g., for two
approximate timestamps, the probability that one is
larger than the other by some threshold can be
determined based on the probability that values will
be updated.

As noted in a previous post, a probabilistic counter
could be updated with variable probability or
variable amounts for different events which could
include the same events in different phases.

Andy (Super) Glew

unread,

Dec 11, 2012, 2:03:55 PM12/11/12

to

Many academic papers in computer architecture posit changes to hardware
datastructures like "Let's keep an N-bit counter in every cache line or
BTB entry".

Probabilistic counters can make this cheaper.

Of course, one needs a cheap good-enough source of (pseudo-)randomness
to use probabilistic counters at this granularity.

Related:

* probabilistic maintenance of running averages, both perpetual and in
some weighted window.

* probabilistic algorithms looking for most common associations and
subsequences. Used in predictors.

--
The content of this message is my personal opinion only. Although I am
an employee (currently of MIPS Technologies; in the past of companies
such as Intellectual Ventures and QIPS, Intel, AMD, Motorola, and
Gould), I reveal this only so that the reader may account for any
possible bias I may have towards my employer's products. The statements
I make here in no way represent my employers' positions on the issue,
nor am I authorized to speak on behalf of my employers, past or present.

Walter Banks

unread,

Dec 12, 2012, 2:45:37 PM12/12/12

to

Mark Thorson wrote:

> I first heard about probabilistic counters when I
> mentioned my need to keep a fractional pixel position
> to support slow-moving objects in my animation program.
> A guy who designed video games for Mattel's Intelevision
> told me a trick he used was to use a probabilistic
> counter so that objects could be moving at less than
> one pixel per time unit without keeping the fraction.
> I thought that was clever. That was in 1980, and
> keeping the arithmetic in 8 bits was a big win.
>
> What's the application for which you think probabilistic
> counters make sense today?

A lot of years ago I did a variant on this approach to
implement a d-a converter. I did not have any timers
to implement a PWM so when ever possible I
compared a random number to the analog value
I wanted and set a I/O pin high if value > random.

Worked very well.

w..

Niels Fröhling

unread,

Dec 16, 2012, 2:28:37 PM12/16/12

to

> * probabilistic algorithms looking for most common associations and
> subsequences. Used in predictors.

This is where context-models use probabilistic counters at "char" size (8bit).
The more contexts you can capture and blend the better the prediction (and
compression if you use the predictor for binary data compression). It is very
easy to understand that there is no question that if you can use 4 billion
contexts (8bit prob. counters) or 1 billion (32bit regular counters) which you
choose.

Most contexts are inactive, but you have to maintain them for a reasonably long
time as otherwise you could also flip coins (empty context -> 50/50) and the
predictor performance is less good, or you fall back to lower order contexts
with about the same penalty in efficiency.
State-of-the-art PPM-models (PPMd) and CM-models (PAQ) use probabilistic
counters, and sometimes logarithmic counters.

Imagine brain-cells would be four times the size, you'd need to evict a lot more
information constantly from it than you're used to ... ;^)

Salud
Niels

Andy (Super) Glew

unread,

Apr 6, 2013, 1:02:26 AM4/6/13

to

On 12/10/2012 4:49 PM, Mark Thorson wrote:

Many academic papers in computer architecture posit changes to hardware
datastructures like "Let's keep an N-bit counter in every cache line or
BTB entry".

Probabilistic counters can make this cheaper.

Of course, one needs a cheap good-enough source of (pseudo-)randomness
to use probabilistic counters at this granularity.

Related:

* probabilistic maintenance of running averages, both perpetual and in
some weighted window.

* probabilistic algorithms looking for most common associations and
subsequences. Used in predictors.