On Tuesday, December 11, 2012 3:02:37 AM UTC-5, Terje Mathisen wrote:
> Paul A. Clayton wrote:
>> For software, there appears to be a significant
>> use for approximate counting, mainly when storage
>
> The SW interface Novell developed for their distributed Directory
> Service included APIs to display a screenfull of data from lists of
> arbitrary size, even when the actual total size was unknown (because
> most of the data might reside at other servers?).
>
> This API (afair) used percentages to move to approximate locations in
> the total list, then allowed you to scroll up/down very efficiently.
Interesting. Thanks for sharing.
>> overhead of an exact counter would be too great
>> because one is dealing with "big data". (In theory,
>> cache capacity or memory bandwidth concerns could
>> justify lossy compression--which is effectively
>> what approximate counting is.)
>
> A FP counter stops working when the count has passed about 2^54, since
> at that point each new increment will be less than 0.5 ulp.
>
> A probabilistic counter corresponds to keeping an additional word of
> extra resolution.
>
> I do have a problem seeing any situation where you are so size limited
> today, and where you will have so many counters simultaneously growing
> past 2^64 or some other "reasonable" size that you cannot employ
> compression instead...
(A probabilistic counter _is_ a form of [lossy] compression.)
It seems there is a miscommunication. I was not thinking
of double (or even single) precision floating point but of,
e.g., 8-bit FP or even smaller. I was also thinking of
microarchitectural uses (which would be prediction-oriented
and so tolerant of some information loss and for which
compatibility is less of an issue and overhead would be a
significant issue [NRE forcing mass production which in
turn broadens the scope over which the cost must be
justified {though if a largish niche did benefit greatly,
a very large price premium for not fusing off the
functionality might allow some wasting of area for other
uses; also storage seems to be much easier to reassign to
alternative uses}; reducing the resource requirements may
also make implementing a novel mechanism acceptable
{even if the simulations indicate that providing more
resources would have a greater ROI, uncertainty of
knowledge may motivate a more conservative approach}]).
FP approximate counters _might_ be interesting for
something like a Bloom filter if the counts themselves
provided useful information since it could provide
exact counts for low magnitude counts (which could be
returned to zero without a reset of the filter) while
supporting very large values (which would saturate at
the minimum and maximum inexact values). Perhaps
such could be useful as an L1 filter where the L2
would not be probed if the counts in L1 were too high
(indicating that L2 would be likely to produce a
'hit' also)??
Anyway, I do not know what uses there might be for
such approximate counters, but given that they have
been used by software it seems reasonable that there
would be some potential microarchitectural use.
Stream of consciousness thought: there might be some
use for a timestamp composed of two timestamps, an
epoch number and a value from a portion of that epoch,
i.e., removing intermediate bits. (This is not FP
in that an exponent is not used and the 'mantissa' is
not the most significant bits, but it would seem to
have somewhat similar properties.) Such could
generally distinguish very old (different epoch) from
old (same epoch but different [unrecorded]
intermediate bits) and from young (same epoch and
intermediate bits), which might be useful. (It
might be useful to adjust the time count by a larger
increment when changing epoch--i.e., the first
sub-epoch would not use the lowest counts--or perhaps
"rotate" the count each sub-epoch--i.e., the actual
count would be C - N.)
One might even be able use knowledge of the
approximate nature of such information as confidence
information (e.g., to weigh predictor inputs or
balance against misprediction costs). E.g., for two
approximate timestamps, the probability that one is
larger than the other by some threshold can be
determined based on the probability that values will
be updated.
As noted in a previous post, a probabilistic counter
could be updated with variable probability or
variable amounts for different events which could
include the same events in different phases.