On Wednesday, December 26, 2012 9:51:11 PM UTC-5, Mark Thorson wrote:
> "Paul A. Clayton" wrote:
>>
>> On Wednesday, December 26, 2012 5:43:56 PM UTC-5, Mark Thorson wrote:
>>> Let's say we make a distinction between hard and
>>> soft data.
>> [snip]
>>> Would it make sense to make this distinction at the
>>> hardware level?
>>
>> I am posting a quick response now, but I hope
>> to provide a bit more detail later.
>
> I meant to say "at the architecture level".
> That is, you might have functional units,
> memory regions, and maybe even registers
> that are less reliable than those used
> for hard data.
I suspect that in many cases, reliability requirements
will correlate with functional activity such that more
specialized hardware would be used. The dark silicon
idea--that there will be more transistors than can be
active and so specialization for lower utilization at
higher efficiency makes sense--would mesh well with
such cases.
However, I suspect that significant work will be done
to increase the flexibility of hardware because of the
costs of communication. I.e., the greater efficiency
of specialized hardware can be countered by the
greater cost of communicated data and control between
different units.
The _functional_ division between architecture and
microarchitecture is, of course, based less on
obvious distinctive aspects and more on availability
of information (compiler, software runtime,
hardware), sophistication and maturity of design
tools, etc. In (extremely abstract) theory, a
system similar to transactional memory with predictors
could handle assigning reliability aspects dynamically
without software assistance. The static and dynamic
hardware overhead of the predictors and rollback
mechanisms and the design complexity (particularly for
such an immature area) presumably makes such very
unattractive as an exclusive option. (However, in
cooperation with software, such might be made
useful eventually.)
> This might appear in the
> instruction set, at least when referring
> to operations on soft data. Alternatively,
> data might be tagged as soft, for example
> if it is read from a region of the address
> space assigned to soft memory. If one of
> the operands is tagged as soft, it's okay
> to use a soft FU on it.
For FU approximate computation, there seem to
be two approaches: deterministic and dependent
on arbitrary conditions (temperature, minor
voltage fluctuation, process variability, etc.).
A deterministic approach might be more friendly
to device testing (though residue techniques
and checker cores could provide reliability in
the presence of unreliable components), it
effectively just adjusts the logic table slightly
to allow a more energy-efficient sufficiently
approximate implementation.
(For storage one can adjust cell reliability [in
hardware design and/or with voltage and temperature
considerations] and ECC/EDC coverage.)
Michael Engel gave some links to some additional
information. I will add the following:
"Flikker: Saving DRAM Refresh-power through
Critical Data Partitioning" (2011; Song Liu et al.)
This paper proposes allocating less critical data
to DRAM that is refreshed less frequently and so
has a higher error rate. This idea could be
combined with Ravi K. Venkatesan et al.'s
Retention-Aware Placement in DRAM (RAPID) that
exploits DRAM retention variability (Their 2006
paper only used the variability to support very
low refresh rate when DRAM is not fully used.).
Reduced precision is similar in concept to
significance compression ("Very Low Power
Pipelines using Significance Compression", 2000,
Ramon Canal et al.). Significance compression
compresses redundancy in MSbits and is not lossy
while reduced precision tends to remove LSbits
and is lossy.
(Other lossless compression schemes have been
proposed, e.g., "Eliminating Energy of Same-
Content-Cell-Columns of On-Chip SRAM Arrays" (2011,
Bushra Ahsan et al. For some uses, lossy
compression techniques might be applied--reducing
precision is a relatively simple and effective form
of lossy compression.)
I seem to recall that an earlier AMD processor
used reduced precision multiply for the first step
in interative refinement for division/square root,
though this was (IIRC) for performance not energy
efficiency. Series calculations could perhaps
likewise exploit reduced precision for certain
terms.
Here is a list of some other papers somewhat
related to this subject that I happened to
encounter (most of which I have not yet read):
"Power Efficient Motion Estimation Using
Multiple Imprecise Metric Computations" (2007,
In Suk Chong and Antonio Ortega)
"Architecture Support for Disciplined Approximate
Programming" (2012, Hadi Esmaeilzadeh et al.)
"Shoestring: Probabilistic Soft Error Reliability
on the Cheap" (2010, Shuguang Feng et al.)--from
the abstract: "Shoestring is able to focus its
efforts on protecting statistically-vulnerable
portions of program code."
"Measuring Architectural Vulnerability Factors"
(2003, Shubhendu S. Mukherjee et al.)--mainly
interesting for how errors in different
microarchitectural structures have different
visibility in terms of program behavior.
"Software/Hardware Cooperative Approximate
Computation" (2011, Gennady Pekhimenko and
Kun Qian)--"The basic idea is to 1) identify
performance-critical events . . . whose
results can be predicted or ignored without
recovery and without degrading a level of
quality required by the user, and 2) value-
predict or ignore such events during dynamic
execution"
"Energy-Precision Tradeoffs in Mobile Graphics
Processing Units" (2008, Jeff Pool et al.)
"Probabilistic Counter Updates for Predictor
Hysteresis and Stratification" (2006, Nicholas
Riley and Craig Zilles)
"EnerJ: Approximate Data Types for Safe and
General Low-Power Computation" (2011, Adrian
Sampson et al.)--Michael Engel mentioned the
project behind this.
"Stochastic Computation" (2010, Naresh R.
Shanbhag et al.)--"This paper traces the roots
of stochastic computing from the Von Neumann
era into its current form."
"Eliminating Microarchitectural Dependency from
Architectural Vulnerability" (2009, Vilas
Sridharan and David R. Kaeli)--looks at the
program-based variability in error visibility.
"The Art of Deception: Adaptive Precision
Reduction for Area Efficient Physics
Acceleration" (2007, Thomas Y. Yeh et al.)
I do not know if any of the above would be
particularly helpful--I had not realized how
many papers I had downloaded and not read!--,
but added to Michael Engel's links such might
provide a starting place for further research.
> Thanks for your thoughts -- I hadn't thought
> about branch predictors.
Approximation works with a variety of predictors
(cache way predictors, prefetch engines, value
predictors, etc.) and not just branch predictors.
Renée St. Amant et al.'s "Low-Power, High-
Performance Analog Neural Branch Prediction"
(2008) used analog summation for a perceptron-
based branch predictor.
> I see this raises
> an issue with regard to deterministic
> behavior -- could pose problems for device
> test.
>
> "You want to do WHAT?"
Yes, this could be worse than the issues with
asynchronous logic.
(I hope the above was not too long and meandering.)