Software errors and working memory

29 views
Skip to first unread message

Gwern Branwen

unread,
Feb 21, 2012, 4:12:28 PM2/21/12
to N-back
http://dl.dropbox.com/u/5317066/1997-hatton.pdf "Reexamining the Fault
Density–Component Size Connection", Les Hatton

Is this right? I have no idea. But it is a curious collection of
studies and an interesting proposed model:

> For years I subscribed to such a principle: that modularization, or structural decomposition, is a good design concept and therefore always improves systems. This belief is so widespread as to be almost unchallengeable. It is responsible for the important programming language concept of compilation models—which are either separate, with guaranteed interface consistency (such as C++, Ada, and Modula-2), or independent, whereby a system is built in pieces and glued together later (C and Fortran, for example). It is a very attractive concept with strong roots in the “divide and conquer” principle of traditional engineering. However, this conventional wisdom may be wrong. Only those components that fit best into human short-term memory cache seem to use it effectively, thereby producing the lowest fault densities. Bigger and smaller average component sizes appear to degrade reliability.
>
> ...It is easy to get the impression from these case histories that developing software systems with low fault densities is exceedingly difficult. In fact, analysis of the literature reveals graphs such as that dependent faults per KLOC will approach an asymptote as time increases. In reality, only this asymptote makes sense for comparing the reliability of different systems. So, given that the asymptote can never be reached, the faults per KLOC and the rate of change of this value are required to compare such systems effectively.
>
> Of course, real systems are subject to continual noncorrective change, so things become rather more complex. No notion of rate of change of faults per KLOC was available for any of the data in this study, although both mature and immature systems were present, with the same behavior observed. This would suggest that the observed defect behavior is present through the life cycle, supporting even further the conjecture that it is a macroscopic property. If only immature systems had been present in the studies, it could have been argued that smaller components may get exercised more. This does not seem to be the case.
>
> A further related point, also observed in the NAG library study, is that when component fault densities are plotted as a function of size, the usage of each component must be taken into account. The models discussed in this article are essentially asymptotic, and the fault densities they predict are therefore an envelope to which component fault densities will tend only as they are used sufficiently to begin to flush out faults. An unused component has complexity but no faults, by definition. The literature reports apparently near-zero-defect systems that have turned out on closer inspection to have been unused. shown in Figure 2. This data was compiled from NASA Goddard data by the University of Maryland’s Software Engineering Laboratory, as quoted in the December 1991 special edition of Business Week. First of all, in spite of NASA’s enormous resources and talent pool, the average was still five to six faults per KLOC. Other studies have reported similar fault densities.4,8 More telling is the observation that in Figure 2, improvement has been achieved mostly by improving the bad processes, not the good ones. This fact suggests that consistency, a process issue, has improved much more than actual fault density, a product issue. The simple conclusion is that the average across many languages and development efforts for “good” software is around six faults per KLOC, and that with our best techniques, we can achieve 0.5–1 fault per KLOC. Perfection will always elude us, of course, but the intractability of achieving systematically better fault densities than have been achieved so far also suggests that some other limitation may be at work.
>
> THE PROPOSED MODEL
>
> ...Recovery code scrambling is an important factor in my proposed model. The evidence suggests that anything that fits in a short-term or cache memory is easier to understand and less fault-prone; pieces that are too large overflow, involving use of the more error-prone recovery code mechanism used for long-term storage. Thus, if a programmer is working with a component of complexity Ω, and that component fits entirely into the cache or short-term memory, which in turn can be manipulated without recourse to back-up or long-term memory, the incremental increase in bugs or disorder dE due to an incremental increase of complexity of dΩ is simply
>
> dE = (1/Ω) dΩ.
>
> This resembles the argument leading to Boltzmann’s law relating entropy to complexity, where the analogue of equipartition of energy in a physical system is mirrored by the apparently equal distribution of rehearsal activity in the short-term memory. In other words, because no part of the cache is favored and the cache accurately manipulates symbols, the incremental increase in disorder is inversely proportional to the existing complexity, making the ideal case when pieces just fit into cache. It is assumed without loss of generality that both E and Ω are continuously valued variables.
>
> What happens when we encounter complexity greater than Ω′ (the complexity which will just fit into the cache)? The increase in disorder will correspond to the complexity in the (now-full) cache contents, plus a contribution proportional to the number of times the cache memory must be reloaded from the long-term memory. In other words,
>
> dE = (1/2*Ω)' * (1 + Ω/Ω') * dΩ
>
> The factor of 1/2 matches Equation 1 when Ω = Ω′, that is, when the complexity of the program is about to overflow the cache memory. The second term is directly proportional to the cache overflow effect and mimics the scrambling of the recovery codes. Integrating Equations 1 and 2 suggests that
>
> E = log Ω for Ω ≤ Ω′
>
> and
>
> E = 1/2 * (Ω/Ω' + Ω^2/2*Ω'^2) for Ω > Ω'
>
> ...The Ada data and the assembly and macro-assembly data provide strong empirical support for this behavior, with about 200 to 400 lines corresponding to the complexity Ω′ at which cache memory overflows into longterm memory. That such disparate languages can produce approximately the same transition point from logarithmic to quadratic behavior supports the view that Ω is not the underlying algorithmic complexity but the symbolic complexity of the language implementation, given that a line of Ada would be expected to generate five or more lines of assembly. This is directly analogous to the observation that it is fit, rather than the actual information content of the cache that is relevant.9
>
> ...To summarize, if a system is decomposed into pieces much smaller than the short-term memory cache, the cache is used inefficiently because the interface of such a component with its neighbors is not “rehearsed” explicitly into the cache in the same way, and the resulting components tend to exhibit higher defect densities. If components exceed the cache size, they are less comprehensible because the recovery codes connecting comprehension with long-term memory break down. Only those components that match the cache size well use it effectively, thereby producing the lowest fault densities.
>
> ...Suppose that a particular functionality requires 1,000 “lines” to implement, where a “line” is some measure of complexity. The immediate implication of the earlier discussion is that, to be reliable, we should implement it as five 200-line components (each fitting in cache) rather than as 50 20-line components. The former would lead to perhaps 5 log_10(200) = 25 bugs while the latter would lead to 50 × log_10(20) = 150 bugs. This apparently inescapable but unpleasant conclusion runs completely counter to conventional wisdom. ...The additional unreliability caused by splitting up the system might be due to simple interface inconsistencies. The Basili–Perricone study considered this a possible explanation, as did Moller–Paulish. However, it was not a factor in the Hatton–Hopkins study, since the internally reusable components in the NAG library (largely externally used reusable components) had high interface consistency. Furthermore, it is unlikely to explain the Compton–Withrow data because Ada mandates interface consistency in language implementations. (This may be responsible for the difference in small components in Figure 4.)

--
gwern
http://www.gwern.net

Reply all
Reply to author
Forward
0 new messages