Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Demo of ATP stat probs with random variates

10 views
Skip to first unread message

David J. Braden

unread,
Oct 28, 2003, 2:04:35 PM10/28/03
to
MS has made significant strides in improving its stat funtionality with
Version 11; no new functions, but much better delivery on promised ones.

In particular, see
http://support.microsoft.com/default.aspx?kbid=829208&product=xl2003

a new page that, after a brief scan, looks good, and much more
forthcoming than MS has been in the past re its Excel shortcomings.

MS concedes that the random number generator within the ATP is
problematic, even as of Version 11. They also give what, at first
glance, look some nice workarounds. While RANDBETWEEN is an ATP-supplied
worksheet function, it uses the Excel function RAND(), so it has
improved markedly from version 10 to 11 as a result. On the other hand,
the variate generators that give static draws from each of 6
distributions (through the ATP Data Analysis Wizard) rely on the ATP's
own generator, which is so bad that it's interesting.

To see for yourself, use the code below.
Open a VBA module, paste the code in, and run it per instructions. MS
was made aware of this (using the following code) during its version 11
beta, and while not fixing the problem, provides some reasonable
alternatives. I thank the developers for pushing for the improvements
this time around, and look forward to further fixes.

Additionally, folks might benefit from code below to see how to form
histograms.

Beware of line-wrap.

HTH
Dave Braden
MVP-Excel, and no toady.

=====================================
Option Explicit

'AnalyzeThis and AnalyzeThis2 (functionally equivalent)

'Written by David J. Braden, 04 Feb 2003

'These routines help investigate large samples from Microsoft Excel's
' Random Variate Generators, found in its Analysis Toolpak (ATP)
'Output is a histogram of a histogram, so to speak, to determine the
number
' of unique variates, doublets, triplets, etc.

'IMPORTANT!!
' "AnalyzeThis" assumes your version of Excel can handle large arrays.
' I suspect that most installed versions cannot (typical limit is about
4,000 variants)
' If you run into a problem with AnalyzeThis, try "AnalyzeThis2"
instead.
' It is functionally equivalent, though slower, than AnalyzeThis.

'Directions:
' Use the ATP to generate, say, 25000 Uniform variates *in a new ply*.
' The ATP will have the output selected. While it is still selected,
run
' the macro AnalyzeThis (or AnalyzeThis2) to create the diagnostics.

'What the output means:
' The first table shows the values, and number of repeats, for the
sample's
' lowest and highest values.

' To understand the 2nd table: Suppose under "#Occurences" you see the
value 3, and in
' the same row, under "#Data Values", the value 1095. That means that
1,095 values in
' the sample have been repeated 3 times.

' There should be absolutely *no* repeats in anything Excel could
*ever* deal
' with, nor should there be any occurences of 0 and 1 in the Uniform
generator.

'NOTES:
' (1) The ATP won't generate 32k or more variates for any version of
Excel up
' through ver. 10
' (2) These routines will work equally well on any column that contains
at least 2
' non-empty cells
' (3) If a legitimate selection contains blank cells, they will be
included in the stats;
' Excel sorts these to the end of a list, so they will be "the
highest value"
' (4) Error-trapping is minimal. This ain't production code, folks.


'''''Globals ''''''''''''''''''''''''''
Const errSelectionMsg = "Current Selection must be a single column of
cells, " & vbCr & _
"at least two of which are non-empty."
Const errSelection = 1000

''''''''''''''''''''''''''''''''''''''''''''''''
Public Sub AnalyzeThis()
Dim vData As Variant, vSupport() As Variant, lFreq() As Long
Dim vFreqSupport() As Variant, lFreqFreq() As Long
Dim l As Long, lF As Long, r As Range
Dim lCard As Long

On Error GoTo HandleErr
If Not RngOK(Selection) Then Err.Raise errSelection, , errSelectionMsg

Application.ScreenUpdating = False

'avoid existing data
Selection.Offset(0, 1).Resize(1, 3).EntireColumn.Insert

'set up a temporary spot to sort data, read it in; then clear
Set r = Selection.Offset(0, 1)
Selection.Copy r
With r
.Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlNo
vData = Application.Transpose(.Value)
.ClearContents
End With
lCard = UBound(vData)

'form histogram of original data
l = MakeHist(vData, vSupport, lFreq)

'sort lFreq, load it into vData
With r.Resize(l)
.Value = Application.Transpose(lFreq)
.Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlNo
vData = Application.Transpose(.Value)
.ClearContents
End With

'form histogram of histogram's counts
lF = MakeHist(vData, vFreqSupport, lFreqFreq)

'Report results
Set r = r.Resize(1, 3)
r = Array("", "Value", "#Repeats")
r.Offset(1) = Array("Lowest:", vSupport(1), lFreq(1))
r.Offset(2) = Array("Highest:", vSupport(l), lFreq(l))
r.Offset(4) = Array("#Occurences", "#Data Values", "% of Data")
r(6, 1).Resize(lF) = Application.Transpose(vFreqSupport)
r(6, 2).Resize(lF) = Application.Transpose(lFreqFreq)
For l = 1 To lF
vFreqSupport(l) = vFreqSupport(l) * lFreqFreq(l) / lCard
Next
r(6, 3).Resize(lF) = Application.Transpose(vFreqSupport)

Exit Sub

HandleErr:
MsgBox "Error " & Err.Number & vbCr & Err.Description
End Sub

Private Function MakeHist(vData As Variant, vSup() As Variant, lFreq()
As Long)
'Returns length of output arrays
'Creates two arrays containing the support and frequency of vData.
'*** Assumes vData is already sorted

Dim l As Long, lUB As Long, lCount As Long, lOutIndex As Long
Dim vlast As Variant

lUB = UBound(vData)
ReDim vSup(1 To lUB) As Variant: ReDim lFreq(1 To lUB) As Long

vlast = vData(1): lCount = 1: lOutIndex = 1

For l = 2 To lUB
If vData(l) = vlast Then
lCount = lCount + 1
Else
vSup(lOutIndex) = vlast: lFreq(lOutIndex) = lCount
vlast = vData(l): lCount = 1: lOutIndex = lOutIndex + 1
End If
Next
vSup(lOutIndex) = vlast: lFreq(lOutIndex) = lCount
ReDim Preserve vSup(1 To lOutIndex): ReDim Preserve lFreq(1 To
lOutIndex)
MakeHist = lOutIndex
End Function

'**************************************************
'**************************************************

Public Sub AnalyzeThis2()
Dim l As Long, lF As Long, r As Range
Dim lCard As Long, vRept(1 To 3, 1 To 3)

On Error GoTo HandleErr
If Not RngOK(Selection) Then Err.Raise errSelection, , errSelectionMsg

Application.ScreenUpdating = False

lCard = Selection.Cells.Count
'avoid existing data
Selection.Offset(0, 1).Resize(1, 6).EntireColumn.Insert

'sort the data
Set r = Selection.Offset(0, 1)
Selection.Copy r
r.Sort Key1:=r(1, 1), Order1:=xlAscending, Header:=xlNo

'form histogram of original data
MakeHist2 r, r(1, 2), l

' Report initial results
With r.Resize(1, 3)
.Offset(0, 3) = Array("", "Value", "#Repeats")
.Offset(1, 3) = Array("Lowest:", r(1, 2), r(1, 3))
.Offset(2, 3) = Array("Highest:", r(l, 2), r(l, 3))
.Offset(4, 3) = Array("#Occurences", "#Data Values", "% of Data")
End With

'Sort Freq data, and make histogram of it
Set r = r(1, 3).Resize(l)
r.Sort Key1:=r(1, 1), Order1:=xlAscending, Header:=xlNo
MakeHist2 r, r(6, 2), l 'output goes directly into final table

For l = 1 To l ' percentages of data with various degrees of
replication
r(5 + l, 4) = r(5 + l, 2) * r(5 + l, 3) / lCard
Next
Selection(1, 2).Resize(1, 3).EntireColumn.Delete

Exit Sub

HandleErr:
MsgBox "Error " & Err.Number & vbCr & Err.Description
End Sub

Private Sub MakeHist2(rData As Range, rOut As Range, lLen As Long)
'Returns length of output arrays
'Creates two columns as rOut containing the suport and frequency of
rData.
'*** Assumes rData is already sorted

Dim lS As Long, lCount As Long
Dim vlast As Variant, rC As Range

lS = rOut.Row
vlast = rData(1, 1)

For Each rC In rData.Cells
If rC = vlast Then
lCount = lCount + 1
Else
rOut = vlast: rOut(1, 2) = lCount
vlast = rC: lCount = 1
Set rOut = rOut(2)
End If
Next
rOut = vlast: rOut(1, 2) = lCount
lLen = rOut.Row + 1 - lS
End Sub

Private Function RngOK(r As Range) As Boolean
If r.Columns.Count > 1 Then Exit Function
If Application.CountA(r) < 2 Then Exit Function
RngOK = True
End Function

--
(ROT13) qoe...@zicf.bet

David J. Braden

unread,
Oct 28, 2003, 2:32:45 PM10/28/03
to
I forgot to add that in the aforementioned page, MS claims that
"Like Excel 2002 and earlier versions of RAND, the ATP's separate random
number generator is known to perform poorly on standard tests of
randomness and to have a short repetition cycle. ***This has negative
implications only if you require a very lengthy sequence of random
numbers (for example, 1 million).***" (emphasis mine).

Not at all true, as the code I posted will easily show. Not only will
you get, depending on the seed, repetitions in as few as 500 draws, but
the Uniform generator returns both 0 and 1; seriously inconvenient it
you intend to use the data to construct other variates. Of course, if
you consider 500 "very lengthy", then deeper tests surface just how
goofy it is. I suppose you are relatively safe with a single variate,
but even then, it has positive probability of being whacky.

IOW, don't use the ATP at all for variate-generation (except, as of
Version 11, for RANDBETWEEN()). A free alternative that is far more
powerful, and far more sohisticated, is PopTools (compiled only for the
Windows OS). Many (most?) commercial packages for Monte Carlo simulation
are other alternatives.

HTH
Dave Braden
MVP - Excel

In article <dbraden-9CCE63...@msnews.microsoft.com>,


"David J. Braden" <dbr...@real.address.below> wrote:

> MS has made significant strides in improving its stat funtionality with
> Version 11; no new functions, but much better delivery on promised ones.
>
> In particular, see
> http://support.microsoft.com/default.aspx?kbid=829208&product=xl2003
>
> a new page that, after a brief scan, looks good, and much more
> forthcoming than MS has been in the past re its Excel shortcomings.
>
> MS concedes that the random number generator within the ATP is
> problematic, even as of Version 11. They also give what, at first
> glance, look some nice workarounds. While RANDBETWEEN is an ATP-supplied
> worksheet function, it uses the Excel function RAND(), so it has
> improved markedly from version 10 to 11 as a result. On the other hand,
> the variate generators that give static draws from each of 6
> distributions (through the ATP Data Analysis Wizard) rely on the ATP's
> own generator, which is so bad that it's interesting.
>
> To see for yourself, use the code below.
> Open a VBA module, paste the code in, and run it per instructions. MS
> was made aware of this (using the following code) during its version 11
> beta, and while not fixing the problem, provides some reasonable
> alternatives. I thank the developers for pushing for the improvements
> this time around, and look forward to further fixes.
>
> Additionally, folks might benefit from code below to see how to form
> histograms.
>

<<snip>>

--
(ROT13) qoe...@zicf.bet

0 new messages