New issue 14 by kyle.tha...@gmail.com: Means and Standard Deviation should
be from random sample in large data sets
http://code.google.com/p/ifcsoft/issues/detail?id=14
If a data set is very large (say over maybe 10,000), rather than go through
all points to find mean and especially standard deviation, the program
should sample at most 10,000 random points from the data set to do so.
Mean might be left going through all since it can be computed with min and
max, but standard deviation greatly slows down the start of an SOM
calculation with large data sets (it is used for the default variance
normalization).