On Tuesday, October 9, 2012 10:06:15 PM UTC-4, Greg Heath wrote:
> Newsgroups: comp.ai.neural-nets
>
> From: TomH488 <
tom...@gmail.com>
>
> Date: Fri, 5 Oct 2012 16:01:15 -0700 (PDT)
>
> Local: Fri, Oct 5 2012 7:01 pm
>
> Subject: Input Pre-Processing
>
> >On Oct 5, 7:01 pm, TomH488 <
tom...@gmail.com> wrote:
>
> > Greg,
>
> >
>
> > First, I have read your robust list of what to do.
>
>
>
> What list?
>
"Search the CANN Google Group archives
greg heath pre training advice"
its in there (!) and perhaps one of the most useful accumulations of information yet.
>
>
> > There are a few procedures that I have questions about:
>
> > __________________
>
> >
>
> > YOUR ITEM #4: Use the data matrix correlation coefficient matrix CC=COFFCOEF(Z) to identify:
>
> > a. undesirable low corr between y and col of x.
>
> > b. undesirable high corr between col of x.
>
> >
>
> > Either one is not too bad a thing to do by hand, but doing both "by eye" is a bit much. I'm working with 186 inputs that I need to cull down.
>
>
>
> I don't know what you mean "by eye"?
>
When you have to make a compromise selection such as not so bad Cxx yet not so good Cxy, do you keep or cull? Besides, I have no idea how to make that decision quantitatively anyhow - my guess it depends on the model which makes it empirical. Or can the decision be calculated?
>
>
> The documentation explains how to use additional confidence/
>
> significance level outputs to quantify the significance of the 186
>
> input-output linear correlations and the 186*185/2 input-input cross
>
> correlations.
>
First, just so I know we're talking about the same thing, I call
Cxx the input-input cross corr which are probably bad if near 1.0
Cxy the in-out linear corr which are of no value if near 0.0
Are you now referring to "Steps 7 & 8"
7) Test the strength of combined linear I/O correlations by calculating the mean square errors, and
8) If I/O correlations are sufficiently high, the training goal MSE = 0.01*MSE0 is reasonable
The don't exactly understand the notation, but I get the idea we're talking about training the network with a single input column and a simplified y (mean, linear regression line, ???) ??? I really have no idea here, especially how to interpret the results.
Again, NO hablo MATLAB
>
>
> help corrcoef
>
> doc corrcoef
>
>
>
> If input cross correlations are significant you might want to
>
> immediately remove some of the "redundant" variables or transform the
>
> inputs to be uncorrelated.
>
My "independent" culls (based on Cxx only or Cxy only) consist of omitting Cxx above 0.8, and Cxy within .05 of zero. The philosophy is that these are so bad, it doesn't matter how good the other C.. would have been.
Again, what to do with the ones that are not so bad.
Would PCA or better yet, Discriminant Analysis (or PLS, Partial Least Squares) simply remove all concern about culling inputs? My experience from structural modal analysis says yes, a simple way to deal with input size reduction.
>
>
> You may also want to check cross correlations of y with 186 quadratic
>
> (xi^2) and some of the 186*185/2 (ugh) interaction (xi*xj) terms.
>
Don't know how to do this. But PLS is sounding like a simple solution for "manual culling based Cxx & Cxy correlations."
>
>
> > NOTE: I added 43 World Stock Indices and a unit Lag and many have to go.
>
> >
>
> > Sounds like MATLAB has STEPWISE and STEPWISEFIT to automate this. I was thinking of a selection algorithm but really don't know enough of the mathematics to make a trade between a. and b. when it occurs.
>
> > Any suggestions how to wade through this expeditiously?
>
>
>
> No guarantees.
>
>
>
> STEPWISEFIT and STEPWISE(Gui version) are useful for models that are
>
> linear in the coefficients ( LIC: e.g., polynomials). The chosen
>
> inputs are good, but not nececessarily optimal for these models, much
>
> less the nonlinear NN models.
>
>
>
> Nevertheless, I usually start by comparing backward and forward
>
> results for a linear in variable (LIV) model and, I might do the same
"backward and forward results" = ?
>
> for a linear in interactions (xi*xj) or a pure quadratic (xi^2) model.
>
>
>
> > _________________________
>
> >
>
> > Apparent popularity of using ACF and PACF to determine LAGS for a Forecasting Prediction.
>
>
>
> ACF = Autocorrelation function?
>
Yes
>
>
> PACF = ??
>
Partial Auto correlation function
>
>
> > One paper said:
>
> >
>
> > 1) use L1...Ln which have significant PACF's, AND
>
> > 2) Use Li of the 4 top ACF's.
>
> >
>
> > That is clear enough for predicting Price from Price input, however, I have some other issues to deal with:
>
>
>
> You are tring to predict price? What do you mean by "price input"?
>
The most simplistic forecasting modesl put price lags in the input. So I call that "price" input.
> >
>
> > ISSUE ONE:
>
> >
>
> > I have many other non-price inputs that I include LAGS. Since they are not Price, they should be not-uncorrelated with Output YET STILL not-correlated to each other. So should a candidate LAG be REQUIRED to satisfy both constraints? or if not possible, should the least offensive ones be used or should NO lags be used since none were suitable?
>
> >
>
> > Well, that's confusing enough.
>
>
>
> Yes it is. significant lags can be determined from auto and
>
> crosscorrelation functions.
>
I've never found any methodology of picking lags this way in the FAQs. Did I miss something?
Also, seems like we have 3 correlations in this discussion:
1) Correlation (generic term? needs adjective to be specific?)
2) Cross Correlation (these are the Cxx & Cxy?)
3) Auto Correlation (know this one)
4) Partial Auto Correlation (know this one)
Humm, if Cross Corr can help determine lags, then I don't know this method. The only one I have found is the ACF and PACF method.
Need Example (I work best from Examples - the more graphical the better. I am worst at working with mathematic equations - don't show me the matrix equations for a particular Nnet architecture, draw me the graph(!)
>
>
> > ISSUE TWO:
>
> >
>
> > While I am doing forecasting, I'm also doing pattern recognition - my "stock chart." Although these columns like like Price and dozens of lags, its really a set of Price Pixels.
>
>
>
> You lost me.
>
A "stock Chart" is an "image."
In pure pattern recognition, you have a video camera which takes a picture of a part a robot wants to pick up or a target that needs to be identified before it is shot at.
This is all pixel input. Each pixel is a column.
When you think of culling pixels, they would probably be "peripheral" ones.
What is even more interesting is, is there a way to define the input so that it is known which pixels are adjacent? That is, rather than have data from a random list of pixels, to have data from a x,y mesh of pixels. Or does it even matter.
>
>
> > So all this ACF, PACF, CORR stuff, I get the impression is has not too much to do with pixel input.
>
>
>
> I associate pixels with images. You lost me again.
>
A stock chart is an image too. It can be a (x,y) series too handled by 2 columns which is what a line function can be broken down into when pixels of the chart are pre-processed.
> >
>
> > NOTE: I was thinking if you were trying to predict the trajectory of a single black pixel image, you might look at a particular pixel output being determined by only a local neighborhood of pixels. Perhaps this would be a "PACF" style simplification.
>
> >
>
> > Nevertheless, I'm not sure what to do about my stock chart columns since they really violate the Price LAG method based on PACF and ACF.
>
> > _____________________
>
> >
>
> > I'm really sold on PCA (Principle Coordinate Analysis) but really Discriminant Analysis (also or now called PLS, Partial Least Squares?) due to the Parallel Cigar problem example.
>
> >
>
> > In Modal Analysis of mega-large finite element models, we always did Generalized Dynamic Reduction which found the first n requested Generalized Coordinates upon which we would then extract our eigenvalues.
>
> >
>
> > What could be simpler than reducing the inputs into their Optimum Classifying Coordinates and simply pick the first N terms for column inputs.
>
> >
>
> > Is it true, that you could load up the inputs with everything including the kitchen sink and simply let DA extract the Principle CS? If you had poorly separated inputs and poor corr w/outputs, wouldn't those "garbage" columns simply not contribute to the Prin CS determined by DA?
>
> >
>
> > If yes, this would really make things easy.
>
> >
>
> > NOTE: It really makes modal analysis of a simple structure quite easy. However, with non-uniform, complex structures, maybe something like an airframe, you may be interested in eigenvalues which are determined by local structure. What is worse is that when you look at all the modes of the complete structure, these local, "fundamental" modes could be mode number 77, 132, 250, 537, and 1200. Clearly generalized dynamic reduction would require a tremendous number of DOF and would not be as efficient as hand selecting a set of nodes in the local area that would capture the desired mode shapes.
>
> >
>
> > I would guess THAT could be a phenomenon that might appear in neural network input reduction.
>
> >
>
> > What do you think?
>
> > ____________________
>
> >
>
> > Anyhow, I guess I'm "data mining" trying to find anything that is "different" from what I am currently using to give the net some "wisdom."
>
>
>
> You lost me on modal analysis as well as mixing classification with
>
> timeseries prediction.
>
Structural Modal Analysis is a Finite Element Technique where you build the model with nodes based on geometry and element which connect the nodes which introduce the mass and stiffness matricies between those nodes. Then you get the Characteristic Equations, extract the eigenvalues, and back-substitute and get the mode shapes.
Yes, the model is part forecasting and part image recognition. Should be 2 net architectures which feed another layer or two for their combination, however, no packages out there had all the options we needed. MemBrain would allow ANY architecture to be built but there were limitations in other areas.
>
>
> Greg.
Well, I think I've gotten through this one.
I really want to learn more about determining the correct Lags and also everything there is to know about input culling.
Thanks Greg!
Tom