As a result of several email requests to correct newbie neural
net MATLAB code, let me give this pre-training advice:
1. Format your data vector in the standard form Z = [X Y] where the
columns of X are input variables and the columns of y are output
variables. For the purposes of this post I will assume y = Y is
an r dimensional column vector, size(Z) = [r c], and
X = [x1,x2,...xc-1].
2. Use TRANSPOSE and PRESTD to standardize the columns of Z. On
special occasions normalization to the bounded interval [-1,1]
(PREMNMX) is used for some columns. However, this is most
useful only if you know that all unknown data must fall
within the original bounds of the training data.
Transforming to a unipolar interval (e.g., [0,1]) tends to
slow down learning. In addition, stability, accuracy and
precision are inferior to that obtained from bipolar intervals
because training algorithm matrices are more poorly conditioned.
(See the input scaling discussions and demonstrations in the
comp.ai.neural-nets FAQ)
For classification into 2 classes, the output targets should be
either 0 or 1. Then, if mean-square-error or cross-entropy is
used as an objective function for learning, y can be interpreted
as a posterior probability for the "1" class, conditional on
the input X.
For classification into more than two classes, Y should have
two or more columns. Therefore it is outside the scope of this
post.
3. Use visual aids to better understand the data
a. Scatter plots of y vs xi (i=1,...c-1)
b. Histograms (HIST,HISTC)
4. Use the data matrix correlation cofficient matrix
CC = CORRCOEF(Z) to identify
a. undesirable low correlations between y and columns of X.
b. undesirable high correlations between columns of X.
5. Theoretically, X has full rank when rank(X) = min(r,c-1)).
For practical purposes, however, X should be treated as
rank deficient when the condition number COND(X) exceeds
~ 100. This indicates that input correlations may be high
enough to significantly degrade learning and input
dimensionality reduction is warranted.
6. Input dimensionality reduction can be achieved in many ways.
a. Use CORCOEFF to identify inputs which can be excluded
because they are less linearly correlated to the
output than they are to other inputs.
b. Use STEPWISEFIT OR STEPWISE to exlude inputs using
a more structured statistical approach.
c. Use PREPCA to project the input data into the subspace
spanned by the dominant eigenvectors of CC.
d. Use more complicated nonlinear neural network techniques.
7. Test the strength of combined linear I/O correlations by
calculating the mean square errors (MSE0~1,MSE1 and MSE2)
obtained by
a. A fit to the constant y0 = mean(y) (yielding MSE0 = 1-1/r)
b. A bias-free linear fit y1 = X*W1.
c. A full linearfit y2 = Xa*W2 where Xa is the augmented
input matrix Xa = [ones(r,1) X].
The truncated pseudoinverse PINV(Xa,tol) can be used when
the conditioning is too low for the slash operator solution
to be stable.
8. If I/O correlations are sufficiently high, the training goal
MSE = 0.01*MSE0 is reasonable. Therefore, *start* the hunt
with, e.g.,
net.trainParam.goal = 0.01*MSE0
net.trainParam.show = 10
net = newff(minmax(X'),[H-1 1],{'tansig' 'purelin'})
while keeping the other parameters at their default values.
9. The number of hidden nodes, H, is typically chosen by
trial and error. Search the c.a.n-n archives in
groups.google.com using "number of hidden" .
9. Become familiar with the comp.ai.neural-net FAQ and archives
in groups.google.com and the beginners tutorial in Tveter's
Home Page.
Hope this helps.
Greg
Greg Heath - 03 Jan 2005 06:37 GMT
>5. Theoretically, X has full rank when rank(X) = min(r,c-1)).
> For practical purposes, however, X should be treated as
> rank deficient when the condition number COND(X) exceeds
> ~ 100. This indicates that input correlations may be high
> enough to significantly degrade learning and input
> dimensionality reduction is warranted.
The determination of rank and condition number are only helpful
when done after standardization (or at least,after mean removal).
It is probably better to look at the rank and condition
number of CCX = corrcoef(X). If CCX is not of full rank, the
eigenvectors of CCX form a transformation matrix that will
project the standardized vectors into a lower dimensional
space where the new variables are uncorrelated.
When the number of input vectors, r, is greater than the
the number of input variables, c-1, the tranformation can
be obtained using PREPCA.
Otherwise remove the line containing the restriction from a
renamed duplicate of PREPCA.
>6. Input dimensionality reduction can be achieved in many ways.
> a. Use CORCOEFF to identify inputs which can be excluded
> because they are less linearly correlated to the
> output than they are to other inputs.
> b. Use STEPWISEFIT OR STEPWISE to exlude inputs using
> a more structured statistical approach.
I have been working on a problem where these stepwise functions
are not working as well as expected. When I find out the cause,
I will post.
Hope this helps.
Greg
mahdi bazarghan - 03 Jan 2005 10:40 GMT
I am using MATLAB Neural Network Toolbox for data classification in
order to do that I need to reduce my data size from 561 data point in
each pattern.
I have 55 patterns each pattern consist of 561 data points which is
to be reduced.I am doing following procedures to do so:
So my data are 561 in dimension to be reduced
I am doing as follow to reduce the dimension:
p = load('traindata');
[pn,meanp,stdp]=prestd(p);
[ptrans,transMat]=prepca(pn,0.02);
and in Work space I am getting these informations:
----------------------------------
name size
----------------------------------
p 55*561
pn 55*561
meanp 55*1
stdp 55*1
ptrans 3*561
transMat 3*55
------------------------------------
from this table as you see dimension of 'ptrans' which is transformed
data set is 3*561 and actually number of patterns are reduced and not
the pattern itself and what I want is to reduce 561 which is number
of data points(dimension of data)in each pattern.
I think what I must get as size of 'ptrans' is 55*A where A will be
the reduced dimension
from 561.Please let me know where I am wrong.
Greg Heath - 16 Jan 2005 22:42 GMT
> For classification into 2 classes, the output targets should be
> either 0 or 1. Then, if mean-square-error or cross-entropy is
> used as an objective function for learning, y can be interpreted
> as a posterior probability for the "1" class, conditional on
> the input X.
Unfortunately, MATLB doesn't seem to support either form of
cross-entropy as objective functions (see the book by Bishop
referenced in the comp.ai.neural-nets FAQ).
> 3. Use visual aids to better understand the data
> a. Scatter plots of y vs xi (i=1,...c-1)
> b. Histograms (HIST,HISTC)
c. Clusters (KMEANS,NEWC)
> 6. Input dimensionality reduction can be achieved in many ways.
> c. Use PREPCA to project the input data into the subspace
> spanned by the dominant eigenvectors of CC.
When the data is transformed, the means are still zero. However,
the variances are no longer unity. Therefore restandardize using
PRESTD again.
It is also advisable to have the data visualizations in 5 for
a. Original data
b. Standardized data
c PCA-transformed standardized data
d. Standardized PCA-transformed standardized data
> net = newff(minmax(X'),[H-1 1],{'tansig' 'purelin'})
Whoops!
net = newff(minmax(X'),[H 1],{'tansig' 'purelin'})
for regression and
net = newff(minmax(X'),[H 1],{'tansig' 'logsig'})
for classification.
If either hangs or bombs because your data set is too large
for trainlm (default) try
net = newff(minmax(X'),[H 1],{'tansig' 'purelin'},'trainscg')
or
net = newff(minmax(X'),[H 1],{'tansig' 'logsig'},'trainscg')
> I have been working on a problem where these stepwise functions
> are not working as well as expected. When I find out the cause,
> I will post.
The current conjecture is that the stepwise functions don't
always work when an exact solution exists and default parameter
settings are used.
Hope this helps.
Greg