Looking to Discuss Implementation Details for Large Self Organizing Maps

cougar_shuttle

unread,

Dec 18, 2009, 10:14:39 AM12/18/09

to

I'm doing something similar to <a href="http://citeseerx.ist.psu.edu/
viewdoc/summary?doi=10.1.1.32.2117">this</a> and need advice about
learning rates, architecture, how long things should take, tricks,
etc. Does anyone know of a good discussion to read? Want to start
one?
-Joe

Phil Sherrod

unread,

Dec 20, 2009, 5:48:22 PM12/20/09

to

I understand what a SOM is, but I'm having difficulty understanding its
usefulness. Please explain what you hope to accomplish by creating a SOM.

--
Phil Sherrod
http://www.dtreg.com -- Neural networks, SVM, Decision trees

cougar_shuttle

unread,

Dec 21, 2009, 12:42:58 AM12/21/09

to

On Dec 20, 4:48 pm, "Phil Sherrod" <PhilSher...@NOSPAMcomcast.net>
wrote:

> On 18-Dec-2009, cougar_shuttle <joebeuck...@gmail.com> wrote:
>
> > I'm doing something similar to <a href="http://citeseerx.ist.psu.edu/
> > viewdoc/summary?doi=10.1.1.32.2117">this</a> and need advice about
> > learning rates, architecture, how long things should take, tricks,
> > etc. Does anyone know of a good discussion to read? Want to start
> > one?
>
> I understand what a SOM is, but I'm having difficulty understanding its
> usefulness. Please explain what you hope to accomplish by creating a SOM.
>
> --

> Phil Sherrodhttp://www.dtreg.com-- Neural networks, SVM, Decision trees

I want to be able to categorize thousands of individual paragraphs
into around 30-40 groups. It's not as important exactly how many
groups there effectively are and I'm hoping to learn that from the
SOM. I've chosen 323 keywords and so, am using a 323-vector with
their counts as inputs. I have about 3300 example texts and have been
working with the FANN library but I'm not sure about training
parameters or what size SOM or how many epochs I should expect to see
before the error falls. I've been trying different parameters and
architectures and only seen errors into the high 2's which would not
be too different from random weights.

Phil Sherrod

unread,

Dec 21, 2009, 2:37:17 AM12/21/09

to

On 20-Dec-2009, cougar_shuttle <joebe...@gmail.com> wrote:

> I want to be able to categorize thousands of individual paragraphs
> into around 30-40 groups. It's not as important exactly how many
> groups there effectively are and I'm hoping to learn that from the
> SOM. I've chosen 323 keywords and so, am using a 323-vector with
> their counts as inputs.

OK, you have 323 keywords and for each paragraph you are generating a vector
with 323 counts of the words in the paragraph.

Is there any sort of criteria you are trying to categorize the paragraphs
by? For example, spam/non-spam? In other words, is this supervised
training (with a target variable whose value you're trying to predict) or
unsupervised training where you are just trying to do some grouping
(clustering)?

Traditional SOM converts (maps) a vector of input values into X,Y
coordinates on a two-dimensional plane. It is not clear to me how to go
from that to 30-40 groups.

Let's say you are able to perform the grouping. If you then can say an input
paragraph is closest to group 7 and second closest to group 9, what will you
do with this information?

Have you done any research on cluster analysis?

Have you thought about using Principal Components Analysis (or just
eigenvectors) to transform the 323 variable input into a reduced set of
orthogonal variables? After applying PCA/eigenvector analysis to the 323
variables, you will have a set of 323 components/eigenvectors in decreasing
order of explained variance. You can then use the top 10/20/30/40/etc.
components depending on how much total variance you want to explain. For a
given input vector, you can multiply the mean-adjusted input values by the
eigenvector values and sum them to transform the input into a coordinate on
the dimension represented by the eigenvector. So if you use the top 20
eigenvectors, you would go from 323 dimension space to 20 dimension space.
Maybe you could use the coordinates in this 20-dimension space as the
grouping.

> I have about 3300 example texts and have been
> working with the FANN library but I'm not sure about training
> parameters or what size SOM or how many epochs I should expect to see
> before the error falls. I've been trying different parameters and
> architectures and only seen errors into the high 2's which would not
> be too different from random weights.

I wasn't aware that the FANN library could create SOMs. I thought it just
made multilayer perceptron networks.

--
Phil Sherrod
http://www.dtreg.com -- Neural networks, SVM, Decision trees

cougar_shuttle

unread,

Dec 21, 2009, 10:26:14 AM12/21/09

to

On Dec 21, 1:37 am, "Phil Sherrod" <PhilSher...@NOSPAMcomcast.net>
wrote:

> On 20-Dec-2009, cougar_shuttle <joebeuck...@gmail.com> wrote:
>
> > I want to be able to categorize thousands of individual paragraphs
> > into around 30-40 groups. It's not as important exactly how many
> > groups there effectively are and I'm hoping to learn that from the
> > SOM. I've chosen 323 keywords and so, am using a 323-vector with
> > their counts as inputs.
>
> OK, you have 323 keywords and for each paragraph you are generating a vector
> with 323 counts of the words in the paragraph.

Exactly.

> Is there any sort of criteria you are trying to categorize the paragraphs
> by? For example, spam/non-spam? In other words, is this supervised
> training (with a target variable whose value you're trying to predict) or
> unsupervised training where you are just trying to do some grouping
> (clustering)?

It's unsupervised. I have an idea about what groups will emerge but
want to get the grouping information from the data. Clustering
analysis may be a good alternative but is that a technique that can be
used in automated way on new data?

> Traditional SOM converts (maps) a vector of input values into X,Y
> coordinates on a two-dimensional plane. It is not clear to me how to go
> from that to 30-40 groups.

The example SOMs I've been reading about seemed to form local groups.
I would look at the trained SOM and note output units associated with
each group. If there were well-connected groups, then I could
tolerate some error for items between adjacent groups.

> Let's say you are able to perform the grouping. If you then can say an input
> paragraph is closest to group 7 and second closest to group 9, what will you
> do with this information?

I plan to create a 1 x (Ngroups) SOM (not the traditional 2D SOM) and
use that vector as the input to a backpropogation network that
operates on other data related to the paragraphs.

> Have you done any research on cluster analysis?
>
> Have you thought about using Principal Components Analysis (or just
> eigenvectors) to transform the 323 variable input into a reduced set of
> orthogonal variables? After applying PCA/eigenvector analysis to the 323
> variables, you will have a set of 323 components/eigenvectors in decreasing
> order of explained variance. You can then use the top 10/20/30/40/etc.
> components depending on how much total variance you want to explain. For a
> given input vector, you can multiply the mean-adjusted input values by the
> eigenvector values and sum them to transform the input into a coordinate on
> the dimension represented by the eigenvector. So if you use the top 20
> eigenvectors, you would go from 323 dimension space to 20 dimension space.
> Maybe you could use the coordinates in this 20-dimension space as the
> grouping.

Thanks - I will be reading about these methods too. It would be great
to create eigenvectors deterministically but would they interpolate
new data well? Honestly I only started with Kohonen maps because I am
working in neural networks. But I would like to learn how to work
with SOM before moving on.

> > I have about 3300 example texts and have been
> > working with the FANN library but I'm not sure about training
> > parameters or what size SOM or how many epochs I should expect to see
> > before the error falls. I've been trying different parameters and
> > architectures and only seen errors into the high 2's which would not
> > be too different from random weights.
>
> I wasn't aware that the FANN library could create SOMs. I thought it just
> made multilayer perceptron networks.

They added SOM and Growing Neural Gas in the 2007 Google Summer of
Code.

Greg Heath

unread,

Dec 21, 2009, 7:11:21 PM12/21/09

to

On Dec 21, 12:42 am, cougar_shuttle <joebeuck...@gmail.com> wrote:
> On Dec 20, 4:48 pm, "Phil Sherrod" <PhilSher...@NOSPAMcomcast.net>
> wrote:
>
> > On 18-Dec-2009, cougar_shuttle <joebeuck...@gmail.com> wrote:
>
> > > I'm doing something similar to <a href="http://citeseerx.ist.psu.edu/
> > > viewdoc/summary?doi=10.1.1.32.2117">this</a> and need advice about
> > > learning rates, architecture, how long things should take, tricks,
> > > etc. Does anyone know of a good discussion to read? Want to start
> > > one?
>
> > I understand what a SOM is, but I'm having difficulty understanding its
> > usefulness. Please explain what you hope to accomplish by creating a SOM.
>
> > --

> > Phil Sherrodhttp://www.dtreg.com--Neural networks, SVM, Decision trees

>
> I want to be able to categorize thousands of individual paragraphs
> into around 30-40 groups. It's not as important exactly how many
> groups there effectively are and I'm hoping to learn that from the
> SOM. I've chosen 323 keywords and so, am using a 323-vector with
> their counts as inputs. I have about 3300 example texts and have been
> working with the FANN library but I'm not sure about training
> parameters or what size SOM or how many epochs I should expect to see
> before the error falls. I've been trying different parameters and
> architectures and only seen errors into the high 2's which would not
> be too different from random weights.

Kohonen's SOM can be used for clustering and/or visualization
(2-D or 3-D) of the data. He does not recommend it for
classification.

Competetive non-neural statistical alternatives:

1. Clustering: K-means and Leader clustering

2. 2 or 3-D visualization:
a. Use the Sammon Multi-dimensional Scaling (MDS) algorithm
to obtain a nonlinear projection of cluster centers
b. Construct a linear transformation between the cluster centers
and their nonlinear projections.
c. Use the resulting matrix to obtain linear projections of the
cluster members.

Hope this helps.

Greg

Ian Parker

unread,

Dec 22, 2009, 7:03:45 AM12/22/09

to

On Dec 21, 7:37 am, "Phil Sherrod" <PhilSher...@NOSPAMcomcast.net>
wrote:

What you want is LSA. http://en.wikipedia.org/wiki/Latent_semantic_analysis
In general you will do a Singular Value decomposition on your Matrix.
You then have to sort eigenvalues. In general you have FEWER
categories than keywords. This is because a lot of your keywords will
be synonyms.

- Ian Parker

joaocarlos

unread,

Dec 22, 2009, 12:47:36 PM12/22/09

to

On Dec 21, 5:11 pm, Greg Heath <he...@alumni.brown.edu> wrote:
> On Dec 21, 12:42 am, cougar_shuttle <joebeuck...@gmail.com> wrote:
>
>
>
> > On Dec 20, 4:48 pm, "Phil Sherrod" <PhilSher...@NOSPAMcomcast.net>
> > wrote:
>
> > > On 18-Dec-2009, cougar_shuttle <joebeuck...@gmail.com> wrote:
>
> > > > I'm doing something similar to <a href="http://citeseerx.ist.psu.edu/
> > > > viewdoc/summary?doi=10.1.1.32.2117">this</a> and need advice about
> > > > learning rates, architecture, how long things should take, tricks,
> > > > etc. Does anyone know of a good discussion to read? Want to start
> > > > one?
>

> > > I understand what aSOMis, but I'm having difficulty understanding its

> > > usefulness. Please explain what you hope to accomplish by creating aSOM.
>
> > > --

> > > Phil Sherrodhttp://www.dtreg.com--Neuralnetworks, SVM, Decision trees

>
> > I want to be able to categorize thousands of individual paragraphs
> > into around 30-40 groups. It's not as important exactly how many
> > groups there effectively are and I'm hoping to learn that from the
> >SOM. I've chosen 323 keywords and so, am using a 323-vector with
> > their counts as inputs. I have about 3300 example texts and have been
> > working with the FANN library but I'm not sure about training

> > parameters or what sizeSOMor how many epochs I should expect to see

> > before the error falls. I've been trying different parameters and
> > architectures and only seen errors into the high 2's which would not
> > be too different from random weights.
>

> Kohonen'sSOMcan be used for clustering and/or visualization

> (2-D or 3-D) of the data. He does not recommend it for
> classification.
>
> Competetive non-neural statistical alternatives:
>
> 1. Clustering: K-means and Leader clustering
>
> 2. 2 or 3-D visualization:
> a. Use the Sammon Multi-dimensional Scaling (MDS) algorithm
> to obtain a nonlinear projection of cluster centers
> b. Construct a linear transformation between the cluster centers
> and their nonlinear projections.
> c. Use the resulting matrix to obtain linear projections of the
> cluster members.
>
> Hope this helps.
>
> Greg

if I'm not wrong Kohonen has made provision for supervised SOM

"If the Self-Organizing Map is to be used as a pattern classifier in
which the cells or their responses are grouped into
subsets, each of which corresponds to a discrete class of patterns,
then the problem becomes a decision process and
must be handled differently. The original Map, like any classical
Vector Quantization (VQ) method (ct. Sec. I-D) is mainly
intended to approximate input signal values, or their probability
density function, by quantized "codebook" vectors
that are localized in the input space to minimize a quantization error
functional (ct. Sec. II I-A below). On the other
hand, if the signal sets are to be classified into a finite number of
categories, then several codebook vectors are usually
made to represent each class, and their identity within the classes is
no longer important. In fact, only decisions made
at class borders count. It is then possiible, as shown below, to
define effective values for the codebook vectors such that they
directly define near-optimal decision borders between the classes,
even in the sense of classical Bayesian
decision theory. These strategies and learning algorithms were
introduced by the present author [38], [43], [45] and
called Learning Vector Quantization (LVQ)"

http://www.eicstes.org/EICSTES_PDF/PAPERS/The%20Self-Organizing%20Map%20%28Kohonen%29.pdf

Greg Heath

unread,

Dec 23, 2009, 3:12:34 AM12/23/09

to

> > Kohonen's SOM can be used for clustering and/or visualization

> http://www.eicstes.org/EICSTES_PDF/PAPERS/The%20Self-Organizing%20Map...-

Correct.

In addition LVQ, didn't work as well as he would have liked, so
he created LVQ2 which is to be used to further train the net.

I have great respect for Kohonen's pioneering work, However,
I have always found other algorithms to work better for clustering
and classification.

So, ... try SOM, LVQ, LVQ2, etc. However, if they don't work
as well as you would like try an alternative.

Hope this helps.

Greg

cougar_shuttle

unread,

Dec 24, 2009, 12:59:44 PM12/24/09

to

Thanks for all the great suggestions. I wrote routines for K-Means
Clustering which is at least very fast for my application. Here's the
source:

http://www.beigerecords.com/joe/?p=414

Ian Parker

unread,

Dec 27, 2009, 10:04:22 AM12/27/09

to

This is the basic algorithm. It is indeed very fast. However we need
to look at a large number of different seeds. This is where the
algorithms described in the literature come in. Mine effectively does
a tournament sort and tries to cluster on all the points which are
furhest away from existing clusters. The literature all does a similar
thing, slightly differently.

- Ian Parker