Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

SVD when calculated for a corpus of similar category?

0 views

Skip to first unread message

paluri

unread,

Nov 15, 2006, 3:29:36 AM11/15/06

Hello everybody,

I have a doubt regarding SVD. Suppose i compute SVD for a huge corpus
of similar category, and i have the decomposition as [USV^T] . Are the
singular values in the diagonal matrix S arranged in descending order
along the diagonal will be very near to each other? i mean , is the
numerical difference between one singular value and the next in the
diagonal will be negligible?. I would be thankful for the response...

Peter Spellucci

unread,

Nov 15, 2006, 6:05:52 AM11/15/06

In article <1163579376....@b28g2000cwb.googlegroups.com>,

???????? what please is a "huge corpus of similar category"?
you mean a set of nearby matrices ???
then the answer is yes:
let sigma(A,i) denote the singualr values of A in descending order
and sigma(B,i) those of B. Then
|sigma(A,i)-sigma(B,i)|<= ||A-B|| for all i
||A-B||=sigma(A-B,1)
hence if A is near B, then all the singular values of A and B can be paired
corresponding to this order with this universal error bound
(follows from the courant-fischer-minimax characterization of eigenvalues)
hth
peter

paluri

unread,

Nov 15, 2006, 6:20:49 AM11/15/06

By "huge corpus of similar category", i mean web pages downloaded from
a similar category ,
Actually i am creating a term by document matrix(rows indicating the
terms, columns the documents and each element of the matrix indicating
the frequency of each term in the corresponding document) of certain
number of web pages and then i will aplly SVD to that term by document
matrix in order to calculate the similarity bwetween the documents or
web pages.

Now, what i am asking is , if i create the term by document matrix of
pages or documents taken from the same category, i.e if they are
already similar, then in the SVD of the Term by Document matrix which i
create using these similar pages, does the singular values in the
diagonal will be very near to each other, i.e. the numerical difference
between one and next singular value in the diagonal will be very
small..?

Peter Spellucci

unread,

Nov 15, 2006, 9:59:42 AM11/15/06

In article <1163589649.8...@f16g2000cwb.googlegroups.com>,

no.
your matrices will be integer matrices with entries not larger than
the number of occurences of a term in a document, hence not really large.
the norm of that matrix will be at most number of elements times the largest
element and not smaller than the largest entry. but if two matrices differ
by one in an entry, at least some singular values will differ in the order
of one, hence not really small
hth
peter

0 new messages