David Webb
unread,Jul 19, 2011, 4:33:32 PM7/19/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to s-spac...@googlegroups.com
I have about 2.5million documents that I can analyze with LSA. I currently have a test sspace that I generated from 10K of those documents.
Is there a magic number of documents to analyze, where the return diminishes? By return, I mean the time to generate, load, or query the sspace file.
Assume that all 2.5 million are similar documents (resumes) and a whatever sample size I choose should provide a good representative sample of the entire set.
Thanks.