Question about dimension word choice implementation in COALS space with no SVD reduction

9 views
Skip to first unread message

Marcin Tatjewski

unread,
Jun 13, 2014, 8:16:48 PM6/13/14
to s-spac...@googlegroups.com
Hi,

As I remember from the COALS paper, authors constrained the full space, i.e. a space with no dimensionality reduction, to 14000 columns, where columns were represented by "open-class" words. I see that this constraint holds in S-Space implementation. I wonder how this is implemented. Are you able to identify open-class words or you just take most-frequent? I assume you take just most frequent as it would be hardly possible to identify open-class words in different languages that can be given as input.

Regards,
Marcin

David Jurgens

unread,
Jun 13, 2014, 9:42:58 PM6/13/14
to s-spac...@googlegroups.com
Generally, there are relatively few closed-class words.  Most of the time you can find lists of these for each language you want and therefore COALS becomes simply excluding them with a word list filter.  The algorithm won't do this automatically for you though; you'll need to pass the list of closed-class words as a parameter to it on the command line (or add it programmatically).

  Thanks,
  David


--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages