Creating document vectors with OOV words throws ArrayIndexOutOfBoundsException

3 views
Skip to first unread message

Johann Petrak

unread,
Mar 29, 2017, 2:02:41 PM3/29/17
to S-Space Package Users
Using version 2.0.4 when I try to build a document vector that contains a word which is not in the 
sspace, I get:
java.lang.ArrayIndexOutOfBoundsException: 4
at edu.ucla.sspace.matrix.SparseHashMatrix.getRowVector(SparseHashMatrix.java:105)
at edu.ucla.sspace.matrix.SparseHashMatrix.getRowVector(SparseHashMatrix.java:42)
at edu.ucla.sspace.common.GenericTermDocumentVectorSpace.getVector(GenericTermDocumentVectorSpace.java:315)
at edu.ucla.sspace.common.DocumentVectorBuilder.buildVector(DocumentVectorBuilder.java:129)
at edu.ucla.sspace.common.DocumentVectorBuilder$buildVector.call(Unknown Source)
       ...
This uses the default tokeniser.
Is there a way to easily tell this method to just ignore OOV words?
Reply all
Reply to author
Forward
0 new messages