Hi Guys,
I want to use LSA with 15000 Dimensions for the ukWaC corpus, i couldn't run it. even when i were giving -Xmx110g flag then also my job was going out of memory. JAVA programs give OutOfMemory exception (java.lang.OutOfMemoryError) at JVM level when they go out of memory not the OOM error at OS level, I found that the my jar was not going out of memory rather the SVD program was going out of memory. It required about 82GB memory.
I thought that i require 72 hrs to complete this execution. I tested this application and found that it take much more time ( more than 5 days, 120 hrs). After this as well it failed with some internal error (NullPointerException).
At the end, i came to a conclusion that the S-Space application might not be developed to process such a huge input file (~12GB). The matrix generated by it is huge (Rows: 4755577 Columns: 2727402). Then it performs transpose over it and then processes it with the SVD stage.
Can you suggest some alternative way to handle huge input file?
With Best Wishes,
Siamak