StopAnalyzer usage in mahout.SparseVectorsFromBehemoth

192 views
Skip to first unread message

madhum...@gmail.com

unread,
May 16, 2013, 5:59:31 AM5/16/13
to digita...@googlegroups.com
How do we use the Lucene StopFilter analyzer to remove stop words from the behemoth sequence file for further processing?

Also, is it possible to specify multiple analyzers as parameters?

DigitalPebble

unread,
May 16, 2013, 6:07:53 AM5/16/13
to digita...@googlegroups.com

How do we use the Lucene StopFilter analyzer to remove stop words from the behemoth sequence file for further processing?

Use the -a parameter as shown in https://github.com/DigitalPebble/behemoth/wiki/Mahout-Processing-Example and specify the name of the analyzer


Also, is it possible to specify multiple analyzers as parameters

No.  The class behaves in the same way as the Mahout original SparseVectorsFromSequenceFiles.  What you can do is create a custom Analyzer that combines the functionalities that you want and add it to the jar.  

Julien

--
 
Open Source Solutions for Text Engineering
 
http://digitalpebble.blogspot.com
http://www.digitalpebble.com

madhum...@gmail.com

unread,
May 16, 2013, 6:10:50 AM5/16/13
to digita...@googlegroups.com, jul...@digitalpebble.com
Thank You Julien. When I use StopAnalyzer class along with the -a option, I get this error:

Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.apache.lucene.analysis.StopAnalyzer.<init>()
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:68)
at com.digitalpebble.behemoth.mahout.SparseVectorsFromBehemoth.run(SparseVectorsFromBehemoth.java:331)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.digitalpebble.behemoth.mahout.SparseVectorsFromBehemoth.main(SparseVectorsFromBehemoth.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NoSuchMethodException: org.apache.lucene.analysis.StopAnalyzer.<init>()
at java.lang.Class.getConstructor0(Class.java:2730)
at java.lang.Class.getConstructor(Class.java:1676)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:62)

DigitalPebble

unread,
May 16, 2013, 6:18:37 AM5/16/13
to digita...@googlegroups.com
Strange. I won't have the time to look into it any time soon. Maybe you could try and see how Mahout's SparseVectorFromSequenceFile handles this?
Thanks

Julien

--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To post to this group, send an email to digita...@googlegroups.com.
Visit this group at http://groups.google.com/group/digitalpebble?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

madhum...@gmail.com

unread,
May 16, 2013, 6:22:15 AM5/16/13
to digita...@googlegroups.com, jul...@digitalpebble.com
Okay, I will look into this and post here if I find anything further.

Thanks,
Madhumita

a...@bnotions.com

unread,
Dec 5, 2013, 6:37:33 PM12/5/13
to digita...@googlegroups.com, jul...@digitalpebble.com
I was wondering if you guys had solved this problem. I want to create my own custom analyzer and I'm having hard time with that.
Reply all
Reply to author
Forward
0 new messages