change documents' label

3 views
Skip to first unread message

jiuren

unread,
Nov 2, 2010, 7:38:12 AM11/2/10
to DigitalPebble
hi,all

I'm doing a multilabel classification and I split the task to some
small tasks. I need to modify the labels of some documents in one
corpus to build a new corpus. As you know, building a corpus from raw
files is time-consuming, especially for CJK languages which need to be
segmented first.

But there is no direct api to do this work. I've read part of the
source code and found a setLabel() method which is not public. Can I
just modify the method to public and let the user code use it?Will
this bring some side effect?

another question:
If I use multiFieldDocument, can I make use some fields only to store
some meta data but not to affect the classification? For example I can
set some field unindexed in lucene.

DigitalPebble

unread,
Nov 2, 2010, 12:37:07 PM11/2/10
to digita...@googlegroups.com
But there is no direct api to do this work. I've read part of the
source code and found a setLabel() method which is not public. Can I
just modify the method to public and let the user code use it?Will
this bring some side effect?

I don't expect so. Why don't you try?
 
another question:
If I use multiFieldDocument, can I make use some fields only to store
some meta data but not to affect the classification? For example I can
set some field unindexed in lucene.

I suppose we could define a special type of field with would be ignored when generating the vector. Feel free to file an issue in github and send a patch

J.

--
 
Open Source Solutions for Text Engineering
 
http://digitalpebble.blogspot.com
http://www.digitalpebble.com

Reply all
Reply to author
Forward
0 new messages