I am new to Behemoth, SOLR as well as Hadoop. I am using behemoth SOLR job to index a behemoth document which has been previously processed using behemoth gate job. I wish to index all the annotations and features generated using my gate processing application, as fields in SOLR schema. But when I run the job, only the text is indexed and the annotations are skipped. I need help in indexing all the annotations. How can it be done?
--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To post to this group, send an email to digita...@googlegroups.com.
Visit this group at http://groups.google.com/group/digitalpebble?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
Hello Julien,does that mean that we should add (to behemoth-site.xml) lines like the following ?<property><name>solr.f.cat</name><value>Token.string</value></property>(I know it is not recommended to index the tokens but I'm just testing and my test document is really short)
It doesn't work on my side... I get my documents into solr, but I can't seem to be able to index the annotations.
Also, I don't really understand where the document's id and version come from... Is it even possible to see the content of the document that Behemoth passes to Solr ?
--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To post to this group, send an email to digita...@googlegroups.com.
Visit this group at http://groups.google.com/group/digitalpebble.
--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To post to this group, send an email to digita...@googlegroups.com.
Visit this group at http://groups.google.com/group/digitalpebble.
For more options, visit https://groups.google.com/groups/opt_out.
Hello Julien,and thank you for your answer.I tried to simplify my problem but I realize I chose a bad example : I don't process phone numbers, and I do process unstructured documents.My GATE application might return several annotations for the same group of words (because I'm using an ontology). So for example, I will have an Animal annotation, which marks the words "cat", "catfish" and "eider" as Animal(s), and (depending on the ontology used) the "cat" annotation will have 2 features : Animal.class=mammal and Animal.class="cat", the "catfish" will have 1 feature Animal.class=fish, and the more specific term "eider" will have 2 features : Animal.class=bird, Animal.class=duck.I don't want to consider 1 solr "document" for each animal, I really want 1 index for each actual document. I'd like to be able to query my solr index for "bird" and get all the documents containing the terms "bird", or any subclass or instance (like "duck" or "eider"). Since all the words "bird", "duck" and "eider" appearing in my documents will be tagged as Animal and there will be an annotation with Animal.class=bird, it is easy to get Solr to return the right documents.But since I get something like :<result><doc><str name="id">hdfs://...</str><arr name="animal"><str>cat</str><str>cat</str><str>catfish</str><str>eider</str><str>eider</str></arr><arr name="class"><str>mammal</str><str>cat</str><str>fish</str><str>bird</str><str>duck</str></arr><arr name="instance"><str>http://.../Animal#catfish</str><str>http://.../Animal#eider</str><str>http://.../Animal#eider</str></arr></doc><doc>...</doc><doc>...</doc></result>... when I want to generate a snippet of the document and highlight the terms whose appearance made solr return the document (like the first document containing "eider" when the user is searching for a "bird"), I'd like to highlight the term "eider" in the snippet, but I don't know how to do that. Having a correspondance between my solr "animal" and "class" fields (for example, an id attribute that would link them : <str id="5">eider</str> and the same id for the class "bird") would make it easier to highlight my term "eider".What do you think ?Thanks !Jim
--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To post to this group, send an email to digita...@googlegroups.com.
Visit this group at http://groups.google.com/group/digitalpebble.
For more options, visit https://groups.google.com/groups/opt_out.