Sensei term vector options not being respected? (Can only get back term-frequencies)

43 views
Skip to first unread message

Jack Dunham

unread,
Nov 17, 2013, 10:05:42 PM11/17/13
to sensei...@googlegroups.com
Hi there -- I'm using sensei 1.6.0, and having some difficulty w/ term vectors. I want to get back term-positions for tokens in a text-column, so I've set the options in schema.xml, like so:

<column name="taggedText" type="text" index="ANALYZED" store="YES" termvector="WITH_POSITIONS"/>

However, the term-vectors I'm getting back from my query are still term-frequency pairs, instead of the desired term-positions. Is there some other ways of specifying the org.apache.lucene.document.Field.TermVector.WITH_POSITIONS option to sensei?


John Wang

unread,
Nov 19, 2013, 10:27:22 AM11/19/13
to sensei...@googlegroups.com
Hi Jack:

    Can you create a ticket at: https://senseidb.atlassian.net/secure/Dashboard.jspa

Thanks

John


--
You received this message because you are subscribed to the Google Groups "Sensei" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sensei-searc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yonghui Zhao

unread,
Nov 19, 2013, 11:24:45 AM11/19/13
to sensei...@googlegroups.com

Hi Jack,

How do you get the term vector result.

Can you give me the detail query you used?

Jack Dunham

unread,
Nov 20, 2013, 6:20:37 PM11/20/13
to sensei...@googlegroups.com
Hi there -- Sorry for the delay in responding (and thanks, btw!). Here's the json query I'm using.

{
    "query": {
        "query_string": {
            "query": "taggedText:hello"
        }
    },
    "from": 0,
    "size": 10,
    "explain": false,
    "fetchStored": true,
    "termVectors": ["taggedText"]

Jack Dunham

unread,
Nov 20, 2013, 6:33:13 PM11/20/13
to sensei...@googlegroups.com
FYI -- Created issue  SENSEI-306


On Tuesday, November 19, 2013 8:24:45 AM UTC-8, yozhao wrote:

Yonghui Zhao

unread,
Nov 29, 2013, 5:36:07 AM11/29/13
to sensei...@googlegroups.com
Hi Jack,

Thank you for report this bug.

Sensei doesn't return offsets and positions info, now I have done a fix in sensei 2.0.1-SNAPSHOT and bobo 4.0.1-SNAPSHOT.

Before the fix sensei only return term and frequency  in Map<String, List<TermFrequency>>.

Now sensei will return termvector in Map<String, List<FieldTerm>>, while FieldTerm contains:

  private String term;
  private Integer freq;
  private List<Integer> positions;
  private List<Integer> startOffsets;
  private List<Integer> endOffsets;



If index option is termvector="YES" , then positions, startOffsets and endOffsets are null.
If index option is termvector="WITH_POSITIONS" , then positions have real value while startOffsets and endOffsets are all -1.
If index option is termvector="WITH_OFFSETS" , then positions are all -1 while startOffsets and endOffsets are real value.
If index option is termvector="WITH_POSITIONS_OFFSETS" , then positions startOffsets and endOffsets are all real value.


Which version are you using now, I can prepare a fixed version for you.

















2013/11/21 Jack Dunham <jack....@gmail.com>
Reply all
Reply to author
Forward
0 new messages