sort on raw docHit/@totalHits instead of score ?

7 views
Skip to first unread message

steve.m...@gmail.com

unread,
Dec 3, 2009, 12:40:38 PM12/3/09
to xtf-...@googlegroups.com

Is there a way to sort search results by the raw hit count
instead of the computed score ?

All of the other sort fields ( including ones I've added )
are all meta fields output by the preFilter stylesheet.

//docHit/@totalHits is output by the textEngine, but I can't
see how to access that value on the input.

-- Steve Majewski / UVA Alderman Library

Jamie Orchard-Hays

unread,
Dec 10, 2009, 2:19:51 PM12/10/09
to xtf-...@googlegroups.com
Dan, didn't you do something like that?
> --
>
> You received this message because you are subscribed to the Google
> Groups "XTF Users List" group.
> To post to this group, send email to xtf-...@googlegroups.com.
> To unsubscribe from this group, send email to xtf-user+u...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/xtf-user?hl=en
> .
>
>

dan haig

unread,
Dec 10, 2009, 3:12:24 PM12/10/09
to xtf-...@googlegroups.com
I thought so upon first reading when Steve posted this Jamie, but upon
more careful reading as I wrote a reply it turned out to be something
rather different he is asking. I could however have confirmed that we
were *not able to do what he is asking.

Looking at it again, I am having evil flashbacks to what it took for
me to get the search result linking to work across our multi-file
collections and am now starting to twitch and foam slightly at the
mouth.

But basically, if I recall correctly: we were not able to affect the
order of the search results as they get spewed into
default/resultformatter.xsl. You get them sorted by the XTF Ranking
system values, highest to lowest. We wanted them to come out in order
of appearance in the text, but in consultation with you, Jamie, and
Martin, determined this was a java hack we weren't prepared to
undertake.

.d

steve.m...@gmail.com

unread,
Dec 11, 2009, 10:02:56 AM12/11/09
to xtf-...@googlegroups.com

On Dec 10, 2009, at 3:12 PM, dan haig wrote:

> I thought so upon first reading when Steve posted this Jamie, but upon
> more careful reading as I wrote a reply it turned out to be something
> rather different he is asking. I could however have confirmed that we
> were *not able to do what he is asking.
>

After looking deeper at the code, my impression was that it couldn't be
done without diving in and changing the whole scoring method in lucene.
But I really wanted both options, so that didn't seem worth while.

-- Steve.

Martin Haye

unread,
Dec 14, 2009, 9:07:42 PM12/14/09
to xtf-...@googlegroups.com

This sounded like a fun little project so I looked into it today...

Steve is right, this is pretty tricky from the Java side. The main issue is
that the number of hits isn't available at the time sorting decisions are
made, because the text spans haven't been de-duplicated.

I just checked in a change that supports "totalDocs" and "-totalDocs" as
sort fields. As a consequence I added de-duplication during sorting, but
only in the case of sorting by totalDocs. Note that deduplicating every hit
(as opposed to only the top scoring ones) is that queries will be a bit
slower, though for small to medium collections I doubt the slowdown will be
noticeable.

I'm considering this to be "experimental", so I'd like feedback from Steve
Majewsky and John Bewley. Do you two feel comfortable grabbing XTF from CVS
and building it from source? That would be preferable but if not I can build
an xtf.jar for you to try.

--Martin

steve.m...@gmail.com

unread,
Dec 15, 2009, 12:52:47 PM12/15/09
to xtf-...@googlegroups.com, Martin Haye

On Dec 14, 2009, at 9:07 PM, Martin Haye wrote:

>
> I'm considering this to be "experimental", so I'd like feedback from
> Steve
> Majewsky and John Bewley. Do you two feel comfortable grabbing XTF
> from CVS
> and building it from source? That would be preferable but if not I
> can build
> an xtf.jar for you to try.
>

I did a 'cvs update' and 'ant dist' (in WEB-INF directory) and copied
dist/xtf-20091215.war
to my tomcat webapps directory. [ I didn't rename it as I waned to
keep the previous xtf
webapp and compare. I had downloaded sources from CVS earlier to get
that patch for escaping
special chars in the facets. ]

Initially, after tomcat unpacked it, I tried linking my data directory
to webapps/xtf-20091215/data,
and ran the textIndexer, but it exited with the following message:

./bin/textIndexer -index default

TextIndexer v2.2


Purging Incomplete Documents From Indexes:
Index: [/usr/local/tomcat6/webapps/xtf-20091215/index-new/]
No Incomplete Documents Found.
Done.

Indexing New/Updated Documents:
Index: "default"
Cloning Data Directories.
*** Error: class java.io.IOException
java.io.IOException: External command 'perl' exited with status 18.
Output from stderr:
Died at - line 1.

at org.cdlib.xtf.util.DirSync.flushLinks(DirSync.java:230)
at org.cdlib.xtf.util.DirSync.syncDirs(DirSync.java:80)
at org.cdlib.xtf.textIndexer.TextIndexer.doIndexing(TextIndexer.java:
499)
at org.cdlib.xtf.textIndexer.TextIndexer.main(TextIndexer.java:336)

Indexing Process Aborted.



I removed the link to my data files, and instead unpacked the sample
data into the webapp directory.
This time, textIndexer gets quite a bit further before also exiting
with an error:



[ ... ]

Optimizing Index:
Index: [/usr/local/tomcat6/webapps/xtf-20091215/index-new/] ...
Done.
Done.

Updating Spellcheck Dictionary:
Index: [/usr/local/tomcat6/webapps/xtf-20091215/index-new/] ...
[ 0%] Reading word files.
[ 28%] Processed 54281 words.
[ 32%] Building word map.
[ 98%] Read 88423 pairs.
[ 99%] Writing pair data.
[100%] Done.
Done.
Done.

*** Error: class java.lang.NoClassDefFoundError
java.lang.NoClassDefFoundError: javax/servlet/ServletException
at
org.cdlib.xtf.textIndexer.TextIndexer.doValidation(TextIndexer.java:688)
at org.cdlib.xtf.textIndexer.TextIndexer.main(TextIndexer.java:396)

Indexing Process Aborted.

Reply all
Reply to author
Forward
0 new messages