using JAVA api in python driver or how-to for no-java-skilled

42 views
Skip to first unread message

gg4u

unread,
Aug 19, 2015, 5:30:29 PM8/19/15
to Neo4j
Hi, I am struggling to overcome the problem that query:

START

against a full-text index will display results not meaningfully sorted.

E.g. using wikipedia as mockup data:
query
'united states'

will hit, as first result:
'List of United States National Historic Landmarks in United States commonwealths and territories, associated states, and foreign states'


I cannot paginate results, cause first results would be meaningless.
I have to fetch all results first, and then (in python) order them - but it takes too long and it s not the way to go.

I posted also a question on SO:
http://stackoverflow.com/questions/31862761/search-queries-in-neo4j-how-to-sort-results-in-neo4j-in-start-query-with-intern

and then found a comment of Michael answering to a similar question:
http://stackoverflow.com/questions/26497068/lucene-in-neo4j-has-some-misbehaviours-in-terms-of-reliable-search-querys-comp


#1 can be handled in Neo4j's Java API by using index.query(new QueryContext(query).sort(Sort.RELEVANCE));

So far I've been learning and using cypher and python, never used java.
Could you please suggest any tutorial or how to in python, or at least appoint to which files should I look to modify if JAVA  is the only way, so to obtain a meaningful relevance of results in START query that I can paginate?

I haven't found this aspect in neo4j-rest-client or, if there are, it is not clear to me if sort.relevance is covered:
http://neo4j-rest-client.readthedocs.org/en/latest/indices.html

It is an important aspect of (my) application cause it allows to start the traversal and any operation on the graph.


gg4u

unread,
Aug 20, 2015, 8:56:11 AM8/20/15
to Neo4j
I am recreating lucene indexes with neo4j-rest-client:



Will sorting be applied by default by lucene?
If not or not satisfactory, how to change sorting ?

E.g. syntax in python driver:
i1.add("key", "value", n2)

 looks like:
assertContains( index.query( "name", "\"Thomas Anderson\"" ), node );

and we are all happy.

But I cannot find info about sorting:
http://neo4j.com/docs/stable/indexing-lucene-extras.html#indexing-lucene-sort

My goal is to limit the query to a few results, ALREADY sorted
(and not sorting all results having matched a full-text query).

gg4u

unread,
Aug 20, 2015, 10:58:41 AM8/20/15
to Neo4j
unfortunately, result hit by the lucene full-text are not yet meaningful: it is not clear which score or sorting rationale it is used.

Is it possible to try out the sort() function of QueryContext class in neo4j-rest-client (or any other python driver)? 

something like:
i1.query("name","united states").sort()

and then paginate results?

Is it possible to expose the score assigned to a lucene index?

gg4u

unread,
Aug 20, 2015, 11:34:23 AM8/20/15
to Neo4j
I found how to check for the score in :


and :


and I ve found that ALL THE SCORES ARE SET THE SAME IN LUCENE INDEX.

here using version 2.2.1

Any help on this?

I cannot figure out it is a bug at this point.

Michael Hunger

unread,
Aug 23, 2015, 4:40:35 AM8/23/15
to ne...@googlegroups.com
Hi,

As explained before, this is due to a performance / size optimization that is currently used which is to omit norms for Neo4j's lucene indexes.

We're looking into that, it might be possible to make this configurable for fulltext lucene indexes.

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gg4u

unread,
Aug 23, 2015, 8:23:51 AM8/23/15
to Neo4j
Hi Michael,

it might be possible to make this configurable for fulltext lucene indexes.

does it means it is possible or not?

from doc :
I understood it is possible, by setting :

order=ordering

where ordering is one of index, relevance or score.


Problem is all the scores are equivalent; as such, it is meaningless - sorting will be done anyway as order of id.

From you previous explanation I understood the issue was in the batchimporter.
I tried to re-create the index;

I tried to use constructors from p2neo as well as neo4j-rest-client; I think they use constructor
QueryContext

In chapter "Sorting": We sort the results by relevance (score) like this:
its = movies.query( "title", new QueryContext( "The*" ).sortByScore() );

Is the same as the API rest by setting order=ordering ?


The author of p2neo also suggest to get back to SO and official neo4j about full-text indexing.


Summarizing:
I am confused because on one side I understand it is not possible to set scoring on lucene indexes due to performance, but it is described in the documentation.

I am using neo4j official python drivers, and py2neo seems to support lucene indexing with score as documented in neo4j doc:
Although author suggest to get back to neo4j mailing list or SO for this aspect.

If lucene full-text indexes does not work at the moment and is discouraged being legacy, which other solutions could be adopted to make a sorted full-text query, so to provide paginated results ordered by relevance (so that to then select the item and make a traversal)?

Could you suggest any tool or how to proceed?

A comment: if really relevance sorting in full-text indexing is not possible due to performance issues and scores in lucene are omitted, I think it may be more user-friendly indicate the functionality as experimental in the documentation, otherwise one keeps on trying and hitting walls without understanding why results are different from ones described in documentation.

I do appreciate any constructive suggestion you may already have looked at! Thank you so much.

Michael Hunger

unread,
Aug 23, 2015, 10:29:32 PM8/23/15
to ne...@googlegroups.com
I'm not very well versed in Lucene, there are a number of queries that return different scores, as it is a combination of the stored/indexed information and how close it aligns the query.

Currently norms are not used, we just started to look into it.

Michael

Am 23.08.2015 um 14:23 schrieb gg4u <luigi...@gmail.com>:

Hi Michael,

it might be possible to make this configurable for fulltext lucene indexes.

does it means it is possible or not?

from doc :
I understood it is possible, by setting :

order=ordering

where ordering is one of index, relevance or score.


this only affects that scores are returned, nothing else.
Reply all
Reply to author
Forward
0 new messages