Geo-spatial queries and relevance

141 views
Skip to first unread message

Maxime Beaudoin

unread,
Sep 4, 2012, 10:24:05 AM9/4/12
to rav...@googlegroups.com
I have a query that looks like: "(Name:(term1 term2) OR TermsQuery:(term1 term2)) Radius:500 Latitude:...". The last geo-spatial-specific part is produced by WithinRadiusOf.

The last part is also responsible for the relevancy issues. When using WithinRadiusOf scores must be flattened or something.

It's quite easy to reproduce:
  • Boost the Name field at index-time by 100
  • TermsQuery is a 'FullText' field
  • Apply the following where clause:  "(Name:(term1 term2) OR TermsQuery:(term1 term2))"
  • [Test] Relevancy: OK
  • Apply WithinRadiusOf
  • [Test] Relevancy: NOT OK: documents are returned in the order they were inserted in the index

Oren Eini (Ayende Rahien)

unread,
Sep 4, 2012, 10:25:41 AM9/4/12
to rav...@googlegroups.com
Can you create a failing test?

Maxime Beaudoin

unread,
Sep 4, 2012, 10:30:19 AM9/4/12
to rav...@googlegroups.com
I'll post it later tonight.

Maxime Beaudoin

unread,
Sep 4, 2012, 10:40:46 PM9/4/12
to rav...@googlegroups.com
using System.Linq;
using NUnit.Framework;
using Raven.Abstractions.Indexing;
using Raven.Client.Embedded;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Client.Linq.Indexing;

namespace Tests
{
    [TestFixture]
    public class OrenEiniTests
    {
        [Test]
        public void WithingRadiusOf_Should_Not_Break_Relevance()
        {
            using (var store = new EmbeddableDocumentStore { RunInMemory = true }.Initialize())
            using (var session = store.OpenSession())
            {
                IndexCreation.CreateIndexes(typeof(OrenEiniTests).Assembly, store);

                var place1 = new Place("Université du Québec à Montréal")
                {
                    Id = "places/1",
                    Description = "L'Université du Québec à Montréal (UQAM) est une université francophone, publique et urbaine de Montréal, dans la province du Québec au Canada.",
                    Latitude = 45.50955,
                    Longitude = -73.569131
                };

                var place2 = new Place("UQAM")
                {
                    Id = "places/2",
                    Description = "L'Université du Québec à Montréal (UQAM) est une université francophone, publique et urbaine de Montréal, dans la province du Québec au Canada.",
                    Latitude = 45.50955,
                    Longitude = -73.569131
                };

                session.Store(place1);
                session.Store(place2);

                session.SaveChanges();

                // places/2: perfect match + boost
                var terms = "UQAM";
                RavenQueryStatistics stats;
                var places = session.Advanced.LuceneQuery<PlacesByTermsAndLocation.PlaceQuery, PlacesByTermsAndLocation>()
                    .WaitForNonStaleResults()
                    .Statistics(out stats)
                    .WithinRadiusOf(500, 45.54545, -73.63908)
                    .Where("(Name:(" + terms + ") OR Terms:(" + terms + "))")
                    .Take(10)
                    .SelectFields<Place>().ToList();

                // bug?: false..
                //Assert.That(places[0].Id == "places/2");

                // places/1: perfect match + boost
                terms = "Université Québec Montréal";
                places = session.Advanced.LuceneQuery<PlacesByTermsAndLocation.PlaceQuery, PlacesByTermsAndLocation>()
                    .WaitForNonStaleResults()
                    .Statistics(out stats)
                    .WithinRadiusOf(500, 45.54545, -73.63908)
                    .Where("(Name:(" + terms + ") OR Terms:(" + terms + "))")
                    .Take(10)
                    .SelectFields<Place>().ToList();

                // true bug?: because places/1 is always first - was inserted first.. no score?
                
                Assert.That(places[0].Id == "places/1");
            }
        }

        public class Place
        {
            public Place(string name)
            {
                Name = name;
            }

            public string Id { get; set; }
            public string Name { get; set; }
            public string Description { get; set; }
            public string Address { get; set; }
            public double Latitude { get; set; }
            public double Longitude { get; set; }
        }

        public class PlacesByTermsAndLocation : AbstractIndexCreationTask<Place, PlacesByTermsAndLocation.PlaceQuery>
        {
            public class PlaceQuery
            {
                public string Name { get; set; }
                public string Terms { get; set; }
            }

            public PlacesByTermsAndLocation()
            {
                Map = boards =>
                      from b in boards
                      select new
                      {
                          Name = b.Name.Boost(3),
                          Terms = new
                          {
                              b.Description,
                              b.Address
                          },
                          _ = SpatialIndex.Generate(b.Latitude, b.Longitude)
                      };

                Index(p => p.Name, FieldIndexing.Analyzed);
                Index(p => p.Terms, FieldIndexing.Analyzed);

Maxime Beaudoin

unread,
Sep 5, 2012, 9:51:03 AM9/5/12
to rav...@googlegroups.com
Uncomment the first assertion and comment WithinRadiusOf for a passing test.

Maxime Beaudoin

unread,
Sep 6, 2012, 10:34:24 AM9/6/12
to rav...@googlegroups.com
@Oren Eini: Is this format acceptable? It was suggested to me that it was probably not procedure.

Thanks, Max

Oren Eini (Ayende Rahien)

unread,
Sep 7, 2012, 4:55:26 AM9/7/12
to rav...@googlegroups.com
Reproduced the error, it will take some time to figure out exactly why this is happening.

Maxime Beaudoin

unread,
Sep 7, 2012, 10:07:52 AM9/7/12
to rav...@googlegroups.com
Thanks, I asked Itamar on Twitter already about one more thing.. I need support for what's called " Relevancy enhancement " in spatial queries, see:  http://www.ibm.com/developerworks/java/library/j-spatial/#searching.spatial. If this is not on the schedule I'll have to work on it sooner or later.

Oren Eini (Ayende Rahien)

unread,
Sep 7, 2012, 12:28:24 PM9/7/12
to rav...@googlegroups.com
This isn't on the schedule right now, no.
Would love to get a pull request for that.

Maxime Beaudoin

unread,
Sep 7, 2012, 1:23:56 PM9/7/12
to rav...@googlegroups.com
I'll see if I can mange so much by myself, I'm kinda new to pulling and open source. However, there seems to be a little bit of confusion.. Itamar says it's possible. We're having a discussion on Tweeter:

@synhershko: @maxbeaudoin what exactly? distance being a boost factor? yes, closer points will score higher, and you can combine this with more fields

As far as I can tell, WithinRadiusOf (assuming it doesn't break scores) would filter results where SortByDistance would be an "absolute sort" where scoring is ignored.

I'm trying to figure this out. I submitted that failing test to you yet WithinRadiusOf would not do what I need to do.. I don't want to exclude results based on my location, I want to bias results based on my location.

Maxime Beaudoin

unread,
Sep 7, 2012, 1:26:08 PM9/7/12
to rav...@googlegroups.com
I'm confused and out of doc for the new spatial query support :(

Oren Eini (Ayende Rahien)

unread,
Sep 8, 2012, 9:34:49 AM9/8/12
to rav...@googlegroups.com
Out of docs - the new spatial query support is not even two weeks old.
We will document that along with everything else when we release 1.2

Oren Eini (Ayende Rahien)

unread,
Sep 8, 2012, 9:35:09 AM9/8/12
to rav...@googlegroups.com
Also, on twitter you indicated that you have a solution, can you share it?

Maxime Beaudoin

unread,
Sep 8, 2012, 12:58:37 PM9/8/12
to rav...@googlegroups.com
Out of docs - sorry about that. I traced my way to knowledge ;).

Okay.

1.2-ready xunit test:  https://gist.github.com/3676730

This test will fail.

In order to make it past, you need to understand how scores are calculated and summed.
That table helped me a lot:

key   Doc    sum name,desc    distance    explanation
se/1 0      11.649    0.495      11.153      ~22km away, farthest
se/2    1      0.752     0.752      0.000   ~0km away, right on
se/3 2      6.329 0.752      5.577       ~11km away, closest

In Lucenet.Net Contrib.Spatial...

Calculating the distance in ShapeFieldCacheDistanceValueSource.cs line 58:

public override double DoubleVal(int doc)
{
var vals = cache.GetShapes(doc);
if (vals != null)
{
double v = enclosingInstance.calculator.Distance(enclosingInstance.from, vals[0]);
for (int i = 1; i < vals.Count; i++)
{
v = Math.Min(v, enclosingInstance.calculator.Distance(enclosingInstance.from, vals[i]));
}
return v;
}
        return Double.NaN; // ?? maybe max?
}

Calculating the score in FunctionQuery.cs in class AllScorer line 165:

public override float Score()
{
float score = qWeight * vals.FloatVal(doc);

// Current Lucene priority queues can't handle NaN and -Infinity, so
// map to -Float.MAX_VALUE. This conditional handles both -infinity
// and NaN since comparisons with NaN are always false.
return score > float.NegativeInfinity ? score : -float.MaxValue;
}

The distance score which is approximately "half the distance in km" is added to the name,desc score.

I've looked into Solr in order to understand how to correctly boost by distance.


Basically, score-by-inverse-of-geodist means score by inverse of distance.

This is done using Solr's recip function: http://wiki.apache.org/solr/FunctionQuery#recip 

That function allows complete control on how much to score by distance.

DISCLAIMER: The following is a quick and dirty fix. It should only assists in understanding the issue. IMHO, the fix should be applied somewhere else. Hopefully, you tell me!

Calculating the distance in ShapeFieldCacheDistanceValueSource.cs line 58: 

public override double DoubleVal(int doc)
{
var vals = cache.GetShapes(doc);
if (vals != null)
{
double v = enclosingInstance.calculator.Distance(enclosingInstance.from, vals[0]);
for (int i = 1; i < vals.Count; i++)
{
v = Math.Min(v, enclosingInstance.calculator.Distance(enclosingInstance.from, vals[i]));
}
                // Solr's 'recip' function where v = distance and v > 0.
        return v > 0 ? 1000/(1*v+1000) : 0;
}
    // Bug: Double.NaN will break the score once multiplied by weigth.
    // Bug?: Shape can be null when docs share identical shapes (points) > cached once, see: GetValues.
    return Double.NaN; // ?? maybe max?
}



Next step?

Max

Maxime Beaudoin

unread,
Sep 8, 2012, 1:07:38 PM9/8/12
to rav...@googlegroups.com
Actually the next step would be to be able to have control over function recip's parameters in the above post. Also: boosting by distance without filtering.

Oren Eini (Ayende Rahien)

unread,
Sep 9, 2012, 2:03:43 AM9/9/12
to rav...@googlegroups.com
Boosting by distance should be how we work by default. 
I think that by just making the change you suggested will resolve the issue.

Maxime Beaudoin

unread,
Sep 9, 2012, 2:09:18 AM9/9/12
to rav...@googlegroups.com
The test indeed passes but the change was made to Lucene.Net Spatial.Contrib project :S. I can't tell if that class is much used or if it's gonna break.

Oren Eini (Ayende Rahien)

unread,
Sep 9, 2012, 2:11:55 AM9/9/12
to rav...@googlegroups.com
That isn't a problem from our end. I'll have a build with this fix soon.

Maxime Beaudoin

unread,
Sep 9, 2012, 2:16:51 AM9/9/12
to rav...@googlegroups.com
Cool! I'll have to start working on the "boost by distance without filtering" then. Good night.

Oren Eini (Ayende Rahien)

unread,
Sep 9, 2012, 2:35:54 AM9/9/12
to rav...@googlegroups.com
That should just work, nothing you need to do.

Maxime Beaudoin

unread,
Sep 9, 2012, 12:04:54 PM9/9/12
to rav...@googlegroups.com
How does that work? I can't use WithinRadiusOf or RelateToShape for the reason that it does filter results right?

Oren Eini (Ayende Rahien)

unread,
Sep 9, 2012, 12:31:50 PM9/9/12
to rav...@googlegroups.com
a) I just pushed an update.
b) I am not sure that I am following. You want to have relevance to a point / shape, but not to filter by that shape?
How do you expect that to work?

Maxime Beaudoin

unread,
Sep 9, 2012, 12:39:58 PM9/9/12
to rav...@googlegroups.com
A) very nice! I just need to wait for the nigthly build?
B) Provide the query with a point and boost results by distance much like Google Maps does. It ranks nearby places higher but it also allows me to find places in Australia if I'm searching from Canada. I would expect a simple client api method BoostByDistance(double lat, double lng, double multiplier).

Oren Eini (Ayende Rahien)

unread,
Sep 9, 2012, 1:34:53 PM9/9/12
to rav...@googlegroups.com
a) It is already out.
b) You could do it starting in the next build:

RavenQueryStatistics stats;
var places = session.Advanced.LuceneQuery<Place, PlacesByTermsAndLocation>()
.WaitForNonStaleResults()
.Statistics(out stats)
.RelatesToShape(Constants.DefaultSpatialFieldName, "Point(45.54545 -73.63908)", SpatialRelation.Nearby)
.Where("(Name:(" + terms + ") OR Terms:(" + terms + "))")
.Take(10)
.ToList();

Maxime Beaudoin

unread,
Sep 9, 2012, 4:15:33 PM9/9/12
to rav...@googlegroups.com
Ok, I'll be working on a pull request. Have one already: updated the test you integrated for me for a much more complete and sophisticated version. I'll be working that file.
Reply all
Reply to author
Forward
0 new messages