Dealing with Int64 and Sorting on Numerics

59 views
Skip to first unread message

Adam Tybor

unread,
Aug 8, 2010, 5:41:36 PM8/8/10
to ravendb
I am trying to deal with an index that needs to have numeric range
querying capabilities and the value is a long. With the new
NumericField in lucene 2.9 does it matter if ints and longs are mixed
on the same index? My understanding is does, and when the document is
being indexed if the value falls below int.MAX_VALUE then an Int field
is created and not a long field.

The second issue I am running into is sorting on the numeric range
index. If I sort on the field value and not the _Range field its a
lexicographical sort which is not what I want. If specify the _Range
as the sort the results are not coming back sorted. I believe because
when the SortField is created for the query it doesn't have the
context that the field type should be SortField.LONG instead of
automatic.

So should there be a way to include type information in the index
definition to handle these types of things?

I don't mind playing with different alternatives to get a fix in
place, I am just wondering what some approaches could be so if I
contribute there is a good chance Oren can pull the changes in.

Adam

Ayende Rahien

unread,
Aug 9, 2010, 12:11:01 AM8/9/10
to rav...@googlegroups.com
This test passes:

public class SortingOnLong : BaseClientTest
{
[Fact]
public void CanSortOnLong()
{
using(var store = NewDocumentStore())
{
using(var session = store.OpenSession())
{
session.Store(new Foo
{
Value = 7147483647
});

session.Store(new Foo
{
Value = 25
});

session.Store(new Foo
{
Value = 3147483647
});

session.SaveChanges();
}

store.DatabaseCommands.PutIndex("long",
                               new IndexDefinition
                               {
                                Map = "from doc in docs select new { doc.Value}"
                               });

using (var session = store.OpenSession())
{
var foos = session.LuceneQuery<Foo>("long")
.WaitForNonStaleResults()
.OrderBy("Value")
.ToList();

Assert.Equal(3, foos.Count);

Assert.Equal(25, foos[0].Value);
Assert.Equal(3147483647, foos[1].Value);
Assert.Equal(7147483647, foos[2].Value);
}
}
}

public class Foo
{
public string Id { get; set; }
public long Value { get; set; }

Adam Tybor

unread,
Aug 9, 2010, 2:44:57 AM8/9/10
to ravendb
I need to do some more homework but here are results of a quick test
with my sample index.

var query = session.LuceneQuery<Equipment>("Equipment/
ByEquipmentId")
.WaitForNonStaleResults(TimeSpan.FromMinutes(5))
.OrderBy("Number")
.SelectFields<JObject>("Number")
.Take(400);

var results = query.ToList();

long? previousNumber = null;
long currentNumber = 0;

foreach (var doc in results)
{
currentNumber = Convert.ToInt64((string)doc["Number"]);
if (previousNumber.HasValue && previousNumber >
currentNumber)
Console.WriteLine("{0} is not less than {1}",
previousNumber, currentNumber);
previousNumber = currentNumber;
}


Executing query '' on index 'Equipment/ByEquipmentId' in 'http://
localhost:8080'
Query returned 400/110000 results
1001668533 is not less than 100193850
1004137010 is not less than 100417948
1004907599 is not less than 100494370
1006552480 is not less than 100655266
1009780851 is not less than 101012970
1010869914 is not less than 101098319
1011009389 is not less than 101112001
1011459554 is not less than 101147620
1012297405 is not less than 101241531
1012864749 is not less than 101304672
1016975239 is not less than 101704038
1017684807 is not less than 10178525
1018479854 is not less than 101856051
1019847996 is not less than 101994306
1021725418 is not less than 102192047
1022402211 is not less than 102254161
1023631993 is not less than 102379439
1027135778 is not less than 10277498
1028913941 is not less than 102902349
1031267611 is not less than 103151640
1034250196 is not less than 103425331
1036155819 is not less than 103634138
1038087628 is not less than 103823248
1041929232 is not less than 104219256
1042266896 is not less than 104250464
1042606488 is not less than 104261224
1042859665 is not less than 104291057
1043245804 is not less than 104364899
1046885822 is not less than 104689804
1046922101 is not less than 104721542
1051652884 is not less than 105178771
1052335496 is not less than 105234184
1055561594 is not less than 10555746
1057067780 is not less than 105717069
1060768064 is not less than 10610358
1062322891 is not less than 106234695
1063049049 is not less than 106318458
Found 110000 results and they are not stale

Ayende Rahien

unread,
Aug 9, 2010, 2:48:20 AM8/9/10
to ravendb
Okay, I see your problem, I think that I sorted this stuff lexically, I'll look further into it.

Adam Tybor

unread,
Aug 9, 2010, 3:06:16 AM8/9/10
to ravendb
Thanks,
From what I found I should be able to sort on "Number_Range" which
would do a true numeric sort, however when creating the SortField for
the lucene query you need to pass in SortField.LONG as the typeid.
Currently raven doesn't specify a typeid which defaults to auto which
gives me really weird results because of how the numeric range is
stored in lucene.

Ayende Rahien

unread,
Aug 9, 2010, 4:32:29 PM8/9/10
to rav...@googlegroups.com
Urgh, that was hard to do.
Basically, we have a new option in IndexDefintion, what is the sort for the field.
When you specify that, things work :-)
See the test.

Adam Tybor

unread,
Aug 9, 2010, 11:52:19 PM8/9/10
to ravendb
Awesome, just tested against my 100K docs and it works perfectly!

Adam Tybor

unread,
Aug 10, 2010, 12:07:26 AM8/10/10
to ravendb
Would be nice if on a range query with sort options numeric I don't
have to query the _Range like you did for sort... am I asking for too
much?

Ayende Rahien

unread,
Aug 10, 2010, 12:13:53 AM8/10/10
to rav...@googlegroups.com
pls create an issue for this, I am working on the include feature now, I wouldn't be able to get to it today
Reply all
Reply to author
Forward
0 new messages