RavenDB 4 - Creating index with large definition fails

105 views
Skip to first unread message

Andrej Krivulčík

unread,
Oct 13, 2017, 6:59:09 AM10/13/17
to RavenDB - 2nd generation document database
When trying to create an index with large definition (seemingly > 64 kB), the index creation fails with the following exception (and sometimes it also crashes the server, tested with the latest nightly - Build 40, Version 4.0, SemVer 4.0.0-nightly-20171013-0400, Commit 1749c69):

Raven.Client.Exceptions.RavenException: 'System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at Sparrow.Json.BlittableJsonTextWriter.Flush() in C:\Builds\RavenDB-4.0-Nightly\src\Sparrow\Json\BlittableJsonTextWriter.cs:line 467
   at Sparrow.Json.BlittableJsonTextWriter.Dispose() in C:\Builds\RavenDB-4.0-Nightly\src\Sparrow\Json\BlittableJsonTextWriter.cs:line 642
   at Raven.Server.Documents.Indexes.IndexDefinitionBase`1.Persist(TransactionOperationContext context, StorageEnvironmentOptions options) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\IndexDefinitionBase.cs:line 189
   at Raven.Server.Documents.Indexes.IndexStorage.CreateSchema() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\IndexStorage.cs:line 94
   at Raven.Server.Documents.Indexes.Index.Initialize(StorageEnvironment environment, DocumentDatabase documentDatabase, IndexingConfiguration configuration, PerformanceHintsConfiguration performanceHints) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Index.cs:line 423
   at Raven.Server.Documents.Indexes.Index.Initialize(DocumentDatabase documentDatabase, IndexingConfiguration configuration, PerformanceHintsConfiguration performanceHints) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Index.cs:line 347
   at Raven.Server.Documents.Indexes.Static.MapIndex.CreateNew(IndexDefinition definition, DocumentDatabase documentDatabase) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Static\MapIndex.cs:line 190
   at Raven.Server.Documents.Indexes.IndexStore.HandleStaticIndexChange(String name, IndexDefinition definition) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\IndexStore.cs:line 309
   at Raven.Server.Documents.Indexes.IndexStore.HandleChangesForStaticIndexes(DatabaseRecord record, Int64 index) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\IndexStore.cs:line 208
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Raven.Server.ServerWide.RachisLogIndexNotifications.<WaitForIndexNotification>d__7.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\ServerWide\ClusterStateMachine.cs:line 1227
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Server.Documents.Indexes.IndexStore.<CreateIndex>d__22.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\IndexStore.cs:line 402
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Server.Documents.Handlers.Admin.AdminIndexHandler.<Put>d__0.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Handlers\Admin\AdminIndexHandler.cs:line 32
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Server.Routing.RequestRouter.<HandlePath>d__5.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Routing\RequestRouter.cs:line 107
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Raven.Server.RavenServerStartup.<RequestHandler>d__11.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\RavenServerStartup.cs:line 157'


I just call TestLargeIndex.TestLargeIndexTest(); from Program.cs.

After reducing the size of the index (comment or remove the line https://gist.github.com/krivulcik/6ea7a3251783b8b647d49a512d95c48f#file-testlargeindex-cs-L667 ), the index is created correctly.

Oren Eini (Ayende Rahien)

unread,
Oct 15, 2017, 5:56:34 AM10/15/17
to ravendb
A) http://issues.hibernatingrhinos.com/issue/RavenDB-8983, fixed
B) Why?
C) Thanks for the clear bug report.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrej Krivulčík

unread,
Oct 16, 2017, 2:46:08 AM10/16/17
to RavenDB - 2nd generation document database
Thanks for the handling this so fast.

> Why?
I need to quickly find counts of entities in each "category" (as defined by the index) using various filters. I'll use this data to draw charts etc. I did the categorization in the index as a proof of concept (several attributes) and after scaling up to all the attributes, I encountered the error. Ultimately, I'll do the comparison/categorization beforehand and store the categories inside the entities, simplifying the index.

Oren Eini (Ayende Rahien)

unread,
Oct 16, 2017, 6:36:01 AM10/16/17
to ravendb
Is there a reason not to use facets for this? I'm not sure that I understand what you mean by categories?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Oren Eini (Ayende Rahien)

unread,
Oct 16, 2017, 6:36:15 AM10/16/17
to ravendb
Also, note that we have the notion of additional sources, you can push a lot of that code to there.

Andrej Krivulčík

unread,
Oct 16, 2017, 7:04:22 AM10/16/17
to RavenDB - 2nd generation document database
Facets turned out to be quite slow for this use case. This is basically replicating facet ranges, and I will be using facets on top of these "categories" (= ranges). However, querying on a database with hundreds of thousands of records with mostly unique numeric values did not perform very well in 3.5.

I'll appreciate any suggestions in this - if querying ranged facets with filtering should be reasonably fast, I will give it a try on this dataset on 4.0.

Andrej Krivulčík

unread,
Oct 16, 2017, 7:04:49 AM10/16/17
to RavenDB - 2nd generation document database
Yes, thanks for this suggestion, I noticed the capability in other forum threads.

Federico Lois

unread,
Oct 17, 2017, 5:22:58 PM10/17/17
to rav...@googlegroups.com
Andrej, 

If they are not fast enough in 4.0 I would like to have a repro so I can understand if the behavior is something that it is under the current algorithm design constraint or we can optimize it further. If it is an algorithmic shortcoming there is nothing we can do for the 4.0 release timeframe but will probably give us incentive to look for a better faceting algorithm (which there are many). Given it is a very restricted use case, it can even be doable in a minor release.

Federico

--

Andrej Krivulčík

unread,
Oct 18, 2017, 4:21:55 AM10/18/17
to RavenDB - 2nd generation document database
The speed difference is not as significant as I anticipated. Also, the duration depends somewhat on the number of records, but it seems that it's not linear dependence.

Here are several results of a bunch of runs of the code below (with 1M documents):

AggregateBy: 797,7943 ms
Facets:      6687,8775 ms
Filtered AggregateBy: 487,4873 ms
Filtered Facets:      1203,4317 ms

AggregateBy: 560,09 ms
Facets:      1341,8023 ms
Filtered AggregateBy: 512,108 ms
Filtered Facets:      1201,4911 ms

AggregateBy: 555,4504 ms
Facets:      1333,8973 ms
Filtered AggregateBy: 497,0776 ms
Filtered Facets:      1202,1661 ms

AggregateBy: 635,5468 ms
Facets:      1615,4753 ms
Filtered AggregateBy: 529,2259 ms
Filtered Facets:      1427,6621 ms

Code, called like this: Task.Run(() => TestFacetPerformance.TestFacetPerformanceTest()).Wait();

using Raven.Client.Documents;
using Raven.Client.Documents.Indexes;
using Raven.Client.Documents.Queries.Facets;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.Linq;
using System.Threading.Tasks;

namespace RavenDB4RCTests
{
   
public class TestFacetPerformance
   
{
       
public const string NUM_FORMAT = "########0.###";

       
public static Dictionary<int, double> Ranges = new Dictionary<int, double>
       
{
           
{ 0, 0 },
           
{ 1, 1 },
           
{ 2, 2 },
           
{ 3, 3 },
           
{ 4, 5 },
           
{ 5, 10 },
           
{ 6, 20 },
           
{ 7, 30 },
           
{ 8, 50 },
           
{ 9, 100 },
           
{ 10, 200 },
           
{ 11, 300 },
           
{ 12, 500 },
           
{ 13, 1000 },
           
{ 14, 2000 },
           
{ 15, 3000 },
           
{ 16, 5000 },
           
{ 17, 10000 },
           
{ 18, 20000 },
           
{ 19, 30000 },
           
{ 20, 50000 },
           
{ 21, 100000 },
           
{ 22, 200000 },
           
{ 23, 300000 },
           
{ 24, 500000 },
           
{ 25, 1000000 },
           
{ 26, 2000000 },
           
{ 27, 3000000 },
           
{ 28, 5000000 },
           
{ 29, 10000000 },
           
{ 30, 20000000 },
           
{ 31, double.MaxValue },
       
};

       
public static async Task TestFacetPerformanceTest()
       
{
           
var documentStore = new DocumentStore
           
{
               
Urls = new[] { "http://4.live-test.ravendb.net" },
               
Database = "TestFacetPerformance"
           
};

            documentStore
.Initialize();

           
new DocIndex().Execute(documentStore);

            await
InitFacetDoc(documentStore);

           
if (await ShouldInitData(documentStore))
           
{
                await
InitializeData(documentStore);
           
}

           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var sw = new Stopwatch();
                sw
.Start();
               
var query = session.Query<DocView, DocIndex>()
                   
.AggregateBy(x => x.Value1Category)
                   
.CountOn(x => x.Id);
               
var results = await query.ToListAsync();
                sw
.Stop();
               
Console.WriteLine("AggregateBy: " + sw.Elapsed.TotalMilliseconds + " ms");
           
}

           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var sw = new Stopwatch();
                sw
.Start();
               
var results = await session.Query<DocView, DocIndex>()
                   
.ToFacetsAsync("facet/docs");
                sw
.Stop();
               
Console.WriteLine("Facets:      " + sw.Elapsed.TotalMilliseconds + " ms");
           
}

           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var sw = new Stopwatch();
                sw
.Start();
               
var query = session.Query<DocView, DocIndex>()
                   
.Where(x => x.Value1 > 500000 && x.Value1 < 1000000)
                   
.AggregateBy(x => x.Value1Category)
                   
.CountOn(x => x.Id);
               
var results = await query.ToListAsync();
                sw
.Stop();
               
Console.WriteLine("Filtered AggregateBy: " + sw.Elapsed.TotalMilliseconds + " ms");
           
}

           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var sw = new Stopwatch();
                sw
.Start();
               
var results = await session.Query<DocView, DocIndex>()
                   
.Where(x => x.Value1 > 500000 && x.Value1 < 1000000)
                   
.ToFacetsAsync("facet/docs");
                sw
.Stop();
               
Console.WriteLine("Filtered Facets:      " + sw.Elapsed.TotalMilliseconds + " ms");
           
}

           
Console.WriteLine("Press Enter.");
           
Console.ReadLine();
       
}

       
private static async Task<bool> ShouldInitData(DocumentStore documentStore)
       
{
           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var doc = await session.LoadAsync<Doc>("doc/1");
               
return doc == null;
           
}
       
}

       
private static async Task InitializeData(DocumentStore documentStore)
       
{
           
var start = 0;
           
var batches = 1000;
           
Console.WriteLine("Generating data.");
           
var rng = new Random();
           
for (int batchNo = start; batchNo < start + batches; batchNo++)
           
{
               
Console.WriteLine($"{DateTime.Now.ToLongTimeString()}: Generating batch {batchNo + 1}/{batches}");
               
using (var session = documentStore.OpenAsyncSession())
               
{
                   
for (int i = 1; i <= 1000; i++)
                   
{
                       
var numVals = Enumerable.Range(1, 5).ToDictionary(x => "Value" + x, _ => rng.NextDouble() < 0.9 ? (rng.NextDouble() * 100000000) : (double?)null);
                        await session
.StoreAsync(new Doc
                       
{
                           
Id = "doc/" + (batchNo * 1000 + i),
                           
NumVals = numVals,
                           
NumCategories = numVals.ToDictionary(x => x.Key, x => CategorizeValue(x.Value)),
                       
});
                   
}
                    await session
.SaveChangesAsync();
               
}
           
}
           
Console.WriteLine("Data generated.");
       
}

       
private static int? CategorizeValue(double? value)
       
{
           
if (value == null)
           
{
               
return null;
           
}

           
return Ranges.First(x => value < x.Value).Key;
       
}

       
private static async Task InitFacetDoc(DocumentStore documentStore)
       
{
           
using (var session = documentStore.OpenAsyncSession())
           
{
               
var doc = new FacetSetup
               
{
                   
Id = "facet/docs",
                   
Facets = new List<Facet>
                   
{
                       
new Facet
                       
{
                           
Name = "Value1_D_Range",
                           
Mode = FacetMode.Ranges,
                           
Ranges = new[] { "[NULL TO " + Ranges.First().Value.ToString(NUM_FORMAT, CultureInfo.InvariantCulture) + "]" }
                               
.Concat(
                                   
Ranges.Values.Take(Ranges.Count - 2)
                                       
.Zip(Ranges.Values.Skip(1).Take(Ranges.Count - 1),
                                           
(from, to) => $"[{from.ToString(NUM_FORMAT, CultureInfo.InvariantCulture)} TO {to.ToString(NUM_FORMAT, CultureInfo.InvariantCulture)}]")
                                   
)
                               
.Concat(new[] { "[" + Ranges.Take(Ranges.Count - 1).Last().Value.ToString(NUM_FORMAT) + " TO NULL]" })
                               
.ToList(),
                       
},
                   
},
               
};
                await session
.StoreAsync(doc);
                await session
.SaveChangesAsync();
           
}
       
}

       
public class Doc
       
{
           
public string Id { get; set; }
           
public Dictionary<string, double?> NumVals { get; set; }
           
public Dictionary<string, int?> NumCategories { get; set; }
       
}

       
public class DocView
       
{
           
public string Id { get; set; }
           
public double? Value1 { get; set; }
           
public int? Value1Category { get; set; }
           
public double? Value2 { get; set; }
           
public int? Value2Category { get; set; }
           
public double? Value3 { get; set; }
           
public int? Value3Category { get; set; }
           
public double? Value4 { get; set; }
           
public int? Value4Category { get; set; }
           
public double? Value5 { get; set; }
           
public int? Value5Category { get; set; }
       
}

       
public class DocIndex : AbstractIndexCreationTask<Doc, DocView>
       
{
           
public DocIndex()
           
{
               
Map = docs =>
                   
from doc in docs
                   
select new
                   
{
                        doc
.Id,
                       
Value1 = doc.NumVals["Value1"],
                       
Value1Category = doc.NumCategories["Value1"],
                       
Value2 = doc.NumVals["Value2"],
                       
Value2Category = doc.NumCategories["Value2"],
                       
Value3 = doc.NumVals["Value3"],
                       
Value3Category = doc.NumCategories["Value3"],
                       
Value4 = doc.NumVals["Value4"],
                       
Value4Category = doc.NumCategories["Value4"],
                       
Value5 = doc.NumVals["Value5"],
                       
Value5Category = doc.NumCategories["Value5"],
                   
};


               
StoreAllFields(FieldStorage.Yes);
           
}
       
}
   
}
}

I'll test it some more, we'll need to have this functionality for around 10M documents.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Andrej Krivulčík

unread,
Oct 18, 2017, 4:24:24 AM10/18/17
to RavenDB - 2nd generation document database
I just noticed that the test server http://4.live-test.ravendb.net probably ran out of disk space. Not sure if this invalidates the results. The documents are all present in the database but the index is stale (didn't notice that before).

Index Name: DocIndex
Document ID:
Timestamp: 2017-10-18T08:15:14.0369121Z
Action: Critical
Unexpected exception occurred: System.IO.IOException: There is not enough space on the disk
   at Lucene.Net.Index.IndexWriter.HandleMergeException(Exception t, OneMerge merge, IState state)
   at Lucene.Net.Index.IndexWriter.Merge(OneMerge merge, IState state)
   at Lucene.Net.Index.SerialMergeScheduler.Merge(IndexWriter writer, IState state)
   at Lucene.Net.Index.IndexWriter.PrepareCommit(IDictionary`2 commitUserData, IState state)
   at Lucene.Net.Index.IndexWriter.Commit(IDictionary`2 commitUserData, IState state)
   at Raven.Server.Documents.Indexes.Persistence.Lucene.LuceneIndexWriter.Commit(IState state) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Persistence\Lucene\LuceneIndexWriter.cs:line 72
   at Raven.Server.Documents.Indexes.Persistence.Lucene.IndexWriteOperation.Commit(IndexingStatsScope stats) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Persistence\Lucene\IndexWriteOperation.cs:line 93
   at Raven.Server.Documents.Indexes.Index.DoIndexingWork(IndexingStatsScope stats, CancellationToken cancellationToken) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Index.cs:line 1255
   at Raven.Server.Documents.Indexes.Index.ExecuteIndexing() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Documents\Indexes\Index.cs:line 770

Oren Eini (Ayende Rahien)

unread,
Oct 18, 2017, 5:36:43 AM10/18/17
to ravendb
Please note that perf testing should NOT be done on the live test instance.
This is a t2.medium machine that is under constant load. Hardly a proper benchmark machine.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Andrej Krivulčík

unread,
Oct 18, 2017, 9:38:50 AM10/18/17
to RavenDB - 2nd generation document database
Yes, certainly, I wanted to provide a quick way to check the results, without waiting for data to be generated (takes a while).

I ran the benchmarks on a separate server with 5M documents and the results are approximately 500-900 ms for AggregateBy, and much more for Facets. How much more varies wildly, first several runs there were 31-second queries, now they are in 4-5 seconds range.

Oren Eini (Ayende Rahien)

unread,
Oct 19, 2017, 2:11:34 AM10/19/17
to ravendb

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
Reply all
Reply to author
Forward
0 new messages