Index is stale long after last insertion

Dodd

unread,

Apr 9, 2012, 6:28:57 AM4/9/12

to ravendb

Hello,

My index has been stale for a long time since last insertion. I am
wondering why it takes so long. I've waited for days.

Well, to make it clearer, I made a simple demo to reproduce:

namespace RavenStaleIndexRepro
{
public class Foo
{
public string Bar { get; set; }
}

public class SimpleReduceResult
{
public string Bar { get; set; }
public long Count { get; set; }
}

public class SimpleMRIndex : AbstractIndexCreationTask<Foo,
SimpleReduceResult>
{
public SimpleMRIndex()
{
Map = foos => from foo in foos
select new
{
Bar = foo.Bar,
Count = 1L
};

Reduce = results => from result in results
group result by result.Bar
into g
select new
{
Bar = g.Key,
Count = g.Sum(c => c.Count)
};
}
}

class Program
{
static void Main(string[] args)
{
using (var store = new DocumentStore() { Url = @"http://
localhost:8080"}.Initialize())
{

IndexCreation.CreateIndexes(typeof(SimpleMRIndex).Assembly, store);

for (int i = 0; i < 500000; i++)
{
using (var session = store.OpenSession())
{
session.Store(new Foo { Bar = "IamBar" });
session.SaveChanges();
}
Console.Write(".");
}
}
}
}
}

That is it. Quite simple - just insert 500,000 docs. Almost
immediately after the above program is done, the reduced result is
already correct, shown in the Management Studio:

Projection
IamBar
{
Bar: "IamBar"
Count: "500000"
}

However, the status of the index is 'stale'. And, it seems that it
will be stale forever (at least it is stale no matter how long I
wait).

Where did I do wrong? Or did I misunderstand something? I am using
Build 888. Thanks a lot.

Oren Eini (Ayende Rahien)

unread,

Apr 9, 2012, 6:31:23 AM4/9/12

to rav...@googlegroups.com

Dodd,

what you are doing is generating a map/reduce index where the reduce has to do a LOT of work.

It is likely that it is taking a long time to load and work through all 500,000 items that share the same "Bar" value.

Dodd

unread,

Apr 9, 2012, 6:36:55 AM4/9/12

to ravendb

Hi Oren,

Thanks for your reply.

Yes, it does take time to do the reduce. But the reduce is already
done, isn't it? According to the stale result:

Projection
IamBar
{
Bar: "IamBar"
Count: "500000"
}

What else needs to be done? Sorry if I don't quite understand the
Raven reduce internals.

On Apr 9, 6:31 pm, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
wrote:

Oren Eini (Ayende Rahien)

unread,

Apr 9, 2012, 7:13:39 AM4/9/12

to rav...@googlegroups.com

Good point, I am not sure, I'll look at that later today or tomorrow.

Dodd

unread,

Apr 10, 2012, 1:48:29 AM4/10/12

to ravendb

Great. Looking forward to your update :)

Btw, extra information: if I insert 150,000 rather than 500,000, the
index will be up-to-date almost immediately. So this stale-for-a-long-
time behavior seems to only occur when the doc number exceeds a
certain threshold (probably between 150,000 and 500,000).

Thanks,
Dodd

On Apr 9, 7:13 pm, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
wrote:

Dodd

unread,

Apr 12, 2012, 4:11:28 AM4/12/12

to ravendb

Hello, any update on this?

Jian Huang

unread,

Apr 14, 2012, 11:01:51 AM4/14/12

to rav...@googlegroups.com

Hi Oren,

I got the same issue in RavenDB build-888.

It makes sense it will take long time when the amount of document is huge, however,according to the log file, RavenDB thought there is no task to handle even the index is not refreshed and it's state is stale.

Is there anything wrong with the task queue when data volume exceed some threshold?

在 2012年4月9日星期一UTC+8下午7时13分39秒，Oren Eini写道：

Oren Eini (Ayende Rahien)

unread,

Apr 14, 2012, 11:03:28 AM4/14/12

to rav...@googlegroups.com

I am not sure, we are going to go over this in detail next week.

Tobi

unread,

Apr 16, 2012, 3:35:01 AM4/16/12

to rav...@googlegroups.com

On 14.04.2012 17:01, Jian Huang wrote:

> however,according to the log file, RavenDB thought there is no task to
> handle even the index is not refreshed and it's state is stale.
> Is there anything wrong with the task queue when data volume exceed some
> threshold?

Have you seen any errors or warnings in the log?

I can reproduce this state by randomly resetting indexes and already sent
Oren some code to reproduce this. Maybe it's the same issue.

In some cases I had a OOME logged and the indexing task just stopped while
the indexes are still stale.

Tobias

Oren Eini (Ayende Rahien)

unread,

Apr 16, 2012, 5:21:54 AM4/16/12

to rav...@googlegroups.com

No, that isn't related to that.

Oren Eini (Ayende Rahien)

unread,

Apr 16, 2012, 5:24:23 AM4/16/12

to rav...@googlegroups.com

Okay, here is what is going on.

We are reducing all 500,000 results. That takes about 30 seconds, all told.

The problem is that we don't expect to have so many reduce keys with the same value, so what is actually happening is that we keep scanning the mapped results for more and more data, and each time we advance only by 512 items.

That means that under your scenario, we are looking at about 1000 reduces of 500K items, all of which is actually NOT relevant.

I'll fix that shortly.

On Sat, Apr 14, 2012 at 6:01 PM, Jian Huang <golive...@gmail.com> wrote:

Oren Eini (Ayende Rahien)

unread,

Apr 16, 2012, 6:18:30 AM4/16/12

to rav...@googlegroups.com

And, now we have it working.

In your scenario, total time to complete the operation of indexing 500,000 results with the same reduce key is....

http://www.surveymonkey.com/s/ZRMHPCK

Dodd

unread,

Apr 17, 2012, 4:27:28 AM4/17/12

to ravendb

Hi Oren,

Thanks a lot for finding out the root cause of the issue. That's
great!

Given that 'now we have it working', do you mean that the fix is
already included in the latest unstable version 909?

Kind regards,
Dodd

On 4月16日, 下午6时18分, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>

wrote:
> And, now we have it working.
> In your scenario, total time to complete the operation of indexing 500,000
> results with the same reduce key is....
>
> http://www.surveymonkey.com/s/ZRMHPCK
>
> On Mon, Apr 16, 2012 at 12:24 PM, Oren Eini (Ayende Rahien) <
>
>
>
> aye...@ayende.com> wrote:
> > Okay, here is what is going on.
> > We are reducing all 500,000 results. That takes about 30 seconds, all told.
> > The problem is that we don't expect to have so many reduce keys with the
> > same value, so what is actually happening is that we keep scanning the
> > mapped results for more and more data, and each time we advance only by 512
> > items.
> > That means that under your scenario, we are looking at about 1000 reduces
> > of 500K items, all of which is actually NOT relevant.
> > I'll fix that shortly.
>

> > On Sat, Apr 14, 2012 at 6:01 PM, Jian Huang <golive.hu...@gmail.com>wrote:
>
> >> Hi Oren,
>
> >> I got the same issue in RavenDB build-888.
> >> It makes sense it will take long time when the amount of document is
> >> huge, however,according to the log file, RavenDB thought there is no task
> >> to handle even the index is not refreshed and it's state is stale.
> >> Is there anything wrong with the task queue when data volume exceed some
> >> threshold?
>
> >> 在 2012年4月9日星期一UTC+8下午7时13分39秒，Oren Eini写道：
>
> >>> Good point, I am not sure, I'll look at that later today or tomorrow.
>

> >>>> > > IndexCreation.CreateIndexes(**typeof(SimpleMRIndex).**Assembly,

Oren Eini (Ayende Rahien)

unread,

Apr 17, 2012, 5:31:38 AM4/17/12

to rav...@googlegroups.com

Yes

Justin A

unread,

Apr 18, 2012, 1:41:08 AM4/18/12

to rav...@googlegroups.com

guess-voted !

Dodd

unread,

Apr 21, 2012, 6:58:10 AM4/21/12

to ravendb

According to our test, the issue has been fixed in build 909. Many
thanks.

Dodd

On 4月17日, 下午5时31分, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
wrote:

Wyatt Barnett

unread,

Aug 3, 2012, 4:58:40 PM8/3/12

to rav...@googlegroups.com

I think I'm running into this issue myself on build 960. In our case we are importing a pretty massive dataset -- about 29m docs in all. The first 11.8m or so are every single Census Block in the 2010 census with some population and housing data.

I'm extracting the data out of DBF files it is pretty slow, at best, to import. The kind of thing you start one morning and pick up a few days later. I kicked off the importer yesterday morning as I was stuck in meetings all day. It chugged along for about 12 hours until ravendb, as near as I can tell, hung up halfway through a processing the blocks for New York -- which has about 350k census blocks.

Anyhow, I figured out things crashed this morning, figured out where and started to re-run the import starting with overwriting new york as I had no idea where it died importing the state. Import side of the game went swimmingly and finished up. But it seems that Raven has choked on something.

First, looking at the logs I don't see any more mapping and reducing happening. The last indication of any of that activity is at 1:35 pm or so:

Debug 8/3/2012 1:35:53 PM No work was found, workerWorkCounter: 0, for: IndexingExecuter, will wait for additional work Raven.Database.Indexing.WorkContext
Debug 8/3/2012 1:35:53 PM No work was found, workerWorkCounter: 0, for: ReducingExecuter, will wait for additional work Raven.Database.Indexing.WorkContext
Debug 8/3/2012 1:34:46 PM Mapped result for index 'CensusBlock/PopulationTotalsByState' doc 'CensusBlockDocuments/370399306012027': '{ ... Raven.Database.Indexing.Index.Indexing
Debug 8/3/2012 1:34:46 PM Mapped result for index 'CensusBlock/PopulationTotalsByState' doc 'CensusBlockDocuments/370399304004060': '{ ... Raven.Database.Indexing.Index.Indexing
Debug 8/3/2012 1:34:46 PM Mapped result for index 'CensusBlock/PopulationTotalsByState' doc 'CensusBlockDocuments/370399304003059': '{ ... Raven.Database.Indexing.Index.Indexing

Indexes haven't updated at all and are still stale, including the default DocumentsByTagName. Said index is still stuck at it is still stuck at 6.8m which was where I started the 2nd round of imports. Raven is still consuming 5gb or so of RAM and 25% or so of CPU. Stats also has not moved -- we have not seen a change in the number of indexing nor reducing events.

To help things along, our document looks like:

public class CensusBlockDocument
{
public CensusBlockDocument()
{}
public CensusBlockDocument(string fipsCode)
{
Id = "CensusBlockDocuments/" + fipsCode;
}
public string Id { get; set; }
public string FullFipsCode
{
get
{
if (Id.Contains("/"))
{
var split = Id.Split('/');
if (split.Length > 1)
{
return split[1];
}
return split[0];
}
return Id;
}
}
public DateTimeOffset Created { get; set; }
public string StateFipsId { get; set; }
public string CountyFipsId { get; set; }
public int HousingUnits { get; set; }
public int Population { get; set; }
public double Latitude { get; set; }
public double Longitude { get; set; }
}

And the indexes look like:

public class CensusBlock_PopulationTotalsByStateAndCounty : AbstractIndexCreationTask<CensusBlockDocument, CensusBlock_PopulationTotalsByStateAndCounty.Result>
{
public class Result
{
public string StateFipsId { get; set; }
public string CountyFipsId { get; set; }
public int Population { get; set; }
public int HousingUnits { get; set; }
public int BlockCount { get; set; }
}
public CensusBlock_PopulationTotalsByStateAndCounty()
{
Map = cbds => from cbd in cbds
select new
{
StateFipsId = cbd.StateFipsId,
CountyFipsId = cbd.CountyFipsId,
Population = cbd.Population,
HousingUnits = cbd.HousingUnits,
BlockCount = 1
};
Reduce = result => from r in result
group r by new { r.StateFipsId, r.CountyFipsId }
into g
select new
{
CountyFipsId = g.Key.CountyFipsId,
StateFipsId = g.Key.StateFipsId,
Population = g.Sum(x => (int)x.Population),
HousingUnits = g.Sum(x => (int)x.HousingUnits),
BlockCount = g.Sum(x => (int)x.BlockCount)
};
Index(x => x.CountyFipsId, FieldIndexing.NotAnalyzed);
Index(x => x.StateFipsId, FieldIndexing.NotAnalyzed);
Store(x => x.CountyFipsId, FieldStorage.Yes);
Store(x => x.StateFipsId, FieldStorage.Yes);
}
}
public class CensusBlock_PopulationTotalsByState : AbstractIndexCreationTask<CensusBlockDocument, CensusBlock_PopulationTotalsByState.Result>
{
public class Result
{
public string StateFipsId { get; set; }
public int Population { get; set; }
public int HousingUnits { get; set; }
public int BlockCount { get; set; }
}
public CensusBlock_PopulationTotalsByState()
{
Map = cbds => from cbd in cbds
select new
{
StateFipsId = cbd.StateFipsId,
Population = cbd.Population,
HousingUnits = cbd.HousingUnits,
BlockCount = 1
};
Reduce = result => from r in result
group r by new {r.StateFipsId}
into g
select new
{
StateFipsId = g.Key.StateFipsId,
Population = g.Sum(x => (int) x.Population),
HousingUnits = g.Sum(x => (int) x.HousingUnits),
BlockCount = g.Sum(x => (int) x.BlockCount)
};
}
}

Oren Eini (Ayende Rahien)

unread,

Aug 4, 2012, 5:06:10 AM8/4/12

to rav...@googlegroups.com

Hm,

It is possible that what is happening is that it is busy reducing something?

Can you try to break into the process and see what it is doing?

Wyatt Barnett

unread,

Aug 4, 2012, 6:14:55 AM8/4/12

to rav...@googlegroups.com

Perhaps, I'm not sure how to do that though -- it is running on a
server without any dev tools installed. Please advise.

Furthermore, kind of a moot point -- the process crashed at some point
overnight, it is replaying log files now. The interesting part is that
the logfiles cease being updated at 1:30pm or so yesterday. There are
also alot of them -- 236 total, 3.36gb worth.

I'm going to let it do it's thing and hopefully it will come up enough
that I can actually do a visual on the box.

On Sat, Aug 4, 2012 at 5:06 AM, Oren Eini (Ayende Rahien)

Wyatt Barnett

unread,

Aug 4, 2012, 9:41:20 AM8/4/12

to rav...@googlegroups.com

9:37am update: ESENT has runned it's log files I think. The server is still chugging along at 25% CPU or so. I can't get /stats to work on any tenant database, they all just hang and timeout. Server reports:

2012-08-04 09:33:46.4528 Raven.Database.Server.HttpServer Error on request

2012-08-04 09:33:46.4528 Raven.Database.Server.HttpServer Request # 28: GET - 120,273 ms - - 500 - /databases/testdb/stats

The default DB does come up and seems to be working just fine.

I'm just going to let it chug for the day and see where it gets me unless there is something you want me to do.

On Saturday, August 4, 2012 5:06:10 AM UTC-4, Oren Eini wrote:

Hm,
It is possible that what is happening is that it is busy reducing something?

Can you try to break into the process and see what it is doing?

Oren Eini (Ayende Rahien)

unread,

Aug 5, 2012, 4:19:55 AM8/5/12

to rav...@googlegroups.com

In task manager, right click and choose Create Dump.

Are you running with CircularLogs = off?

Wyatt Barnett

unread,

Aug 6, 2012, 9:58:17 AM8/6/12

to rav...@googlegroups.com

Gotcha. If I can reproduce I'll try and do a dump while it is hanging.

Yes, circular logs are off: Here are the relevent app config settings:

<add key="Raven/DataDir" value="F:\RavenDb_Data\RavenDb"/>
<add key="Raven/AnonymousAccess" value="None"/>
<add key="Raven/Authorization/Windows/RequiredGroups" value="[our groups]" />
<add key="Raven/BackgroundTasksPriority" value="AboveNormal" />
<add key="Raven/Esent/MaxVerPages" value="1024" />
<add key="Raven/Esent/DbExtensionSize" value="128" />
<add key="Raven/Esent/LogsPath" value="E:\RavenDb_Logs\RavenDb" />

I took a look at the logs folder, that was relatively clean -- just
one 16mb log file, not 200+.

At some point it decided to chug through things -- log file grew from
~300mb to 2+ gb. Now I'm having loads of trouble getting into the
management studio -- error is generally "EsentTempPathInUseException:
Temp path already used by another database instance", even though the
app pool is set to not overlap per the posted instructions.

On Sun, Aug 5, 2012 at 4:19 AM, Oren Eini (Ayende Rahien)

Oren Eini (Ayende Rahien)

unread,

Aug 6, 2012, 12:03:51 PM8/6/12

to rav...@googlegroups.com

That sounds really strange. I would really love to be able to reproduce this and see what is causing the issue.

Wyatt Barnett

unread,

Aug 6, 2012, 12:29:34 PM8/6/12

to rav...@googlegroups.com

Yeah, definitely tipping off my spidey sense too. Anyhow, I can share
the code and the underlying data is all public domain anyhow. I could
also get you into the problematic server via screensharing if that
would help. Please advise.

On Mon, Aug 6, 2012 at 12:03 PM, Oren Eini (Ayende Rahien)

Oren Eini (Ayende Rahien)

unread,

Aug 6, 2012, 12:30:55 PM8/6/12

to rav...@googlegroups.com

If you can send us the data + code, that would help, I would like to see if I can repro things on my own machine first.

Wyatt Barnett

unread,

Aug 6, 2012, 12:42:19 PM8/6/12

to rav...@googlegroups.com

Sounds good, stay tuned for a few things from YouSendIt , a bit hefty
for typical email . . .

On Mon, Aug 6, 2012 at 12:30 PM, Oren Eini (Ayende Rahien)

Reply all

Reply to author

Forward