StreamResult timeout after about ten minutes

171 views
Skip to first unread message

Cathal McHale

unread,
Mar 20, 2015, 5:51:04 PM3/20/15
to rav...@googlegroups.com
Hi ya,
I'm using a stream to GetAllProducts and notice it times out after around ten minutes.

On the client side (IIS WCF web app) I have the doc store configured with:
[sorry, code not in front of me at the minute]
store.JsonRequestFactory.ConfigureRequest += (send, args) => args.Request.Timeout = (various values tried e.g. 1 hour and System.Web....Infinite)

And on the RavenDb side it's build 2700 and it's running in development mode (start.cmd).

I played about with stripping everything out except the stream to see if there was something else at play, but...
foreach(var product in GetAllProducts(session))
{
  // do nothing
}

The above will work fine and execute in about 3 minutes.

foreach(var product in GetAllProducts(session))
{
  ++i;
  if (i > 1000)
  {
    i = 0;
    Thread.Sleep(1000);
  }
}

The above also worked fine and executed in around 8 minutes. But making it sleep for two secs instead of one gives the timeout that I was seeing originally.
So it's very much a simple timeout issue.And given that the JsonRequestFactory should be serving up a WebRequest with a sufficiently long timeout, I wonder what is causing the timeout?

Note that the client side wcf service in which this code is running continues to execute and logs the error and cleans up cancellation tokens etc. So I don't think it's the client side timing out...?

The error that it logs states that it'a an I/O error "connection forcibly closed by the remote host", stack trace includes the yield return line of the StreamResult code in GetAllProducts, and at a lower level it's a Read on a System.Net.WebRequest object.


Any ideas much appreciated.

Cathal.

Kijana Woodard

unread,
Mar 20, 2015, 5:59:05 PM3/20/15
to rav...@googlegroups.com
Sounds like an IIS / .net default timeout, but I'm not sure which one.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oren Eini (Ayende Rahien)

unread,
Mar 20, 2015, 6:27:50 PM3/20/15
to ravendb
You are timing out because you aren't reading from the stream fast enough.
Basically, the timeout is for the remote call.
If you aren't reading fast enough, the local buffers will be full, the local machine will start dropping packets from the server.
The server will try for a while, but will decide that you are down and close the connection.
You'll read the local buffers, then time out on trying to continue reading from the closed connection

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 

Cathal McHale

unread,
Mar 21, 2015, 3:09:18 AM3/21/15
to rav...@googlegroups.com
Great, thanks. I'll try this on Monday.
So would it seem reasonable to have an async session open too, read products as quickly as possible, manipulate the prices and then put back to the db on the async session. Would this keep the read thread moving fast enough and avoid saturating the local buffers?

Oren Eini (Ayende Rahien)

unread,
Mar 21, 2015, 8:30:03 AM3/21/15
to ravendb
This isn't an issue of a session, it is an issue of a single request.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Sat, Mar 21, 2015 at 9:09 AM, Cathal McHale <cathal...@gmail.com> wrote:
Great, thanks. I'll try this on Monday.
So would it seem reasonable to have an async session open too, read products as quickly as possible, manipulate the prices and then put back to the db on the async session. Would this keep the read thread moving fast enough and avoid saturating the local buffers?

Cathal McHale

unread,
Mar 21, 2015, 9:02:48 AM3/21/15
to rav...@googlegroups.com
Yes but from what you've said previously, if I don't keep the read thread moving the server will kill my connection. Fair enough, but I have to do something with the products after reading them. I was doing a store and a save changes every 1000. This was enough to slow things down, cause the stream read buffer to saturate and the connection to be killed after around ten minutes or so.

So perhaps another async session that I do the store and save changes on would be a simple solution here - it would keep the read thread moving quicker as it wouldn't be waiting for a response from the save changes?

Anyhow, just an idea. I'll give it a go and see if it works out. Thanks for your help.

Oren Eini (Ayende Rahien)

unread,
Mar 21, 2015, 9:19:37 AM3/21/15
to ravendb
What is it that you are actually _doing_ ?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Tal Weiss

unread,
Mar 22, 2015, 5:51:53 AM3/22/15
to rav...@googlegroups.com
if all you are doing is manipulating data in a collection you should check the patch API for that.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Tal Weiss l Core Team Developer Mobile:+972-54-802-4849

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/ 

Cathal McHale

unread,
Mar 24, 2015, 5:14:58 AM3/24/15
to rav...@googlegroups.com
Hi Oren / Tal,
I am indeed just manipulating the collection, but for a large DB. We used to use the patch API, but had trouble with it. We opted for a different approach, which granted took longer and involved pulling all the docs across to the client to enact the change, but we have found it more reliable and indeed the logging, continuous feedback and ability to cancel the operation and kick off a new one has been important to us. Furthermore, even the patch required a LoadDocument to load up the manifest to decide what change to enact - we couldn't figure out how to not have to do this everytime within a patch, and this might have added to the problems we were having with them. (I notice in Oren's eBook for Raven DB 3.0 the mention of a 10k changes limit for patches?)

Let me elaborate a little further:
What I am doing is repricing an entire catalog. I'm testing with 200k docs, but in production it's more like 1.5M. We need to price the document, as opposed to having a readonly property in memory, because we want to do a facet search on products including a price range. One other complication is that the catalog databases are built from a library database which is rebuilt every night from distributor company feeds - often the changes overnight are small, but once every few weeks there'll be an increment that could roll over as much as 40% of the catalog (deleted products, updated products, new products never seen before).

So the repricing algorithm that we have been using (working fine, but slow and hence why I am revisiting), was to load and enact changes in batches. It resulted in code like:

batchOfMigratedItems.AddRange(
session.Advanced.LuceneQuery<CatalogProduct>("Raven/DocumentsByEntityName")
  .WhereEquals("Tag", "CatalogProducts")
  .OrderBy("__document_id")
  .Skip(nextBatchStart + nextPageOffset)
  .Take(PAGESIZE)
  .ToArray()
);

Ordering by document id (rather than etag for example) to avoid updated products making their way to the end of the queue and getting picked up again on a later page (and of course other products being missed entirely for the same reason).

So I was revisiting this code to see if there was anything to be gained by using a frozen stream (Note: I'm not concerned about new products being added during the process - we can control this, so considered out of scope for now).
The stream idea was in case the skip and take (deep paging) were having a negative effect. And then combining this with the async SaveChanges, seemed like a reasonable way of getting through all documents quicker.
However, my test bed of 200k products isn't showing much of an improvement. I will retest with 1.5M and see if it makes a difference there.

For completeness, below is the stream code that I ended up with (it does work - thanks for your feedback), but as I said the data isn't showing much of a performance gain yet.
Based on this new information, can you offer any further advice?

[Note: this code hasn't made it out of test yet and won't be committed unless I can back it up with performance improvement data]

const int batchSize = 500;
const int sessionLimit = 30;
int currentBatchSizeCounter = 0;
int currentSessionLimitCounter = 0;
IAsyncDocumentSession bulkSession = null;
try
{
using (var stream = _catalogStore.OpenSession())
{
bulkSession = _catalogStore.OpenAsyncSession();
foreach (var product in GetAllProducts(stream))
{
++currentBatchSizeCounter;
RepriceProduct(product);
bulkSession.StoreAsync(product, product.Id);
if (currentBatchSizeCounter > batchSize)
{
currentBatchSizeCounter = 0;
++currentSessionLimitCounter;
_logger.Log(LoggingLevel.Info, "Flushing batch of products to DB");
if (currentSessionLimitCounter >= sessionLimit)
{
currentSessionLimitCounter = 0;
bulkSession.SaveChangesAsync();
bulkSession.Dispose();
bulkSession = _catalogStore.OpenAsyncSession();
}
else
{
bulkSession.SaveChangesAsync();
}
}
}
// Flush last batch
_logger.Log(LoggingLevel.Info, "Flushing LAST batch of products to DB");
bulkSession.SaveChangesAsync();
}
}
finally
{
if (bulkSession != null)
{
bulkSession.Dispose();
}
}

Oren Eini (Ayende Rahien)

unread,
Mar 24, 2015, 10:18:18 AM3/24/15
to ravendb
Why are you using a session, then?
You want to do a read - write for large items.
You already use streaming, which is great, but you need to also use bulk insert for this.
So you read the values via streaming, make the modifications, and write them back using bulk insert.
That will give you the speed you want.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Cathal McHale

unread,
Mar 24, 2015, 10:26:13 AM3/24/15
to rav...@googlegroups.com
I actually tried that in a unit testing scenario to confirm whether "bulk insert" would also work for "update", but I got an error telling me that doc with ID xxx already existed.

We do indeed use bulk insert in other parts of our system where we're batch inserting new entities and the optimizations are clearly beneficial there. But in this case they're all updates.

Oren Eini (Ayende Rahien)

unread,
Mar 24, 2015, 10:29:57 AM3/24/15
to ravendb
You have bulk insert options where you tell it to allow overwrite

Cathal McHale

unread,
Mar 24, 2015, 10:32:27 AM3/24/15
to rav...@googlegroups.com
Oh cool! Thnx a lot. I'll give this a go and get back :)

--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/rtPFr6DkNwE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Cathal McHale

unread,
Mar 26, 2015, 6:42:00 AM3/26/15
to rav...@googlegroups.com
Oren,

I'm using a stream and a bulkinsert now. Testing on the smaller dataset passed fine - 200k docs in around 10 minutes. However testing it on the large collection (1.5M docs) fails. Twice now I've got the following exception (on two different servers). The latest time, the exception occurred after about 50 minutes of processing:

System.OperationCanceledException was unhandled by user code
  HResult=-2146233029
  Message=The operation was canceled.
  Source=mscorlib
  StackTrace:
       at System.Threading.CancellationToken.ThrowOperationCanceledException()
       at System.Threading.CancellationToken.ThrowIfCancellationRequested()
       at Raven.Client.Document.RemoteBulkInsertOperation.WriteQueueToServer(Stream stream, BulkInsertOptions options, CancellationToken cancellationToken) in c:\Builds\RavenDB-Stable\Raven.Client.Lightweight\Document\RemoteBulkInsertOperation.cs:line 213
       at Raven.Client.Document.RemoteBulkInsertOperation.<>c__DisplayClass1.<StartBulkInsertAsync>b__0() in c:\Builds\RavenDB-Stable\Raven.Client.Lightweight\Document\RemoteBulkInsertOperation.cs:line 134
       at System.Threading.Tasks.Task.InnerInvoke()
       at System.Threading.Tasks.Task.Execute()
  InnerException: 


This corresponds to the following method in RemoteBulkInsertOperation.cs. A cancel has been requested and the code is instructing to throw an exception when this is the case.
Any idea why the op is being cancelled? Any further info I can gather?

private void WriteQueueToServer(Stream stream, BulkInsertOptions options, CancellationToken cancellationToken)
{
while (true)
{
cancellationToken.ThrowIfCancellationRequested();

var batch = new List<RavenJObject>();
RavenJObject document;
while (queue.TryTake(out document, 200))
{
cancellationToken.ThrowIfCancellationRequested();

if (document == null) // marker
{
FlushBatch(stream, batch);
return;
}

batch.Add(document);

if (batch.Count >= options.BatchSize)
break;
}

FlushBatch(stream, batch);
}
}

Oren Eini (Ayende Rahien)

unread,
Mar 26, 2015, 7:31:09 AM3/26/15
to ravendb
You can increase the batch size for the bulk insert in the options

Cathal McHale

unread,
Mar 26, 2015, 10:38:59 AM3/26/15
to rav...@googlegroups.com
That didn't work I'm afraid. Tried doubling the batch size to 1024, but got exactly the same error.
Then I tried 8000. This time I got a different error (I think... didn't have the debugger attached this time, so could be the same error later in the stack?):

System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task)
   at Raven.Client.Document.RemoteBulkInsertOperation.Dispose() in c:\Builds\RavenDB-Stable\Raven.Client.Lightweight\Document\RemoteBulkInsertOperation.cs:line 319

Which would be this method:

public void Dispose()
{
if (disposed)
return;

var disposeAsync = DisposeAsync().ConfigureAwait(false);
disposeAsync.GetAwaiter().GetResult();
}


Any ideas?

Oren Eini (Ayende Rahien)

unread,
Mar 26, 2015, 10:40:05 AM3/26/15
to ravendb
Not sure, please create a test case that we can look at

Cathal McHale

unread,
Mar 26, 2015, 10:44:17 AM3/26/15
to rav...@googlegroups.com
Ok. One other thing, this is what was spat out on the console immediately after the exception:

Failed to generate log message Index (zero based) must be greater than or equal
to zero and less than the size of the argument list.
Database prosys_FullTrackingCatalog_catalog_DEV, h
ad no incoming requests idle for 00:40:17.9779806, trying to shut it down
Failed to generate log message Index (zero based) must be greater than or equal
to zero and less than the size of the argument list.

Failed to generate log message Index (zero based) must be greater than or equal
to zero and less than the size of the argument list.
Database prosys_FullTrackingCatalog_catalog_DEV, h
ad no incoming requests idle for 00:41:18.3731221, trying to shut it down
Failed to generate log message Index (zero based) must be greater than or equal
to zero and less than the size of the argument list.

Failed to generate log message Index (zero based) must be greater than or equal
to zero and less than the size of the argument list.
Request #  54: POST    - 2,499,226 ms - prosys_Ful
lTrackingCatalog_catalog_DEV - 200 - /bulkInsert?checkForUpdates=true&operationI
d=ea411724-978a-4920-94f2-450e86797b3f
        Bulk inserted received 374,161 documents in 00:41:39.2313915, task #: 6

Request #  55: GET     -     0 ms - prosys_FullTra
ckingCatalog_catalog_DEV - 200 - /changes/config?id=1/1AdF3C&command=unwatch-doc
&value=catalogmanifests/1

Cathal McHale

unread,
Mar 31, 2015, 7:40:14 AM3/31/15
to rav...@googlegroups.com
[ Version store out of memory (cleanup already attempted) ]


I tried a few things to pin-point this issue and here's what I found:

* The original problem, that you guys solved for me, was "Reading from the stream too slowly".
- This was validated by removing all work after the read and simply reading all docs in the collection. This worked fine and took around half an hour to read all docs.

* On your advice I then configured a BulkInsertOperation (rather than a session, which I had) to allow updates and increased the batch size to 2000. 512 is the default. In the test environment the batches being sent appear to be between 800 - 1500, so specifying 2000 seems reasonable.
- This causes the process to fail. Always after around 350k docs have been stored by the BulkInsertOperation .

* I worked on the theory that perhaps I was seeing a different manifestation of the original problem i.e. the BulkInsertOperation was slowing down the read.
- This is not the case. For testing purposes I buffered locally in a ConcurrentQueue to allow the read to progress as fast as possible and had the BulkInsertOperation drain this queue as fast as possible.
- This didn't help and the results were the same - causes the process to fail, always after around 350k docs have been stored by the BulkInsertOperation.

* I then decided to dispose and re-establish the BulkInsertOperation every 100k stores. 
- Interestingly this failed at exactly the same point - after 350k docs - even though that's only 50k into the forth iteration.

* So then I started looking into the RavenDb code base. The error mentioned the bulk insert task was cancelled. So I targeted the code where the CancellationTokenSource was cancelled. This is a single place in the code base.
- It didn't tell me much, but it did point out that there was an exception that I could tap into to try get some more info. So I added the following (I really have to add the rx extensions soon):
- (Note: I have indexing toggled to "Paused" for testing purposes).

IDisposable LogBulkInsertErrors(BulkInsertOperation bulkSession)
{
return _ravenDb
.Changes(_catalogStore.GetDbName())
.ForBulkInsert(bulkSession.OperationId)
.Subscribe(this);
}

void IObserver<BulkInsertChangeNotification>.OnNext(BulkInsertChangeNotification change)
{
if (change.Type == DocumentChangeTypes.BulkInsertError)
{
_logger.Log(LoggingLevel.Error, "Bulk insert error thrown");
_logger.Log(LoggingLevel.Error, change.Message);
}
}


And the error getting logged out is:

ERROR Pic.Integrations.PricingIntegration - Version store out of memory (cleanup already attempted)
2015-03-31 10:17:27,518 [23] ERROR Pic.Integrations.PricingIntegration - Reprice job 3ad8190d-24c5-4274-b5e4-5deff3ec7301. Exception thrown in Catalog Reprice bulk store task. Status = Failed.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task)
   at Raven.Client.Document.RemoteBulkInsertOperation.Dispose() in c:\Builds\RavenDB-Stable\Raven.Client.Lightweight\Document\RemoteBulkInsertOperation.cs:line 319
   at Pic.IntegrationServices.Pricing.CatalogRepriceHandler.<>c__DisplayClass8.<Reprice>b__6() in c:\dev\Interconnect\trunk\src\Pic.IntegrationServices\Pricing\CatalogRepriceHandler.cs:line 154


Any help / ideas greatly appreciated.

Thanks,
Cathal.

Oren Eini (Ayende Rahien)

unread,
Mar 31, 2015, 9:25:21 AM3/31/15
to ravendb
If you increase the value too much, you may be asking the server to save a LOT of data at once.
You can ask it to increase to internal buffers for this using Raven/Esent/MaxVerPages.

What is likely going on is that there is a lot of work going on on the server. Both writing documents and map/reduce indexes that compete for the transaction buffer size. You need to increase it.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Cathal McHale

unread,
Mar 31, 2015, 11:01:58 AM3/31/15
to rav...@googlegroups.com
Thanks Oren.

You're exactly right. I tweaked BulkInsert batch size to 900 and increased the MaxVerPages to 1024 and it got through about a million docs (up from 350k) before barfing.
This is a local dev VM with purposely limited resources.

I appreciate now the balance of getting more work done Vs RavenDb as a server app having to protect itself from unreasonable demands.

Ultimately we definitely need to look at our modeling of "pricing" being part of our big, clunky Product entity.
But thanks for your help and I believe that I could definitely get this to work such that it:
- is faster than it is currently
- processes all 1.5M docs
- doesn't put an unacceptable demand on the server.

Cheers,
Cathal.

Kamran Ayub

unread,
Apr 15, 2015, 12:44:29 PM4/15/15
to rav...@googlegroups.com
Dang, I'm getting the task canceled error too during batching and this helps--I can probably solve it now. I wish the error was more specific :(
Reply all
Reply to author
Forward
0 new messages