RavenDB stream not returning all results

160 views
Skip to first unread message

Matthias De Ridder

unread,
Jul 29, 2014, 9:08:56 AM7/29/14
to rav...@googlegroups.com
What could cause a RavenDB stream not to return all results?

We have a collection of 65000 documents we need to export to a file. The creation of this file is part of a large operation involving the creation of 19 files in total and a lot of queries to RavenDB and business validation checks. Therefore, we start a new Task to generate the files that can be separated from the main business validation and file generation. The file with the 650000 documents is one of them. The documents are queried using a RavenDB stream. What happens now is we don't get the 65000 documents back: sometimes 20000, 40000, 63000 ... Random counts, but never the complete amount of documents.

When we don't start a new thread to generate the file, the stream returns the 65000 documents. We tried to write a unittest for this, but we couldn't reproduce it in a simple test. As said, this occurs while running a large operation and we don't know what exactly is causing this behavior.

Chris Marisic

unread,
Jul 29, 2014, 9:30:36 AM7/29/14
to rav...@googlegroups.com
Sessions are not threadsafe, you're likely losing access to the stream from mishandling the session.

Just don't use raven in a background thread structure your code more like

Main Thread / Request Thread / Worker Thread {


// new Task background stuff 1
// new Task background stuff 2

RavenDB export

tasks WaitAll()

Matthias De Ridder

unread,
Jul 29, 2014, 9:38:37 AM7/29/14
to rav...@googlegroups.com
That's not the case in our scenario. We're not sharing sessions amongst different threads. The thread that queries the stream and exports the file starts a new session.

Op dinsdag 29 juli 2014 15:30:36 UTC+2 schreef Chris Marisic:

Chris Marisic

unread,
Jul 29, 2014, 10:15:39 AM7/29/14
to rav...@googlegroups.com
Something is wrong in your usage. My advice is just ditch threading, you're not as slick as you think you are. If you seriously need that level of concurrency & throughput move to a queuing architecture or message bus, don't just try to wing it with threads. Threads will clobber you every time.

Matthias De Ridder

unread,
Jul 29, 2014, 10:39:10 AM7/29/14
to rav...@googlegroups.com
This doesn't solve our issue. Stream behavior should be the same when using threads or not.

Op dinsdag 29 juli 2014 16:15:39 UTC+2 schreef Chris Marisic:

Chris Marisic

unread,
Jul 29, 2014, 1:44:09 PM7/29/14
to rav...@googlegroups.com
I'm firmly in the camp, you are causing your own problems. You specifically state:


When we don't start a new thread to generate the file, the stream returns the 65000 documents.

Try to replicate your behavior using a stripped down model: http://ravendb.net/docs/2.0/samples/raven-tests/createraventests

Oren Eini (Ayende Rahien)

unread,
Jul 30, 2014, 4:06:53 AM7/30/14
to ravendb
Can you answer Kijana's questions regarding deployment?
What does your code looks like?



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias De Ridder

unread,
Jul 31, 2014, 2:15:41 AM7/31/14
to rav...@googlegroups.com
Our code looks like this:

var sendOtherData = Task.Factory
                    .StartNew(() => //create the file that should contain 65000 elements using a RavenDB stream)
                    .ContinueWith(_ => //create another file)
                    .ContinueWith(_ => //create another file)
                    .ContinueWith(_ => //create another file);

                try
                {
                    // Main business logic with validation which also creates numerous files.
                }
                finally
                {
                    // Whatever happens, always wait for sendOtherData to complete.
                    sendOtherData.Wait();
                }

Op woensdag 30 juli 2014 10:06:53 UTC+2 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,
Jul 31, 2014, 2:26:15 AM7/31/14
to ravendb
Great, now I need to see the actual code that interacts with RavenDB.

Note that if the main business code does work on the db, you might be running into stale indexes, or just data that was updated after the streaming began.

Matthias De Ridder

unread,
Jul 31, 2014, 2:37:01 AM7/31/14
to rav...@googlegroups.com
The query we use to collect the data from Raven is this:

IEnumerable<T> GetAllOfType<T>()
{
            string startsWith = EntityIdentityPolicy.FindFullTypeTagName(typeof(T)) + "/";
            
            using (var session = CreateSession())
            {
                using (var enumerator = session.Advanced.Stream<T>(startsWith))
                { 
                    while (enumerator.MoveNext())
                    {
                        yield return enumerator.Current.Document;
                    }
                }
            }
}



Op donderdag 31 juli 2014 08:26:15 UTC+2 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,
Jul 31, 2014, 2:42:11 AM7/31/14
to ravendb
Okay, I'm not going to be able to figure out if I have the information in a piecemeal fashion.
Please create a failing test.

Matthias De Ridder

unread,
Jul 31, 2014, 3:03:49 AM7/31/14
to rav...@googlegroups.com
We tried to write a unittest for this, but we couldn't reproduce it in a simple test. As said, this occurs while running a large operation and we don't know what exactly is causing this behavior.

Op donderdag 31 juli 2014 08:42:11 UTC+2 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,
Jul 31, 2014, 3:04:46 AM7/31/14
to ravendb
Then I'll need to see the full code.

Jared Kells

unread,
Jul 31, 2014, 3:08:09 AM7/31/14
to rav...@googlegroups.com
> Something is wrong in your usage. My advice is just ditch threading, you're not as slick as you think you are.

My advice is that you Chris ditch the attitude.

Matthias De Ridder

unread,
Jul 31, 2014, 3:15:52 AM7/31/14
to rav...@googlegroups.com
To answer Kijana's questions, we are running build 2851, both client and server. The server is hosted in IIS. We execute the method using a console application.

Op donderdag 31 juli 2014 09:04:46 UTC+2 schreef Oren Eini:

Chris Marisic

unread,
Jul 31, 2014, 8:51:49 AM7/31/14
to rav...@googlegroups.com
It's not an attitude, it's pragmatism. Writing custom threading inside of ASP.NET is almost always wrong. All it does is lead to problems and rarely results in meaningful performance increases. Generally it's premature optimization.

If you doubt me, look at reality here. It's exactly what i said, it's working fine without it. Whatever they're doing is resulting in faults from their system likely doing something not threadsafe. Writing threadsafe code is absurdly hard and should be avoided unless you literally have zero other options.

Even take RavenDB. Some of the core parts are single threaded! I can't recall if Oren managed to multi-thread some of the writers and other low level pieces or not. There's just so much to threading that it is never a trivial undertaking.

Inside a system there are many solutions better to threading. The place for threading are servers themselves such as: IIS, RavenDB, Sql Server, etc. Those need threading. Your web application / enterprise application does not need threading.

Kijana Woodard

unread,
Jul 31, 2014, 10:01:27 AM7/31/14
to rav...@googlegroups.com
"Your web application / enterprise application does not need threading."
More specifically, your web application is already multi-threaded. Each request runs in it's own thread. One has to ask if you now want to use 2 threads to process a request.

"Writing threadsafe code is absurdly hard and should be avoided"
Yup.

"I can't recall if Oren managed to multi-thread some of the writers"
IIRC, the voron engine for 3.0 uses a single writer. People sometimes mistake multi-threaded for "faster". It's parallel [maybe], which may or may not be "faster".

IIS, generally, doesn't "like" long running requests. I'm wondering if there is a better approach to the entire problem, but that may be beyond the scope of this thread.

If you can't repro this against an in memory server, have you tried running the test against a server hosted in IIS? Instead of using the method from test helpers, point at a server url [and explain the setup so others can repro].


--

Jared Kells

unread,
Jul 31, 2014, 7:27:21 PM7/31/14
to rav...@googlegroups.com

I'm not disagreeing with you but your attitude is inappropriate. "You're not as slick as you think you are"

--

Oren Eini (Ayende Rahien)

unread,
Aug 1, 2014, 1:47:36 AM8/1/14
to ravendb
To be rather more exact, the core RavenDB engine used to do locks of writes (for Esent).
That was annoying to me personally, so we removed that limitation. Even though no user ever complained about that.
This decision led to a lot of issues down the line, because we rely on etag ordering, and that was no longer serial. We spent some time fixing this, but eventually we backed out of this.

Voron is a single writer conceptually, but that isn't how you use it. Voron's use of WriteBatch means that you can do concurrent multiple writes, and submit them all to be transaction merged. In effect, you have concurrency for writes.



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





Matthias De Ridder

unread,
Aug 1, 2014, 9:42:04 AM8/1/14
to rav...@googlegroups.com
After a lot of debugging and monitoring the responses using Fiddler, we found the reason for our problem. For some reason, the httpRunTime executionTime property in our Raven config was deleted, so the default timeout was 110 seconds. The server generated a timeout exception, which was included in the json that was send back as a response. On the client, there is no notification at all an error occurred. I’ve included a response where you can see the html exception. We think this is a bug in Raven because it should throw an exception instead of silently interrupting the stream.

Op vrijdag 1 augustus 2014 07:47:36 UTC+2 schreef Oren Eini:
Response with exception.txt

Chris Marisic

unread,
Aug 1, 2014, 10:20:19 AM8/1/14
to rav...@googlegroups.com
You stated it worked fine without the threading. What is the disconnect between these statements?  Why would it timeout only in 1 case, not both?

Are you sure you're handling exceptions properly inside your threads that it actually was raising an exception and you were losing it?

Oren Eini (Ayende Rahien)

unread,
Aug 2, 2014, 6:00:10 AM8/2/14
to ravendb
We'll make sure to have a good error there: http://issues.hibernatingrhinos.com/issue/RavenDB-2573

Matthias De Ridder

unread,
Aug 4, 2014, 2:40:38 AM8/4/14
to rav...@googlegroups.com
We think it's the load on the server. As said, the main operation is quite heavy with database access and validation. If we don't use threads, the files are exported before the main operation starts.

Op vrijdag 1 augustus 2014 16:20:19 UTC+2 schreef Chris Marisic:
Reply all
Reply to author
Forward
0 new messages