Question about IDocumentSession.Advanced.Stream performances

128 views
Skip to first unread message

Valeriob

unread,
Sep 22, 2016, 3:33:30 AM9/22/16
to RavenDB - 2nd generation document database

Hi,
we have some use cases where we use IDocumentSession.Advanced.Stream api. It always felt a bit slow but i could never be sure before measuring it, so i prepared a small test to see the difference.
We are streaming 20k documents with that api and compared it with streaming the data from a sql table with this schema
CREATE TABLE [dbo].[EntityDb] (
    [Id]       VARCHAR (128) NOT NULL,
    [Document] VARCHAR (MAX) NULL
);

There is a 7x difference in favor of the sql implementation and i'm wondering if i'm using something wrong or there is room for improvement, i tried to analyze that part of ravendb client source code but i could not easly tell what's going on, I'll try harder when i find the time ;D


The attached file contains the code and a dotTrace session. https://1drv.ms/u/s!AqvjZP1RsulxncQX-OnkMX_kY3KIBQ


Thanks
Valerio

Oren Eini (Ayende Rahien)

unread,
Sep 22, 2016, 4:40:58 AM9/22/16
to ravendb
The likely reason is that you are doing things very differently.
You are streaming from an index, which has a cost of O(N* logN), vs. reading from a table, which has a cost of O(N).
Other stuff relates to the behavior of deserialziation, getting the right type, running listeners, etc.


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Valeriob

unread,
Sep 22, 2016, 8:13:58 AM9/22/16
to RavenDB - 2nd generation document database
I thought the same, 
but what made me wonder is that it's slow only via c# api, if i export from raven studio i get results in line with the sql example.
So maybe there is some inefficiency that can be pruned.

Thanks,
Valerio
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Sep 22, 2016, 8:14:50 AM9/22/16
to ravendb
Are you running in debug?
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Valeriob

unread,
Sep 22, 2016, 8:24:05 AM9/22/16
to RavenDB - 2nd generation document database



I found no differences in running in release.

Oren Eini (Ayende Rahien)

unread,
Sep 22, 2016, 8:50:26 AM9/22/16
to ravendb
With or without debugger attached?
--

Valeriob

unread,
Sep 22, 2016, 8:59:15 AM9/22/16
to RavenDB - 2nd generation document database
Without ofc :D

Grisha Kotler

unread,
Sep 22, 2016, 10:46:15 AM9/22/16
to rav...@googlegroups.com
Can you send us the database export?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


Valeriob

unread,
Sep 22, 2016, 10:52:02 AM9/22/16
to RavenDB - 2nd generation document database
Hi,
if you run the project, it does everything (just configure ravendb and sql server connection strings).

Valerio


On Thursday, September 22, 2016 at 4:46:15 PM UTC+2, Grisha Kotler wrote:
Can you send us the database export?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


On 22 September 2016 at 15:59, Valeriob <vale...@gmail.com> wrote:
Without ofc :D

On Thursday, September 22, 2016 at 2:50:26 PM UTC+2, Oren Eini wrote:
With or without debugger attached?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Grisha Kotler

unread,
Sep 26, 2016, 2:39:06 PM9/26/16
to rav...@googlegroups.com
Hi Valerio,

This behavior is mostly due to deserialziation.
Using LoadStartingWith provides better performance than streaming from an index.
However you'll have to do it in batches.

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Valeriob

unread,
Sep 27, 2016, 12:49:00 PM9/27/16
to RavenDB - 2nd generation document database
Hi Grisha,
I do not have enough knowledge of ravendb to experiment with some code, but two things stand up from the performance trace,  the very high cost on deserializing string to RavenJObject and the cost transforming from RavenJObject to <T>, both very high compared to straight newtonsoft deserialization, but i could not figure out yet how much time is spent waiting for I/O. (dot trace says aggregated a 37% of the time spent in System.IO)

Following the async trail from IDocumentSession.Stream i ended up to TextReader.ReadAsyncInternal that SURPRISE calls Task.StartNew :D I was kind of expecting it since there is no way to read from streams async (see David Fowler https://github.com/davidfowl/Channels ),i wonder if it worth something implementing the JsonTextReaderAsync.

        internal virtual Task<int> ReadAsyncInternal(char[] buffer, int index, int count)
        {
            var tuple = new Tuple<TextReader, char[], int, int>(this, buffer, index, count);
            return Task<int>.Factory.StartNew(state =>
            {
                var t = (Tuple<TextReader, char[], int, int>)state;
                return t.Item1.Read(t.Item2, t.Item3, t.Item4);
            },
            tuple, CancellationToken.None, TaskCreationOptions.DenyChildAttach, TaskScheduler.Default);
        }

Just curious, is the protocol used by ravendb.client going to change in ravendb 4.0 ? do you think stream api performances will improove in ravendb 4.0 ? It's quite common scenario in lob applications, and 7x perfromance impact is quite severe vs a relational implementation :D

Thanks,
Valerio


On Monday, September 26, 2016 at 8:39:06 PM UTC+2, Grisha Kotler wrote:
Hi Valerio,

This behavior is mostly due to deserialization.

Oren Eini (Ayende Rahien)

unread,
Sep 27, 2016, 1:13:34 PM9/27/16
to ravendb
Yes, streaming changed in 4.0, we are much faster.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Valeriob

unread,
Sep 28, 2016, 4:50:18 AM9/28/16
to RavenDB - 2nd generation document database
Awesome ! :D

Valeriob

unread,
Sep 29, 2016, 3:19:53 AM9/29/16
to RavenDB - 2nd generation document database
Hi,
just because i'm curious, i tried to replace  class YieldStreamResults : IAsyncEnumerator<RavenJObject> with class YieldStreamResults2 : IEnumerator<RavenJObject>  (quick and dirty, just to see what happened) and performances just doubled :D I may have broken the world tho ^ ^
I think async and Stream really do not go along.





























Valerio

Oren Eini (Ayende Rahien)

unread,
Sep 29, 2016, 2:18:10 PM9/29/16
to ravendb
Can you share the change you made?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Valeriob

unread,
Sep 30, 2016, 5:34:58 AM9/30/16
to RavenDB - 2nd generation document database
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Sep 30, 2016, 11:27:45 AM9/30/16
to ravendb
Blah,
Looks like a case of this:

I'm looking into reverting the IAsyncEnumerator entirely back to IEnumerator, because of this perf difference.

However, it does mean a breaking change in the API.

That said, it is twice as fast, so it is a great incentive. Thoughts?
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Valeriob

unread,
Sep 30, 2016, 12:50:24 PM9/30/16
to RavenDB - 2nd generation document database

Wow nice !
Thanks for confirming it, i was afraid to miss something :D Async is really a beast to tame !
Well who uses the sync version wont break, for IAsyncAdvancedSessionOperation we could : 
1) break the clients
2) provide a new method with different return value with optimized code path, with an obsolete attribute over the old methods.
3) try to keep the async api with the sync implementation and measure the impact in performances, if it's not huge we could live with that.

Valerio

Jahmai Lay

unread,
Oct 2, 2016, 10:01:44 PM10/2/16
to RavenDB - 2nd generation document database
As long as async code never causes a block on IO it doesn't matter.
Will it cause a block on IO? :)

Oren Eini (Ayende Rahien)

unread,
Oct 3, 2016, 8:35:17 AM10/3/16
to ravendb
Not sure that I'm following?

The problem is that the async machinery is heavy if you don't actually need to block.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Oct 3, 2016, 8:36:37 AM10/3/16
to ravendb
Yes, I might change it up just for the sync part.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages