Extremely slow put when executed on remote server

963 views
Skip to first unread message

Asger Hallas

unread,
Sep 14, 2012, 4:18:28 AM9/14/12
to rav...@googlegroups.com
Hi

The last couple of days, we've been trying to track down a performance issue in our application - which only occurs in production. 

Now we have nailed it down to, that whenever we connect to a database on a remote machine (same lan with same subnet), a PUT will take about ten times as long to execute as when connecting to a database on localhost.
GETs are not affected. I can see in the server log, that the time is taken on the server.

It makes no difference whether I'm connecting against IIS or the console server. I'm running on build 960, but it makes no difference, if I use the newest unstable.

The total request times that I've measured are (making 10 calls using the regular DocumentStore's database command PUT and GET, no request caching, using anon access):

Getting from localhost
That took 1 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
That took 0 ms.
Which sums up to 13 ms.

Getting from IIS@Remote
That took 1 ms.
That took 1 ms.
That took 1 ms.
That took 1 ms.
That took 1 ms.
That took 1 ms.
That took 1 ms.
That took 0 ms.
That took 1 ms.
That took 1 ms.
Which sums up to 14 ms.

Putting to localhost
That took 96 ms.
That took 5 ms.
That took 5 ms.
That took 3 ms.
That took 4 ms.
That took 4 ms.
That took 3 ms.
That took 7 ms.
That took 7 ms.
That took 3 ms.
Which sums up to 267 ms.

Putting to IIS@Remote
That took 303 ms.
That took 209 ms.
That took 200 ms.
That took 218 ms.
That took 217 ms.
That took 202 ms.
That took 217 ms.
That took 202 ms.
That took 217 ms.
That took 217 ms.
Which sums up to 2334 ms.

The times are very much the same serverside. The document is 588 bytes.
The test times are consistent both on our development setup and in our production environment.

Does this make sense to anyone?

Regards, Asger

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 4:24:58 AM9/14/12
to rav...@googlegroups.com
We recently had a bug that we fixed with regards to buffering output.
I guess it is possible that we have similar issue with input, but I don't think this likely.
Especially since this is inside IIS that you are talking about.
When you are running this over the network using the builtin server, not IIS, what are the results?

Asger Hallas

unread,
Sep 14, 2012, 4:28:18 AM9/14/12
to rav...@googlegroups.com
Those are very much the same as with IIS:

Console@Remote
That took 289 ms.
That took 205 ms.
That took 210 ms.
That took 213 ms.
That took 218 ms.
That took 202 ms.
That took 217 ms.
That took 217 ms.
That took 202 ms.
That took 217 ms.
Which sums up to 2329 ms.

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 4:29:12 AM9/14/12
to rav...@googlegroups.com
Can you adapt this sample program and try it out?

Asger Hallas

unread,
Sep 14, 2012, 4:41:46 AM9/14/12
to rav...@googlegroups.com
I've run it now with 1024 chunks getting these times:

localhost: 0,942s
remote: 1,466s

If it was to be a network problem, wouldn't it be strange it is only on PUT's?

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 4:42:35 AM9/14/12
to rav...@googlegroups.com
Just to make sure, you are reading all of the put info in the program, right?

And it might indicate that you have an inspecting firewall in the middle.

Asger Hallas

unread,
Sep 14, 2012, 4:46:30 AM9/14/12
to rav...@googlegroups.com
I'm sorry, I'm not sure I understand what you mean "reading all the put info in the program"?

Asger Hallas

unread,
Sep 14, 2012, 4:52:38 AM9/14/12
to rav...@googlegroups.com
The complete output from the ravendb server console is:

Request #  15: PUT     -   207 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  16: PUT     -   202 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  17: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  18: PUT     -   210 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  19: PUT     -   206 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  20: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  21: PUT     -   200 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  22: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  23: PUT     -   207 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  24: PUT     -   209 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57

There only the windows firewall, and that is disabled on both machines.

On Friday, 14 September 2012 10:42:57 UTC+2, Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 4:53:41 AM9/14/12
to rav...@googlegroups.com
Are you using the program AS IS?

Asger Hallas

unread,
Sep 14, 2012, 4:55:19 AM9/14/12
to rav...@googlegroups.com
Yes, no changes to the sample program from your homepage. Requesting with curl from my machine to the remote machine. The program runs on the remote machine of course.

Asger Hallas

unread,
Sep 14, 2012, 4:57:57 AM9/14/12
to rav...@googlegroups.com
Output is more precisely: 4 mb in 1.489


On Friday, 14 September 2012 10:54:03 UTC+2, Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 4:58:30 AM9/14/12
to rav...@googlegroups.com
Okay, change it to be like this:

using System;
using System.Diagnostics;
using System.IO;
using System.Net;

class Program
{
static void Main(string[] args)
{
var listener = new HttpListener
{
Prefixes = { "http://+:8080/" }
};
listener.Start();
while (true)
{
var context = listener.GetContext();
var sp = Stopwatch.StartNew();

var memoryStream = new MemoryStream();
context.Request.InputStream.CopyTo(memoryStream);
context.Response.OutputStream.Write(new[] {(byte) 'O', (byte) 'K'}, 0, 2);
context.Response.OutputStream.Flush();
context.Response.Close();
Console.WriteLine("{0:#,#.##;;0} mb in {1:#,#}", Math.Round(((double)memoryStream.Length / 1024) / 1024, 2), sp.ElapsedMilliseconds);

Asger Hallas

unread,
Sep 14, 2012, 5:07:10 AM9/14/12
to rav...@googlegroups.com
Oh, that was what you meant with adapt :)

Output is: 4,36 mb in 48 

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:08:38 AM9/14/12
to rav...@googlegroups.com
I am not sure that I am following.
What is the speed when you send a 1 KB data to the server locally? What is it remote?

Asger Hallas

unread,
Sep 14, 2012, 5:14:53 AM9/14/12
to rav...@googlegroups.com
With 1KB it says:

local: 0 mb in
remote: 0 mb in

So with a larger file, to get some useful numbers, it is:

local: 4,36 mb in 15
remote 4,36 mb in 48

Is that what you meant for me to give you?

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:16:47 AM9/14/12
to rav...@googlegroups.com
Yes, that is pretty much what I expected.
There is no cost to actually doing the put at those levels.
Just to be sure, your actual data size is < 1 KB in your tests against raven, right?

Asger Hallas

unread,
Sep 14, 2012, 5:18:09 AM9/14/12
to rav...@googlegroups.com
Yes, 588 bytes to be exact.

Asger Hallas

unread,
Sep 14, 2012, 5:19:27 AM9/14/12
to rav...@googlegroups.com
To be precise, 588 is what raven studio tells me, so I guess it's even smaller on the wire.

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:19:13 AM9/14/12
to rav...@googlegroups.com
Okay, here is my thinking right now.

* It isn't likely to be raven, because it works fine locally.
* It isn't the network, because we tested that.

Therefor, it must be the dog's fault :-)

Honestly, this is puzzling. When you run RavenDB console on the remote, what is the console output?

Asger Hallas

unread,
Sep 14, 2012, 5:22:13 AM9/14/12
to rav...@googlegroups.com
The output is:

Request #  15: PUT     -   207 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  16: PUT     -   202 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  17: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  18: PUT     -   210 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  19: PUT     -   206 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  20: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  21: PUT     -   200 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  22: PUT     -   208 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  23: PUT     -   207 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  24: PUT     -   209 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:23:13 AM9/14/12
to rav...@googlegroups.com
When you run the same test program, _on the remote server_, so it is running on the remote server, buy locally against the server on the same machine.
What are the results?

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:23:29 AM9/14/12
to rav...@googlegroups.com
Also, is the remote server a cloud server? virtual server?

Asger Hallas

unread,
Sep 14, 2012, 5:26:48 AM9/14/12
to rav...@googlegroups.com
The remote here is my collegue's machine in our dev environment, same lan. In production it is two virtual servers talking to each other. Same issues both places :)

Numbers coming right up.

Asger Hallas

unread,
Sep 14, 2012, 5:35:14 AM9/14/12
to rav...@googlegroups.com
Here is the results for running the exact same test program locally on the remote server:

Request #  23: GET     -     0 ms - <default>  - 200 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  24: GET     -     0 ms - <default>  - 404 - /docs/Raven/Replication/D
estinations
Request #  25: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  26: PUT     -     2 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  27: PUT     -    24 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  28: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  29: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  30: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  31: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  32: PUT     -     2 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  33: PUT     -     1 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57
Request #  34: PUT     -     0 ms - <default>  - 201 - /docs/1edfdc9f-6987-4c7e-
b259-8f8215f3ec57

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:38:41 AM9/14/12
to rav...@googlegroups.com
Grrr
And this is happening on both IIS and console, right?

The problem is that this makes no sense as far as I am concerned. I know it isn't buffering, like before, because then we wouldn't have this problem in IIS, it takes care of that, and the size is too small, anyway.

Asger Hallas

unread,
Sep 14, 2012, 5:47:55 AM9/14/12
to rav...@googlegroups.com
Yes, its the same for both console db-server and iis db-server. We have tested both on our developer machines and on our staging and our production environment.

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 5:49:06 AM9/14/12
to rav...@googlegroups.com
Can you try to run a profiler on the app? See if we can narrow this down?

Asger Hallas

unread,
Sep 14, 2012, 5:56:16 AM9/14/12
to rav...@googlegroups.com
One thing we've noticed on our way around this issue the last couple of days, is that a slightly larger document does not behave the same way.
When we use a larger document, we get low write times in general, but some strange spikes, like this:

Request # 145: PUT     -   219 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 146: PUT     -    17 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 147: PUT     -    22 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 148: PUT     -   217 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 149: PUT     -   212 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 150: PUT     -    23 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 151: PUT     -    15 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 152: PUT     -    22 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 153: PUT     -    20 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23
Request # 154: PUT     -    22 ms - <default>  - 201 - /docs/9ee4e78a-796d-45ea-
b59d-3a8ed6257d23

This is the same test-program running against a 186KB document. This is consistent for each run, there always 2-4 spikes out of ten. Occurring randomly.



On Friday, 14 September 2012 11:47:55 UTC+2, Asger Hallas wrote:
Yes, its the same for both console db-server and iis db-server. We have tested both on our developer machines and on our staging and our production environment.

On Friday, 14 September 2012 11:39:03 UTC+2, Oren Eini wrote:
Grrr
And this is happening on both IIS and console, right?

The problem is that this makes no sense as far as I am concerned. I know it isn't buffering, like before, because then we wouldn't have this problem in IIS, it takes care of that, and the size is too small, anyway.

Asger Hallas

unread,
Sep 14, 2012, 5:56:47 AM9/14/12
to rav...@googlegroups.com
Sure, a profile of the RavenDB process? Like dotTrace?

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 6:00:48 AM9/14/12
to rav...@googlegroups.com
Okay, I think that I know what is going on.
It probably relates to the actual size of the TCP window.
Can you repeat your basic test (with the sample program) from before, but now with 400 bytes, not 1 kb?
It is possible that the network is actually waiting for more data to arrive before handling the data to raven.

Asger Hallas

unread,
Sep 14, 2012, 6:09:48 AM9/14/12
to rav...@googlegroups.com
Yes, outputs are (I've changed it to tell in bytes and ms):

localhost: 400 bytes in 0 ms
remote: 400 bytes in 0 ms

Oren Eini (Ayende Rahien)

unread,
Sep 14, 2012, 6:15:12 AM9/14/12
to rav...@googlegroups.com
Then I think we will need to profile this deeper.

Asger Hallas

unread,
Sep 14, 2012, 6:18:50 AM9/14/12
to rav...@googlegroups.com
Ok, I'll run a profiler on it, I'll get back to you in half an hour. I need to grab some lunch, before they'll take it away :)

Asger Hallas

unread,
Sep 14, 2012, 6:59:35 AM9/14/12
to rav...@googlegroups.com
Ok, I've made two test runs. The local is to the left, the remote is to the right.

I don't know if it's safe to post profiling dumps here, but I've mailed them to you.
profiler-run.PNG

Asger Hallas

unread,
Sep 14, 2012, 8:17:01 AM9/14/12
to rav...@googlegroups.com
More interesting information:

If I open fiddler on the client while running the test program, the problem goes away :-/

Asger Hallas

unread,
Sep 14, 2012, 11:11:10 AM9/14/12
to rav...@googlegroups.com
Hmm... it could look like delayed ACK. 

I have some further observations:

- The only difference between the content of the local and the remote requests, is that the compression ratio is higher on the remote than the local, so the remote requests are actually smaller.
- I created a simple console app to send requests directly to the remote server in the same manner as the raven client. If I send the request in ascii, the problem dissappears.

//Asger

Oren Eini (Ayende Rahien)

unread,
Sep 15, 2012, 1:24:22 AM9/15/12
to rav...@googlegroups.com
It might be that we are running int MTU size issue.
When we compress the data we are so small, it is a single packet, and it may be waiting for a while for the next one before releasing it.

Oren Eini (Ayende Rahien)

unread,
Sep 15, 2012, 1:27:23 AM9/15/12
to rav...@googlegroups.com
And I agree, this is something deep in the network stack.
Can you try tracing this with wire shark?
Can you share the program you use to reproduce this?
Message has been deleted

Asger Hallas

unread,
Sep 15, 2012, 7:14:03 AM9/15/12
to rav...@googlegroups.com
Seems like my message was deleted somehow? I only have my phone here, but I can resend the message later today.

//Asger

Oren Eini (Ayende Rahien)

unread,
Sep 16, 2012, 5:39:31 AM9/16/12
to rav...@googlegroups.com
I didn't get the wireshark sdtuff

On Sat, Sep 15, 2012 at 12:46 PM, Asger Hallas <asger...@gmail.com> wrote:
Yes, I have mailed wireshark dumps to your mail (it would let me attach them). I could not make it capture the looped back traffic, but I've sent what it looks like when fiddler is enabled (where issue is not present) and when sending in ascii (where issue is not present).
The utf8 dump is the one with the issue present.

These dumps are made with this program:

    class Program
    {
        static void Main(string[] args)
        {
            const string json =
@"{
  ""$id"": ""2"",
  ""$type"": ""Energy10.Core.Domain.Services.Auth.Authentication, Energy10.Core.Domain"",
  ""RememberMe"": true,
  ""DueTime"": ""2012-09-19T13:40:05.4026399"",
  ""UserId"": ""00000000-0000-0000-0000-000000000000"",
  ""Username"": ""a...@version3.dk"",
  ""OnBehalfOfUsername"": null,
  ""OnBehalfOfUserId"": ""00000000-0000-0000-0000-000000000000""
}";
            for (int i = 0; i < 10; i++)
            {
                var local = (HttpWebRequest)WebRequest.Create("http://localhost:8080/docs/1edfdc9f-6987-4c7e-b259-8f8215f3ec57");
                local.Credentials = CredentialCache.DefaultNetworkCredentials;

                var remote = (HttpWebRequest)WebRequest.Create("http://10.10.0.36:8080/docs/1edfdc9f-6987-4c7e-b259-8f8215f3ec57");

                Execute(local, json);
                Execute(remote, json);
            }
        }

        static void Execute(HttpWebRequest request, string json)
        {
            request.Method = "PUT";
            var enc = Encoding.UTF8;
            var contentLength = enc.GetByteCount(json) + enc.GetPreamble().Length;
            request.ContentLength = contentLength;

            using (var dataStream = request.GetRequestStream())
            using (var writer = new StreamWriter(dataStream, enc))
            {
                writer.Write(json);
                writer.Flush();
                dataStream.Flush();
            }
            request.GetResponse().Close();
        }
    }

(note that the earlier tests were made using the DatabaseCommands.Put as stated).

//Asger

Asger Hallas

unread,
Sep 16, 2012, 1:37:45 PM9/16/12
to rav...@googlegroups.com
Did it get through now?
Message has been deleted

Asger Hallas

unread,
Sep 19, 2012, 6:03:12 PM9/19/12
to rav...@googlegroups.com
We've done some testing, and the problem is almost definitely the Nagle Algorithm on the client in combination with Delayed Ack on the server.

To be more precise, disabling Nagle will allow for piggybacking the acks back on the http responses and thus put Delayed Ack out of the game.

Disabling Nagle can be done on the client process doing: ServicePointManager.UseNagleAlgorithm = false.

Are there any drawbacks to this approach?

Oren Eini (Ayende Rahien)

unread,
Sep 20, 2012, 6:03:35 AM9/20/12
to rav...@googlegroups.com
We reproduced this locally, but didn't have time to figure out what is actually going on.

On Tue, Sep 18, 2012 at 11:32 AM, Asger Hallas <asger...@gmail.com> wrote:
I hope you got what I sent. Any news on this? Or anything else I can do to help?
I have reproduced this against RavenHQ now too, and tried it out connecting from several different machines on different networks. Same problem, though it actually seems to double the wait when https is involved.

Response time on a DatabaseCommand.Put are on average this:

No fiddler: 530ms
Fiddler with no https decryption: 225ms
Fiddler with https decryption: 105ms

//Asger

Oren Eini (Ayende Rahien)

unread,
Sep 20, 2012, 6:05:55 AM9/20/12
to rav...@googlegroups.com
Well, that is interesting.
The drawbacks is that this may result in larger number of packets, which is what Nagle is trying to prevent.
You can also do this in a less global manner by using the CustomizeRequest on the JsonRequestFactory on the document store.

Oren Eini (Ayende Rahien)

unread,
Sep 20, 2012, 6:06:50 AM9/20/12
to rav...@googlegroups.com
I did some more research into that, and I think that we can safely disable this behavior.

Asger Hallas

unread,
Sep 20, 2012, 7:10:26 AM9/20/12
to rav...@googlegroups.com
That's very good to hear. 

I thouht, it might be valuable to elaborate a little more on our testing. Maybe you know of an even more appropriate solution. I am especially baffled by how the UTF8 BOM is sent in the HTTP requests.

I've outlined the different situations below.


Current (slow) situation:

send HTTP header
receive 100 Continue, ACK is piggybacked
send the UTF8 BOM from request body (this is sent in separate segment, but why?)
receive ACK after 200ms because of delayed ACK
send rest of request body
receive 1. part of response, ACK is piggybacked
receive 2. part of response (5 bytes = 2 x CRLF and 0)


Without Nagle:

send HTTP header
receive 100 Continue, ACK is piggybacked
send the UTF8 BOM from request body
send rest of request body
receive ACK
receive 1. part of response
receive 2. part of response 


Without Content-Length in HTTP-header:

send HTTP header
receive 100 Continue, ACK is piggybacked
send full request body, including BOM
receive 1. part of response, ACK is piggybacked
receive 2. part of response 


With Fiddler capturing:

send HTTP header
send full request body, including BOM
receive ACK
receive 1. part of response, ACK is piggybacked
receive 2. part of response 


Hope it can be of use.

//Asger

Oren Eini (Ayende Rahien)

unread,
Sep 20, 2012, 7:16:58 AM9/20/12
to rav...@googlegroups.com
We send the BOM, because we use UTF8, but I honestly have no idea why this is sent in a different packet.
For that matter, the whole thing is supposed to be compressed, so I don't see why this would happen in the first place.
I'll add some stuff there to check it further, I would be greatful if you can check it for me in the next build, up in an hour or so.

Asger Hallas

unread,
Sep 20, 2012, 2:54:08 PM9/20/12
to rav...@googlegroups.com
Sure, I'll run it first thing in the morning (UTC+1).

Asger Hallas

unread,
Sep 21, 2012, 6:12:03 AM9/21/12
to rav...@googlegroups.com
We've run the new build through all of our tests now, and this works very well.
Are there any logs or the like you would like to see?

Regarding the zipped/bom issue. I tested both with and without compression. Without compression it sends the BOM, with compression it sends the gzip-header and a small amount of data (which I believe is still the BOM) in a separate packet. Still strange, but a least no longer an issue.


On Thursday, 20 September 2012 13:17:20 UTC+2, Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Sep 21, 2012, 6:57:10 AM9/21/12
to rav...@googlegroups.com
Did you test it with the new stuff? 
Right now it is buffered, and the BOM should be sent in the same output.

Asger Hallas

unread,
Sep 21, 2012, 8:28:06 AM9/21/12
to rav...@googlegroups.com
Yes, I tested with build 2086. 

Ok, just checked again, and the BOM packet is no longer there with the new version, when compression is off. 
But when compressed it now sends that other small packet with the GZip signature. I do not know if there's also actual data in it, I just thought it was the BOM in there too (GZip signature of 3 bytes, 12 bytes of what?). But I actually don't have any evidence of that, just got my mind set on it after some hours of testing :)

The trace is like this:

send HTTP header
receive 100 Continue
send the GZip header, 15 bytes
send request body, 209 bytes
send rest of request body, 5 bytes
receive ACK
receive 1. part of response
receive 2. part of response 

There are no headers included in the byte counts.

Oren Eini (Ayende Rahien)

unread,
Sep 21, 2012, 8:51:40 AM9/21/12
to rav...@googlegroups.com
Shouldn't this be small enough that it would be on a _single_ packet?

I just checked how it worked, and it is using a buffer stream to write things, so I don't see how this can happen.

How do you get the trace in this fashion?

Asger Hallas

unread,
Sep 21, 2012, 9:23:06 AM9/21/12
to rav...@googlegroups.com
Yes, it should fit, there's plenty of room left. And when using the buffered stream, it really makes no sense, unless something forces the buffer to flush prematurely.

For testing this, I'm simply issuing a single PUT (with the client) against a server on another machine. I'm tracing using wireshark.

I'll try to reproduce it with a simple console app, to see if that reveals anything.

Oren Eini (Ayende Rahien)

unread,
Sep 21, 2012, 10:04:28 AM9/21/12
to rav...@googlegroups.com
Thanks for that.

Chris Carpenter

unread,
Sep 26, 2012, 1:13:48 PM9/26/12
to rav...@googlegroups.com
I did receive better results when I used this suggestion, but I am still seeing performance gains by running fiddler at the same time of my test application. Definitely feels like more is going on here than just the compression.

Oren Eini (Ayende Rahien)

unread,
Sep 26, 2012, 1:14:38 PM9/26/12
to rav...@googlegroups.com
Are you running the latest build? Can you post some numbers?

Chris Carpenter

unread,
Sep 26, 2012, 3:52:51 PM9/26/12
to rav...@googlegroups.com
I am using version 920.

I have a console app that does 20k writes in parallel. I have 2 AWS instances setup in the same security group. The server is setup with 2 EBS volumes (1 for indexes and 1 for data). Async turned on.

When running the test without fiddler2 running I get ~425 writes per second.
When running the test with fiddler2 running I get ~850 writes per second.
When running the test locally(from the server to itself) I get ~1550 writes per second.

Chris Carpenter

unread,
Sep 26, 2012, 4:01:57 PM9/26/12
to rav...@googlegroups.com
I forgot to include average write times.

When running the test without fiddler ~60ms
When running the test with fiddler ~14ms
When running the test local ~12ms

Also I have tried 2 drivers to see what its impact would be, and it split the 400 writes in half between the 2 servers.

Asger Hallas

unread,
Sep 26, 2012, 5:04:29 PM9/26/12
to rav...@googlegroups.com

What’s the size of the documents?

What was the average times before you switched off Nagle?

Chris Carpenter

unread,
Sep 26, 2012, 5:09:37 PM9/26/12
to rav...@googlegroups.com
Each document consists of 9 fields containing 8 characters each.

Before turning off nagle it was around 200 ms to 250 ms, similar to shown above.

Asger Hallas

unread,
Sep 26, 2012, 6:10:41 PM9/26/12
to rav...@googlegroups.com

Could you try setting:

 

ServicePointManager.Expect100Continue = false;

 

Note that I don’t know if that’ll have any negative impact on how RavenDb does its negotiation and such, but I doubt it (read: I’m not sure that’s a solution to use in production, but it would be interesting to see if it helps).

I’m looking into how the request is chunked right now,  and Fiddler changes this.

 

//Asger

Oren Eini (Ayende Rahien)

unread,
Sep 27, 2012, 2:09:44 AM9/27/12
to rav...@googlegroups.com
Chris,
In your case, manually turn Nagle off, and that should improve performance.
In recent builds, this is already handled for you.

Chris Carpenter

unread,
Sep 27, 2012, 10:07:26 AM9/27/12
to rav...@googlegroups.com
The values I shared yesterday was with Nagle turned off.

Chris Carpenter

unread,
Sep 27, 2012, 10:13:13 AM9/27/12
to rav...@googlegroups.com
Including Expect100Continue to be false changed to values as follows.

Nagle false, Expect100Continue false

Running Test without fiddler2 ~750 writes per second
Running Test with fiddler2 ~910 writes per second

This shows about a 300 improvement without fiddler, and a 60 improvement with fiddler. The avg write times improved as well. 

The point of concern with these results is that there is still an improvement from fiddler looking at the packets, which you would think would slow down the process.

Asger Hallas

unread,
Sep 28, 2012, 4:18:25 AM9/28/12
to rav...@googlegroups.com

Yes, I see the same difference with fiddler on and off. If you inspect the network traffic you will see that fiddler changes how the http1.1 chunks are sent on the tcp stream (in less packets).

I am – right now – trying to find out why this is. I will return, when I know more.

Asger Hallas

unread,
Oct 1, 2012, 9:07:10 AM10/1/12
to rav...@googlegroups.com
Oren, I have tested this in quite a few ways now. And the problem persists.

The problem can be split in three seperate issues:

1. With Nagle enabled on the client PUTs are slow, and this can be fixed easily by disabling it for servicepoint or directly on http request. As you already know.

2. Even with Nagle disabled, Fiddler will make the PUT requests faster. This is due to two thing: 
   a) The Expect100Continue header being sent and requiring additional packets being sent, which is very clear in high latency environment. The solution here could be to disable Expect100 on the request - possibly only for small documents (like request.ServicePoint.Expect100Continue = false)
   b) Chunks are smaller when sent directly from the .NET client, than they are when sent through Fiddler. In particular the GZip header is chun ked as a seperate packet, again causing additional traffic that can be seen with high latency. A solution here could be to not chunk small documents, the effect is of course much less evident on larger PUTs.

3. Now the worst part is that we experience the exact same thing when doing GETs now. This is not as consistent - but I have managed to have a 188KB document consitenly take 200ms to load in the studio. In our production environment we see these as sudden spikes (200-800ms) every know and then (more or less every second application request, consisting of roughly 3 GETs and 2 PUTs). It seems that the requests influences each other and randomly waits for ACKs.

We have traced a lot in Wireshark and sometimes it just sends too small chunks causing the Nagle/Delayed ACK behavior again. Unfortunately I was not able to disable Nagle on the server side the same as for the client. And I was not able to influence the chunk size.

Another thing to notice is that the GET problem also exists for larger documents, and I'm not sure if disabling chunking on the response would help or is viable at all.

In our test console applications we see these spikes too. Sometimes the requests takes up to 700ms for one GET. Normally it takes 15-20ms. 

All these effects are very visible when using e.g. RavenHQ where HTTPS is on.

I really don't know what else we can do about this. We use this in production now and we have several daily complaints about the response time. Is it possible to get you to look into this soon?

Please let us know, if there is anything else I can do, traces or test applications we can send you.


//Asger

Oren Eini (Ayende Rahien)

unread,
Oct 1, 2012, 10:26:22 AM10/1/12
to rav...@googlegroups.com
inline


On Mon, Oct 1, 2012 at 3:07 PM, Asger Hallas <asger...@gmail.com> wrote:
Oren, I have tested this in quite a few ways now. And the problem persists.

The problem can be split in three seperate issues:

1. With Nagle enabled on the client PUTs are slow, and this can be fixed easily by disabling it for servicepoint or directly on http request. As you already know.

2. Even with Nagle disabled, Fiddler will make the PUT requests faster. This is due to two thing: 
   a) The Expect100Continue header being sent and requiring additional packets being sent, which is very clear in high latency environment. The solution here could be to disable Expect100 on the request - possibly only for small documents (like request.ServicePoint.Expect100Continue = false)


Is there a reason to ever use this?

   b) Chunks are smaller when sent directly from the .NET client, than they are when sent through Fiddler. In particular the GZip header is chun ked as a seperate packet, again causing additional traffic that can be seen with high latency. A solution here could be to not chunk small documents, the effect is of course much less evident on larger PUTs.


This kills me, I don't know why this is happening.
As far as I can see, we are doing just one call to the network, and that is it.
That is why we added the buffered stream for.
 
3. Now the worst part is that we experience the exact same thing when doing GETs now. This is not as consistent - but I have managed to have a 188KB document consitenly take 200ms to load in the studio. In our production environment we see these as sudden spikes (200-800ms) every know and then (more or less every second application request, consisting of roughly 3 GETs and 2 PUTs). It seems that the requests influences each other and randomly waits for ACKs.

 
A repro would be wonderful.
 
We have traced a lot in Wireshark and sometimes it just sends too small chunks causing the Nagle/Delayed ACK behavior again. Unfortunately I was not able to disable Nagle on the server side the same as for the client. And I was not able to influence the chunk size.


I don't know how to do that on the server side, a quick check showed no obvious way to do this in http listener or iis.
 
Another thing to notice is that the GET problem also exists for larger documents, and I'm not sure if disabling chunking on the response would help or is viable at all.

In our test console applications we see these spikes too. Sometimes the requests takes up to 700ms for one GET. Normally it takes 15-20ms. 

Being able to reproduce and follow this would be wonderful. 
 
All these effects are very visible when using e.g. RavenHQ where HTTPS is on.

I really don't know what else we can do about this. We use this in production now and we have several daily complaints about the response time. Is it possible to get you to look into this soon?

The only problem is that I am currently traveling and don't have two machines to test this on properly.
 
Please let us know, if there is anything else I can do, traces or test applications we can send you.


Let us try to simplify things to the barest possible way, like we did at the start with the raw http server and raw client.
That will allow us to isolate what is actually going on. 

Asger Hallas

unread,
Oct 2, 2012, 3:26:22 PM10/2/12
to rav...@googlegroups.com

FYI I’m working to get some repros to you for tomorrow.

 

As for Expect100Continue, the only reason to have it turned on – as far as I know – is to prevent users to upload large files only to be given back a “Not Authorized” or another rejection that could have been caught by only sending the headers. I suppose such large uploads could also be used for DOS attacks. My initial reaction is that it does not really apply to the use cases of RavenDB (though of course, I don’t know them all).

 

 

Fra: rav...@googlegroups.com [mailto:rav...@googlegroups.com] På vegne af Oren Eini (Ayende Rahien)
Sendt: 1. oktober 2012 16:26
Til: rav...@googlegroups.com
Emne: Re: [RavenDB] Re: Extremely slow put when executed on remote server

 

inline

 

 

//Asger

Fra: rav...@googlegroups.com [mailto:rav...@googlegroups.com] På vegne af Chris Carpenter

Oren Eini (Ayende Rahien)

unread,
Oct 2, 2012, 3:52:42 PM10/2/12
to rav...@googlegroups.com
Yes, that is the reason, but I think we can skip that, too.
Repros would be good, but likely I'll only be able to work on them next week

Asger Hallas

unread,
Oct 4, 2012, 5:31:33 AM10/4/12
to rav...@googlegroups.com

Ok. We have now made a repro of the GET-issue. It is not as evident, as we see it in production – I guess that the order and size of the requests matters somehow.

 

We have setup two virtual machines (running on two separate physical machines), which you can RDP into to see the problem.

Your welcome to use the machines for debugging the issue, if you need to, due to your travelling.

 

As for Expect100Continue, I too think that’s reasonable to turn off for raven generally.

 

As for the excessive chunking, I have not made a repro yet, as it is most evident on high latency, and I could not simulate that (without paying money). Still I think the solution is to turn off chunking for small docs. The buffered stream makes no difference, it’s as if the ConnectStream class in .NET sends off chunks by some other “clever” pattern. I have delved into it, but have not found anything yet.

 

I’ll send you the credentials in a private mail.

 

//Asger

Oren Eini (Ayende Rahien)

unread,
Oct 4, 2012, 5:34:09 AM10/4/12
to rav...@googlegroups.com
Thank you!

Asger Hallas

unread,
Oct 9, 2012, 5:06:50 AM10/9/12
to rav...@googlegroups.com
Nice talking to you.


Let me know, if you need anything else!

Chris Carpenter

unread,
Oct 17, 2012, 5:27:51 PM10/17/12
to rav...@googlegroups.com
Has any progress been made on this issue?

Oren Eini (Ayende Rahien)

unread,
Oct 18, 2012, 2:12:25 AM10/18/12
to rav...@googlegroups.com
We have been working with the network team in MS to try resolving this.

Kijana Woodard

unread,
Dec 14, 2012, 3:51:12 PM12/14/12
to rav...@googlegroups.com
Any update here? Is there a github issue to follow?

Thanks.

Oren Eini (Ayende Rahien)

unread,
Dec 16, 2012, 12:30:53 PM12/16/12
to rav...@googlegroups.com

Kijana Woodard

unread,
Jan 26, 2013, 7:49:53 PM1/26/13
to rav...@googlegroups.com
Looks like the issue has been marked as slated for release with 2.1.
Find something?

BTW, I just tried to repro using two computers on my home LAN, build 2239, and the test app (https://github.com/asgerhallas/RavenDbTestClient).

I consistently get ~25ms from the test app and ~2ms on the remote machine in the raven console.

Oren Eini (Ayende Rahien)

unread,
Jan 27, 2013, 2:28:04 AM1/27/13
to ravendb
We didn't find something.
And I am pretty sure that we fixed this issue, but the original reporter haven't confirmed it yet.
I want to do a more thorough investigation to confirm it is there.
Note that we explicitly set nagle = false and expect 100 continue = false for the service point we use.


--
 
 

Asger Hallas

unread,
Jan 28, 2013, 2:58:06 AM1/28/13
to rav...@googlegroups.com
Yes the PUT side of things is resolved now. 

But we still have a request pending about the similar problems for when you requests large-ish documents. We still have that problem in production right now with build 960, and last time I checked it was not resolved in the newest unstable build. It has been about two months since I last checked, as far as I recall.

Microsoft support was asked about the issue, but - correct me if I'm wrong - has not yet returned with information about it yet.

You should be able to see the related problem if you run  ShowOccasionalSlowGETsUsingRavenClient() in the test app.

Maybe it's best to have a separate issue on the issue list for the related problem, so to not have any confusion about that the PUT-problem has actually been resolved?

Oren Eini (Ayende Rahien)

unread,
Jan 28, 2013, 3:03:52 AM1/28/13
to rav...@googlegroups.com
I agree
Can you do so?
--
You received this message because y

Asger Hallas

unread,
Mar 20, 2013, 7:40:01 AM3/20/13
to rav...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages