Socket exhaustion issue again

353 views
Skip to first unread message

Vlad Kosarev

unread,
Mar 5, 2015, 9:34:16 AM3/5/15
to rav...@googlegroups.com
So I've been tracing socket exhaustion issues with MS guys and we found what looks like raven client failing connections without disposing them and trying to create new connections.

I think it's in this code on the client -
while (true)
{
try
{
if (writeCalled == false)
webRequest.ContentLength = 0;
return ReadJsonInternal(webRequest.GetResponse);
}
catch (WebException e)
{
if (++retries >= 3 || disabledAuthRetries)
throw;

var httpWebResponse = e.Response as HttpWebResponse;
if (httpWebResponse == null ||
(httpWebResponse.StatusCode != HttpStatusCode.Unauthorized &&
httpWebResponse.StatusCode != HttpStatusCode.Forbidden &&
httpWebResponse.StatusCode != HttpStatusCode.PreconditionFailed))
throw;

if (httpWebResponse.StatusCode == HttpStatusCode.Forbidden)
{
HandleForbiddenResponse(httpWebResponse);
throw;
}

if (HandleUnauthorizedResponse(httpWebResponse) == false)
throw;
}
}

It looks like raven server doesn't reply to connection request, client fails the connection without disposing it (closing the port) and tries 2 more times (looks like it waits a few seconds between each try).

This way raven client uses up all the ports on the machine and runs out of sockets.

So in a case where raven server is busy the client is spamming it with requests without closing sockets.

This isn't much of an issue when you have lots and lots of sockets but if you are on something like Azure Websites where socket is limited (and socket timeout is 4 min!) then you will hit this wall fairly quickly and your server will effectively be down.

By default web request timeout is very large but it looks like raven lowers it to a few seconds. Is there a raven friendly way to change the timeout or to make it dispose of failed requests before retrying?

Thanks.

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 10:44:02 AM3/5/15
to ravendb
What build are you using? 
This looks like 2.5 code.

This code is also only running in the case of authentication, it isn't relevant for any other issue.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vlad Kosarev

unread,
Mar 5, 2015, 10:59:14 AM3/5/15
to rav...@googlegroups.com
Yeah, it's 2.5.2956

It might as well be authentication. Just like last time we talked I was suspecting auth being a problem when we were running out of sockets.
This is one of those calls from the dump -

0:071> kc15

# Call Site

00 ntdll!ZwWaitForSingleObject

01 mswsock!SockWaitForSingleObject

02 mswsock!SockDoConnectReal

03 mswsock!SockDoConnect

04 mswsock!WSPConnect

05 ws2_32!DPROVIDER::WSPConnect

06 ws2_32!WSAConnect

07 System!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte[], Int32, IntPtr, IntPtr, IntPtr, IntPtr)

08 System!System.Net.Sockets.Socket.DoConnect(System.Net.EndPoint, System.Net.SocketAddress)

09 System!System.Net.ServicePoint.ConnectSocketInternal(Boolean, System.Net.Sockets.Socket, System.Net.Sockets.Socket, System.Net.Sockets.Socket ByRef, System.Net.IPAddress ByRef, ConnectSocketState, System.IAsyncResult, System.Exception ByRef)

0a System!System.Net.ServicePoint.GetConnection(System.Net.PooledStream, System.Object, Boolean, System.Net.IPAddress ByRef, System.Net.Sockets.Socket ByRef, System.Net.Sockets.Socket ByRef)

0b System!System.Net.PooledStream.Activate(System.Object, Boolean, System.Net.GeneralAsyncDelegate)

0c System!System.Net.Connection.CompleteStartConnection(Boolean, System.Net.HttpWebRequest)

0d System!System.Net.Connection.CompleteStartRequest(Boolean, System.Net.HttpWebRequest, System.Net.TriState)

0e System!System.Net.Connection.SubmitRequest(System.Net.HttpWebRequest, Boolean)

0f System!System.Net.ServicePoint.SubmitRequest(System.Net.HttpWebRequest, System.String)

10 System!System.Net.HttpWebRequest.SubmitRequest(System.Net.ServicePoint)

11 System!System.Net.HttpWebRequest.GetResponse()

12 System!System.Net.HttpWebRequest.GetResponse()

13 Raven_Client_Lightweight!Raven.Client.Connection.HttpJsonRequest.ReadJsonInternal

14 Raven_Client_Lightweight!Raven.Client.Connection.HttpJsonRequest.ReadResponseJson()

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 11:02:50 AM3/5/15
to ravendb
When you pipe this through Fiddler, what do you see?

Vlad Kosarev

unread,
Mar 5, 2015, 11:03:02 AM3/5/15
to rav...@googlegroups.com
We are using windows auth for ravendb (windows auth is enabled in IIS) and this is full raven initializer -
 public static IDocumentStore BuildNewStore(string connectionStringName)
        {
            var ds = new DocumentStore
                {
                    ConnectionStringName = connectionStringName,
                    Conventions = new DocumentConvention
                        {
                            FailoverBehavior = FailoverBehavior.AllowReadsFromSecondaries
                        }
                };
            ds.RegisterListener(new AuditingDocumentStoreListener());
            ds.RegisterListener(new LoggingDocumentConflictListener());
            ds.GetReplicationInformerForDatabase().FailoverStatusChanged += FailoverStatusChanged;
            // this is needed to not create a new TCP connection for every authenticated http request
            // if UnsafeAuthenticatedConnectionSharing is false then every authenticated http request will need to open a new tcp connection (port)
            ds.JsonRequestFactory.ConfigureRequest += (sender, e) => ((HttpWebRequest)e.Request).UnsafeAuthenticatedConnectionSharing = true;
            return ds;
        }

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 11:04:51 AM3/5/15
to ravendb
Then this code is not relevant at all.
The while loop is going to be exited immediately because we get 403, not 401.


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Vlad Kosarev

unread,
Mar 5, 2015, 11:14:04 AM3/5/15
to rav...@googlegroups.com
Here's an example of behaviour -



The raven server didn’t reply to the connection request. Then 3 seconds later, client tried again, still no response. 6 seconds later, client did the last try and it was finally connected.

Any idea where in code something like that could be happening if it isn't in the code I provided?

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 11:18:28 AM3/5/15
to ravendb
That isn't related to that code at all.
What you are seeing is likely the router dropping connections, and the OS retrying packets until a timeout is reached.
Do you have long idle periods that can cause the router to do so?



Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Vlad Kosarev

unread,
Mar 5, 2015, 11:23:33 AM3/5/15
to rav...@googlegroups.com
This is on azure. I have no access to routers there but it could absolutely be their issue. They are obviously saying that it's database server.

So you think that the hardware is dropping connections for no good reason and that's why the os keeps retrying?

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 11:56:37 AM3/5/15
to ravendb
I believe so.
Note that we _know_ that Azure is doing that.

Vlad Kosarev

unread,
Mar 5, 2015, 12:01:58 PM3/5/15
to rav...@googlegroups.com
I am aware of transient issues and we have that implemented for SQL Azure stuff but I was always under the impression that transient actually means transient and not something that is guaranteed to happen all the time.
Thanks for the link. I will talk to Azure people and see what they say. I suspected load balancer at one point but I could never prove it. I think this might be enough for them to at least investigate it.
I wonder if they have some sort of DDOS protection and it gets triggered on a relatively low load.

Oren Eini (Ayende Rahien)

unread,
Mar 5, 2015, 12:04:32 PM3/5/15
to ravendb
Note that RavenDB does absolutely nothing at the TCP level.
In 2.5, we are using WebRequest and not doing much with them from a TCP standpoint except turn of Nagle.

Vlad Kosarev

unread,
Mar 5, 2015, 12:06:38 PM3/5/15
to rav...@googlegroups.com
Noted. Thanks.

I went through the code and I saw no trickery of any sort so I figured that I missed it.

I will write this up to Azure team and see what they come back with. Thanks again.

Vlad Kosarev

unread,
Mar 6, 2015, 10:38:09 AM3/6/15
to rav...@googlegroups.com
Still trying to figure this out. Have a couple of questions -
The connections that are failing are always in 3, is there any other place where 3 retries are made other than in the code I mentioned before?

You mentioned getting a 403 on windows auth. I've never seen 403s, it's always 401s. What did you mean?

Vlad Kosarev

unread,
Mar 6, 2015, 10:47:55 AM3/6/15
to rav...@googlegroups.com
Something I should've done before - test disabling windows auth in iis/raven.
Everything works great when in full anonymous mode. So this is still very much pointing to windows auth issue.

Oren Eini (Ayende Rahien)

unread,
Mar 9, 2015, 4:05:53 AM3/9/15
to ravendb
Vlad,
The way Windows Auth works is something like this:



So the request is rejected, you do another two requests to do the auth, then retry the first request.


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Fri, Mar 6, 2015 at 5:47 PM, Vlad Kosarev <eqpl...@gmail.com> wrote:
Something I should've done before - test disabling windows auth in iis/raven.
Everything works great when in full anonymous mode. So this is still very much pointing to windows auth issue.

--

Vlad Kosarev

unread,
Mar 9, 2015, 1:08:35 PM3/9/15
to rav...@googlegroups.com
Thanks. On a bit of a tangent -
I tried to set up API keys but had no luck. Is it as easy as the doc says (just add api key in system db/disable windows auth and then change connection string on the client)?

I kept getting 412s. I saw that client requests had headers has-api-key = true (or something along those lines) but database initialization would always fail because of 412s. Client just keeps retrying and getting 412s and database times out initializing.


Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 7:56:36 AM3/11/15
to ravendb
412 is how we tell you that the API Key wasn't authenticated properly.
It is like 401, but we can't use that because the network stack is capturing those before we can handle it.



Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Vlad Kosarev

unread,
Mar 11, 2015, 8:02:04 AM3/11/15
to rav...@googlegroups.com
Yeah, I got that, but why would this be happening?
API key is in the connection string, I just copy pasted connection string directly from raven (changing localhost to actual server url).
Looks like client knows it has api key (due to header being set to true) but server just rejects the requests.

Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 9:04:18 AM3/11/15
to ravendb
Does this API key have permissions to the database in question?
Can you do a fiddler capture?

Vlad Kosarev

unread,
Mar 11, 2015, 10:00:04 AM3/11/15
to rav...@googlegroups.com
yes, api key has permission.
I think I know why it fails, it looks like api key configuration strips (or forgets to add) port from database URL. DB is not running on port 80 but oath request goes to port 80 and then fails -

HTTP/1.1 502 Fiddler - Connection Failed
Date: Wed, 11 Mar 2015 13:54:03 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Cache-Control: no-cache, must-revalidate
Timestamp: 09:54:03.535

[Fiddler] The connection to 'vm1test' failed. <br />Error: TimedOut (0x274c). <br />System.Net.Sockets.SocketException A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.0.1:80                                                                                                                                                                                   

Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 10:22:41 AM3/11/15
to ravendb
How are you running this?

Vlad Kosarev

unread,
Mar 11, 2015, 11:20:44 AM3/11/15
to rav...@googlegroups.com
Server is running on port 80 locally but firewall is exposing it as port 9090 to outside. So client connects to 9090.

Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 11:22:00 AM3/11/15
to ravendb
Okay, then you need to tell RavenDB that.
Set "Raven/Port" to 9090

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Wed, Mar 11, 2015 at 5:20 PM, Vlad Kosarev <eqpl...@gmail.com> wrote:
Server is running on port 80 locally but firewall is exposing it as port 9090 to outside. So client connects to 9090.

--

Vlad Kosarev

unread,
Mar 11, 2015, 12:33:29 PM3/11/15
to rav...@googlegroups.com
Thanks, will try but I just figured out the original issue.

Hitting my head repeatedly on the desk.

This is one of the worst things about not having proper documentation.

Here's what I had in essence -

ds.JsonRequestFactory.ConfigureRequest += (sender, e) => ((HttpWebRequest) e.Request).UnsafeAuthenticatedConnectionSharing = true;
ds.Initialize();

and here is the fix -
ds.Initialize();
ds.JsonRequestFactory.ConfigureRequest += (sender, e) => ((HttpWebRequest) e.Request).UnsafeAuthenticatedConnectionSharing = true;

Once I traced raven code it became obvious that UnsafeAuthenticatedConnectionSharing  was not getting set and then after a bit I figured out why. The way things are built there are no errors of any sort, you can add all the events you want, they just won't be there because Initialize will recreate JsonRequestFactory.

Here's the definition in DocumentStore -
private HttpJsonRequestFactory jsonRequestFactory =
#if !SILVERLIGHT && !NETFX_CORE
 new HttpJsonRequestFactory(DefaultNumberOfCachedRequests);
#else
 new HttpJsonRequestFactory();
#endif

so as soon as store is created this gets initialized and you can do stuff with it.

And then you do Initialize and this happens -
#if !SILVERLIGHT && !NETFX_CORE
jsonRequestFactory = new HttpJsonRequestFactory(MaxNumberOfCachedRequests);
#else
jsonRequestFactory = new HttpJsonRequestFactory();
#endif

Brutal.

Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 2:39:59 PM3/11/15
to ravendb
Huh? That isn't supposed to work.
The jsonRequestFactory is _supposed_ to be null until the Init call

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Vlad Kosarev

unread,
Mar 11, 2015, 2:47:08 PM3/11/15
to rav...@googlegroups.com
As you can see from the snippet that's not the case. It gets initialized by new DocumentStore() and then re-initialized by .Initialize


That initializer probably shouldn't be there.

Oren Eini (Ayende Rahien)

unread,
Mar 11, 2015, 3:01:15 PM3/11/15
to ravendb

Vlad Kosarev

unread,
Mar 11, 2015, 3:03:39 PM3/11/15
to rav...@googlegroups.com
Ah, thanks. This ordeal can finally be over.
Reply all
Reply to author
Forward
0 new messages