--
You received this message because you are subscribed to the Google Groups "Npgsql Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to npgsql-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> wrote:
>
>> This sounds like the thread pool is full and the completed callback can't
>> be called. This is a hazard when writing async code. It's possible the
>> new async work will help with this. Particularly if it removes a wait
>> call.
>> But when the thread pool is full of work, there's not much you can do to
>> avoid this type of problem short of holding onto any thread you've got.
>>
>> Josh
>>
>> On Jul 10, 2014, at 9:31 AM, "Miłosz Kubański" <>
>> email to npgsql-dev+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Npgsql Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to npgsql-dev+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Npgsql Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to npgsql-dev+unsubscribe@googlegroups.com.
string host = "google.pl"; int millisecondsTimeout = 1000; int asyncFails = 0; int syncFails = 0; long asyncTime = 0; long syncTime = 0; bool breakAfterError = false;
//begin test Console.WriteLine("Async start"); Monitor.TryEnter(this); int loopCount = 500;
try { bool hasTimeoutOccured = false;
Parallel.For(0, loopCount, (x) => { Stopwatch sw = new Stopwatch();
for (int i = 0; i < loopCount && (hasTimeoutOccured == false || breakAfterError == false); i++) { sw.Restart();
var result = Dns.BeginGetHostAddresses(host, null, null);
if (!result.AsyncWaitHandle.WaitOne(millisecondsTimeout, true)) { ++asyncFails; hasTimeoutOccured = true; // Timeout was used up attempting the Dns lookup }
Dns.EndGetHostAddresses(result);
sw.Stop();
asyncTime += sw.ElapsedMilliseconds;
if (i % 400 == 0) { Console.Write('.'); } }
}) ;
if (hasTimeoutOccured) { throw new TimeoutException(":/"); } } catch (Exception ex) { Console.WriteLine("Async :/"); }
Console.WriteLine("Sync start");
try { bool hasTimeoutOccured = false;
Parallel.For(0, loopCount, (x) => { Stopwatch sw = new Stopwatch();
for (int i = 0; i < loopCount && (hasTimeoutOccured == false || breakAfterError == false); i++) { sw.Restart();
Dns.GetHostAddresses(host);
sw.Stop();
syncTime += sw.ElapsedMilliseconds;
if (sw.ElapsedMilliseconds > millisecondsTimeout) { ++syncFails; hasTimeoutOccured = true; // Timeout was used up attempting the Dns lookup }
if (i % 400 == 0) { Console.Write('.'); } } });
if (hasTimeoutOccured) { throw new TimeoutException("Sync"); } } catch (Exception ex) { Console.WriteLine(":/"); } Monitor.Exit(this);
Console.WriteLine("\r\n\r\nAsync Fails: {0},\r\nSync fails: {1}", asyncFails, syncFails);
if (breakAfterError == false) { Console.WriteLine("\r\n\r\nAsync Time: \t{0},\r\nSync Time: \t{1}", asyncTime, syncTime); }
To unsubscribe from this group and stop receiving emails from it, send an email to npgsql-dev+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to npgsql-dev+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Npgsql Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to npgsql-dev+unsubscribe@googlegroups.com.
It's definitely an interesting discussion,
> Shay, my response to the comment that connection startup has other overhead not included in the timeout enforcement would be, that ideally we would include it to improve accuracy.How? IMO we're getting to the crux of the matter. If I follow your argument and we want to provide a 100% accurate timeout, the *only* thing we can do is something along the following:
public void Open() {Task.Run(() => { // ... all the actual connection logic }).Wait(the_timeout);}
In other words, run everything in a different thread and wait on that thread with a timeout. Anything else will leave operations out of the timeout and is inaccurate.At this point I find myself wrapping all my API methods in this fashion, because Open() isn't really special: other methods do multiple back-and-forths with the server.
So we end up with a whole API full of thread-pool wrappers. Aside from the performance impact, the strain on the thread pool and the general yuckiness, any user who needs a timeout of this sort could simply do the task wrapping *outside* npgsql, like this:Task.Run(() => { conn.Open() }).Wait(the_timeout);In other words, we're providing almost zero added value by internally wrapping all of npgsql's code in Task.Run(): any user who needs it can do it themselves just as well. It's just a bit of syntactic sugar.It all comes down to what you're saying you're providing: database timeouts, which can be cleanly and accurately implemented, or timeouts on npgsql methods, which make no sense to me...
> But, as you say, if the server you're connecting to has 15 IP's and that's the cause of timeouts, then that is up to the user to deal withHow would you suggest a user deal with it? Do they need to do a DNS query outside npgsql just in order to know how many IPs are resolved, and then multiply their timeout by that number? Or should they change their DNS setup because of npgsql's timeout mechanism? I think there is simply no way out here, this seems like a classical situation where a minimalist approach of "not handling it" works better than forcing a compromise on the user...
> ALL network dependent API's should off a timeout IMO, async or notThat's a strange statement, in all my years I've yet to see a library that attempts to do this, when *multiple* underlying I/O operations are involved... can you provide an example? You're right that all network-dependent APIs should allow you to set the *socket* timeout, but that timeout is valid only for a *single* socket I/O operation. The moment a library provides a method that does more than one operation, the socket timeout no longer corresponds with the API operation. But I might be misunderstanding what you're suggesting...
> But here's a question. If the Dns.BeginGetHostAddresses() call times out because of a thread pool issue, why wouldn't the Socket.BeginConnect() calls timeout for the exact same reason? Moreover, if the thread pool is as starved as it sounds like in this case, how can anything else work reliably?Socket.BeginConnect() absolutely would show the same phenomenon, if what Josh said is true (and I think it is): it's a matter of trying to do too many operations in parallel and (temporarily) putting enough strain on the thread pool (when it's still small) and we miss the timeout. It's a rather extreme scenario, and if you really do have to spin up 500 connections in parallel it's probably fair to ask you to wait a little longer by setting a higher timeout...
Holy wall of text Batman!
On Thu, Sep 18, 2014 at 2:23 PM, Shay Rojansky <ro...@roji.org> wrote:It's definitely an interesting discussion,
> Shay, my response to the comment that connection startup has other overhead not included in the timeout enforcement would be, that ideally we would include it to improve accuracy.How? IMO we're getting to the crux of the matter. If I follow your argument and we want to provide a 100% accurate timeout, the *only* thing we can do is something along the following:I did not mean to imply that 100% accuracy is a reasonable expectation for timeout behavior. Sorry if that's how it came across. But I don't think it's unreasonable to expect a timeout to occur within maybe 10% of the timeout value supplied (10% longer, never shorter).
public void Open() {Task.Run(() => { // ... all the actual connection logic }).Wait(the_timeout);}
In other words, run everything in a different thread and wait on that thread with a timeout. Anything else will leave operations out of the timeout and is inaccurate.At this point I find myself wrapping all my API methods in this fashion, because Open() isn't really special: other methods do multiple back-and-forths with the server.Isn't that because timeout aren't accurate enough? If they all provided +10% accuracy. would you still do this type of wrapper? In a perfect world everything would be 100% accurate. I think it's generally a good idea to keep perfection in mind at all times, as kind of the unobtainable goal that we should always try to move toward, but 100% accuracy in a timeout is very much unnecessary.
The trouble with doing timeouts this way is that the task can continue to run even after you quit waiting. So, if our 15 IP example leads to a fifteen minute run time, and you stop waiting after one minute, the task is going to continue to run for 14 more minutes. That's not good at all!
This is exactly why I implemented statement timeouts using PG's statement_timeout setting. If a query gets stuck in the backend and there's no statement_timeout, we quit waiting, but the connection remains useless until the query finishes. I had tons of trouble with that and implemented the same timeout mechanism something like 12 years ago in our in-house middleware software (which was old c++ code using ODBC).
The above is an extreme example, I grant you.So we end up with a whole API full of thread-pool wrappers. Aside from the performance impact, the strain on the thread pool and the general yuckiness, any user who needs a timeout of this sort could simply do the task wrapping *outside* npgsql, like this:Task.Run(() => { conn.Open() }).Wait(the_timeout);In other words, we're providing almost zero added value by internally wrapping all of npgsql's code in Task.Run(): any user who needs it can do it themselves just as well. It's just a bit of syntactic sugar.It all comes down to what you're saying you're providing: database timeouts, which can be cleanly and accurately implemented, or timeouts on npgsql methods, which make no sense to me...
> But, as you say, if the server you're connecting to has 15 IP's and that's the cause of timeouts, then that is up to the user to deal withHow would you suggest a user deal with it? Do they need to do a DNS query outside npgsql just in order to know how many IPs are resolved, and then multiply their timeout by that number? Or should they change their DNS setup because of npgsql's timeout mechanism? I think there is simply no way out here, this seems like a classical situation where a minimalist approach of "not handling it" works better than forcing a compromise on the user...The guy writing the software to connect to his company's RDBMS is the guy who needs to know about the 15 IP's, so of course he's the one to deal with it. He should be able to expect some level of accuracy from Npgsql's timeout mechanism, and set his timeout values according to the conditions at hand.
> ALL network dependent API's should off a timeout IMO, async or notThat's a strange statement, in all my years I've yet to see a library that attempts to do this, when *multiple* underlying I/O operations are involved... can you provide an example? You're right that all network-dependent APIs should allow you to set the *socket* timeout, but that timeout is valid only for a *single* socket I/O operation. The moment a library provides a method that does more than one operation, the socket timeout no longer corresponds with the API operation. But I might be misunderstanding what you're suggesting...It's just my opinion. No examples needed :)
> But here's a question. If the Dns.BeginGetHostAddresses() call times out because of a thread pool issue, why wouldn't the Socket.BeginConnect() calls timeout for the exact same reason? Moreover, if the thread pool is as starved as it sounds like in this case, how can anything else work reliably?Socket.BeginConnect() absolutely would show the same phenomenon, if what Josh said is true (and I think it is): it's a matter of trying to do too many operations in parallel and (temporarily) putting enough strain on the thread pool (when it's still small) and we miss the timeout. It's a rather extreme scenario, and if you really do have to spin up 500 connections in parallel it's probably fair to ask you to wait a little longer by setting a higher timeout...THIS is the crux of the issue. All this talk about how timeouts should work and their accuracy is a side show.It appears that we're discussing the removal of perfectly good code in a strange (and ultimately likely to be ineffectual) attempt to band-aid somebody's system performance issues.
-Glen
On Saturday, September 20, 2014 1:44:14 AM UTC+3, glenebob wrote:Holy wall of text Batman!Didn't mean to be this wordy, I just got a little excited :)
One additional small argument here, is that Postgresql's default behavior is no timeout whatsoever. I personally believe npgsql should be as "pass-through" as possible to Postgresql, which is why I think it would be better if the backend timeout were not on by default (but it's definitely important for it to be supported).
The trouble with doing timeouts this way is that the task can continue to run even after you quit waiting. So, if our 15 IP example leads to a fifteen minute run time, and you stop waiting after one minute, the task is going to continue to run for 14 more minutes. That's not good at all!It's not amazing, but it's not a catastrophe either - we're talking about a single thread sitting around waiting on I/O for 14 minutes in an extreme situation...
> But, as you say, if the server you're connecting to has 15 IP's and that's the cause of timeouts, then that is up to the user to deal withHow would you suggest a user deal with it? Do they need to do a DNS query outside npgsql just in order to know how many IPs are resolved, and then multiply their timeout by that number? Or should they change their DNS setup because of npgsql's timeout mechanism? I think there is simply no way out here, this seems like a classical situation where a minimalist approach of "not handling it" works better than forcing a compromise on the user...The guy writing the software to connect to his company's RDBMS is the guy who needs to know about the 15 IP's, so of course he's the one to deal with it. He should be able to expect some level of accuracy from Npgsql's timeout mechanism, and set his timeout values according to the conditions at hand.The assumption about the guy writing the software knowing about the DNS, well... I really don't feel it's our job to speculate on this kind of thing. In large organizations the app programmer, the DB admin and the system/network admin are very different people; environments are complex and so forth. I'm not used to libraries varying their behavior based on such an arbitrary and unrelated factor.
THIS is the crux of the issue. All this talk about how timeouts should work and their accuracy is a side show.It appears that we're discussing the removal of perfectly good code in a strange (and ultimately likely to be ineffectual) attempt to band-aid somebody's system performance issues.Well, I'll try to sum it up to make sure I'm understood... My motivation for removing the timeout isn't at all to band-aid the user's performance issue. IMO the timeout doesn't improve accuracy, but rather hurts npgsql behavior and what users can expect from it. My problem is that the timeout we're providing right now isn't accurate at all. If the DNS record has 3 IPs it's divided by 3. If DNS, startup and authentication are slow it's multiplied by 3. So we're letting the user configure a timeout that, in practice, means almost nothing. What can users rely on here, how would this timeout even be documented?
public void Open() {Task.Run(() => { // ... all the actual connection logic }).Wait(the_timeout);}
In other words, run everything in a different thread and wait on that thread with a timeout. Anything else will leave operations out of the timeout and is inaccurate.At this point I find myself wrapping all my API methods in this fashion, because Open() isn't really special: other methods do multiple back-and-forths with the server.Isn't that because timeout aren't accurate enough? If they all provided +10% accuracy. would you still do this type of wrapper? In a perfect world everything would be 100% accurate. I think it's generally a good idea to keep perfection in mind at all times, as kind of the unobtainable goal that we should always try to move toward, but 100% accuracy in a timeout is very much unnecessary.
Well, all the timeouts I know are pretty much 100% accurate: socket timeout, Postgresql backend timeout. The point I'm trying to make here (and below) is that most Npgsql operations are complex, in the sense that they contain multiple operations. Each *single* operation can (usually) provide 100% accuracy (e.g. via a socket timeout). But what does a timeout mean when two or more operations are involved? What can anybody do except wrap the whole think in a Task?One additional small argument here, is that Postgresql's default behavior is no timeout whatsoever. I personally believe npgsql should be as "pass-through" as possible to Postgresql, which is why I think it would be better if the backend timeout were not on by default (but it's definitely important for it to be supported).
The trouble with doing timeouts this way is that the task can continue to run even after you quit waiting. So, if our 15 IP example leads to a fifteen minute run time, and you stop waiting after one minute, the task is going to continue to run for 14 more minutes. That's not good at all!It's not amazing, but it's not a catastrophe either - we're talking about a single thread sitting around waiting on I/O for 14 minutes in an extreme situation...This is exactly why I implemented statement timeouts using PG's statement_timeout setting. If a query gets stuck in the backend and there's no statement_timeout, we quit waiting, but the connection remains useless until the query finishes. I had tons of trouble with that and implemented the same timeout mechanism something like 12 years ago in our in-house middleware software (which was old c++ code using ODBC).Here I'm in total agreement with you - it's perfectly useful and a great idea to provide backend timeouts - these are clear and effective, and also 100% accurate with respect to what they provide.
The above is an extreme example, I grant you.So we end up with a whole API full of thread-pool wrappers. Aside from the performance impact, the strain on the thread pool and the general yuckiness, any user who needs a timeout of this sort could simply do the task wrapping *outside* npgsql, like this:Task.Run(() => { conn.Open() }).Wait(the_timeout);In other words, we're providing almost zero added value by internally wrapping all of npgsql's code in Task.Run(): any user who needs it can do it themselves just as well. It's just a bit of syntactic sugar.It all comes down to what you're saying you're providing: database timeouts, which can be cleanly and accurately implemented, or timeouts on npgsql methods, which make no sense to me...
> But, as you say, if the server you're connecting to has 15 IP's and that's the cause of timeouts, then that is up to the user to deal withHow would you suggest a user deal with it? Do they need to do a DNS query outside npgsql just in order to know how many IPs are resolved, and then multiply their timeout by that number? Or should they change their DNS setup because of npgsql's timeout mechanism? I think there is simply no way out here, this seems like a classical situation where a minimalist approach of "not handling it" works better than forcing a compromise on the user...The guy writing the software to connect to his company's RDBMS is the guy who needs to know about the 15 IP's, so of course he's the one to deal with it. He should be able to expect some level of accuracy from Npgsql's timeout mechanism, and set his timeout values according to the conditions at hand.The assumption about the guy writing the software knowing about the DNS, well... I really don't feel it's our job to speculate on this kind of thing. In large organizations the app programmer, the DB admin and the system/network admin are very different people; environments are complex and so forth. I'm not used to libraries varying their behavior based on such an arbitrary and unrelated factor.
> ALL network dependent API's should off a timeout IMO, async or notThat's a strange statement, in all my years I've yet to see a library that attempts to do this, when *multiple* underlying I/O operations are involved... can you provide an example? You're right that all network-dependent APIs should allow you to set the *socket* timeout, but that timeout is valid only for a *single* socket I/O operation. The moment a library provides a method that does more than one operation, the socket timeout no longer corresponds with the API operation. But I might be misunderstanding what you're suggesting...It's just my opinion. No examples needed :)No problem, but I still don't understand how anyone is supposed to manage timeouts in the case of complex operations...
> But here's a question. If the Dns.BeginGetHostAddresses() call times out because of a thread pool issue, why wouldn't the Socket.BeginConnect() calls timeout for the exact same reason? Moreover, if the thread pool is as starved as it sounds like in this case, how can anything else work reliably?Socket.BeginConnect() absolutely would show the same phenomenon, if what Josh said is true (and I think it is): it's a matter of trying to do too many operations in parallel and (temporarily) putting enough strain on the thread pool (when it's still small) and we miss the timeout. It's a rather extreme scenario, and if you really do have to spin up 500 connections in parallel it's probably fair to ask you to wait a little longer by setting a higher timeout...THIS is the crux of the issue. All this talk about how timeouts should work and their accuracy is a side show.It appears that we're discussing the removal of perfectly good code in a strange (and ultimately likely to be ineffectual) attempt to band-aid somebody's system performance issues.Well, I'll try to sum it up to make sure I'm understood... My motivation for removing the timeout isn't at all to band-aid the user's performance issue. IMO the timeout doesn't improve accuracy, but rather hurts npgsql behavior and what users can expect from it. My problem is that the timeout we're providing right now isn't accurate at all. If the DNS record has 3 IPs it's divided by 3. If DNS, startup and authentication are slow it's multiplied by 3. So we're letting the user configure a timeout that, in practice, means almost nothing. What can users rely on here, how would this timeout even be documented?
Reading over what I wrote I think I've come across a bit aggressively, I'm sorry about that (too late/tired to go back and rewrite). If we really disagree on this, maybe we should have a vote between the core devs?
* Rather than dividing the connection timeout between the different IPs we get from DNS, we have the option of doing them in parallel using async IO. In other words, if DNS gives us 4 IPs, we use Socket.ConnectAsync() and wait for the first completion. This is not a flawless idea: it may mean a bit more load when doing 500 in parallel, but gives you a truly predictable timeout. I'm not 100% sure this is a good idea: the load increases the more IPs there are, but I wanted to put it out there - what do you think?* Regarding the DNS query itself - searching around yield no result for sync-with-timeout lookups. I'd say we stay with the current mechanism - I'm not sure I see an actual issue here, as Josh explained in his very first reply (thread pool needs to scale up etc.). What I can do is rewrite the async DNS mechanism to use the more modern async/await idiom (now that we're .NET 4 and above), which is functionally the same but cleaner code.* I'll also open an issue on extending the connection timeout over the later connection phases, i.e. startup and authentication. That shouldn't be too hard
* Rather than dividing the connection timeout between the different IPs we get from DNS, we have the option of doing them in parallel using async IO. In other words, if DNS gives us 4 IPs, we use Socket.ConnectAsync() and wait for the first completion. This is not a flawless idea: it may mean a bit more load when doing 500 in parallel, but gives you a truly predictable timeout. I'm not 100% sure this is a good idea: the load increases the more IPs there are, but I wanted to put it out there - what do you think?* Regarding the DNS query itself - searching around yield no result for sync-with-timeout lookups. I'd say we stay with the current mechanism - I'm not sure I see an actual issue here, as Josh explained in his very first reply (thread pool needs to scale up etc.). What I can do is rewrite the async DNS mechanism to use the more modern async/await idiom (now that we're .NET 4 and above), which is functionally the same but cleaner code.* I'll also open an issue on extending the connection timeout over the later connection phases, i.e. startup and authentication. That shouldn't be too hardWell, you realize, the normal case could result in all connection attempts succeeding, leaving us to have to pick (random?), and disconnect the rest.
Actually, it's worse than that. I think we'd have to code it such that each task is able to close the socket it just got upon finding it is one of the losers. We could get a connection quickly and start doing queries, while the other tasks were still running in the background cleaning up the mess. "Hello, PG? Yeah, I need to talk to you. Oh, uh, never mind." :)
This sounds like all kinds of bad to me.
On Sat, Sep 20, 2014 at 12:33 PM, Shay Rojansky <ro...@roji.org> wrote:On Sat, Sep 20, 2014 at 8:53 PM, Glen Parker <glen...@gmail.com> wrote:* Rather than dividing the connection timeout between the different IPs we get from DNS, we have the option of doing them in parallel using async IO. In other words, if DNS gives us 4 IPs, we use Socket.ConnectAsync() and wait for the first completion. This is not a flawless idea: it may mean a bit more load when doing 500 in parallel, but gives you a truly predictable timeout. I'm not 100% sure this is a good idea: the load increases the more IPs there are, but I wanted to put it out there - what do you think?* Regarding the DNS query itself - searching around yield no result for sync-with-timeout lookups. I'd say we stay with the current mechanism - I'm not sure I see an actual issue here, as Josh explained in his very first reply (thread pool needs to scale up etc.). What I can do is rewrite the async DNS mechanism to use the more modern async/await idiom (now that we're .NET 4 and above), which is functionally the same but cleaner code.* I'll also open an issue on extending the connection timeout over the later connection phases, i.e. startup and authentication. That shouldn't be too hardWell, you realize, the normal case could result in all connection attempts succeeding, leaving us to have to pick (random?), and disconnect the rest.That's right, although it seems to make sense to just pick the first one that connects (rather than random). The others gets asynchronously cancelled (if not yet connected) or closed (if connected).Actually, it's worse than that. I think we'd have to code it such that each task is able to close the socket it just got upon finding it is one of the losers. We could get a connection quickly and start doing queries, while the other tasks were still running in the background cleaning up the mess. "Hello, PG? Yeah, I need to talk to you. Oh, uh, never mind." :)I'm not sure I see the issue... True, we'd have the "I need to talk to you. Oh, uh, never mind" scenario, but is that a problem? I don't see anything wrong in going ahead with queries on the successful connection, while the others get asynchronously dumped in the background (except, again, for slightly more load).This sounds like all kinds of bad to me.I'm still not sure why... and it would provide an actual, predictable timeout.You keep saying that, about the timeout. I think you're reading the connect code wrong or something. It's quite predictable now, from the perspective of the caller.
On Sat, Sep 20, 2014 at 11:09 PM, Glen Parker <glen...@gmail.com> wrote:On Sat, Sep 20, 2014 at 12:33 PM, Shay Rojansky <ro...@roji.org> wrote:On Sat, Sep 20, 2014 at 8:53 PM, Glen Parker <glen...@gmail.com> wrote:* Rather than dividing the connection timeout between the different IPs we get from DNS, we have the option of doing them in parallel using async IO. In other words, if DNS gives us 4 IPs, we use Socket.ConnectAsync() and wait for the first completion. This is not a flawless idea: it may mean a bit more load when doing 500 in parallel, but gives you a truly predictable timeout. I'm not 100% sure this is a good idea: the load increases the more IPs there are, but I wanted to put it out there - what do you think?* Regarding the DNS query itself - searching around yield no result for sync-with-timeout lookups. I'd say we stay with the current mechanism - I'm not sure I see an actual issue here, as Josh explained in his very first reply (thread pool needs to scale up etc.). What I can do is rewrite the async DNS mechanism to use the more modern async/await idiom (now that we're .NET 4 and above), which is functionally the same but cleaner code.* I'll also open an issue on extending the connection timeout over the later connection phases, i.e. startup and authentication. That shouldn't be too hardWell, you realize, the normal case could result in all connection attempts succeeding, leaving us to have to pick (random?), and disconnect the rest.That's right, although it seems to make sense to just pick the first one that connects (rather than random). The others gets asynchronously cancelled (if not yet connected) or closed (if connected).Actually, it's worse than that. I think we'd have to code it such that each task is able to close the socket it just got upon finding it is one of the losers. We could get a connection quickly and start doing queries, while the other tasks were still running in the background cleaning up the mess. "Hello, PG? Yeah, I need to talk to you. Oh, uh, never mind." :)I'm not sure I see the issue... True, we'd have the "I need to talk to you. Oh, uh, never mind" scenario, but is that a problem? I don't see anything wrong in going ahead with queries on the successful connection, while the others get asynchronously dumped in the background (except, again, for slightly more load).This sounds like all kinds of bad to me.I'm still not sure why... and it would provide an actual, predictable timeout.You keep saying that, about the timeout. I think you're reading the connect code wrong or something. It's quite predictable now, from the perspective of the caller.I'm sorry if I'm misunderstanding it or something, I'm not trying to be annoying. The way it's currently implemented, if the user sets CommandTimeout to 10 seconds and there are 2 IPs, it gets cut down to 5 instead. This is unexpected behavior, and unpredictable if the user isn't aware of DNS specifics. I know I'm repeating myself, but it's hard to understand how this isn't at least recognized as a potential issue.
You keep saying that, about the timeout. I think you're reading the connect code wrong or something. It's quite predictable now, from the perspective of the caller.I'm sorry if I'm misunderstanding it or something, I'm not trying to be annoying. The way it's currently implemented, if the user sets CommandTimeout to 10 seconds and there are 2 IPs, it gets cut down to 5 instead. This is unexpected behavior, and unpredictable if the user isn't aware of DNS specifics. I know I'm repeating myself, but it's hard to understand how this isn't at least recognized as a potential issue.You're not being annoying, it's just that I'd rather be on the same page than just drop it. Maybe I'm the one misunderstanding something.Currently. in your example, the first IP will time out after 5 seconds. Then the second IP will be attempted, also with a 5 second timeout. That's a total of 10 seconds, which is what the caller expects.Now, if the first IP fails for some other reason after only 1 second, the entire remaining time slice (9 seconds) will be given to the second IP, so if it ends up timing out, again we have a total of 10 seconds.If you're seeing something that causes what I just described to be false, then we do indeed have a bug, and it should be fixed. But I looked at it again this morning, and it looks to me like it's working as expected.
Just a thought: the user could enter an ip adress to skip dns lookup. Else, If our method of resolving isn't good enough, the user can manually do dns lookup by himself and pass the ip address to Npgsql...
--
Hi Miłosz.Nobody is happy about using async for the DNS lookup - it's being done only because the sync version has no timeout. If you can find a good DNS client implementation that supports a timeout I think we'll glad use it.Having said that, we're not exactly talking about allocating unnecessary threads just for this task; the .thread pool is a standard part of .NET and is used in many operations across many applications. It's natural that large load/stress circumstances should require a larger pool. Sure, the context switch incurs a bit more CPU, but we're talking about a single switch that happens when performing a database connection - which is supposed by its nature to be an expensive, lengthy operation (roundtrips, authentication...).One more workaround to consider, is simply to preallocate connections on application startup. If you want to support 100 connections, open them at your leisure on startup and keep them around in Npgsql's pool. Then when requests actually come they will already be open.Despite all the above, Glen, Josh, with all the complaints about this maybe we should just add a connstring param called SyncDns, through which the user acknowledges that they will potentially get stuck for more than the timeout? This way high-load users which don't care about the DNS timeout can be happy?
ShayOn Monday, October 13, 2014 9:31:11 AM UTC+2, Miłosz Kubański wrote:Hi.I thing the allocation unnecessary threads is very wrong attitude, because you're using much more resources than you need. You'll use much more memory (If I remember correctly each thread allocates around 1MB for a stack) as well as you'll use to much CPU for the context switching. If you run a few applications which uses the npgsq, you'll simply run out of memory just for the application start and may crush the system performance.If you want to stay with async resolve name the better option is to create the separate thread pool. I talk about this with Francisco via email and forgot to mention it on this group. IMHO this thread pool should be separated only for a dns resolve name and shouldn't be shared for example with socket read/write, because we can have the similar problems (more complicated operations will cause the simple operations to starve).Best Regards,Miłosz
W dniu niedziela, 12 października 2014 11:35:07 UTC+2 użytkownik Shay Rojansky napisał:Note virtually identical discussion on github: https://github.com/npgsql/npgsql/issues/376Workaround mentioned by the user is to either preconfigure the thread pool to allocate more threads, to correspond with the load burst anticipated, or in an extreme case to use a custom pool (haven't investigated this yet).
--
You received this message because you are subscribed to the Google Groups "Npgsql Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to npgsql-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Miłosz, no problem for the long email and thanks for the positive feedback.I ultimately agree with almost everything you say. Your options are currently:
- Pre-open *database* connections to match burst requirements. As you say this may impose too much load on the database itself.
- Configure the .NET TP for minimum threads (as cbenien proposed in #376). This will make new connections open quickly under burst, since threads will be available for the async DNS. The downside here is thread-related resources, which frankly doesn't seem like a huge deal to me (but I admit it's a factor). An additional point is that since TP threads are a common resource, they may be in use just when you need them, so this makes connection open less deterministic.
- Increase the Npgsql connection timeout. Not an amazing solution, since under burst connections will open quite slowly (although without exceptions).
I'm guessing nobody will be against adding the SyncDns option as well - I'll open an issue for that and wait for Glen's and Josh's feedback.
Emil: Passing an IP rather than a hostname circumvents the whole problem, but I agree with Miłosz that it's not feasible in many situations
Francisco, you wrote:> Another idea I remember I saw in the async connection discussion in the github was to use a timeout-able block of code for the sync DNS code? This way, if the DNS takes too long, the code times out aborting any sync DNS query and we continue to honor the user timeout settings.The problem with the timeout-able block of code, is that... it requires a thread :) If you want to run a code block with a timeout, the way to do this is to send it to the thread pool (or allocate your own new thread), and we're back at the original problem. I agree with everything else.
--
Stopwatch sw = new Stopwatch();
sw.Restart();
string hostName = "abracadabra.com"; var s = Dns.GetHostEntry(hostName);
sw.Stop();
var el = sw.ElapsedMilliseconds;
sw.Restart();
s = Dns.GetHostEntry(hostName);
sw.Stop();
var el2 = sw.ElapsedMilliseconds;
Console.WriteLine(el); Console.WriteLine(el2); Console.ReadKey();
return;