Hello all,
I recently noticed that Npgsql didn't perform well in a high demand environment (a site that has many users and huge spikes in traffic every now and then).
I discovered that the cause of the issue was that the connection pool distributed connectors in no particular order.
Say you have two web requests that are called at about the same time and each request makes 4 DB calls. I'll call the DB calls 1A, 1B, 1C and 1D for the first web request and 2A, 2B, 2C and 2D for the 2nd web request.
Let's say the DB operations try to grab connectors from the pool in this order 1A 2A 1B 2B 1C 2C 1D and 2D, the connectors might actually end up being distributed in the following order 1A 1B 1C 2A 1D 2B 2C 2D.
In the above example 1B and 1C got connectors before 2A got one.
In a high demand environment where there are many threads (web requests) simultaneously making DB calls, 2A might end up waiting for a very long time while other threads are handed connectors even though 2A made its request before those other threads. This often leads to an artificial time out.
What's worse is that all successful DB calls made within a web request are wasted if a subsequent DB call times out. The web request only succeeds when all DB calls within that web request are successful.
The fix to this problem is to queue up requests for connectors and process them one by one.
Another issue is that I saw my server application run into a stackoverflow exception and go down.
It didn't happen very often (It has only happened three times) but it's still a cause of concern.
I'm not sure what caused it but I suspect the recursion in GetPooledConnector() might be to blame.
Even if that's not the cause, it's safer to convert the recursion into an iterative loop since all sorts of code depend on this library.
I've created Pull Requests 178 and 182 to resolve this issues.
Thanks,
Sunny