Connection Pooling, Multiple Threads and DbLinq issues

77 views
Skip to first unread message

greyman

unread,
Jun 13, 2009, 10:10:02 AM6/13/09
to DbLinq
Hi,

I'm attempting to use DbLinq in an application that runs a 10 to 20
concurrent threads which access the database. I've been tracking down
sporadic exception related to DbLinq operations and from what I've
been able to glean so far it looks like the DatabaseContext class is
caching the database connection object and on each access is checking
whether the connection is open or not. If this is the case it's a
problem when multiple threads are using the same DataContext object
since they'll end up with multiple threads using the connection at the
same time and having a situation where the responses will be
incorrectly parsed.

For connection pooling in ADO.Net a new database connection should be
created each time a databse connection object is needed and the
connection pool takes care of making connection re-use efficient. That
solves the problem for multi-threaded apps as long as they don't try
and pass a database connection object around which they shouldn't.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 13, 2009, 10:06:35 PM6/13/09
to dbl...@googlegroups.com
On Sat, 2009-06-13 at 07:10 -0700, greyman wrote:
> I'm attempting to use DbLinq in an application that runs a 10 to 20
> concurrent threads which access the database. I've been tracking down
> sporadic exception related to DbLinq operations and from what I've
> been able to glean so far it looks like the DatabaseContext class is
> caching the database connection object and on each access is checking
> whether the connection is open or not. If this is the case it's a
> problem when multiple threads are using the same DataContext object
> since they'll end up with multiple threads using the connection at the
> same time and having a situation where the responses will be
> incorrectly parsed.

From:
http://msdn.microsoft.com/en-us/library/system.data.linq.datacontext.aspx

Any public static (Shared in Visual Basic) members of this type
are thread safe. Any instance members are not guaranteed to be
thread safe.

Granted, this is for .NET's DataContext type, but DbLinq aims to
implement the same API, with the same constraints, as .NET's API.

In short, DataContext is NOT thread safe, never has been, likely never
will be, so your scenario is broken by design.

Sorry.

> For connection pooling in ADO.Net a new database connection should be
> created each time a databse connection object is needed and the
> connection pool takes care of making connection re-use efficient. That
> solves the problem for multi-threaded apps as long as they don't try
> and pass a database connection object around which they shouldn't.

This is likely a good idea anyway (and is now at the end of my rather
lengthy TODO list).

That said, following this behavior will not magically make DataContext
thread safe, and thus won't make DataContext suitable for your uses.
(For example, objects returned by queries are "tied" to the DataContext
instance that created them for change tracking purposes, and these
objects are stored in a collection, so your multi-threaded access of the
DataContext may very well corrupt this collection, which will cause no
end of debugging pain....)

- Jon


Andrus Moor

unread,
Jun 14, 2009, 4:26:13 AM6/14/09
to dbl...@googlegroups.com
>> For connection pooling in ADO.Net a new database connection should be
>> created each time a databse connection object is needed and the
>> connection pool takes care of making connection re-use efficient.
>
> This is likely a good idea anyway (and is now at the end of my rather
> lengthy TODO list).

Currently dblinq seems to work in this way: new IDbConnection() reads
connection from static thread-safe ado .net pool (if ado provider implements
pooling).
IDbConnection.Dispose() issued by connection creator (dblinq or user who
passes connection) returns connection to pool.

So I don'nt undrestand what to do?

Andrus.

greyman

unread,
Jun 14, 2009, 8:05:20 AM6/14/09
to DbLinq


On Jun 14, 9:26 am, "Andrus Moor" <kobrule...@hot.ee> wrote:
> Currently dblinq seems to work in this way: new IDbConnection() reads
> connection from static thread-safe ado .net pool (if ado provider implements
> pooling).
> IDbConnection.Dispose() issued by connection creator (dblinq or user who
> passes connection) returns connection to pool.
>
> So I don'nt undrestand what to do?
>

At the moment there looks like each DbLinq DataContext will only ever
create a single IDbConnection (which represents the physical
connection to the database) object. The creation is done in
DatabaseContext.Connect. The single connection object is then passed
to other DbLinq classes to consume. In DatabaseConnection.Dispose the
connection is closed but not disposed of, it would be a problem if it
was disposed of since it is still reference and required by the other
consuming classes.

The correct way to use ADO.Net connection pooling is rather than
keeping a reference to a single connection object and passing it
around the is to create and open a new connection everytime and let
the ADO.Net provider connection pooling take care of reusing
connections. Once the connection is finished with it should be closed
and de-referenced or disposed of. It's something that looks easy to
fix in DbLinq which is good news.

Regards,

Aaron

greyman

unread,
Jun 14, 2009, 8:11:29 AM6/14/09
to DbLinq


On Jun 14, 3:06 am, Jonathan Pryor <jonpr...@vt.edu> wrote:
> From:http://msdn.microsoft.com/en-us/library/system.data.linq.datacontext....
>
>         Any public static (Shared in Visual Basic) members of this type
>         are thread safe. Any instance members are not guaranteed to be
>         thread safe.
>
> Granted, this is for .NET's DataContext type, but DbLinq aims to
> implement the same API, with the same constraints, as .NET's API.
>
> In short, DataContext is NOT thread safe, never has been, likely never
> will be, so your scenario is broken by design.
>

I didn't realise DataContexts were meant to be used in that fashion.
Thanks very much for pointing it out and after a quick refactor of my
code the initial indications are good with no weird errors so far. I
guess the key is to keep the DataContext objects light, as the
Microsoft link you provided points out. Andrus' testing may highlight
an inefficiency with the use of reflection in large assemblies but
that looks like ti could be optimised fairly easily.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 14, 2009, 10:23:10 AM6/14/09
to dbl...@googlegroups.com

This is a known issue. (Known to me as of two days ago. ;-)

The larger problem is that the current mechanism interacts *horribly*
with transactions (i.e. it doesn't work with DataContext.Transaction AT
ALL). The result is that I've had to disable several unit tests which
require use of Transactions (see my prior message about "unit tests
which altered the database"; I'm using Transactions in some tests to
prevent the changes from being persisted).

(The problem is that DataContext.Transaction is currently ignored, and
my feeble attempts so far at making DataContext.SubmitChanges() support
DataContext.Transaction results in "transactions within transactions are
not supported" errors, because in various places DatabaseContext is
being used when executing the SQL commands, and may implicitly create a
new context. This is obviously BAD.)

I think the entire use of DbConnection/etc. needs to be reworked
(possibly by removing DatabaseContext?), but I'm not sure when I'll get
to it. :-/

- Jon


greyman

unread,
Jun 14, 2009, 7:51:10 PM6/14/09
to DbLinq
On Jun 15, 12:23 am, Jonathan Pryor <jonpr...@vt.edu> wrote:
> This is a known issue.  (Known to me as of two days ago. ;-)
>
> The larger problem is that the current mechanism interacts *horribly*
> with transactions (i.e. it doesn't work with DataContext.Transaction AT
> ALL).  The result is that I've had to disable several unit tests which
> require use of Transactions (see my prior message about "unit tests
> which altered the database"; I'm using Transactions in some tests to
> prevent the changes from being persisted).
>
> (The problem is that DataContext.Transaction is currently ignored, and
> my feeble attempts so far at making DataContext.SubmitChanges() support
> DataContext.Transaction results in "transactions within transactions are
> not supported" errors, because in various places DatabaseContext is
> being used when executing the SQL commands, and may implicitly create a
> new context.  This is obviously BAD.)
>
> I think the entire use of DbConnection/etc. needs to be reworked
> (possibly by removing DatabaseContext?), but I'm not sure when I'll get
> to it. :-/

What about leaving a slight modification DatabaseContext where the
Connection property is removed and replaced with a method
GetConnection. Instead of caching the IDbConnection object as is
currently done everytime a parent class wanted to use a database
connection it would call the GetConnection method which would create
and open a new one. For this to work properly and not chew up every
database connection in the pool every call to GetConnection would have
to be wrapped in a using block so that the connection was returned to
the pool after use. The using block wraps do seem to be already used
in a lot of places but not everywhere.

I think the redundant class is DatabaseConnection. If an approach like
the one above is used it would no longer be needed.

I don't mind having a crack at implementing this but would like to get
an indication that the design at least sounds ok. Otherwise there
could be an obvious critical thing I'm missing.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 15, 2009, 3:51:11 PM6/15/09
to dbl...@googlegroups.com
On Sun, 2009-06-14 at 16:51 -0700, greyman wrote:
> What about leaving a slight modification DatabaseContext where the
> Connection property is removed and replaced with a method
> GetConnection. Instead of caching the IDbConnection object as is
> currently done everytime a parent class wanted to use a database
> connection it would call the GetConnection method which would create
> and open a new one. For this to work properly and not chew up every
> database connection in the pool every call to GetConnection would have
> to be wrapped in a using block so that the connection was returned to
> the pool after use. The using block wraps do seem to be already used
> in a lot of places but not everywhere.

I'm still not seeing the benefit here. DataContext.Connection already
exposes the IDbConnection instance to use, and DataContext appears to be
accessible from nearly everywhere of consequence.

DatabaseConnection does have one benefit: it can centralize the logic
for when to close the connection, as required in
http://msdn.microsoft.com/en-us/library/bb292288.aspx:

A DataContext opens and closes a database connection as needed
if you provide a closed connection or a connection string. In
general, you should never have to call Dispose on a DataContext.
If you provide an open connection, the DataContext will not
close it. Therefore, do not instantiate a DataContext with an
open connection unless you have a good reason to do this. In a
System.Transactions transaction, a DataContext will not open or
close a connection to avoid promotion.

So DatabaseConnection handles the implicit opening and closing of the
IDbConnection when it's needed. That seems fairly useful to me.
(Though it should be enhanced to check for presence of a transaction and
not open if a transaction is in process, though this might not be
necessary -- shouldn't the connection already be open for it to be part
of a transaction?)

DatabaseContext, on the other hand, has methods that aren't used used
anywhere (as far as VS is concerned) such as DatabaseContext.Connect(),
or DatabaseContext(DbProviderFactory) (why should we care about
DbProviderFactory?).

It's apparently "utility for the sake of utility" rather than necessity,
and what's necessary is *working* Transaction support (e.g. get
Transactions.TransactionRollbackDelete() and
Insert_Update_Delete.LinqToSqlInsert06() working), along with supporting
the documented DataContext IDbConnection semantics (e.g. open the
connection for as short a period as possible, leave open if the
connection was originally open, wrap DataContext.SubmitChanges() within
a transaction UNLESS DataContext.Transaction isn't null, in which case
we reuse the existing transaction, etc.).

My suspicion is that just getting transactions working properly will
result in either simplifying or removing many of these types, so I would
suggest trying that first. :-)

> I don't mind having a crack at implementing this but would like to get
> an indication that the design at least sounds ok. Otherwise there
> could be an obvious critical thing I'm missing.

I don't see how your approach of using GetConnection() would work,
actually. DbLinq cannot require creation of new instances, as the
IDbConnection instance may be provided to the constructor, and
IDbConnection isn't ICloneable. How could we reliably create new
connections?

The only solution I see, baring glaring bits of idiocy on my part, is to
open and close the IDbConnection on-demand whenever we perform a SQL
command, with the proviso that we don't close the connection if it was
already open when the DataContext was created. DatabaseConnection
should help with this logic.

Thanks,
- Jon


greyman

unread,
Jun 15, 2009, 7:21:56 PM6/15/09
to DbLinq
On Jun 16, 5:51 am, Jonathan Pryor <jonpr...@vt.edu> wrote:
>
> I don't see how your approach of using GetConnection() would work,
> actually.  DbLinq cannot require creation of new instances, as the
> IDbConnection instance may be provided to the constructor, and
> IDbConnection isn't ICloneable.  How could we reliably create new
> connections?
>
> The only solution I see, baring glaring bits of idiocy on my part, is to
> open and close the IDbConnection on-demand whenever we perform a SQL
> command, with the proviso that we don't close the connection if it was
> already open when the DataContext was created.  DatabaseConnection
> should help with this logic.

I see your points.

Some extra conditions I have come across are:

1. DataContext is relying on a single physical database connection and
has no mechanisms to cope if that connection enters an inconsistent
state and cannot be read to or written from,

2. The physical database connection is closed by the server or by a
network condition.

I guess the way these condiions are to be handled is that the
DataContext object will throw an exception and the consuming class
will need to create a new DataContext object and have it create a new
IDbConnection or pass a new IDbConnection to it. The key there is that
DataContext should not attempt to open any IDbConnection more than
once. If there is a need to re-open a connection is probably means
something bad has happened and a new connection should be acquired
from the pool by creating a new IDbConnection object.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 16, 2009, 7:12:42 AM6/16/09
to dbl...@googlegroups.com
On Mon, 2009-06-15 at 16:21 -0700, greyman wrote:
> The key there is that
> DataContext should not attempt to open any IDbConnection more than
> once.

This is also wrong. MSDN explicitly documents[0] that the IDbConnection
will be kept open for as short a period as possible, for each SQL
request sent to the DB. Consequently, assuming that the IDbConnection
is closed during DataContext construction, it will be .Open()ed
AND .Close()d for each SQL request, which could be more than once.

Specifically, at minimum every DataContext.SubmitChanges() call should
result in both opening and closing the IDbConnection (along with every
query traversal, etc.).

- Jon

[0] http://msdn.microsoft.com/en-us/library/bb292288.aspx

greyman

unread,
Jun 16, 2009, 8:34:57 AM6/16/09
to DbLinq
On Jun 16, 12:12 pm, Jonathan Pryor <jonpr...@vt.edu> wrote:
> On Mon, 2009-06-15 at 16:21 -0700, greyman wrote:
> > The key there is that
> > DataContext should not attempt to open any IDbConnection more than
> > once.
>
> This is also wrong.  MSDN explicitly documents[0] that the IDbConnection
> will be kept open for as short a period as possible, for each SQL
> request sent to the DB.  Consequently, assuming that the IDbConnection
> is closed during DataContext construction, it will be .Open()ed
> AND .Close()d for each SQL request, which could be more than once.
>
> Specifically, at minimum every DataContext.SubmitChanges() call should
> result in both opening and closing the IDbConnection (along with every
> query traversal, etc.).
>
>  - Jon
>
> [0]http://msdn.microsoft.com/en-us/library/bb292288.aspx


Fair enough.

I can tell you from experience though that no matter what MSDN says
there are at least some ADO.Net data providers that work a lot better
from a fault tolerance point of view if you create a new IDbConnection
object rather than opening a closed one.

If you close you IDbConnection and then the network connection to the
db is lost or the db just happens to decide to timeout the process for
your connection then calling open on the physical connection will
fail. Creating a new IDbConnection allows the ADO.Net provider
connection pool to allocate a new working connection.

I have tested this with DbLinq by knocking off the thread processing a
MySQL connection. The DataContext does not cope and the application
will need to try and work out what state the DataContext was in and
recover and resubmit. With Postgresql I've seen my physical
connections timeout and the same thing happen.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 16, 2009, 1:47:21 PM6/16/09
to dbl...@googlegroups.com
On Tue, 2009-06-16 at 05:34 -0700, greyman wrote:
> I can tell you from experience though that no matter what MSDN says
> there are at least some ADO.Net data providers that work a lot better
> from a fault tolerance point of view if you create a new IDbConnection
> object rather than opening a closed one.

So in your experience, long lived IDbConnection instances are bad. What
is long lived -- seconds or minutes?

I'll go out on a limb and assume that seconds is not long-lived. :-)

That said, the guideline [0] seems to be that a DataContext should be
created as a method local variable, not as a class field. Thus the
DataContext lifetime should be fairly short (i.e. seconds, not minutes),
removing most issues with reusing IDbConnection instances.

For example, most [1] code doesn't provide an IDbConnection instance,
but instead uses the connection string. This will BY NECESSITY result
in the creation of a *new* IDbConnection instance every time a
DataContext is created. (Using the connection string has added benefits
such as allowing specification of the SQL vendor w/o using DbLinq
extensions such as the DataContext(IDbConnection,IVendor) constructor.)

In short, for apps following existing guidelines (short-lived
DataContexts and connection strings), I don't think reusing the
IDbConnection instance will cause any problems. For apps which do
explicitly provide an IDbConnection, then the onus is on them to
recreate the IDbConnection as appropriate, as DbLinq won't do so (nor
should it be expected to do so, given the design).

- Jon

[0] I'd swear I saw a better guideline elsewhere on MSDN, but this is
the best my Google-fu can currently find:
http://social.msdn.microsoft.com/Forums/en-US/linqprojectgeneral/thread/ea6fe19c-cd21-40d1-b005-750af6632bfe

[1] "most" is, of course, pulled OOMA, but this is true for most example
code I've seen on MSDN.


greyman

unread,
Jun 16, 2009, 8:01:34 PM6/16/09
to DbLinq
I'd agree with you again that if DataContexts are short lived the
benefits of handling its IDbConnection suffering a failure are small.
I first raised this thread before I was aware DataContexts were
supposed to be used for single units of work and I had by DataContext
with it's single IDbConnection trying to survive for my application's
whole lifetime.

I would still argue that it would be better to create and destory
IDbConnection objects in DbLinq's DataConnection class rather than
opening and closing an existing one, excepting where the IDbConnection
was provided to DataContext during instantiation. Yes the MSDN
documentation seems to make a point of saying IDbConnection should be
open and closed as needed within the DataContext lifetime but I wonder
if that statement was written with connection pooling in mind.

The thing I find confusing and what set me down the wrong path with
the usage pattern I employed for DataContext was that it looks like it
should be an application lifetime type object. It's arguable the
identity management and query caching in DbLinq won't be of much
benefit if the DataContext is only used for single units of work.

Regards,

Aaron

Jonathan Pryor

unread,
Jun 16, 2009, 9:01:49 PM6/16/09
to dbl...@googlegroups.com
On Tue, 2009-06-16 at 17:01 -0700, greyman wrote:
> The thing I find confusing and what set me down the wrong path with
> the usage pattern I employed for DataContext was that it looks like it
> should be an application lifetime type object.

I would have agreed with you, prior to reading the documentation, but
then I started looking at the object tracking implementation. In order
to perform updates, DataContext needs to track *every* entity[0] object
it returns, and additionally track property changes on these objects.

If you have DataContext living longer than a method call, and reuse it,
it would not seem that unusual for it to be tracking a *huge* number of
objects. (Specifically, I wouldn't be surprised if it were tracking
millions of objects if it were used with a busy web service within a few
days time, if not sooner.) Needing to track large quantities of objects
will increase pressure on the GC, increase memory use, and slow down the
application (as more OS disk cache is used as memory use increases).

It only gets worse when you realize that there's no mechanism to clear
the tracked object cache, meaning you'd have to create a new DataContext
instance at some point anyway, just to clear out unused entities...

> It's arguable the
> identity management and query caching in DbLinq won't be of much
> benefit if the DataContext is only used for single units of work.

I'm not sure about identity management, but I'm dubious about query
tracking (as I've mentioned numerous times), as the only way to usefully
cache the query is to use a global cache, which could have similar
unbounded growth issues.

On the other hand, CompiledQuery doesn't suffer from these problems, as
it's held by the developer, doesn't refer to a DataContext (it needs to
be provided to the CompiledQuery on each delegate call), and isn't
"invisible global memory use." But this is a discussion for another
time....

- Jon

[0] "entity" defined as "MappingSource.GetModel(Type) returns non-null
for the entity's type."


Giacomo Tesio

unread,
Jun 17, 2009, 3:42:02 AM6/17/09
to dbl...@googlegroups.com


On Wed, Jun 17, 2009 at 2:01 AM, greyman <greym...@gmail.com> wrote:

It's arguable the
identity management and query caching in DbLinq won't be of much
benefit if the DataContext is only used for single units of work.

This is related to how you define a Unit of Work.

You should not need identity map out of a unit of work.

For example, in our web application, we define Unit of Work a single HTTP request (when related to DAL, domain model entities are another thing).


Giacomo
Reply all
Reply to author
Forward
0 new messages