Testing DbLinq

8 views
Skip to first unread message

Jonathan Pryor

unread,
Feb 23, 2009, 12:27:15 AM2/23/09
to DbLinq
First, an aside: is there an IRC channel to discuss DbLinq?

I'm gratified that DbLinq contains unit tests for all the providers.
However, as a developer I dislike any tests that require an actually
running database, particularly in the case of DbLinq which is trying to
write a LINQ provider for 7 different databases. I do not look forward
to installing and configuring 7 different database systems, much less
running tests for all of them...

For a project predicated upon databases, this does seem to be an odd
complaint. ;-)

There is a method to my madness, though: "removing" the database
requirement would make it easier to run unit tests on systems that don't
have any databases (most useful from my perspective, as I want to get
some form of unit tests within Mono for System.Data.Linq). This should
also make life easier for new developers -- less "grunt" work to do (in
the form of setting up a DB server) before working on more interesting
development, etc.

Note: I am NOT advocating that we drop the existing unit tests.

What I am advocating is a layered testing strategy; for DbLinq, there
are logically 3 layers (at least) that can be tested (more or less)
separately:

1. IQueryable -> SQL command (to send to backend provider)
2. IDataReader -> result sets
3. Database -> IDataReader

Strictly speaking, (3) is under the responsibility of the database
provider, not the LINQ provider, so can be ignored (e.g. there's no
reason for DbLinq to test SqlDataReader...).

The existing unit tests would remain as "end-to-end" integration tests,
ensuring that things actually work when the more targeted tests
suggested above work.

How would this work?

I haven't looked at layer (2) yet, but layer (1) can be solved through
string comparisons on two "well known" types: DataContext.Log and
DbCommand.CommandText (as returned from DataContext.GetCommand). For
example, this works quite nicely on current DbLinq w/o requiring any
database support:

DataContext context = new DataContext(new NullConnection());
var foos =
from p in context.GetTable<Person>()
where p.FirstName == "foo"
select p;
var cmd = context.GetCommand(foos);
Assert.AreEqual(ExpectedText, cmd.CommandText);

(NullConnection is a DbConnection subclass that does nothing on required
void-returning methods and throws an exception on all other methods.
It's enough to create a DataContext and elicit "interesting" behavior,
as seen above, without requiring a full-blown database.)

DataContext.Log can be used for checking the SQL generated for e.g.
DataContext.ExecuteCommand() and DataContext.ExecuteQuery() (along with
LINQ expressions that are immediately evaluated, such as .Count()).

A prototype of such an implementation is attached.

Is this testing strategy something the DbLinq team would be interested
in implementing?

Thanks,
- Jon

DataContractTest.cs
MsSqlDataContractTest.cs
NullProvider.cs

Pascal Craponne

unread,
Feb 23, 2009, 4:01:27 AM2/23/09
to dbl...@googlegroups.com
Hi Jon,

the idea is interesting, because I also have problems maintaining a VM with all the databases.
Anyway I see several limitations:
- This will require to rewrite all current tests
- Making any small change (like the internal temporary objects naming) would require to change all tests
- This makes any further optimization a pain.
As an example, I rewrote the SQL generation engine last spring/summer, and even if the generated SQL is different is all ways from previous engines, the existing tests helped me validating the engine. Also (still my own story), some tests were complex, and I had no idea of what it would look like in SQL (I'm not an SQL expert at all, and yes, I rewrote an SQL generation engine :)), so until the engine worked I had no idea of what tests should look like.
- SQL is very different from one vendor to another. A good example is how Skip() and Take() are handled in Oracle, SQL Server, and others.

But, beside all of this, DbLinq needs some new blood :)

Yes, DbLinq tests need to be structured and validated, and I suggest a few preliminary moves:
1. Stefan Klinger (team member) was working on a Linux based VM with all databases. This would allow to validate that all tests still work. Stefan, if you read us, as far as I remember, you were close to finish. Could you distribute your VM? Can someone finish the job if you're too busy?
2. The test databases need to be checked: I suspect that some test fail because the database doesn't contain the right elements.
3. Jon, provide me a google code identifer, so I can add you to contributors (and I would be very pleased to)
4. Then you Jon can safely refactor the tests, after we have agreed on something more flexible (yes, I'm hard on flexibility :))

Regarding the tests, we could instead of a hard SQL string comparison (which will totally break all tests after first change), use regular expressions, so ignore internal names could be safely ignored, just like maybe  tables or columns names (without ignoring their occurrence), whitespaces, comments, and so on? 

I also suggest to keep the current tests, but restructure them in real projects with real directories (currently they are all in the same folder, we could probably create new projets in new directories with links to shared files, instead of having all test projects in one folder).

Last thing: all tests results are usually written at http://linq.to/db/Tests by me, so you can see the state of the art without running the tests.

I probably forget a lot of things I want to say... But welcome Jon, and thanks for your help :)

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org

Jonathan Pryor

unread,
Feb 23, 2009, 8:35:22 AM2/23/09
to dbl...@googlegroups.com
On Mon, 2009-02-23 at 10:01 +0100, Pascal Craponne wrote:
> the idea is interesting, because I also have problems maintaining a VM
> with all the databases.
> Anyway I see several limitations:
> - This will require to rewrite all current tests

Hence my suggestion that this would be /in addition to/, not a
replacement for, the current tests. We want to keep the current tests
as integration tests, to ensure that "end-to-end" functionality works.

> - Making any small change (like the internal temporary objects naming)
> would require to change all tests

This is to be expected, somewhat. I'd consider it to be a feature,
myself, so that the "externally visible" effects of an internal change
are more noticeable.

> - This makes any further optimization a pain.
> As an example, I rewrote the SQL generation engine last spring/summer,
> and even if the generated SQL is different is all ways from previous
> engines, the existing tests helped me validating the engine. Also
> (still my own story), some tests were complex, and I had no idea of
> what it would look like in SQL (I'm not an SQL expert at all, and yes,
> I rewrote an SQL generation engine :)), so until the engine worked I
> had no idea of what tests should look like.

It's not necessary to have the tests written and fully complete before
the functionality is complete. In this case, I'd suggest doing the
refactoring, then writing the tests to ensure that the functionality
doesn't change in the future. (This also removes the "I don't know what
the SQL should look like" issue, as you can just use the actual
resulting SQL in the test, after you've written the functionality.)

> - SQL is very different from one vendor to another. A good example is
> how Skip() and Take() are handled in Oracle, SQL Server, and others.

You'll notice that my approach solved this, by having a DataContractTest
abstract base class (which has all the [Test] methods), and a
MsSqlDataContextTest subclass which overrides DataContextTest methods.
This allows subclasses to provide the actual SQL that's specific to that
provider, without requiring that all databases accept the same SQL.

> Regarding the tests, we could instead of a hard SQL string comparison
> (which will totally break all tests after first change), use regular
> expressions, so ignore internal names could be safely ignored, just
> like maybe tables or columns names (without ignoring their
> occurrence), whitespaces, comments, and so on?

I'd be leery about using regular expressions, and would think that the
current subclass approach would be sufficient.

However, I wonder what sort of "internal names" would be present in the
SQL that is being sent to the database....

> I also suggest to keep the current tests, but restructure them in real
> projects with real directories (currently they are all in the same
> folder, we could probably create new projets in new directories with
> links to shared files, instead of having all test projects in one
> folder).

Please! :-)

- Jon

Pascal Craponne

unread,
Feb 23, 2009, 10:03:16 AM2/23/09
to dbl...@googlegroups.com
OK, I apparently misunderstood a lot of your arguments.

Regarding the "internal names" I'm talking about, these are the aliases given to tables or columns in the request. Some internal names may also be use for special operations, like Skip() and Take() on Oracle, for example. Also, the formatting may change, to be more readable or smaller, depending on what the application wants.
Also, this is not just names, but some comments may be added. I had the concrete case last year where we wanted to identify which linq request was generated a specific SQL expression. This is not possible with MS Linq to SQL, but in DbLinq (extended mode), we may add some event handlers to allow the user to add some information directly in the generated SQL.

So this brings us to regular expressions: I know they are hard to maintain (because hard to read), so we could imagine an intermediate string format that would generate a Regex from it, without the hassle of writing regular expressions directly.

Imagine we coud write SQL check expressions such as: "select {0} from b as {1}", where {0} would be translated to valid identifier regex and spaces to "any number of spaces". So this could be translated to a regex like "select\s*\w*\s*from\s*b\s*as\s*\w*".
We keep our tests clean and only identify what's necessary.

Jon, do you have a Google code ID (a google id, in fact)? I'd like to add to you to contributors, so you could make some tests directly on the SVN (maybe in a branch for a start).

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Jonathan Pryor

unread,
Feb 23, 2009, 3:36:58 PM2/23/09
to dbl...@googlegroups.com
On Mon, 2009-02-23 at 10:01 +0100, Pascal Craponne wrote:
> 3. Jon, provide me a google code identifer, so I can add you to
> contributors (and I would be very pleased to)

My Google Code identifier is: jonmpryor

- Jon


Pascal Craponne

unread,
Feb 23, 2009, 4:51:08 PM2/23/09
to dbl...@googlegroups.com
Welcome to Jonathan Pryor, our new contributor :)

Justin Collum

unread,
Feb 24, 2009, 3:09:42 PM2/24/09
to dbl...@googlegroups.com
I'd like to be added in too. I'm working in SQLite and running into some errors with the generation of queries (joins, booleans). Unless there's someone else working on it? I'm a C# developer, comfortable with unit testing but I don't know the first thing about turning objects into SQL queries. Good for the ol' resume tho, and it'd be nice to help.

Pascal Craponne

unread,
Feb 24, 2009, 3:14:03 PM2/24/09
to dbl...@googlegroups.com
Done. You are now contributor. Also remember that with great power comes great responsibility, so be careful with changes, always check unit tests before and after the changes, if possible on many databases.

I didn't have any news from Stefan who was working on a linux VM with all databases. It is probably not totally finished, so if someone here has enough linux knowledge to finish the job (I have no idea of what databases remain to be installer), I'd appreciate.

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Justin Collum

unread,
Feb 24, 2009, 3:17:57 PM2/24/09
to dbl...@googlegroups.com
Great. Do I get a branch so I don't collide with other devs?

Pascal Craponne

unread,
Feb 24, 2009, 3:40:16 PM2/24/09
to dbl...@googlegroups.com
What do you plan to work on, exactly?

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Justin Collum

unread,
Feb 24, 2009, 4:28:37 PM2/24/09
to dbl...@googlegroups.com
I need to change how booleans are handled in SQLite query generation. I've got a boolean field in my table and data object but the query that's being generated looks like "MyField = True" where it should be "MyField ='Y'". Or am I missing something?

Pascal Craponne

unread,
Feb 24, 2009, 4:39:20 PM2/24/09
to dbl...@googlegroups.com
on some databases, booleans don't even exist, and all of this is just a convention (Oracle for example, where I have been using 0/1, Y/N, T/F in different contexts). It is apparently the same thing for SQLite, so something that works for someone won't work for you, and the opposite is also true.

The problem is really interesting, probably goes far beyond such a fix.

Maybe could such differences be handled as attributes (extensions to DbLinq).
For example:
[Bool(0,1)]
[Bool('N','Y')] 
could be attributes used to make differences for such purely conventional type management.

I don't have Linq to SQL documentation in mind (and probably being too lazy to read it now), but I don't think MS implementation needs to handle such cases.

For guys new to the project, DbLinq follows to goals:
1. Being strictly compatible with Linq to SQL, this is called the strict mode or mono-strict mode.
2. Add extended features, when there are limitations. Such extensions are in xxx.Extended.cs files (since those files are not included in strict builds).

This is detailed at http://linq.to/db/SxSVersions

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Justin Collum

unread,
Feb 24, 2009, 4:49:51 PM2/24/09
to dbl...@googlegroups.com
Seems like there is somewhere where a linq statement gets translated into SQL, but translated differently for every target database type. I just need to find that spot and change how it deals with booleans. Or am I totally missing something?

Pascal Craponne

unread,
Feb 24, 2009, 4:58:06 PM2/24/09
to dbl...@googlegroups.com
No, you got the point, the only thing is that this change should probably be modular (for example related to an attribute, if Linq to SQL doesn't provide such meta info).

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Pascal Craponne

unread,
Feb 24, 2009, 5:02:54 PM2/24/09
to dbl...@googlegroups.com
I forgot to say that the change must be implemented at three places:
1. Read (when SQL column is transformed to CLR data)
2. Insert/Update (CLR -> SQL)
3. Criteria (the "where" you were writing about)

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Justin Collum

unread,
Feb 24, 2009, 5:30:54 PM2/24/09
to dbl...@googlegroups.com
Hmm, I'm debating this now. I thought that SQLite supported a boolean type because SQLite Administrator had that as an option when defining tables. Turns out that booleans don't really exist in SQLite, ref if anyone needs it: http://www.sqlite.org/datatype3.html. So what to do? Should it translate booleans as (MyCol='Y' or MyCol=1 or  MyCol='y') and the inverse?  I guess that'd work. Sqlite administrator is turning a boolean checkbox from its interface into Y/N.

I think this didn't come up yet because there's no boolean fields in northwind that I can see. Probably because of the fact that many databases don't support them. Still, dblinq ought to have some way to translate from the boolean object type in .net to a something in the target db.

Might be related, might not, but dbmetal did interpret the 'Boolean' type in the database as a boolean in .net. The create script from SQLite admin looks like this:

CREATE TABLE [Users] (
[Id] INTEGER  PRIMARY KEY AUTOINCREMENT NOT NULL,
[UserName] varchar(20)  UNIQUE NOT NULL,
[Password] varchAR(20)  NULL,
[FirstName] vaRCHAR(20)  NULL,
[LastName] varchar(20)  NULL,
[Active] BOOLEAN DEFAULT 'Y' NOT NULL
)

It seems likely that SQLite Administrator is doing something weird with booleans. It's making some assumptions, probably generating a varchar(1) instead. I'm not sure what to do with it.

Avery Pennarun

unread,
Feb 24, 2009, 5:59:12 PM2/24/09
to dbl...@googlegroups.com
On Tue, Feb 24, 2009 at 5:30 PM, Justin Collum <jco...@gmail.com> wrote:
> Might be related, might not, but dbmetal did interpret the 'Boolean' type in
> the database as a boolean in .net. The create script from SQLite admin looks
> like this:
>
> CREATE TABLE [Users] (
> [Id] INTEGER  PRIMARY KEY AUTOINCREMENT NOT NULL,
> [UserName] varchar(20)  UNIQUE NOT NULL,
> [Password] varchAR(20)  NULL,
> [FirstName] vaRCHAR(20)  NULL,
> [LastName] varchar(20)  NULL,
> [Active] BOOLEAN DEFAULT 'Y' NOT NULL
> )
>
> It seems likely that SQLite Administrator is doing something weird with
> booleans. It's making some assumptions, probably generating a varchar(1)
> instead. I'm not sure what to do with it.

The important thing about sqlite is it doesn't really have "data
types" exactly. You can name your data types anything you want. You
could call it a "bigfoolean" if you wanted, and it would still work.

Because of that, there's no completely standard convention on what
goes into a boolean type. With that in mind, I'm not really sure how
you'd do it comfortably in Dblinq. Maybe an attribute on the
table/column object that declares what the true/false values of the
field are.

Avery

Justin Collum

unread,
Feb 24, 2009, 7:41:13 PM2/24/09
to dbl...@googlegroups.com
That makes sense to me. I solved my problem by switching those columns over to ints. Just have to modify my linq and dbml and i'm done. I think you could argue that the issue I'm having is a configuration issue. I'm not sure why SQLite Admin lets you define a boolean type if there is no such thing in the list of SQLite types I referenced.

Avery Pennarun

unread,
Feb 24, 2009, 7:43:12 PM2/24/09
to dbl...@googlegroups.com
On Tue, Feb 24, 2009 at 7:41 PM, Justin Collum <jco...@gmail.com> wrote:
> That makes sense to me. I solved my problem by switching those columns over
> to ints. Just have to modify my linq and dbml and i'm done. I think you
> could argue that the issue I'm having is a configuration issue. I'm not sure
> why SQLite Admin lets you define a boolean type if there is no such thing in
> the list of SQLite types I referenced.

Well, people do like their data types :)

Up until sqlite3, there apparently weren't *any* types. Even ints
were just stored as varchar. That doesn't mean you don't want to
think of it as an integer, though.

Have fun,

Avery

Justin Collum

unread,
Feb 24, 2009, 7:48:00 PM2/24/09
to dbl...@googlegroups.com
My vote (which you can take or leave) would be to only generate dbml from dbmetal for datatypes that are strictly supported by sqlite. So if it found a boolean when building the classes it should've skipped that table and said "hey, I don't know what to do with a boolean, you'll have to build that yourself".

Justin Collum

unread,
Feb 24, 2009, 7:48:22 PM2/24/09
to dbl...@googlegroups.com
If the source database is SQLite of course.

Jonathan Pryor

unread,
Feb 25, 2009, 1:19:42 PM2/25/09
to dbl...@googlegroups.com
I'd hate to re-ask, but is there an IRC channel for this project?  If not, perhaps we should start one on freenode or GIMPnet?



On Mon, 2009-02-23 at 10:01 +0100, Pascal Craponne wrote:
I also suggest to keep the current tests, but restructure them in real projects with real directories (currently they are all in the same folder, we could probably create new projets in new directories with links to shared files, instead of having all test projects in one folder).

Which raises the question: how should such a test project split be done?

I don't know if there is a common convention, but I do know Mono's convention is to place unit tests for an assembly into a Test directory within the assembly's directory.  Mapping this to DbLinq, we would thus have the directories:

src/DbLinq [ existing directory ]
src/DbLinq/Test [ from tests/Test_NUnit/Internals, tests/DbLinqTest ]
src/DbLinq.Firebird [ existing directory ]
src/DbLinq.Firebird/Test [ from tests/Test_NUnit/Test_NUnit_Firebird.csproj ]
src/DbLinq.Ingres [ existing directory ]
src/DbLinq.Ingres/Test [ from tests/Test_NUnit/Test_NUnit_Ingres.csproj ]

This would be a "radical" departure from the current setup of keeping the unit tests in a directory structure parallel to that of src, so I was wondering if this would be acceptable, or if you'd instead prefer a "parallel" structure within tests, e.g. have tests/DbLinq.Firebird, tests/DbLinq.Ingres, etc. directories.

- Jon

Pascal Craponne

unread,
Feb 25, 2009, 4:40:49 PM2/25/09
to dbl...@googlegroups.com
Hi Jon,

no, there is no IRC channel.
Regarding the tests, I have no specific idea about how the directories should be organized (and since I'm the only one to answer to you, I must be right :)).
The remaining problem is: where should the common tests (95% of them) be placed?
If you choose a structure under a common directory, as you suggested as second option, the problem is probably easier to solve.

Pascal.

jabber/gtalk: pas...@jabber.fr
msn: pas...@craponne.org



Jonathan Pryor

unread,
Feb 25, 2009, 10:06:26 PM2/25/09
to dbl...@googlegroups.com
On Wed, 2009-02-25 at 22:40 +0100, Pascal Craponne wrote:
no, there is no IRC channel.

Should we start one?  Or is there no interest among the current maintainers?

If there is interest, I would suggest starting a ##dblinq channel on freenode.  This would require registering the channel; directions can be found at: http://freenode.net/group_registration.shtml


Regarding the tests, I have no specific idea about how the directories should be organized (and since I'm the only one to answer to you, I must be right :)).
The remaining problem is: where should the common tests (95% of them) be placed?
If you choose a structure under a common directory, as you suggested as second option, the problem is probably easier to solve.

I don't think the shared files require the second solution. :-)

To prove so, I've created a new branch:

    https://dblinq2007.googlecode.com/svn/branches/tests-without-db/

In this branch, I've committed the previously suggested test split, so that tests are located within a src/Assembly/Test directory.

Shared files are located within src/DbLinq/Test/Providers.

The .csproj, .vbproj, and .sln files have been updated to cope with this change.

That's the good news.

The bad news is that I only have Visual Studio Express to play with, and loading the DbLinq.sln file throws up a ton of error about the project files using features that aren't supported within Express, so I would appreciate it if someone else could take a peek and ensure things are still sane.  (A subset of DbLinq.sln is supported in Express, so I was able to build most of the projects, but I don't know that all of them will build.)

This brings me to a related issue: file duplication in the build process.  At present, all the project files are setup so that they copy dependent assemblies into their output directories (which makes sense, as this allows the assemblies to actually work properly ;-).

Unfortunately, for my DbLinq.sln-under-Express build, this results in 18 copies of DbLinq.dll running around, which seems...excessive.  Similarly, there are 3 copies of DbLinq.SqlServer.dll, 4 of DbLinq.Oracle.dll, 5 of DbLinq.Sqlite.dll....

We could greatly reduce all this duplication by configuring the projects to place their output files into a common directory, e.g. into the top-level lib directory (which also contains Npgsql.dll, etc.), or perhaps into a src/build directory.  (Personally, I'd prefer throwing them into the toplevel `lib' directory, but that's just me...).

Would this be a reasonable change to make?  Or is the current setup preferred?
Thanks,
- Jon

Jonathan Pryor

unread,
Feb 25, 2009, 11:37:17 PM2/25/09
to dbl...@googlegroups.com
On Thu, 2009-02-26 at 03:06 +0000, Jonathan Pryor wrote:
On Wed, 2009-02-25 at 22:40 +0100, Pascal Craponne wrote:
no, there is no IRC channel.

Should we start one?  Or is there no interest among the current maintainers?

If there is interest, I would suggest starting a ##dblinq channel on freenode.  This would require registering the channel; directions can be found at: http://freenode.net/group_registration.shtml

Alternatively, we could just pick an arbitrary channel to hang out on until we see enough need to create a separate channel.  I would suggest ##csharp on freenode, as (1) I'm already there, and (2) they tend to cover all things .NET (and then some), so dblinq would fit right in.

My identifier on IRC is `jonp'.

- Jon

Pascal Craponne

unread,
Feb 26, 2009, 4:04:13 AM2/26/09
to dbl...@googlegroups.com
Dont take it the wrong way, but I don't see the point in creating an IRC channel: what can you expect from it that you don't get from this group/mailing list? (Am I missing something?)

Justin Collum

unread,
Feb 26, 2009, 10:31:09 AM2/26/09
to dbl...@googlegroups.com
I agree. My exposure to IRC was a mildly frustrating experience. I think people check their email way more often than they log on to irc. And it's easier to formulate a thought in email. Where you have paragraphs and formatting.

Adam Tauno Williams

unread,
Feb 26, 2009, 10:43:03 AM2/26/09
to dbl...@googlegroups.com
On Thu, 2009-02-26 at 07:31 -0800, Justin Collum wrote:
> I agree. My exposure to IRC was a mildly frustrating experience. I
> think people check their email way more often than they log on to irc.
> And it's easier to formulate a thought in email. Where you have
> paragraphs and formatting.

Ditto; and putting code snippet's etc... in IRC is pretty wretched.
IMO, nothing beats maillist (hands down). You can sort, archive, post
(with attachments, quotations, etc..) and then the wonderful listservs
archive for everyone else. I've never understood some people aversion's
to lists.

I usually have an IRC client open to three channels (#GRLUG, #gtk, and
#ogo). Other than chatting with some fellow OGo hackers on #ogo the
amount of useful traffic is very close to zero.
--
OpenGroupware developer: awil...@whitemice.org
<http://whitemiceconsulting.blogspot.com/>

Justin Collum

unread,
Feb 26, 2009, 10:48:43 AM2/26/09
to dbl...@googlegroups.com
My experience with irc was a while back when I wanted to build a ship fitting tool for Eve. I'd get on irc, look for someone I needed to talk to. Not on? Wait. Check again later, still not on. Check again later, still not on. Give up, try the next day. Finally after a couple of days I get a hold of the guy, he answers my question and gets right off irc. Then I have another question and the whole process starts over.

Contrast that with: send an email to a group of people, wait for a response.

Option 2 is a much better use of my time.
Reply all
Reply to author
Forward
0 new messages