I believe we have to look at two separate concerns: What do we want, and how should we implement it.
I) What's the best way to handle language specific semantics in the target query language?
This depends a great deal on the target query language. I go with SQL here, but I'd like to know how the NH guys will want to handle this for LINQ to HQL, and what other LINQ providers are doing.
For instance, in C# null equals null. So comparing a nullable property in LINQ should result in some special SQL code:
C#: where customer.Name == name
SQL: WHERE (customer.Name = @name) OR (customer.Name IS NULL AND @name IS NULL)
In VB we'd also have to consider that null (Nothing) equals empty strings (""):
VB: Where customer.Name = Name
SQL: WHERE (customer.Name = @name) OR ((customer.Name IS NULL OR customer.Name == '') AND (@name IS NULL OR @name == ''))
But, is this really what the VB programmer expects (principle of least surprise)? Or would they rather expect it to translate to a simple customer.Name = @name in SQL (and just deal with the different semantics in the application code)?
The LINQ to SQL examples in your blog post are a good example. Who would have expected this output differences, depending on whether you write "= Nothing" or "Is Nothing"?!
And, besides expectations, is it really what they _want_? (The null semantics of SQL are there for a reason, they often make more sense in set-based operations.)
We would have to allow opt-out of this translation, so if a programmer actually wants to get customer.Name = @name in SQL, they need to write something like:
Where SqlComparer.Equals (customer.Name, name)
Is this a Good Thing? I'd say it's hard to avoid, but I haven't quite made up my mind yet. Just translating equals to equals does have something going for it. Maybe "mirror the semantic of the source language" is not as good a prime directive as we were thinking? It's not completely achievable anyway.
And then there's case sensitivity. Is there any backend that we can think of that supports each comparison expression to specify case sensitivity? Are there any SQL dialects that support that? (TSQL does not.) Does HQL support it?
Or can we safely ignore it, just as LINQ to SQL does? (Then why the nitpicking about other semantic differences?)
II) What's the best way to handle this technically?
Right now there are only C#, VB and Oxygene/Delphi Prism. No other language supports LINQ. Oxygene has C# semantics as far as I can tell (just downloaded a trial). So it creates method calls to op_Equality or BinaryExpressions of NodeType Equal, just like C#. *)
There is one option that's not on your list, and that's creating Expressions that match VB's semantics with standard operators:
VB: Where customer.Name = Name
C#: where (customer.Name == name) || (customer.Name == null && name == "") || (customer.Name == "" && name == null)
I.e., we create the same Expression for the VB code above as we would for the C# code in the next line. This would be fully transparent for the provider (i.e., any sufficiently complete provider would automatically support VB with correct semantics.)
However, the resulting code would potentially be less optimized than a special case implementation by the provider (the null == null semantics would just be added on top). So this would have to be optional too.
If we don't want to implement this, I'd say we leave it alone for now or do the simplest thing that works just for VB. It's too early to support future languages, and for VB there's not a lot we need to do. That would rule out option #5, and favor #4 over #3.
(Since we don't want to mess with the perfectly fine C# expressions, every provider will have to handle two distinct types of expressions anyway. So we might as well leave it to the provider to parse the Microsoft.VisualBasic.CompilerServices.Operators.CompareString calls, except that they are hidden inside BinaryOperators which would make awkward parsing code. In any case, the provider needs special cases for the generated expressions, such as SQL or HQL.)
Only if someone wants to go with option 2, we could make it significantly easier for them (optionally, of course).
And then we would wait until the dust settles and the larger community (hopefully including Microsoft) finally recognizes this problem and agrees on a general solution. When the first language with LINQ support arrives that does not have C# semantics on expressions, something will have to happen. Or they could never be first class citizens for any LINQ provider that does not explicitly support them.
Oh, and one more thing: The example from Michael looks like a bug in LINQ to SQL:
Dim result = From p In context.Products Where p.Size = Nothing
SELECT COUNT(*) AS [value] FROM [Production].[Product] AS [t0]
WHERE [t0].[Size] = ''
This should be
WHERE [t0].[Size] = '' OR [t0].[Size] IS NULL
Since "= Nothing" in VB matches both null and empty, while "Is Nothing" matches only null.
Goes to show how hard it is to get everything right here. One more reason to just assume that VB programmers are better off just switching mentally between VB and query semantics. Sooner or later, relying on equality of null and empty would just bite them in the ass anyway, even if we do get everything right. Hey, no need to separate null and empty in your app? Then don't make your string columns nullable, or disallow empty strings. Mixing null and empty, just assuming that some VB magic takes care of it, that's a disaster waiting to happen.
*) I found one minor deviation: stringVar == objectVar becomes a simple BinaryExpression/Equal in C#, but in Oxygene it becomes (stringVar as object) == objectVar (additional UnaryExpression of NodeType TypeAs). This is semantically identical, but still different. So while we might not need explicit special case handling for Oxygene support, it would need separate testing and maybe a fix or two.
> 1. Just leave it as it is - let the actual LINQ provider sort this out.