[re-linq] VB.NET-specific string comparisons

Skip to first unread message

Fabian Schmied

Oct 16, 2009, 9:32:38 AM10/16/09
to re-motion Developers
Michael Ketting turned me to this blog post:

It's about VB.NET emitting special expression trees for code comparing
strings in LINQ queries; this is because VB.NET handles string
comparisons a little different from other .NET languages. For example,
empty strings and null strings (Nothing in VB.NET) are regarded to be
equal, and one can choose whether to use ordinal compare or
culture-specific compare.

In re-linq, we're currently not special-casing these VB.NET operators,
which means that a LINQ provider will end up with
MethodCallExpressions to
Microsoft.VisualBasic.CompilerServices.Operators.CompareString when
they parse an expression (e.g. where condition) written in VB.NET. A
re-linq client would currently have to recognize this special
MethodCallExpression and handle it accordingly.

Fur the future, we'll want to implement some support for those in
re-linq, recognizing the MethodCallExpressions and canonicalizing them
somehow. We cannot reduce them to standard expression nodes, however,
because those operators have very specific semantics.

I've listed our options together with a more elaborate explanation
here: http://www.re-motion.org/blogs/mix/archive/2009/10/16/vb.net-specific-text-comparison-in-linq-queries.aspx

Here's the list of options from the blog post:

1. Just leave it as it is – let the actual LINQ provider sort this out.
2. Substitute those operators with the standard ones.
3. Define generic text comparison expression node types and replace
the MethodCallExpressions with instances of those types.
4. Define VB-specific expression node types and replace the
MethodCallExpressions with instances of those types.
5. Implement support for language packages, i.e. plug-ins for re-linq
that encapsulate language-subtleties and their processing.

Has anyone got opinions to share?


Wenig, Stefan

Oct 16, 2009, 2:07:02 PM10/16/09
to re-mot...@googlegroups.com

I believe we have to look at two separate concerns: What do we want, and how should we implement it.

I) What's the best way to handle language specific semantics in the target query language?

This depends a great deal on the target query language. I go with SQL here, but I'd like to know how the NH guys will want to handle this for LINQ to HQL, and what other LINQ providers are doing.

For instance, in C# null equals null. So comparing a nullable property in LINQ should result in some special SQL code:

C#: where customer.Name == name
SQL: WHERE (customer.Name = @name) OR (customer.Name IS NULL AND @name IS NULL)

In VB we'd also have to consider that null (Nothing) equals empty strings (""):

VB: Where customer.Name = Name
SQL: WHERE (customer.Name = @name) OR ((customer.Name IS NULL OR customer.Name == '') AND (@name IS NULL OR @name == ''))

But, is this really what the VB programmer expects (principle of least surprise)? Or would they rather expect it to translate to a simple customer.Name = @name in SQL (and just deal with the different semantics in the application code)?

The LINQ to SQL examples in your blog post are a good example. Who would have expected this output differences, depending on whether you write "= Nothing" or "Is Nothing"?!

And, besides expectations, is it really what they _want_? (The null semantics of SQL are there for a reason, they often make more sense in set-based operations.)

We would have to allow opt-out of this translation, so if a programmer actually wants to get customer.Name = @name in SQL, they need to write something like:

Where SqlComparer.Equals (customer.Name, name)

Is this a Good Thing? I'd say it's hard to avoid, but I haven't quite made up my mind yet. Just translating equals to equals does have something going for it. Maybe "mirror the semantic of the source language" is not as good a prime directive as we were thinking? It's not completely achievable anyway.

And then there's case sensitivity. Is there any backend that we can think of that supports each comparison expression to specify case sensitivity? Are there any SQL dialects that support that? (TSQL does not.) Does HQL support it?

Or can we safely ignore it, just as LINQ to SQL does? (Then why the nitpicking about other semantic differences?)

II) What's the best way to handle this technically?

Right now there are only C#, VB and Oxygene/Delphi Prism. No other language supports LINQ. Oxygene has C# semantics as far as I can tell (just downloaded a trial). So it creates method calls to op_Equality or BinaryExpressions of NodeType Equal, just like C#. *)

There is one option that's not on your list, and that's creating Expressions that match VB's semantics with standard operators:

VB: Where customer.Name = Name
C#: where (customer.Name == name) || (customer.Name == null && name == "") || (customer.Name == "" && name == null)

I.e., we create the same Expression for the VB code above as we would for the C# code in the next line. This would be fully transparent for the provider (i.e., any sufficiently complete provider would automatically support VB with correct semantics.)

However, the resulting code would potentially be less optimized than a special case implementation by the provider (the null == null semantics would just be added on top). So this would have to be optional too.

If we don't want to implement this, I'd say we leave it alone for now or do the simplest thing that works just for VB. It's too early to support future languages, and for VB there's not a lot we need to do. That would rule out option #5, and favor #4 over #3.

(Since we don't want to mess with the perfectly fine C# expressions, every provider will have to handle two distinct types of expressions anyway. So we might as well leave it to the provider to parse the Microsoft.VisualBasic.CompilerServices.Operators.CompareString calls, except that they are hidden inside BinaryOperators which would make awkward parsing code. In any case, the provider needs special cases for the generated expressions, such as SQL or HQL.)

Only if someone wants to go with option 2, we could make it significantly easier for them (optionally, of course).

And then we would wait until the dust settles and the larger community (hopefully including Microsoft) finally recognizes this problem and agrees on a general solution. When the first language with LINQ support arrives that does not have C# semantics on expressions, something will have to happen. Or they could never be first class citizens for any LINQ provider that does not explicitly support them.

Oh, and one more thing: The example from Michael looks like a bug in LINQ to SQL:

Dim result = From p In context.Products Where p.Size = Nothing
SELECT COUNT(*) AS [value] FROM [Production].[Product] AS [t0]
WHERE [t0].[Size] = ''

This should be
WHERE [t0].[Size] = '' OR [t0].[Size] IS NULL
Since "= Nothing" in VB matches both null and empty, while "Is Nothing" matches only null.

Goes to show how hard it is to get everything right here. One more reason to just assume that VB programmers are better off just switching mentally between VB and query semantics. Sooner or later, relying on equality of null and empty would just bite them in the ass anyway, even if we do get everything right. Hey, no need to separate null and empty in your app? Then don't make your string columns nullable, or disallow empty strings. Mixing null and empty, just assuming that some VB magic takes care of it, that's a disaster waiting to happen.


*) I found one minor deviation: stringVar == objectVar becomes a simple BinaryExpression/Equal in C#, but in Oxygene it becomes (stringVar as object) == objectVar (additional UnaryExpression of NodeType TypeAs). This is semantically identical, but still different. So while we might not need explicit special case handling for Oxygene support, it would need separate testing and maybe a fix or two.
> 1. Just leave it as it is - let the actual LINQ provider sort this out.


Oct 19, 2009, 8:45:10 AM10/19/09
to re-motion Developers
I think I have a preference for options 3 or 4. Option 1 doesn't
sound great, since pretty much any provider is going to have to
duplicate the same logic. Option 2 is, I think, a non-starter, since
it means that the re-linq model is no longer a hi-fidelity view of the
original linq expression - intent has been lost and it's now
impossible for the provider to do exactly what the user wanted
(granting, of course, that the user may well not have known what they
were asking for!).

Option 5 sounds great, but it also sounds like a lot of work and
smells a little of yagni to me. That leaves 3 or 4, and I don't
really have a strong opinion either way on which one to adopt.

Wenig, Stefan

Oct 19, 2009, 9:41:59 AM10/19/09
to re-mot...@googlegroups.com, st...@srstrong.com
Steve, thanks for your input. Do you have any idea yet what you'd want to generate for LINQ to NH?

Would you try to simulate the null-semantics of C# and VB? (i.e., a = b OR a is null AND b is null)
The null/empty equality of VB?
Any weird thing beyond that? (case sensitivity, cultures ;-))

Will NH want to support Delphi/Oxygene?

BTW, I think that would be good topics to discuss on the NH mailing list.

(I assume the null semantics of HQL are like those in SQL? i.e. the expression (null = null) returns null, not true. Is this correct?)

Reply all
Reply to author
0 new messages