On 08/12/2011, at 2:42 PM, Dan Kubb (dkubb) wrote:
>>> I don't believe it's possible to have a perfect abstraction that prevents you from using SQL while still supporting all the features of SQL.
>> This is false - as a counter-example, CQL already does this.
> Really? That's amazing.
Well, yes. It's just a humanised language wrapped around first-order logic
with bag comprehensions. There's no limit to the complexity of the resolved
expression, so anything that you can define in FOL can be said in CQL.
FOL is equivalent to the relational model in expressive power, and SQL
is less powerful than both.
In some cases it's necessary to break an expression into several parts by
defining derived fact types, but that's because the humanised language
doesn't include the use of parentheses to allow arbitrary and/or
(conjunctive/disjunctive) nesting. But you want to break up such queries
anyhow, it makes them much more comprehensible.
> When you say "SQL" though, do you mean one of
> the ANSI spec revisions, a subset, a vendor specific flavour, all of
> those, or something else?
ANSI, at least, including all kinds of joins (inner, outer, anti, semi),
grouping, ordering and recursive queries (using CTEs in SQL).
IOW it's relationally complete.
CQL doesn't support (or need) NULLs, which makes things simpler,
as all the garbage that SQL introduces to deal with the anomalies
they introduce is avoided. I'm sure there are individual functions of
vendor versions that aren't built-in, but that's unavoidable. Functions
and fact type derivations in CQL can be externally defined, and expose
into the language without breaking the model or structure.
CQL doesn't have a built-in row-numbering feature like ROW_NUMBER(),
but that's only technically needed when you have a result set containing
exact duplicates. (If you exclude exact duplicates, you can always
define a total ordering and derive a row-number from that ordering.
A naive implementation would be terribly inefficient, but the question
was about expressive power, not implementation efficiency).
> I must admit I haven't seen any query abstraction that doesn't begin
> to leak at some point, especially as you start to get near the edge.
Do you consider the need to sometimes define external functions (for
example, atan2(x, y)) externally to be leaky? I mean, you could possibly
define even that in FOL, but it wouldn't make sense to. So I don't consider
the need for that type of extension to be leaky.
Do you consider Tutorial D to be leaky?
The presentation of CQL on my website is almost three years out of date,
sorry about that. I'm trying to finish incorporating arithmetic into the
query implementation (big job!) before updating it.
> I have yet to see a case where the abstraction is perfect.
Regarding internal representations, I've been considering whether to translate
all FOL expressions to use a single operator, effectively the Sheffer Stroke
(actually the "mark of distinction" of George Spencer-Brown from "Laws of
Form"), because that guarantees the ability to deterministically reach the
single simplest form when optimising the query. It's a big change from my
current internal representation however, and I have yet to finish that.
The only other existing implementation I know of is in NORMA Pro (the
basic version is at http://sf.net/projects/orm), which uses the role calculus
for its internal representation. That work is much more complete than mine,
and is in use in some incredible ways with clients of LogicBlox Inc, but they
have no query language. Instead, you have to build queries through a set
of complex Windows dialogs, editing the join paths using a difficult hierarchical
Clifford Heath, Data Constellation, http://dataconstellation.com
Agile Information Management and Design
Skype: cjheath, Ph: (+61/0)401-533-540