Muldis D 0.75.0 - concrete syntax and Unicode

1 view
Skip to first unread message

Darren Duncan

unread,
May 29, 2009, 5:40:29 PM5/29/09
to recco...@googlegroups.com
P.S. This is a roughly edited/truncated version of a message I just sent to 4
individuals a couple days ago, to whom I was following up on a proposal that
they use my Muldis D design work for ideas/defaults in the evolving design their
own less-ambitious SQL parser/generator/translater/orm/etc projects written in
Perl 5. I'm not ready to make a general announcement yet, but I'm sending a
copy to you now, as a setup for my next post ...
----------

Hello [],

[]

This is a heads-up that I've just released to CPAN a very significant update to
my Muldis D language spec, which, in cumulation with the previous few releaes,
formally documents a majority of the concrete syntax(es) of the language, namely
saying how to actually write code rather than just describing its features.

See http://search.cpan.org/dist/Muldis-D/ for the latest, version 0.7[5].0 today.

[]

Now there are still some significant parts of the syntax(es) not yet formalized,
but I'll be getting to those within days, [].

Now, as far as syntax matters go, the most relevant specific files of my spec
are PTMD_STD and HDMD_Perl5_STD. The latter is the closest one to what you
actually are doing, which is trees of Perl 5 arrayrefs/etc, but it is defined
partly in terms of or as a delta from the former, which is plain text like
normal SQL.

Muldis D has 3 formally defined syntaxes which directly correspond to each
other; PTMD_STD is the "main" syntax and is the best to use for
conceptualization, same as conceptualizing in string SQL; HDMD_Perl[5|6]_STD are
"alternate" syntaxes, and are the best to use when you are generating code at
runtime from Perl 5|6 data.

It is important to note that *all* of these are *concrete* syntaxes, intended
that programmers can write in them manually same as they do the various data
structures given as input to ORMs or the DBI module. As such, you may find them
more compact than say an actual internal AST of yours might be, though they
can serve as an actual AST all the same, especially if you're looking for an
interchange format of sorts. It *is* designed to be easy to parse.

One idea I have is that my syntax could be what you use as a fallback in your
API, when users want to specify something you don't have your own special syntax
for, similarly to how you accept raw SQL snippits now as a fallback.

I remind you that Muldis D is intended to be "SQL TWO-POINT-OH" of sorts, and
any useful detail in SQL should have a reasonably direct analogy in Muldis D,
which should be either superior or at least no worse than the original. If
Muldis D can't do something reasonable that SQL can, just as easily or easier,
then this is a bug to be rectified. And so it should be well suited for the
task of representing any reasonable SQL at all, in a vendor-normalized fashion.

Here is a brief summary of what parts of the concrete syntax(es) are already
formalized, which summarizes to most parts of a SELECT query; they are
enumerated in roughly the same order they are documented in the syntax files:

0. Syntax for declaring the language/authority/version/etc that code is written in.

1. Value literals of any scalar or collection type at all, both system and
user-defined, including: booleans, integer and rational numerics, character and
binary strings, instants/datetimes and durations, tuples/rows,
relations/rowsets, sets, maybes, arrays, bags, values of any kind of
user-defined type.

2. Syntax for defining and including named sub-expressions, useful when your
query has repeated portions, so you just have to write one copy of it.

3. The generic, prefix syntax for invoking any named function at all, including
about 90% of the c300 system-defined routines, and any user-defined function.
(A function is what you can invoke inside a value expression such as a SELECT,
while a procedure is what you invoke as its own statement.) Practically every
distinct SQL clause or element that can go in a SELECT is represented by a
function in Muldis D, including your SELECT (map, project, rename, extend, etc),
FROM (join, etc), WHERE/HAVING (restrict, semijoin, antijoin, etc), GROUP BY
(group, summary, etc), ORDER BY (rank, limit, array-from, etc),
UNION/INTERSECT/MINUS/etc (union/intersect/difference), your AND/OR/XOR/etc,
your math funcs, compare funcs, string funcs, datetime funcs, etc.

4. Syntax for if-then-else-if-then-else-etc value expressions. This isn't
encoded as a function since it conceptually evaluates its sub-expressions in
order and short-cuts, where there's no conceptual order to function argument
evaluation (all functions are pure functions, so this is a feature).

5. Syntax for given-when-then-when-then-etc-default value expressions.

6. A large number of special alternate syntaxes for invoking many
system-defined functions, most of them being infix syntaxes, so you can write
all of your more common operations like you expect to be able to. You can write
all of these common operations with symbolic infix syntax: integer and rational
math, boolean/logical/and/or/etc, comparisons/equals/between/min/max, string
ops, set comparisons, relational join/union/etc.

Here is a brief summary of concrete syntax that is still to be formalized, but
that I've basically figured out already:

1. Special alternate infix syntax for more system-defined functions, such as
attribute/field extraction, renaming and some of the relational ops / parts of
SELECTs.

2. Syntax for inlining defs of routine params aka bind params, and inlining
some other things, so that writing a single-statement query really is just that
statement/expression, with no explicit "function foo ..." wrapper required, same
as with SQL.

3. The generic syntax for invoking any named updater/procedure at all, whether
system or user-defined; the syntax for writing procedure statements. This
includes your INSERT/UPDATE/DELETE and CREATE/ALTER/DROP statements.

4. The syntax for other things you may see in procedures such as var
declarations, try/catch blocks which do double-duty as child-transaction
boundary specifiers (though note that many operations are implicitly atomic anyway).

5. The syntax for defining the framing and parameters etc of both functions and
procedures. (But the body of a function is just a value expression, so that's
mostly done.)

6. The syntax for defining types, and relvars/tables, and state constraints on
both.

7. The syntax for defining the framing of whole packages (think as in Oracle's)
and schemas (basically just namespaces), etc.

8. Special syntax for defining and using sequence generators.

And then these things will be added afterwards but will take more thought first;
for one thing, other parts of the spec than just the concrete syntax need
updating to specify them:

1. Syntax for defining database transition constraints, which could block a
change that sets an otherwise valid database state because it isn't allowed to
be in sequence with the prior database state; for example, a project status code
must transition START->MIDDLE->END without skipping any steps. In a SQL
database this would typically be implemented using an ON UPDATE TRIGGER.

2. Syntax for defining automatic or triggered actions that may mutate the
database, such as logging an action that might be implemented in SQL using a
trigger. Until that is done, or as an alternative, just have an invokable
procedure do this which you can invoke when performing the action it would
otherwise respond to.

3. Value literals of system-defined spatial/GIS types.

4. Stuff to do with database users and privileges etc.

So I'll cap it there. Thank you for your time, and let me know if you [] have
other questions, or have criticism to level, or suggestions for improvement.

-- Darren Duncan

P.S. Particularly as of the latest release 0.7[5].0, the Muldis D concrete
syntaxes do exploit trans-ASCII Unicode characters as symbols, but various steps
have been taken to make this as beneficial and painless as possible. See
Basics.pod for a list of all the ASCII and trans-ASCII characters you might see
in the spec, each displayed literally and annotated with its official Unicode
codepoint number and character name, and with what Muldis D uses it for. I even
went and wrote+bundled an input method rule file you can install, unless you
have a better tool already for writing math/etc symbols (if so, tell me what it
is), so the trans-ASCIIs are easy to type. I know this since I eat my own dog
food. But otherwise, using the trans-ASCII isn't mandatory and plain ASCII
alternative syntaxes are provided too.

Reply all
Reply to author
Forward
0 new messages