CQL question: How to write a "path_equations" that prevents cycles?

67 views
Skip to first unread message

Benjamin Martin

unread,
Jul 9, 2019, 7:54:18 AM7/9/19
to Categorical Data
Hi,
    I was reading through the "Tutorial" example in CQL and your comments talk about how you can have an infinite instance if "manager" is null. To fix this you mention you could add "manager.manager = manager" to schema's S "path_equations". However, this would mean a manager cannot have a manager that is not themselves. What you really want is to allow for an arbitrary hierarchy of "Employee" (rose tree) where the nodes are managers. The only requirement needed to enforce such a structure is to prevent any cycles, essentially "forall employee. exists m. employee.m = null" where "m" is some number of "manager" (Example, employee.manager.manager.manager = null). Essentially that all manager "paths" end. This is assuming that we can have an employee with a null manager "foreign_keys", but this constraint could also be written as "all paths must end in an Employee who's manager is themselves" (I assume we cannot have an either type for "foreign_keys").

In short, how do we talk about cycles (or really add constraints about cycles) in "foreign_keys" in CQL?

Note: I don't entirely understand why having an employee with manager as null would cause an infinite instance. In other words, why does a self referencing (Employee -> Employee in this case) "foreign_keys" give rise to the possibility of infinite instances?

Note2: Is there a way to require each hierarchy (rose tree of Employee) corresponds to a single department? In other words, that the "Department" of each Employee in a given hierarchy are all the same? (I might be pushing the limits of what you can express in CQL).

Thanks for any help! (CQL is very interesting)

Ryan Wisnesky

unread,
Jul 9, 2019, 3:58:35 PM7/9/19
to categor...@googlegroups.com
Hi Benjamin,

I have good news and bad news. The good news is that you can do "SQL-style nullable foreign keys" using CQL's attributes - see the built-in "OuterJoin" example for details. CQL's foreign keys cannot be null in the sense of SQL; they are so-called labeled nulls with a different semantics. In this semantics, if X exists, and f : X -> Y is a foreign key, then f(X) must exist, leading to chains such as e -> f(e) -> f(f(e)) -> ... that must be 'broken' somehow.

The bad news is that first-order logic, and hence CQL, cannot constrain a binary relation to have no cycles using a finite number of formulae -- because as you observe, with one formula, you can only express 'has no cycles <= N' for some number N. This phenomenon is related to the fact that there is no relational algebra query to transitively close a graph's edge relation.

There's a few different strategies for encoding acyclic graphs in CQL. Is that what you have, or are you trying to control the cycles rather than prohibit them?

Regarding your particular constraint, I think

worksIn = worksIn.manager

would do it.
> --
> You received this message because you are subscribed to the Google Groups "Categorical Data" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to categoricalda...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/categoricaldata/84d4bd6f-4a56-49d3-b982-ffed18466d29%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Benjamin Martin

unread,
Jul 10, 2019, 1:44:59 AM7/10/19
to Categorical Data
Hi Dr. Wisnesky,

Thanks for answering my questions! I'm a bit overly curious about how to formalize and enforce constraints on data because they are one of the harder parts of creating an API.

The bad news is that first-order logic, and hence CQL, cannot constrain a binary relation to have no cycles using a finite number of formulae -- because as you observe, with one formula, you can only express 'has no cycles <= N' for some number N.  This phenomenon is related to the fact that there is no relational algebra query to transitively close a graph's edge relation.  

No wonder I was having trouble writing "all paths must end" as a first-order logic formula. Today I learned about and the need for transitive closure logic.

There's a few different strategies for encoding acyclic graphs in CQL.  Is that what you have, or are you trying to control the cycles rather than prohibit them?

A number of times I'm modelling some data (usually in SQL) I find a relation of foreign keys that form some type of partial order. For the "manager" foreign key I would want to have a strict partial order (DAG). Another example involving multiple foreign keys would be a user "favorite" relation (presumably for blog posts) where "favorite" is an array of foreign keys to other users. This is an irreflexive, transitive partial order where we care about the transitive closure to recommend what posts users might want to read. I'm in very interested in ways I can formalize these types of relations in CQL instead of leaving open the possibility of entering incoherent data, since many of the queries I would write on these type of relations assume the foreign keys do form said type of partial order. Some partial order properties (I think) would be easier to check than others: irreflexivity of the "favorite" relation should be "User.favorite∉User" where User is an "entities" and "favorite : User -> [User]" (This is assuming we can construct a "not in" set relation). Regardless, let me know what types of partial orders I can formalize in CQL.
 
Regarding your particular constraint, I think

   worksIn = worksIn.manager

would do it.

 Note: I think you meant "Employee.worksIn = Employee.manager.worksIn", regardless this does fix my "Note2" curiosity.

Note2: I am thoroughly impressed with CQL thus far, no other "production" product has let me even get close to adding meaningful constraints on data that can be analyzed at compile time.

Ryan Wisnesky

unread,
Jul 10, 2019, 3:03:16 AM7/10/19
to categor...@googlegroups.com
Would a partial order at the data level work? If so check out the built-in PartialOrder example, reproduced below.

schema Rel = literal : empty {
entities
Dom Row
foreign_keys
p1 p2 : Row -> Dom
}

constraints PartialOrder = literal : Rel {
forall d:Dom -> exists r:Row where r.p1 = d r.p2 = d //refl

forall r1 r2:Row where r1.p1 = r2.p2 r1.p2 = r2.p1 -> where r1.p1 = r1.p2 //anti-sym

forall r1 r2:Row where r1.p2 = r2.p1 ->
exists r3:Row where r3.p1 = r1.p1 r3.p2 = r2.p2 //trans
}

instance TwoGenerators = literal : Rel {
generators
0 1 : Dom
}

//R(0,0) R(1,1)
instance FreePartialOrderOnTwoGenerators = chase PartialOrder TwoGenerators
> --
> You received this message because you are subscribed to the Google Groups "Categorical Data" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to categoricalda...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/categoricaldata/1391298c-5a1e-4992-a4c3-10ee8e906d85%40googlegroups.com.

Benjamin Martin

unread,
Jul 11, 2019, 6:49:05 AM7/11/19
to Categorical Data
Hi Dr. Wisnesky,

From what I understand, "constraints" can only be used to check your "instance" after it has been created with "check" or try to repair your instance with "chase". The "check" is a bit like the checks on tables from SQL, but with much more expressive syntax and is applied to a complete instance instead of being checked on each sql insert. Is there a way to export the "constraints" with a schema (possibly as a trigger?) when deploying to SQL or other backends? I get the feeling that "constraints" are designed to be used in CQL's IDE as runtime checks while working with data.

I am quite curious why the "constraint" expressiveness is not allowed in "path_equations". (Would love a reference I can go read) I assume the "path_equations" get sent off to the first order logic solver while the "constraints" are only literally checked with the specified instance.

If I can export the "constraints" into whatever backend I'm using to ensure data integrity at runtime, then this approach works fine.

I did notice there is a way to compose or reuse "constraints" by importing them into a new "constraints" and changing the "constraints" type. That way you can have a generic "schema" with the minimal for a partial order type of foreign key relation, define the type of partial order you want with "constraints" and then import them into your actual "schema" and "constraints" for the particular problem you are dealing with. Just happy that worked like I expected.

Thanks for your response!

Ryan Wisnesky

unread,
Jul 11, 2019, 9:47:23 AM7/11/19
to categor...@googlegroups.com
Hi Benjamin,

It is possible to export constraints as SQL triggers: open the EASIK tool from CQL's tool's menu, click 'from last run of CQL', after running the CQL file, and then click file->export to DBMS. But I would advise against this procedure, as it is brittle, and also I do not believe anyone proved that the non-determinism of the trigger firing process doesn't matter.

The reason CQL separates constraints into a separate kind is twofold:
- a constrained schema need not admit certain queries. You have to use the check_query command on each individual query. In contrast, with plain schemas, queries always exist. I don't have a great reference for this but maybe David can speak up.

command checkQ = check_query Q SrcConstraints TgtConstraints

- for now, checking constraints against queries requires the 3rd-party E prover. Path_equations we handle with CQL's internal prover.

I've attached a couple slides with an example.

ConstraintChecking.pdf
Reply all
Reply to author
Forward
0 new messages