Schema Evolution Example

54 views
Skip to first unread message

Fred Eisele

unread,
Jan 18, 2018, 3:22:13 PM1/18/18
to categoricaldata




Given this schema "S" evolving to "T".
There is a problem mapping objects 1:2 [not allowed].
There is also a problem mapping the purple arrow.

Adding the ‘X’ schema makes it possible to map all objects properly.
But, there is still ambiguity on how to map the purple arrow.


For example the ‘’has” arrow could legitimately be mapped from four different places.
Should it be mapped from all of them?
I am guessing that each of the four candidate arrows means something slightly different.
If so, what would that difference in meaning be?

Should the idx/idy arrows be mapped to the same or distinct idx/idy arrows?

Fred Eisele

unread,
Jan 18, 2018, 3:45:51 PM1/18/18
to categoricaldata
The more I look at this the more ambiguity I see.

I am pretty sure the event_idx/idy and position_idx/idy arrows being fused is probably not what is intended.
I have not introduced any path equivalences which may help resolve some of these questions.
In the actual database from which this example is derived the cardinality from position to event is [0,1]:1
* Could a careful selection of arrows and path equivalence specify the desired cardinality?
* What do each of the choices for the purple arrow practically imply?

Fred Eisele

unread,
Jan 18, 2018, 4:40:23 PM1/18/18
to categoricaldata
 
Let me motivate the example a bit.
A naive approach to database schema evolution [S->T] is to simply map the attributes from one set of tables [in S] to another set [in T].
The foreign-keys are similarly migrated but, even in the simple example provided, it is not clear where the foreign-key should go.
It seems that the evolution can be described fully by indicating which of the four choices are "active".
This seems, at first, sufficient but closer inspection reveals an ambiguity in the identity arrows for the objects in the original schema.



David Spivak

unread,
Jan 18, 2018, 10:30:58 PM1/18/18
to categor...@googlegroups.com
One reason to use AQL is to prototype with the different options to see what each will do to a small amount of data. Would that help?

I had a hard time following the example; it might help me to see it in AQL code if possible.

--
You received this message because you are subscribed to the Google Groups "categoricaldata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to categoricaldata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fred Eisele

unread,
Jan 19, 2018, 1:02:55 PM1/19/18
to categoricaldata

Here is a simple AQL that conveys the idea.

schema S = literal : sql {
 entities
    a b
 foreign_keys
    has_a
: b -> a
 attributes
    i
: a -> Varchar
    j
: a -> Varchar
    k
: b -> Varchar
    m
: b -> Varchar
}


schema T
= literal : sql {
 entities
    c d
 foreign_keys
    has_c
: d -> c
 attributes
    i
: c -> Varchar
    j
: d -> Varchar
    k
: c -> Varchar
    m
: d -> Varchar
}


schema X
= literal : sql {
 entities
    ac ad bc bd
 foreign_keys
 attributes
    i
: ac -> Varchar
    j
: ad -> Varchar
    k
: bc -> Varchar
    m
: bd -> Varchar
}


mapping F
= literal : X -> S {
 entity ac
-> a
    attributes i
-> i
 entity ad
-> a
    attributes j
-> j
 entity bc
-> b
    attributes k
-> k
 entity bd
-> b
    attributes m
-> m
}


mapping G
= literal : X -> T {
 entity ac
-> c
    attributes i
-> i
 entity ad
-> d
    attributes j
-> j
 entity bc
-> c
    attributes k
-> k
 entity bd
-> d
    attributes m
-> m
}


instance
Js = literal : S {
 generators
    a1 a2
: a
    b1
: b
 equations
    a1
.i = "a1i" a1.j = "a1j"
    b1
.has_a = a1 b1.k = "b1k" b1.m = "b1m"
    a2
.i = "a2i" a2.j = "a2j"
}


instance
Jx = delta F Js


instance
Jt = sigma G Jx


The goal is to have 'Jt' be something like...

instance Jt_goal = literal : T {
 generators
    c1 c2
: c
    d1 d2
: d
 equations
    c1
.i = "a1i" c1.k = "b1k"
    d1
.has_c = c1 d1.j = "a1j" d1.m = "d1m"
 
    c2
.i = "a2i" c2.k = "null"
   
d2.has_c = c2 d2.j = "a2j" d2.m = "null"
}

data.aql

David Spivak

unread,
Jan 19, 2018, 2:22:09 PM1/19/18
to categor...@googlegroups.com
Your X wasn't quite right; you forgot to include the foreign keys, so it generated fresh things you didn't want. Here's the thing that seems at least closer. Is it what you want?

schema S = literal : sql {
 entities 
    a b
 foreign_keys 
    has_a : b -> a
 attributes 
    i : a -> Varchar
    j : a -> Varchar
    k : b -> Varchar
    m : b -> Varchar
}


schema T = literal : sql {
 entities 
    c d
 foreign_keys 
    has_c : d -> c
 attributes 
    i : c -> Varchar
    j : d -> Varchar
    k : c -> Varchar
    m : d -> Varchar
}


schema X = literal : sql {
 entities 
    ac ad bc bd
 foreign_keys 
  a_has_c : ad -> ac
  b_has_c : bd -> bc
  has_a_c : bc -> ac
  has_a_d : bd -> ad
 attributes 
    i : ac -> Varchar
    j : ad -> Varchar
    k : bc -> Varchar
    m : bd -> Varchar
}


mapping F = literal : X -> S {
 entity ac -> a
    attributes i -> i
 entity ad -> a
  foreign_keys a_has_c -> b
    attributes j -> j
 entity bc -> b
  foreign_keys has_a_c -> has_a
    attributes k -> k
 entity bd -> b
  foreign_keys has_a_d -> has_a
              b_has_c -> b
    attributes m -> m
}


mapping G = literal : X -> T {
 entity ac -> c
    attributes i -> i
 entity ad -> d
  foreign_keys a_has_c -> has_c 
    attributes j -> j
 entity bc -> c
  foreign_keys has_a_c -> c
    attributes k -> k
 entity bd -> d
  foreign_keys has_a_d -> c
              b_has_c -> has_c
    attributes m -> m
}


instance Js = literal : S {
 generators 
    a1 a2 : a
    b1 : b
 equations 
    a1.i = "a1i" a1.j = "a1j" 
    b1.has_a = a1 b1.k = "b1k" b1.m = "b1m"
    a2.i = "a2i" a2.j = "a2j" 
}


instance Jx = delta F Js


instance Jt = sigma G Jx


instance Jt_goal = literal : T {
 generators 
    c1 c2 : c
    d1 d2 : d
 equations 
    c1.i = "a1i" c1.k = "b1k" 
    d1.has_c = c1 d1.j = "a1j" d1.m = "d1m"
 
    c2.i = "a2i" // c2.k = "null"  (leaving things blank is what gives nulls)
    d2.has_c = c2 d2.j = "a2j"// d2.m = "null"
}


If it's not what you want, it's probably because you're trying to use a sigma when you should really be using pi. In fact, you really should just be using a query (which is a pi and a delta). Ryan or I can send you the query text if you want.

Fred Eisele

unread,
Jan 19, 2018, 3:15:33 PM1/19/18
to categoricaldata


On Friday, January 19, 2018 at 1:22:09 PM UTC-6, David Spivak wrote:
Your X wasn't quite right; you forgot to include the foreign keys, so it generated fresh things you didn't want. Here's the thing that seems at least closer.

I left them out on purpose, that was the escense of my question.
How should the arrows be mapped?
What should the foreign_keys be and how should they be mapped?

 
Is it what you want?
 
Maybe. 
I will take a look at what you suggest and see if that gives reasonable results.


If it's not what you want, it's probably because you're trying to use a sigma when you should really be using pi. In fact, you really should just be using a query (which is a pi and a delta). Ryan or I can send you the query text if you want.

I spoke with Ryan about this  and he pointed out that the problem can be attacked in several ways the most obvious ways are via a span or a cospan.

Via a span:
This is what I have been trying to do.
In relational terms this is like first normalizing and then denormalizing.

Via a cospan: 
This is the same a query [I think].
In relational terms this is like first denormalizing and then renormalizing.

What are the attendant issues of these two approaches?

I have attached the cospan and query versions.
Note that the 'pi' function does not seem to be present in AQL so I used sigma.
It seems to give reasonable results with the trivial Js.

[I am still trying to wrap my head around adjunctions and what pi and sigma actually do.]

I took a shot at writing the query but it has a defect.

data_cospan.aql
data_query.aql
data_span.aql

Ryan Wisnesky

unread,
Jan 19, 2018, 3:43:48 PM1/19/18
to categor...@googlegroups.com
I'll add a keyword for pi, but I'm pretty sure you can do it like this:

Eval (toQuery F) I

Or possibly using toCoQuery - whichever fails when F is not surjective on attributes.

Fred Eisele

unread,
Jan 19, 2018, 6:42:54 PM1/19/18
to categoricaldata


On Friday, January 19, 2018 at 1:22:09 PM UTC-6, David Spivak wrote:
In fact, you really should just be using a query (which is a pi and a delta). Ryan or I can send you the query text if you want.


Ok,I am trying to get the query to work but once again the foreign_key has me flummoxed.
Once again here are the source and target schema.

schema S = literal : sql {

 entities
 a b
 foreign_keys
 has
: b -> a
 attributes
 i
: a -> Varchar

 j
: a -> Varchar
 k
: b -> Varchar
 m
: b -> Varchar
}


schema T
= literal : sql {
 entities
 c d
 foreign_keys
 has_c
: d ->
c
 has_d
: c -> d
 attributes
 i
: c -> Varchar

 j
: d -> Varchar
 k
: c -> Varchar
 m
: d -> Varchar
}

Here are the cospan suitable mappings:

schema X = literal : sql {

 entities
 n
 foreign_keys


 attributes
 i
: n -> Varchar
 j
: n -> Varchar
 k
: n -> Varchar
 m
: n -> Varchar
}


mapping F
= literal : S -> X {
 entity a
-> n
 attributes
 i
-> i
 j
-> j
 entity b
-> n
 foreign_keys has
-> n
 attributes
 k
-> k
 m
-> m
}


mapping G
= literal : T -> X {
 entity c
-> n
 foreign_keys has_d
-> n
 attributes
 i
-> i
 k
-> k
 entity d
-> n
 foreign_keys has_c
-> n
 attributes
 j
-> j
 m
-> m
}

Here is an instance of the source schema to migrate.

instance Js = literal : S {

 generators
 a1 a2 a3 a4
: a
 b1 b2
: b
 equations
 a1
.i = "a1i" a1.j = "a1j"
 b1
.has = a1 b1.k = "b1k" b1.m = "b1m"

 a2
.i = "a2i" a2.j = "a2j"



 a3
.i = "a3i" a3.j = "a3j"
 b2
.has = a3 b2.k = "b2k" b2.m = "b2m"
 a4
.i = "a4i" a4.j = "a4j"
}


Here is the migration using a cospan:
Note that the PI function effectively gives me an inner-join while SIGMA gives me the intended outer-join.

instance JxPi = eval (toCoQuery F) Js
instance
JxSigma = sigma F Js


instance
Jt = delta G JxSigma


Here is my try at making a query that does the same thing.
Note that this is broken around the foreign_keys.
 {Help!}

query Q = literal : S -> T {
 entity c
-> {
   
from ca:a cb:b
   attributes
     i
-> i(ca)
     k
-> k(cb)
   foreign_keys
     has_d
-> {db -> db}}
 entity d
-> {
   
from da:a db:b
   attributes
     j
-> j(da)
     m
-> m(db)
   foreign_keys
     has_c -> {cb -> da}
 
}
}

Once the query is correct the following evaluates Js using it.

instance JtQuery = eval Q Js



Ryan Wisnesky

unread,
Jan 19, 2018, 6:48:53 PM1/19/18
to categor...@googlegroups.com
I fixed the query one way - not sure if that’s the intended semantics or not.

fred.aql
Screen Shot 2018-01-19 at 6.47.36 PM.png

Ryan Wisnesky

unread,
Jan 19, 2018, 7:10:34 PM1/19/18
to categor...@googlegroups.com
Writing the foreign keys clauses in a query tends to be tough at first but it’s made a lot easier with the concept of a ‘frozen instance’.  The only difference between an AQL instance and the FROM/WHERE part of a query are the words GENERATORS/EQUATIONS.  So when Q : S -> T, we have a ‘frozen' S-Instance Q(t) for every entity t in T.  If f : s -> t is a foreign key in T, then giving the foreign keys clause Q(f) is tantamount to giving a natural transformation Q(t) -> Q(s) [note reversed direction], i.e., it’s the same as an AQL ’transform’ between the frozen instances.  

There’s a lot to unpack in those sentences, but once you understand it you’ll never struggle to write a query again.  Basically, this says how to populate f : s -> t by giving, for each generator in Q(t), an equivalent expression in Q(s); so, when we evaluate Q(s) and need to give a generator in Q(t) (i.e., to populate the value of the f column, which should be a row in t), we can use the expression in Q(s).    

On Jan 19, 2018, at 6:48 PM, Ryan Wisnesky <wisn...@gmail.com> wrote:

I fixed the query one way - not sure if that’s the intended semantics or not.

<fred.aql>


<Screen Shot 2018-01-19 at 6.47.36 PM.png>

--
You received this message because you are subscribed to the Google Groups "categoricaldata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to categoricalda...@googlegroups.com.

Fred Eisele

unread,
Jan 23, 2018, 2:25:54 PM1/23/18
to categoricaldata
Enter code here...

The problem of presenting the schema migration as a query does not seem to be correct.
Given the Source and Target schema:
schema S = literal : sql {
 entities
   a b
 foreign_keys
   has
: b -> a
 attributes
   i
: a -> Varchar
   j
: a -> Varchar
   k
: b -> Varchar
   m
: b -> Varchar
}




schema T = literal : sql {
 entities
   c d
 foreign_keys
   has_c
: d -> c
   has_d
: c -> d
 attributes
   i
: c -> Varchar
   j
: d -> Varchar
   k
: c -> Varchar
   m
: d -> Varchar
}

The cospan can be represented as a pair of mappings and a cospan schema.
With a schema S instance
instance Js = literal : S {
 generators
   a1 a2 a3 a4
: a
   b1 b2
: b
 equations
   a1
.i = "a1i" a1.j = "a1j"
   b1
.has = a1 b1.k = "b1k" b1.m = "b1m"
   a2
.i = "a2i" a2.j = "a2j"


   a3
.i = "a3i" a3.j = "a3j"
   b2
.has = a3 b2.k = "b2k" b2.m = "b2m"
   a4
.i = "a4i" a4.j = "a4j"
}



The query can be computed from the mappings or produced as a literal.
instance JxPi = pi F Js

instance
JtCospan = delta G JxPi

query QSTComputed = [ toCoQuery F ; toQuery G ]
query QST
= literal : S -> T {
  entity c
-> {
 
from B1: b
    attributes
   i
-> B1.has.i
   k
-> B1.k
    foreign_keys
   has_d
-> {
     B2
-> B1
   
}
 
}
  entity d
-> {
 
from
   B2
: b
    attributes
   j
-> B2.has.j
   m
-> B2.m
    foreign_keys
   has_c
-> {
     B1
-> B2
   
}
 
}
}
instance
JtQuery = eval QST Js

These produce similar results in which the "a2" and "a4" data are lost.
This is what would be expected from an inner-join [as per normal queries].


In order to preserve all of the data an outer-join is needed.
Using the "sigma" operation rather than the "pi" operation preserves all of the data.
instance JxSigma = sigma F Js

instance
JtSigma = delta G JxSigma

This is the reason I thought using a span rather than a cospan was necessary in the first place.
Are queries capable of doing the job of instance migration without losing data?


Ryan Wisnesky

unread,
Jan 23, 2018, 2:31:30 PM1/23/18
to categor...@googlegroups.com
Hi Fred,

Can you say which output instance you would like to get?

--
You received this message because you are subscribed to the Google Groups "categoricaldata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to categoricaldata+unsubscribe@googlegroups.com.

Ryan Wisnesky

unread,
Jan 23, 2018, 3:01:59 PM1/23/18
to categor...@googlegroups.com
One important limitation of Delta and pi is that they cannot create 'fresh' nulls.

David Spivak

unread,
Jan 23, 2018, 4:18:49 PM1/23/18
to categor...@googlegroups.com
Fred already sent the instance he would like to get in an email from Jan 19, addressed to categoricaldata.

Ryan Wisnesky

unread,
Jan 23, 2018, 4:21:33 PM1/23/18
to categor...@googlegroups.com
Hm, I must have misunderstood the question, but Fred and I will chat about it tomorrow.

Fred Eisele

unread,
Jan 23, 2018, 5:36:12 PM1/23/18
to categoricaldata

 

On Tuesday, January 23, 2018 at 1:31:30 PM UTC-6, Ryan Wisnesky wrote:
Hi Fred,

Can you say which output instance you would like to get?


 instance JtGoal = literal : T {
  generators
    c1 c2 c3 c4
: c
    d1 d2 d3 d4
: d
  equations
    c1
.i = "a1i" c1.k = "b1k" c1.has_d = d1
    c2
.i = "a2i"              c2.has_d = d2
    c3
.i = "a3i" c3.k = "b2k" c3.has_d = d3
    c4
.i = "a4i"              c4.has_d = d4


    d1
.j = "a1j" d1.m = "b1m" d1.has_c = c1
    d2
.j = "a2j"              d2.has_c = c2
    d3
.j = "a3j" d3.m = "b2m" d3.has_c = c3
    d4
.j = "a4j"              d4.has_c = c4
 
}


Note that this is isomorphic to 

instance JtSigma = delta G sigma F Js


Ryan Wisnesky

unread,
Jan 23, 2018, 6:13:37 PM1/23/18
to categor...@googlegroups.com
Ok, so my confusion here is as follows: if you can compute the instance you want, then what is the question?  Are you trying to obtain it using a different method than you are now, and if so, why?  

David Spivak

unread,
Jan 23, 2018, 6:27:17 PM1/23/18
to categor...@googlegroups.com
I think he didn't compute it: he presented it. He wants AQL to compute it.

Ryan Wisnesky

unread,
Jan 23, 2018, 6:31:08 PM1/23/18
to categor...@googlegroups.com
In the very last sentence the instance desired is described as isomorphic to a Delta/sigma combo.

Is the question to find the two mappings for the Delta sigma that result in that instance?

Fred Eisele

unread,
Jan 24, 2018, 8:00:34 AM1/24/18
to categoricaldata
Yes, I have a set of mapping’s that appears to give me what I expect for this specific case.

1. Do the set of schematic mappings actually work or does it just seem that way?
2. I need to automatically generate mappings from quasi-random permutations of the columns.
3. How do I use the mappings to migrate a set of queries?

Reply all
Reply to author
Forward
0 new messages