DAL auto_import errors

113 views
Skip to first unread message

Alan Etkin

unread,
Jan 30, 2014, 6:10:16 AM1/30/14
to web2py-d...@googlegroups.com
http://code.google.com/p/web2py/issues/detail?id=1799&can=5
https://groups.google.com/d/msg/web2py/TnYUXKYfnj0/OlssRRjVRf8J

There is a problem with the feature auto_import in DAL. It fails when the table file retrieval does not match the table creation order in the model (i.e. in mysql or postgres).

I belive that one approach can be having DAL create the tables in two steps (altering field definitions for references the second time). Do you think it is an acceptable workaround or is there a better way of solving the issue?

Paolo Valleri

unread,
Jan 30, 2014, 6:57:33 AM1/30/14
to web2py-d...@googlegroups.com

 Paolo


2014-01-30 Alan Etkin <spam...@gmail.com>

--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alan Etkin

unread,
Jan 30, 2014, 7:36:53 AM1/30/14
to web2py-d...@googlegroups.com


 Paolo

And for pathological cases like

table a
    reference b

table b
    reference a

(sometimes called cross-linked tables)

No matter the order, this should always raise a backend exception. This is anyway a bad design so probably should not be supported, if it ever worked with simple db i/o commands.

I do agree with Niphlod about not supporting unordered model definitions, but in this case there is a built-in feature that is not aware of the model.

Mariano Reingart

unread,
Jan 30, 2014, 11:28:39 AM1/30/14
to web2py-d...@googlegroups.com
AFAIK I'm afraid "cross-linked tables" and even self referencing ones are not a bad design and in fact, it is necessary sometimes.
If not, please provide a link or citation of a well known database design book, I don't remember anyone.
In the oppossite, I do remember some papers about Entity-Relationship and relational modelling saying nothing about table definition order but showing self-referencing and reciprocal references, for example one I'd studied in the University:

A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model
Toby J. Teorey , Dongqing Yang , James P. Fry
ACM Computing Surveys, 1986
 
Having to reorder table definitions due missing references is counter productive and error-prone, at least, and it is not possible at all with complex table schemas.

Alan: remember the issues with the ERP app:


Also, you can take a look at how PostgreSQL SQL dump mechanism:

create the tables (without primary keys nor foreign keys)
copy data to the tables
add the constraints (primary keys, foreign keys, indexes, etc.)

So my humble +1 to table creation in two steps

Best regards,

Massimo Di Pierro

unread,
Jan 30, 2014, 11:34:16 AM1/30/14
to web2py-d...@googlegroups.com
I agree with Mariano on this. The first version of web2py did not allow out of order references. This was added because the case was made that databases support it and people want it.

Massimo

Alan Etkin

unread,
Jan 30, 2014, 12:22:54 PM1/30/14
to web2py-d...@googlegroups.com
AFAIK I'm afraid "cross-linked tables" and even self referencing ones are not a bad design and in fact, it is necessary sometimes.
If not, please provide a link or citation of a well known database design book, I don't remember anyone.

I'm afraid I don't have those bibliographic references, just some dude from frisco complaining about cross-linked tables back in the early 2001's (at the postgres mailing list)

> ACM Computing Surveys, 1986

Well, that's sort of an incunable of database design, ain't it?
 
> Alan: remember the issues with the ERP app:

https://code.google.com/p/erplibre/source/browse/modules/db_erplibre.py

Right, and IIRC, there was a way of disabling the constraint error by sending a triggers command with executesql (for pg, not sure if possible with other backends).

> So my humble +1 to table creation in two steps

This would complicate migrations, no? For the particular case of auto_import, I suppose it would be enough to store temporarily table references, create integer fields, execute the db commands and then add the contraints. Perhaps it is possible to have a common method to implement the two step table creation for both cases (reading the model files and table files).

Michele Comitini

unread,
Jan 30, 2014, 1:07:07 PM1/30/14
to web2py-developers
AFAIK to have cross linked tables in SQL you need to use 2 steps:
1 CREATE TABLE
2 CREATE CONSTRAINT

The same can be done even for any linked table, I suppose SQL DDL allows shortcuts just for programmers convenience and SQL readability, but creating all constraints after the complete set of tables is in place makes sense.

IIRC DAL has already much of the logic to collect constraints definition from define_table and then create them later, but this is not complete.  This could also bring some advantages given by the fact that constraints could be managed programmatically if exposed in the API.

mic


2014-01-30 Alan Etkin <spam...@gmail.com>:

--

Mariano Reingart

unread,
Jan 30, 2014, 2:18:39 PM1/30/14
to web2py-d...@googlegroups.com
On Thu, Jan 30, 2014 at 2:22 PM, Alan Etkin <spam...@gmail.com> wrote:
AFAIK I'm afraid "cross-linked tables" and even self referencing ones are not a bad design and in fact, it is necessary sometimes.
If not, please provide a link or citation of a well known database design book, I don't remember anyone.

I'm afraid I don't have those bibliographic references, just some dude from frisco complaining about cross-linked tables back in the early 2001's (at the postgres mailing list)


Sorry, I cannot publish the paper due to copyright restrictions, but the relational schema in question is:

DEPARTMENT(DEPT-NO, DEPT-NAME, ROOM-NO, ... , DIV-NO, MANAG-EMP-NO)
EMPLOYEE(EMP-NO, EMP-NAME, JOB-TITLE, ... , DEPT-NO, SPOUSE-EMP-NO)

Note that there is no way to change the definition order (department must have a manager employee, and an employee must belong to a department and could have a spouse)
Of course, you could split the tables with intermediary ones (like some ORM suggest), but it could not be an optimal design from a strict perspective (and you risk to lose some implicit constratins in the design).

I cannot find the issue in the postgres mailing list.
I just cited an academic paper published in a peer reviewed technical journal to ilustrate the issue.
 
> ACM Computing Surveys, 1986

Well, that's sort of an incunable of database design, ain't it?
 

No, the relational model dates back to the '60

Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 

(there is a bibliographical link in the wikipedia page)

Also, you can get a free databases book from here (2007 edition):


See "Ejemplo de interrelación recursiva" (example of recursive relationship)

My point is that if web2py DAL cannot express this kind of relationships, it would be difficult to justify it in the teching room (database courses and similar).
 
> Alan: remember the issues with the ERP app:

https://code.google.com/p/erplibre/source/browse/modules/db_erplibre.py

Right, and IIRC, there was a way of disabling the constraint error by sending a triggers command with executesql (for pg, not sure if possible with other backends).


There are several options at postgresql level (DISABLE TRIGGER, SET CONSTRAINTS ALL DEFERRED) but that doesn't solves all the issues (you cannot have foreign keys to non-existent tables)
IIRC, we had to reorder the table definitions and comment out some references by hand.
 
> So my humble +1 to table creation in two steps

This would complicate migrations, no? For the particular case of auto_import, I suppose it would be enough to store temporarily table references, create integer fields, execute the db commands and then add the contraints. Perhaps it is possible to have a common method to implement the two step table creation for both cases (reading the model files and table files).

Yes.

For some complex migration scenaries, it should provide some way to import the data after constraints are added, or they will fail (for example, creating the field as "integer", importing the data, and then adding the references othertable).

As MIC says, exposing this API to applications could be a plus for this complex models.

Best regards

Niphlod

unread,
Jan 30, 2014, 5:08:16 PM1/30/14
to
There are subtleties in this matter that are soonely forgotten, but I'll try to summarize here:

- auto_import was not very well thought. .table files just hold table name and fields definitions. No migrate, rname, sequence_name, common_filter, format, singular, plural, etc. This was discussed at length when the "idea" of having an API to create indexes came up......it conflicts with what a .table file holds now: basically, just Fields without the attributes set to a Table (that in "indexes future API" should be "pinned" to Table). auto_import "shines" in the lack of such data vs executing the "usual" model. Also, there's no way for auto_import to see what the dependencies are (goes "straight" in alphabetical order)
- Niphlod's statement regarding out-of-order definitions:
       - table1.table2_id defined before table2.id has no sense, and always has been senseless
       - table1.parent_id self-referencing table1.id has sense, it's doable in DAL albeit with limited support ATM when parent_id or id are "rnamed"
- web2py's ability to create department.manager_id and manager.department_id is allowed, but is reachable (and always has been) through AT LEAST 2 steps:
     1 - a. create manager table, b. create department table with reference to manager table
     2 - c. alter manager table adding reference to department

A "major issue" with complicated models is (and always has been) that the user reaches it through several layers of rounds, that created a very well defined "route" of create and alter DDLs that "reached" the required datamodel.
When faced with an empty database, it's really hard (even without strolling in the path of dependency-walking all the out-of-order definitions) to see what's the real "route" to take to recreate the full datamodel.

Mariano Reingart

unread,
Jan 31, 2014, 4:10:50 AM1/31/14
to web2py-d...@googlegroups.com
On Thu, Jan 30, 2014 at 7:06 PM, Niphlod <nip...@gmail.com> wrote:
There are subtleties in this matter that are soonely forgotten, but I'll try to summarize here:

- auto_import was not very well thought. .table files just hold table name and fields definitions. No migrate, rname, sequence_name, common_filter, format, singular, plural, etc. This was discussed at length when the "idea" of having an API to create indexes conflicts with what .table file hold now: basically, just Fields without the attributes set to a Table (that in "indexes future API" should be "pinned" to Table). auto_import "shines" in the lack of such data vs executing the "usual" model
- Niphlod's statement regarding out-of-order definitions:
       - table1.table2_id defined before table2.id has no sense, and always has been senseless

Again, I don't think it is senseless, and as far I tested, It DOES work in sqlite if you use lazy_tables=True:

db.define_table("DEPARTMENT",
    Field("DEPT_NO", "id"),
    Field("DEPT_NAME", "string"),
    Field("ROOM-NO", "integer"), 
    Field("DIV-NO", "integer"), 
    Field("MANAG_EMP_NO", "reference EMPLOYEE"),
    )
db.define_table("EMPLOYEE",
    Field("EMP_NO", "id"),
    Field("EMP_NAME", "string"),
    Field("JOB_TITLE", "string"),
    Field("DEPT_NO", "reference DEPARTMENT"),
    Field("SPOUSE_EMP_NO", "reference EMPLOYEE"),
    )

generates the SQL schema:

CREATE TABLE "EMPLOYEE"(
    "EMP_NO" INTEGER PRIMARY KEY AUTOINCREMENT,
    "EMP_NAME" CHAR(512),
    "JOB_TITLE" CHAR(512),
    "DEPT_NO" INTEGER REFERENCES "DEPARTMENT" ("DEPT_NO") ON DELETE CASCADE,
    "SPOUSE_EMP_NO" INTEGER REFERENCES "EMPLOYEE" ("EMP_NO") ON DELETE CASCADE
);
CREATE TABLE "DEPARTMENT"(
    "DEPT_NO" INTEGER PRIMARY KEY AUTOINCREMENT,
    "DEPT_NAME" CHAR(512),
    "ROOM-NO" INTEGER,
    "DIV-NO" INTEGER,
    "MANAG_EMP_NO" INTEGER REFERENCES "EMPLOYEE" ("EMP_NO") ON DELETE CASCADE
);

This will fail on PostgreSQL as the relation DEPARTMENT does not exist when creating EMPLOYEE, but it will work too if you created the table using SQL (migrate=False):

CREATE TABLE "EMPLOYEE"(
    "EMP_NO" SERIAL PRIMARY KEY,
    "EMP_NAME" CHAR(512),
    "JOB_TITLE" CHAR(512),
    "DEPT_NO" INTEGER,
    "SPOUSE_EMP_NO" INTEGER 
);
CREATE TABLE "DEPARTMENT"(
    "DEPT_NO" SERIAL PRIMARY KEY,
    "DEPT_NAME" CHAR(512),
    "ROOM-NO" INTEGER,
    "DIV-NO" INTEGER,
    "MANAG_EMP_NO" INTEGER 
);

ALTER TABLE "DEPARTMENT" ADD CONSTRAINT emp_no_fk FOREIGN KEY ("MANAG_EMP_NO") REFERENCES "EMPLOYEE" ("EMP_NO") ON DELETE CASCADE;
ALTER TABLE "EMPLOYEE" ADD CONSTRAINT emp_no_fk FOREIGN KEY ("SPOUSE_EMP_NO") REFERENCES "EMPLOYEE" ("EMP_NO") ON DELETE CASCADE;
ALTER TABLE "EMPLOYEE" ADD CONSTRAINT dep_no_fk FOREIGN KEY ("DEPT_NO") REFERENCES "DEPARTMENT" ("DEPT_NO") ON DELETE CASCADE;

The only difference to get a working schema is that constraints are posponed until all tables are created.
So it could work even in postgresql with some modifications, I don't see the need to remove the whole feature at all.
 
       - table1.parent_id self-referencing table1.id has sense, it's doable in DAL albeit with limited support ATM when parent_id or id are "rnamed"
- web2py's ability to create department.manager_id and manager.department_id is allowed, but is reachable (and always has been) through AT LEAST 2 steps:
     1 - a. create manager table, b. create department table with reference to manager table
     2 - c. alter manager table adding reference to department


But you're creating a new table "manager" that is not present in the logical (relational) nor in the conceptual (Entity-Relationship Diagram) model I'm trying to implement.
This new table has some implications: you need to check unique employee per department (row level "validation" that requires at least more unique constraints), you introduce a new "surrogate" keys (as manager is not a true entity in my model, it is a "fake" key, so queries will be more complex and not so precise), etc. etc.

A "major issue" with complicated models is (and always has been) that the user reaches it through several layers of rounds, that created a very well defined "route" of create and alter DDLs that "reached" the required datamodel.
When faced with an empty database, it's really hard (even without strolling in the path of dependency-walking all the out-of-order definitions) to see what's the real "route" to take to recreate the full datamodel.


Yes, but only for circular references that cannot be nullable are the really hard part to solve (because you cannot insert in one table without inserting a related record in the other, and viceversa).
If constraints are added after table creation, it will work for most cases that allow nulls
The not null issue could be fully solved with proper data migration (ie. a callback for loading initial data from a csv before applying the constraints, or an API to add constraints in a second step) 

If this has a chance of being considered ("alter table add constraint" and API), I could work on this enhancements (not urgently, sorry).

Best regards,

Alan Etkin

unread,
Jan 31, 2014, 5:57:45 AM1/31/14
to

If constraints are added after table creation, it will work for most cases that allow nulls
The not null issue could be fully solved with proper data migration (ie. a callback for loading initial data from a csv before applying the constraints, or an API to add constraints in a second step) 

If this has a chance of being considered ("alter table add constraint" and API), I could work on this enhancements (not urgently, sorry).

There's a problem: legacy apps expecting constraints applied for each .define_table can break. How do we prevent this? I'm afraid we'll have to resort again to a kwarg switch to enable the feature.

EDIT: perhaps with something like DAL(..., lazy_reference=True) and/or db.lazy_reference = True

Alan Etkin

unread,
Jan 31, 2014, 10:36:54 AM1/31/14
to web2py-d...@googlegroups.com
If constraints are added after table creation, it will work for most cases that allow nulls
The not null issue could be fully solved with proper data migration (ie. a callback for loading initial data from a csv before applying the constraints, or an API to add constraints in a second step)

Something else: not sure about the other rdbms, but with pg you can do:

# tables is a list with any table in the model (besides the default scaffolding app tables)
>>> for table in tables:
>>>    db.define_table(table)

before the actual table definitions, then one can call define_table(..., redefine=True) and pg will not complain about foreign keys. Do you think this approach is acceptable as workaround or do we still need a two step migration automatic process?

Mariano Reingart

unread,
Jan 31, 2014, 11:55:51 AM1/31/14
to web2py-d...@googlegroups.com
It is a valid workaround, but is not completly straightforward (you have to define tables twice) and will be compicate for keyed tables or ones that uses non-standard id primary keys.
Also, it could not solve the circular / not null references issues unless you do some specific inserts and updates in between (ie. it would not be transparent for the user).

About the backward compatibility, posponing contraints should not affect prior applications as the constraints will be enforced anyway, so if there is any inconcistency in the data, it will fail sooner or later.

Best regards,

Niphlod

unread,
Jan 31, 2014, 2:54:13 PM1/31/14
to web2py-d...@googlegroups.com
<rant mode=dba>
@mariano: sqlite doesn't enforce any PK costraints on DDLs... you picked out the only relational db that allows you to create such a thing in that order
</rant>
I may have switched employee with manager, but it's the exact same situation. My point was that if you need two tables referencing each other, you can reach that status in web2py and in the relational backed only in two steps.
web2py then works fine until it is faced with a clean db (or with auto_import).

So, back to basics.... T-SQL is hard, ugly and not a programming language, but expects things to be in order.
DAL doesn't do a thing about order, so if you want those features, tables should be defined in order.

To me, allowing the creation of a table that **could** not have a real FK as defined just doesn't sound right.
db.define_table('hello', Field('world_id', 'reference world')) not erroring out needs us to: raise the exception, then take care to drop such table if a db.define_table('world', Field('name')) doesn't exist in the model.

So, I'd say we **should** enforce the reordering of the models just for "better readability" and less "error pronability": in this case, db.define_table('hello', Field('world_id')) goes into exception, creation is rolled back, and everybody is safe.

Nobody said that features should be dropped, but please do remember that DAL is in this situation from its creation.

Strictly ATM, there's nothing DAL can do, because the model triggers DDLs not in the way it has since the original creation of the app (that is, two create DDLs instead of two CREATE and one ALTER)
We can do dependency-walking and/or postponing FK creation, but I don't think it will be an easy task (although it will be definitely nice): there are still lots of dark corners to look into (mainly because in web2py there's no such thing as a signal that states "this is the point when the model is completely defined").


Mariano Reingart

unread,
Jan 31, 2014, 5:02:48 PM1/31/14
to web2py-d...@googlegroups.com
On Fri, Jan 31, 2014 at 4:54 PM, Niphlod <nip...@gmail.com> wrote:
<rant mode=dba>
@mariano: sqlite doesn't enforce any PK costraints on DDLs... you picked out the only relational db that allows you to create such a thing in that order

Yes, I know, and I've also cited the PostgeSQL example where it doesn't work.
The nice part is that sqlite is mostly used to experiment, so it it is a positive side effect ;-)
 
</rant>
I may have switched employee with manager, but it's the exact same situation. My point was that if you need two tables referencing each other, you can reach that status in web2py and in the relational backed only in two steps.
web2py then works fine until it is faced with a clean db (or with auto_import).
 
Yes, after re-reading you mail I figured it, but my point still is valid: creating new tables because ordering issues is dangerous.
 
So, back to basics.... T-SQL is hard, ugly and not a programming language, but expects things to be in order.
DAL doesn't do a thing about order, so if you want those features, tables should be defined in order.

Not necessarily, as I pointed out, you can define tables in whatever order you want, and then apply constraints all together at the end.
 

To me, allowing the creation of a table that **could** not have a real FK as defined just doesn't sound right.

Ok, it is your opinion, I've presented my case with real world facts, academic papers and database books.
I work with databases with more than 70 tables (my simple ERP I'm trying to port to python/web2py)
OpenERP has more than 500 in the installations I'm working on
In that cases, I bet it is simply impossible to find a "correct order" to define the tables at all, due circular and cross relationships.
 
db.define_table('hello', Field('world_id', 'reference world')) not erroring out needs us to: raise the exception, then take care to drop such table if a db.define_table('world', Field('name')) doesn't exist in the model.


Maybe 'reference world' is not the best syntax: 
* you're hiding the data type (and you need the other table created to get the correct type)
* it is is an internal validation anyway (it is a string! so static analysis tools and IDEs cannot catch it) 
* it varies if you use keyed tables
* you cannot set it after table has been created (as most other Field's properties)

Something better (explicit, cleaner, etc.) could be:

# CREATE TABLEs:

db.define_table('hello', Field('world_id', 'integer'))
db.define_table('world', Field('hello_id', 'integer'))

# populate the table hello
# populate the table world

# ADD FK constraints:

db.hello.world_id.reference = db.world.id
db.hello.hello_id.reference = db.hello.id

That will not be optimal for keyed tables, but it is a starting point.
Also, this could be implemented to not break backwards compatibility, adding just ADD FK when reference is set (and it is not a duplication of code, because currently you cannot add constraints).

So, I'd say we **should** enforce the reordering of the models just for "better readability" and less "error pronability": in this case, db.define_table('hello', Field('world_id')) goes into exception, creation is rolled back, and everybody is safe.


It will roll back too if constraints are posponed, and in my proposed example (using db.table.fied.reference), it will rise the exception in the exact line the offending ALTER TABLE ADD CONSTRAINT is executed
 
Nobody said that features should be dropped, but please do remember that DAL is in this situation from its creation.


Not directly, but you are trying to create consensus towards deprecating it, in favour of model ordering.
IMHO it should be enhanced and fixed, without enforcing models order.
 
Strictly ATM, there's nothing DAL can do, because the model triggers DDLs not in the way it has since the original creation of the app (that is, two create DDLs instead of two CREATE and one ALTER)

Yes, as I've said, I could work in a patch to separate table creation and constraints
 
We can do dependency-walking and/or postponing FK creation, but I don't think it will be an easy task (although it will be definitely nice): there are still lots of dark corners to look into (mainly because in web2py there's no such thing as a signal that states "this is the point when the model is completely defined").


I don't think it is so complex, DAL is currently doing something like it for primary keys (see create_sequence_and_triggers, yes it is not exactly the same but it shows that sometimes you need several steps to properly define a table field)

The dark corners are not really our problem (the user may not want set up a proper data migration from a blank database) but this proposal could allow it if he wants to do so.
 
Best regards and sorry if I some parts of my mails could be considered rude, it is not my intention and sometimes writing proper English is compicated, it is not my first language

Alan Etkin

unread,
Feb 1, 2014, 7:47:00 AM2/1/14
to
# CREATE TABLEs:

db.define_table('hello', Field('world_id', 'integer'))
db.define_table('world', Field('hello_id', 'integer'))

# populate the table hello
# populate the table world

# ADD FK constraints:

db.hello.world_id.reference = db.world.id
db.hello.hello_id.reference = db.hello.id


I'd prefer an alternative syntax like this instead:

db = DAL(... lazy_references=True) # store tmp fk by table.field
...
db.apply_references() # or maybe reuse db.commit()

EDIT: apply_references() should be forced before returning or changes would be lost (perhaps using a flag).

It would be also useful to provide a backend-wide common way of disabling and enabling fk triggers like

db.ignore_references = <bool>

I think that those enhancements would make data migration much simpler.

Massimo DiPierro

unread,
Feb 1, 2014, 10:28:48 AM2/1/14
to web2py-d...@googlegroups.com
I think this:

db = DAL(... lazy_references=True)

should be the default and should be automatic.

On Feb 1, 2014, at 6:35 AM, Alan Etkin wrote:

# CREATE TABLEs:

db.define_table('hello', Field('world_id', 'integer'))
db.define_table('world', Field('hello_id', 'integer'))

# populate the table hello
# populate the table world

# ADD FK constraints:

db.hello.world_id.reference = db.world.id
db.hello.hello_id.reference = db.hello.id

I'd prefer an alternative syntax like this instead:

db = DAL(... lazy_references=True) # store tmp fk by table.field
...
db.apply_references() # or maybe reuse db.commit()

It would be also useful to provide a backend-wide common way of disabling and enabling fk triggers like

db.ignore_references = <bool>

I think that those enhancements would make data migration much simpler.

Alan Etkin

unread,
Feb 2, 2014, 9:20:01 AM2/2/14
to web2py-d...@googlegroups.com
db = DAL(... lazy_references=True)

should be the default and should be automatic.

Could anyone review this patch?

https://github.com/spametki/web2py/commit/68788755108efa8eb8248bd4b7d836c8a883dccf

It sets reference lazyness by default and stores constraint data for deferred modifications (DAL._pending_fk and DAL._pending_tfk) or uses the current behavior when the option is off.
Implements a BaseAdapter.create_reference internal method for reusing the command formatter for both cases (lazy or not). The new method should be modified to return table modification commands when it is called after table creation (deferred constraints). Also there's a function missing that should update constraints automatically (using create_reference)

Alan Etkin

unread,
Feb 5, 2014, 9:40:21 AM2/5/14
to

Could anyone review this patch?

New patch, for now only works with postgresql 9.1 (only for integer fk references):

https://github.com/spametki/web2py/tree/lazy_references

- Supports any linked table order with deferred constraints
- Allows to switch between old behavior (by default) or reference lazyness
- Implements DAL.update_reference (to be called automatically at  the end of the model execution)

Issues:
- Cannot be set by default since it supports only postgres
- Should be igored if sqlite or nonslq adapter
- Does not implement multiple column constraints
- Reference default validation requires to set fomat="" at table definition (I belive this is the default behavior)

EDIT: format="%(<field name>)s"

lazy_references cannot be True bydefault currently since will only work with the tested enviroment. It should work also with other rbms with some changes (possible alter table syntax issues).

It was tested with the following model:

    db = DAL('postgres://...', migrate=True, lazy_references=True, pool_size=1)

...

db
.define_table("mythirdtable", Field("somethirdtext"), Field("mythirdreference", "reference mytable"), format="%(id)s")
db
.define_table("mytable", Field("sometext"), Field("myreference", "reference myothertable"), format="%(id)s")
db
.define_table("myothertable", Field("othertext"), Field("myotherreference", "reference mythirdtable"), format="%(id)s")

db
.update_references()

Mariano Reingart

unread,
Feb 5, 2014, 11:33:49 PM2/5/14
to web2py-d...@googlegroups.com
Great!

I tested and it worked ok, but sql.log didn't show the ALTER TABLE ADD CONSTRAINTS
(I did a quick review, I'd like to review it more in deep)

I don't like the asumption about the 'integer' type, that's why I proposed a new field attribute to define references. 
This could be archiveable "reference xx" is avoided and the real field type is stated.
Then, setting the field reference property could call to create_reference, so no need to call to db.update_references() by the user ( db._pending_fk could be avoided at all if not keyed tables are used).
BTW, the last call to update_reference is error prone, if a novice user forget it, the database could get inconsistent.
Also, if the user uses the db.othertable as reference type, it could fail as db.othertable could not yet exist (there is no easy workaround as the reference need to know the original field type to define the secondary table). 

Compound keys is my main concern, as it should detect and group multi-column references.
Currently it does not work with your patch as you said it isn't implemented, but it was working in trunk for keyed tables.

About your concerns regarding database support, I think that it is a relatively small issue.

As you pointed, Sqlite doesn't requires lazy_references to be True at all, as they can be created directly in any order (and ALTER TABLE ADD CONSTRAINT is not supported by design):


Other major RDBMS should support the basic syntax (SQL-92):


Even MS SQL Server (T-SQL) seems to support it too:


Best regards

On Wed, Feb 5, 2014 at 11:38 AM, Alan Etkin <spam...@gmail.com> wrote:

Could anyone review this patch?

New patch, for now only works with postgresql 9.1 (only for integer fk references):

https://github.com/spametki/web2py/tree/lazy_references

- Supports any linked table order with deferred constraints
- Allows to switch between old behavior (by default) or reference lazyness
- Implements DAL.update_reference (to be called automatically at  the end of the model execution)

Issues:
- Cannot be set by default since it supports only postgres
- Should be igored if sqlite or nonslq adapter
- Does not implement multiple column constraints
- Reference default validation requires to set fomat="" at table definition (I belive this is the default behavior)

lazy_references cannot be True bydefault currently since will only work with the tested enviroment. It should work also with other rbms with some changes (possible alter table syntax issues).

It was tested with the following model:

    db = DAL('postgres://...', migrate=True, lazy_references=True, pool_size=1)

...

db
.define_table("mythirdtable", Field("somethirdtext"), Field("mythirdreference", "reference mytable"), format="%(id)s")
db
.define_table("mytable", Field("sometext"), Field("myreference", "reference myothertable"), format="%(id)s")
db
.define_table("myothertable", Field("othertext"), Field("myotherreference", "reference mythirdtable"), format="%(id)s")

db
.update_references()

Alan Etkin

unread,
Feb 6, 2014, 6:20:21 AM2/6/14
to
 
I tested and it worked ok, but sql.log didn't show the ALTER TABLE ADD CONSTRAINTS
(I did a quick review, I'd like to review it more in deep)

TODO (because there are missing adapter.log calls in update_references); there are lots of unsolved issues (check DAL.update_refereces docstring)
 
I don't like the asumption about the 'integer' type, that's why I proposed a new field attribute to define references. 

This should be achieved by adding an adapter type "deferred tfk/fk" or similar. For the case of auto incremented integer id fields having a field type defined beforehand would allow to migrate those values from another database (a very common use case) before adding the constraints.
 
This could be archiveable "reference xx" is avoided and the real field type is stated.

Changing "reference ..." by a field type would make reading the model more difficult
 
Then, setting the field reference property could call to create_reference, so no need to call to db.update_references() by the user ( db._pending_fk could be avoided at all if not keyed tables are used).

But that is a less user-friendly api:

db.define_table(...
db.define_table(...
# and for each reference
db.tablename.fieldname.reference = ...

I thougth it would be better to have a function to collect any constraint defined and create it in a loop. I think that ideally DAL should support both syntaxes. There would need magic for the reference attribute so it calls the constraint addition.

BTW, the last call to update_reference is error prone, if a novice user forget it, the database could get inconsistent.

I just exposed the method because I don't know where it would be called automatically (supposedly before the model files are executed)

EDIT: after any other model code is executed
 
Also, if the user uses the db.othertable as reference type, it could fail as db.othertable could not yet exist

Are you sure? If update_references is called after any other model code, the table objects should be already created.

Compound keys is my main concern, as it should detect and group multi-column references.
Currently it does not work with your patch as you said it isn't implemented, but it was working in trunk for keyed tables.
 
The patch also stores other key types for deferred addition, Currently, I'm trying to figure how to execute them correctly.

Mariano Reingart

unread,
Feb 6, 2014, 10:16:39 PM2/6/14
to web2py-d...@googlegroups.com
On Thu, Feb 6, 2014 at 8:06 AM, Alan Etkin <spam...@gmail.com> wrote:
 
I tested and it worked ok, but sql.log didn't show the ALTER TABLE ADD CONSTRAINTS
(I did a quick review, I'd like to review it more in deep)
TODO (because there are missing adapter.log calls in update_references); there are lots of unsolved issues (check DAL.update_refereces docstring)

Good work so far!
 
 
I don't like the asumption about the 'integer' type, that's why I proposed a new field attribute to define references. 
This should be achieved by adding an adapter type "deferred tfk/fk" or similar. For the case of auto incremented integer id fields having a field type defined beforehand would allow to migrate those values from another database (a very common use case) before adding the constraints.

I don't fully understand this.

Integer autoincremented primary keys are not mandatory, and in fact, surrogate keys are not allways recomendable as they can cause a lot of secondary side effects (like the difficult data export/import)

Deferrable constraints is another entire topic that would be useful for some specific and complicated use cases, but I'll prefer if they could be discussed in other thread/patch in the future, after having two-step table and constraint creation implemented.
 
This could be archiveable "reference xx" is avoided and the real field type is stated.
Changing "reference ..." by a field type would make reading the model more difficult

"Readability counts" but "Explicit is better than implicit" ;-)

To be clear, this is my proposed syntax:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer", reference="department"),
    Field("sopuse_emp_no", "integer", reference="employee"),
    )

or better:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer"),
    Field("sopuse_emp_no", "integer"),
    )

employee.dept_no.reference = db.department.dept_no
employee.sopuse_emp_no.reference = db.department.emp_no

Yes, it could affect readability on larger models (as constraint are decoupled from table definition), but it makes the program logic more clean and explicit (and they will be more cohesioned at the end of the model definition, for example, like in postgresql and maybe other database backup dumps).
It is more flexible sometimes as it can be better controlled programatically (ie. conditional).

The major drawback I see in this last form, is that it will not be compatible with sqlite (as it doesn't support ALTER TABLE ADD CONSTRAINT, but a workaround in that case would be to rename the table, create it and copy records using insert)
The minor drawback is that this syntax is not straightforward for keyed tables, and it will require a db._pending_fk list until all the fields of the compound foreing key are referenced.

So, the first form (reference inside define table but in a separate keyword parameter) would be easier at this stage (no extra logic for sqlite or keyed tables).
 
Then, setting the field reference property could call to create_reference, so no need to call to db.update_references() by the user ( db._pending_fk could be avoided at all if not keyed tables are used).
But that is a less user-friendly api:

db.define_table(...
db.define_table(...
# and for each reference
db.tablename.fieldname.reference = ...

I thougth it would be better to have a function to collect any constraint defined and create it in a loop. I think that ideally DAL should support both syntaxes. There would need magic for the reference attribute so it calls the constraint addition.
 
No magic, just @property (like other special attributes are already using)
 
BTW, the last call to update_reference is error prone, if a novice user forget it, the database could get inconsistent.
I just exposed the method because I don't know where it would be called automatically (supposedly before the model files are executed)
 

That's why I'm proposing a new field property:
  •  it doesn't need that update_reference be called by the user
  •  it doesn't need any special hook to call update_reference by web2py
  •  it doesn't need even lazy_references=True
It also explicit where and when the FK constraint is created

If not, update_reference could be called in a hook after the model files are executed, or inside commit, but in case of error, it would be more obscure and 
 
Also, if the user uses the db.othertable as reference type, it could fail as db.othertable could not yet exist
Are you sure? If update_references is called after any other model code, the table objects should be already created.


I think so, currently inside define_table you can't reference using db.tablename syntax if that table doesn't exist (even with lazy_tables=True as far as I know).
Allowing not-yet-created-lazy-references could introduce hard to debug problems (mispelled db.tablename or undefined tables would delay the exception, if it is raised at all), maybe worse than using "tablename" string syntax as now.
Again, explicit is better than implicit ;-)
 
Compound keys is my main concern, as it should detect and group multi-column references.
Currently it does not work with your patch as you said it isn't implemented, but it was working in trunk for keyed tables.

The patch also stores other key types for deferred addition, Currently, I'm trying to figure how to execute them correctly.


Great! 

Best regards,

Alan Etkin

unread,
Feb 7, 2014, 7:25:59 AM2/7/14
to web2py-d...@googlegroups.com
 
I don't fully understand this.

Integer autoincremented primary keys are not mandatory, and in fact, surrogate keys are not allways recomendable as they can cause a lot of secondary side effects (like the difficult data export/import)
 
Unless I'm missing something, that's how rdbms tables are supposed to work with web2py. The book encourages the use of those keys, and changing it would imply a major incompatibility issue in the core:

http://www.web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#DAL--Table--Field
 
Deferrable constraints is another entire topic that would be useful for some specific and complicated use cases, but I'll prefer if they could be discussed in other thread/patch in the future, after having two-step table and constraint creation implemented.

With deferrable I mean delayed constraint addition, i.e., two step table declaration/migration, not setting deferrable in the backend.
 
"Readability counts" but "Explicit is better than implicit" ;-)

But "There should be one-- and preferably only one --obvious way to do it."

We have two: field.type="reference <table>" and field.type=table, but now we would have three or four field.type="reference ...", field.type=table, field.reference="<table>" and field.reference=table.

To be clear, this is my proposed syntax:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer", reference="department"),
    Field("sopuse_emp_no", "integer", reference="employee"),
    )

or better:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer"),
    Field("sopuse_emp_no", "integer"),
    )

employee.dept_no.reference = db.department.dept_no
employee.sopuse_emp_no.reference = db.department.emp_no

I don't think it is a good idea to modify the model definition syntax, unless it is imposible to accomplish the same effect without adding it. However, I recognize that it would make the definition more accurate compared with the actual backend configuration,  since it decouples references from field type. The current way is simpler though, and follows web2py style for simplifying the api.
 
Yes, it could affect readability on larger models (as constraint are decoupled from table definition), but it makes the program logic more clean and explicit (and they will be more cohesioned at the end of the model definition, for example, like in postgresql and maybe other database backup dumps).

How about this instead? For databases defining "reference ..." assume id (and set integer backend type) and allow the alternative syntax for other types of constraints:

... Field("account", "string") ...
...
db.<table>.<field>.type="reference account"

The major drawback I see in this last form, is that it will not be compatible with sqlite (as it doesn't support ALTER TABLE ADD CONSTRAINT, but a workaround in that case would be to rename the table, create it and copy records using insert)

Why not just support the db.table.field.reference syntax for rdbms excluding sqlite? So if adapter is sqlite, raise a syntax error.

Mariano Reingart

unread,
Feb 8, 2014, 10:38:17 AM2/8/14
to web2py-d...@googlegroups.com
On Fri, Feb 7, 2014 at 9:25 AM, Alan Etkin <spam...@gmail.com> wrote:
 
I don't fully understand this.

Integer autoincremented primary keys are not mandatory, and in fact, surrogate keys are not allways recomendable as they can cause a lot of secondary side effects (like the difficult data export/import)
 
Unless I'm missing something, that's how rdbms tables are supposed to work with web2py. The book encourages the use of those keys, and changing it would imply a major incompatibility issue in the core:

http://www.web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#DAL--Table--Field

Yes, quoting the book:

Optionally you can define a field of type='id' and web2py will use this field as auto-increment id field. This is not recommended except when accessing legacy database tables. With some limitation, you can also use different primary keys and this is discussed in the section on "Legacy databases and keyed tables". 

Surrogate "id" keys are used extensively in other ORMs and frameworks too, it is a well-known convention but it is not optimal from a strict database design perspective.

My point is that we should not make integer autonumeric "id" keys mandatory, as we should not make table definition order mandatory neither, or we'll restrict web2py ability to express more complex models.
 
Deferrable constraints is another entire topic that would be useful for some specific and complicated use cases, but I'll prefer if they could be discussed in other thread/patch in the future, after having two-step table and constraint creation implemented.

With deferrable I mean delayed constraint addition, i.e., two step table declaration/migration, not setting deferrable in the backend.

Ok
  
"Readability counts" but "Explicit is better than implicit" ;-)

But "There should be one-- and preferably only one --obvious way to do it."

"preferably" is the magic word there, "Although practicality beats purity." ;-)
BTW, web2py currently support both forms: keyword arg (in define table) and property-like (outside define table)  for most attributtes.

 
We have two: field.type="reference <table>" and field.type=table, but now we would have three or four field.type="reference ...", field.type=table, field.reference="<table>" and field.reference=table.

To be clear, this is my proposed syntax:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer", reference="department"),
    Field("sopuse_emp_no", "integer", reference="employee"),
    )

or better:

db.define_table("employee",
    Field("emp_no", "id"),
    Field("emp_name", "string"),
    Field("job_title", "string"),
    Field("dept_no", "integer"),
    Field("sopuse_emp_no", "integer"),
    )

employee.dept_no.reference = db.department.dept_no
employee.sopuse_emp_no.reference = db.department.emp_no

I don't think it is a good idea to modify the model definition syntax, unless it is imposible to accomplish the same effect without adding it. However, I recognize that it would make the definition more accurate compared with the actual backend configuration,  since it decouples references from field type. The current way is simpler though, and follows web2py style for simplifying the api.
 

As said Niphlod previously, adding keyword parameters to DAL is not simplifying the API (and could be error prone and requires a hook to execute the alter table, see the rest of my previous mail).
I think discussing DAL syntax goes in the right direction and opens the path to reunificate id and keyed tables in the future.
 
Yes, it could affect readability on larger models (as constraint are decoupled from table definition), but it makes the program logic more clean and explicit (and they will be more cohesioned at the end of the model definition, for example, like in postgresql and maybe other database backup dumps).

How about this instead? For databases defining "reference ..." assume id (and set integer backend type) and allow the alternative syntax for other types of constraints:

... Field("account", "string") ...
...
db.<table>.<field>.type="reference account"


If thinking in an alternative syntax, why not just add support for reference property, and correctly separate FK from the field type?
If no, we're still mixing two concepts, and the .type redefinition looks even worse to me.
 
The major drawback I see in this last form, is that it will not be compatible with sqlite (as it doesn't support ALTER TABLE ADD CONSTRAINT, but a workaround in that case would be to rename the table, create it and copy records using insert)

Why not just support the db.table.field.reference syntax for rdbms excluding sqlite? So if adapter is sqlite, raise a syntax error.

Yes, this is a trade-off currently accepted in other libraries like alembic:

Don't break our necks over SQLite's inability to ALTER things. SQLite has almost no support for table or column alteration, and this is likely intentional. Alembic's design is kept simple by not contorting its core API around these limitations, understanding that SQLite is simply not intended to support schema changes. While Alembic's architecture can support SQLite's workarounds, ...


I'm not sure to completely ignore workarounds for sqlite, it is used extensively by newcomers.
But certainly use cases for separate constraint creation will be targeted to more capable databases like postgresql.

Best regards

Reply all
Reply to author
Forward
0 new messages