--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
AFAIK I'm afraid "cross-linked tables" and even self referencing ones are not a bad design and in fact, it is necessary sometimes.If not, please provide a link or citation of a well known database design book, I don't remember anyone.
--
AFAIK I'm afraid "cross-linked tables" and even self referencing ones are not a bad design and in fact, it is necessary sometimes.If not, please provide a link or citation of a well known database design book, I don't remember anyone.
I'm afraid I don't have those bibliographic references, just some dude from frisco complaining about cross-linked tables back in the early 2001's (at the postgres mailing list)
> ACM Computing Surveys, 1986
Well, that's sort of an incunable of database design, ain't it?
> Alan: remember the issues with the ERP app:https://code.google.com/p/erplibre/source/browse/modules/db_erplibre.py
Right, and IIRC, there was a way of disabling the constraint error by sending a triggers command with executesql (for pg, not sure if possible with other backends).
This would complicate migrations, no? For the particular case of auto_import, I suppose it would be enough to store temporarily table references, create integer fields, execute the db commands and then add the contraints. Perhaps it is possible to have a common method to implement the two step table creation for both cases (reading the model files and table files).> So my humble +1 to table creation in two steps
There are subtleties in this matter that are soonely forgotten, but I'll try to summarize here:
- auto_import was not very well thought. .table files just hold table name and fields definitions. No migrate, rname, sequence_name, common_filter, format, singular, plural, etc. This was discussed at length when the "idea" of having an API to create indexes conflicts with what .table file hold now: basically, just Fields without the attributes set to a Table (that in "indexes future API" should be "pinned" to Table). auto_import "shines" in the lack of such data vs executing the "usual" model
- Niphlod's statement regarding out-of-order definitions:
- table1.table2_id defined before table2.id has no sense, and always has been senseless
- table1.parent_id self-referencing table1.id has sense, it's doable in DAL albeit with limited support ATM when parent_id or id are "rnamed"
- web2py's ability to create department.manager_id and manager.department_id is allowed, but is reachable (and always has been) through AT LEAST 2 steps:
1 - a. create manager table, b. create department table with reference to manager table
2 - c. alter manager table adding reference to department
A "major issue" with complicated models is (and always has been) that the user reaches it through several layers of rounds, that created a very well defined "route" of create and alter DDLs that "reached" the required datamodel.
When faced with an empty database, it's really hard (even without strolling in the path of dependency-walking all the out-of-order definitions) to see what's the real "route" to take to recreate the full datamodel.
If constraints are added after table creation, it will work for most cases that allow nullsThe not null issue could be fully solved with proper data migration (ie. a callback for loading initial data from a csv before applying the constraints, or an API to add constraints in a second step)If this has a chance of being considered ("alter table add constraint" and API), I could work on this enhancements (not urgently, sorry).
If constraints are added after table creation, it will work for most cases that allow nullsThe not null issue could be fully solved with proper data migration (ie. a callback for loading initial data from a csv before applying the constraints, or an API to add constraints in a second step)
<rant mode=dba>
@mariano: sqlite doesn't enforce any PK costraints on DDLs... you picked out the only relational db that allows you to create such a thing in that order
</rant>
I may have switched employee with manager, but it's the exact same situation. My point was that if you need two tables referencing each other, you can reach that status in web2py and in the relational backed only in two steps.
web2py then works fine until it is faced with a clean db (or with auto_import).
So, back to basics.... T-SQL is hard, ugly and not a programming language, but expects things to be in order.
DAL doesn't do a thing about order, so if you want those features, tables should be defined in order.
To me, allowing the creation of a table that **could** not have a real FK as defined just doesn't sound right.
db.define_table('hello', Field('world_id', 'reference world')) not erroring out needs us to: raise the exception, then take care to drop such table if a db.define_table('world', Field('name')) doesn't exist in the model.
So, I'd say we **should** enforce the reordering of the models just for "better readability" and less "error pronability": in this case, db.define_table('hello', Field('world_id')) goes into exception, creation is rolled back, and everybody is safe.
Nobody said that features should be dropped, but please do remember that DAL is in this situation from its creation.
Strictly ATM, there's nothing DAL can do, because the model triggers DDLs not in the way it has since the original creation of the app (that is, two create DDLs instead of two CREATE and one ALTER)
We can do dependency-walking and/or postponing FK creation, but I don't think it will be an easy task (although it will be definitely nice): there are still lots of dark corners to look into (mainly because in web2py there's no such thing as a signal that states "this is the point when the model is completely defined").
# CREATE TABLEs:db.define_table('hello', Field('world_id', 'integer'))db.define_table('world', Field('hello_id', 'integer'))
# populate the table hello# populate the table world# ADD FK constraints:db.hello.world_id.reference = db.world.iddb.hello.hello_id.reference = db.hello.id
db = DAL(... lazy_references=True)
# CREATE TABLEs:db.define_table('hello', Field('world_id', 'integer'))db.define_table('world', Field('hello_id', 'integer'))
# populate the table hello# populate the table world# ADD FK constraints:db.hello.world_id.reference = db.world.iddb.hello.hello_id.reference = db.hello.id
I'd prefer an alternative syntax like this instead:
db = DAL(... lazy_references=True) # store tmp fk by table.field
...
db.apply_references() # or maybe reuse db.commit()
It would be also useful to provide a backend-wide common way of disabling and enabling fk triggers like
db.ignore_references = <bool>
I think that those enhancements would make data migration much simpler.
db = DAL(... lazy_references=True)should be the default and should be automatic.
Could anyone review this patch?
db = DAL('postgres://...', migrate=True, lazy_references=True, pool_size=1)
...
db.define_table("mythirdtable", Field("somethirdtext"), Field("mythirdreference", "reference mytable"), format="%(id)s")
db.define_table("mytable", Field("sometext"), Field("myreference", "reference myothertable"), format="%(id)s")
db.define_table("myothertable", Field("othertext"), Field("myotherreference", "reference mythirdtable"), format="%(id)s")
db.update_references()
Could anyone review this patch?
New patch, for now only works with postgresql 9.1 (only for integer fk references):
https://github.com/spametki/web2py/tree/lazy_references
- Supports any linked table order with deferred constraints
- Allows to switch between old behavior (by default) or reference lazyness
- Implements DAL.update_reference (to be called automatically at the end of the model execution)
Issues:
- Cannot be set by default since it supports only postgres
- Should be igored if sqlite or nonslq adapter
- Does not implement multiple column constraints
- Reference default validation requires to set fomat="" at table definition (I belive this is the default behavior)
lazy_references cannot be True bydefault currently since will only work with the tested enviroment. It should work also with other rbms with some changes (possible alter table syntax issues).
It was tested with the following model:
db = DAL('postgres://...', migrate=True, lazy_references=True, pool_size=1)
...
db.define_table("mythirdtable", Field("somethirdtext"), Field("mythirdreference", "reference mytable"), format="%(id)s")
db.define_table("mytable", Field("sometext"), Field("myreference", "reference myothertable"), format="%(id)s")
db.define_table("myothertable", Field("othertext"), Field("myotherreference", "reference mythirdtable"), format="%(id)s")
db.update_references()
I tested and it worked ok, but sql.log didn't show the ALTER TABLE ADD CONSTRAINTS(I did a quick review, I'd like to review it more in deep)
I don't like the asumption about the 'integer' type, that's why I proposed a new field attribute to define references.
This could be archiveable "reference xx" is avoided and the real field type is stated.
Then, setting the field reference property could call to create_reference, so no need to call to db.update_references() by the user ( db._pending_fk could be avoided at all if not keyed tables are used).
BTW, the last call to update_reference is error prone, if a novice user forget it, the database could get inconsistent.
Also, if the user uses the db.othertable as reference type, it could fail as db.othertable could not yet exist
Compound keys is my main concern, as it should detect and group multi-column references.Currently it does not work with your patch as you said it isn't implemented, but it was working in trunk for keyed tables.
I tested and it worked ok, but sql.log didn't show the ALTER TABLE ADD CONSTRAINTS(I did a quick review, I'd like to review it more in deep)
TODO (because there are missing adapter.log calls in update_references); there are lots of unsolved issues (check DAL.update_refereces docstring)
I don't like the asumption about the 'integer' type, that's why I proposed a new field attribute to define references.
This should be achieved by adding an adapter type "deferred tfk/fk" or similar. For the case of auto incremented integer id fields having a field type defined beforehand would allow to migrate those values from another database (a very common use case) before adding the constraints.
This could be archiveable "reference xx" is avoided and the real field type is stated.
Changing "reference ..." by a field type would make reading the model more difficult
Then, setting the field reference property could call to create_reference, so no need to call to db.update_references() by the user ( db._pending_fk could be avoided at all if not keyed tables are used).
But that is a less user-friendly api:
db.define_table(...
db.define_table(...
# and for each reference
db.tablename.fieldname.reference = ...
I thougth it would be better to have a function to collect any constraint defined and create it in a loop. I think that ideally DAL should support both syntaxes. There would need magic for the reference attribute so it calls the constraint addition.
BTW, the last call to update_reference is error prone, if a novice user forget it, the database could get inconsistent.
I just exposed the method because I don't know where it would be called automatically (supposedly before the model files are executed)
Also, if the user uses the db.othertable as reference type, it could fail as db.othertable could not yet exist
Are you sure? If update_references is called after any other model code, the table objects should be already created.
Compound keys is my main concern, as it should detect and group multi-column references.Currently it does not work with your patch as you said it isn't implemented, but it was working in trunk for keyed tables.
The patch also stores other key types for deferred addition, Currently, I'm trying to figure how to execute them correctly.
I don't fully understand this.Integer autoincremented primary keys are not mandatory, and in fact, surrogate keys are not allways recomendable as they can cause a lot of secondary side effects (like the difficult data export/import)
Deferrable constraints is another entire topic that would be useful for some specific and complicated use cases, but I'll prefer if they could be discussed in other thread/patch in the future, after having two-step table and constraint creation implemented.
"Readability counts" but "Explicit is better than implicit" ;-)
To be clear, this is my proposed syntax:db.define_table("employee",Field("emp_no", "id"),Field("emp_name", "string"),Field("job_title", "string"),Field("dept_no", "integer", reference="department"),Field("sopuse_emp_no", "integer", reference="employee"),)or better:db.define_table("employee",Field("emp_no", "id"),Field("emp_name", "string"),Field("job_title", "string"),Field("dept_no", "integer"),Field("sopuse_emp_no", "integer"),)employee.dept_no.reference = db.department.dept_noemployee.sopuse_emp_no.reference = db.department.emp_no
Yes, it could affect readability on larger models (as constraint are decoupled from table definition), but it makes the program logic more clean and explicit (and they will be more cohesioned at the end of the model definition, for example, like in postgresql and maybe other database backup dumps).
The major drawback I see in this last form, is that it will not be compatible with sqlite (as it doesn't support ALTER TABLE ADD CONSTRAINT, but a workaround in that case would be to rename the table, create it and copy records using insert)
I don't fully understand this.Integer autoincremented primary keys are not mandatory, and in fact, surrogate keys are not allways recomendable as they can cause a lot of secondary side effects (like the difficult data export/import)
Unless I'm missing something, that's how rdbms tables are supposed to work with web2py. The book encourages the use of those keys, and changing it would imply a major incompatibility issue in the core:
http://www.web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#DAL--Table--Field
Optionally you can define a field of type='id' and web2py will use this field as auto-increment id field. This is not recommended except when accessing legacy database tables. With some limitation, you can also use different primary keys and this is discussed in the section on "Legacy databases and keyed tables".
Deferrable constraints is another entire topic that would be useful for some specific and complicated use cases, but I'll prefer if they could be discussed in other thread/patch in the future, after having two-step table and constraint creation implemented.
With deferrable I mean delayed constraint addition, i.e., two step table declaration/migration, not setting deferrable in the backend.
"Readability counts" but "Explicit is better than implicit" ;-)
But "There should be one-- and preferably only one --obvious way to do it."
We have two: field.type="reference <table>" and field.type=table, but now we would have three or four field.type="reference ...", field.type=table, field.reference="<table>" and field.reference=table.
To be clear, this is my proposed syntax:db.define_table("employee",Field("emp_no", "id"),Field("emp_name", "string"),Field("job_title", "string"),Field("dept_no", "integer", reference="department"),Field("sopuse_emp_no", "integer", reference="employee"),)or better:db.define_table("employee",Field("emp_no", "id"),Field("emp_name", "string"),Field("job_title", "string"),Field("dept_no", "integer"),Field("sopuse_emp_no", "integer"),)employee.dept_no.reference = db.department.dept_noemployee.sopuse_emp_no.reference = db.department.emp_no
I don't think it is a good idea to modify the model definition syntax, unless it is imposible to accomplish the same effect without adding it. However, I recognize that it would make the definition more accurate compared with the actual backend configuration, since it decouples references from field type. The current way is simpler though, and follows web2py style for simplifying the api.
Yes, it could affect readability on larger models (as constraint are decoupled from table definition), but it makes the program logic more clean and explicit (and they will be more cohesioned at the end of the model definition, for example, like in postgresql and maybe other database backup dumps).
How about this instead? For databases defining "reference ..." assume id (and set integer backend type) and allow the alternative syntax for other types of constraints:
... Field("account", "string") ...
...
db.<table>.<field>.type="reference account"
The major drawback I see in this last form, is that it will not be compatible with sqlite (as it doesn't support ALTER TABLE ADD CONSTRAINT, but a workaround in that case would be to rename the table, create it and copy records using insert)
Why not just support the db.table.field.reference syntax for rdbms excluding sqlite? So if adapter is sqlite, raise a syntax error.
Don't break our necks over SQLite's inability to ALTER things. SQLite has almost no support for table or column alteration, and this is likely intentional. Alembic's design is kept simple by not contorting its core API around these limitations, understanding that SQLite is simply not intended to support schema changes. While Alembic's architecture can support SQLite's workarounds, ...