ORM (?) : A Revisit, NOT a Rebuttal

Arnon Marcus

unread,

Apr 27, 2013, 9:18:45 AM4/27/13

to web...@googlegroups.com

I am in the process of researching ways to improve the structure of my web2py-app's code, simplifying usage of certain areas, and enabling RPC-like interface for external programs.

I use web2py for over 3 years now, and love every aspect of it - especially the DAL (!)

However, as the code grew larger, and as hierarchical domain-model-patterns started to emerge, I started to look for alternative ways of accessing and using the portion of the data-model that is strictly hierarchical in nature.

It is a huge controversial issue with relational-data-models which contain hierarchies. I don't intend to open a large discussion about this here. Suffice it to say, that even the most die-hard SQL lover, would admit it's shortcomings when hierarchies are introduces into the data-mode. It is an unsolved (probably "unsolvable") problem in data-model theory.

So it is no a matter of looking fot the "best" solution, because there can not exist such a concept - even in theory.

It is a matter of looking for the "most-fitting" set of trade-offs for the problem at hand.

That said, some projects are large and/or varied enough, that they DO include both highly-relational areas, as well as highly-hierarchical areas - within the same data-model (!)

For such use-cases, a more flexible/hybrid approach is beneficial.

You don't expect to have to choose either/or relational-models vs. hierarchical-models - you expect your framework to include and facilitate support for both approaches for the same database.

You would use the relational-features of the framework for when it is most suited for, and hierarchical-features for when IT makes better sense.

Ideally, your framework would be built in an integrated-yet-layered design, that would make it easy for you to accomplish both approaches in a synergetic manner.

My research has led me through ZODB and SQLAlchemy, just to get a feel for what an ORM could provide. Aside from reading a lot and watching a lot of lectures about these technologies, as well as general opinions about them, I have also taken the time to really go through tons of threads in this group about these issues. as well as the web2py documentation.

Bottom-line, my current feelings about this issue, is that there is still something missing in web2py to facilitate the construction of higher-levels of abstractions, that are more focused on business-logic than database-schema. I also feel that there are "dogmatic" sentiments being thrown from both sides of the fence in this flame-fest fiasco.

I think this hurts us - a lot.

I think a more constructive approach would be to acknowledge that there are different use-cases that can benefit from different approaches, and that this leads to opposing opinions regarding certain trade-off that are being sought after.

I think that web2py has taken an approach that is still too narrow-minded when it comes to supporting multiple-approaches, and that a layered-design could be beneficial here.

Case in point, the philosophy and design of SQLAlchemy:

http://www.youtube.com/watch?v=uvFBXxftvN4

Now, just to be clear, I think that the web2py-DAL's API is much cleaner, simpler, and more easy and fun to use than SQA's API, at least for the SQL-Expression layer. But I also think that SQA's is a more flexible approach - it can target a more varied set of use-cases.

Contrary to most of what I've read about SQA in this group, it's ORM is NOT mandatory, nor is it a "necessarily" more-restrictive/less-performant way of using the database. I think most criticisms I've seen here of it, are ill-informed, and have a somewhat "prima-facie" smell to them. They mainly attack the ORM concept in it's Active-Record form, which is NOT what SQA has. They also don't consider the layered-architecture of SQA, and compare the DAL with different implementations of ORMs that ARE more restrictive and obtuse, "assuming" that this is probably what SQA does, when in fact it is not. They compare the DAL to SQA's ORM, which is a fundamental mistake(!) The DAL is not comparable to SQA's ORM layer, but to it's SQL-Expression layer(!) These API's are almost identical in what they do - only the interface is different. So I could very well refer to SQA's SQL-Expression layer, as being SQA's DAL.

SQA's ORM is something different, that to my knowledge, web2py is lacking. It is a layer "on-top" of it's DAL, while "using" it's DAL, as well as lower-layers, in an integrated fashion.

In short, SQA is NOT a single-layer ORM (!)

Now, what I think is missing, is something comparable to SQL's ORM in web2py - another layer on-top of it's DAL. Not a set of classes, but a single abstract-class meant to be inherited from. There are many operations that can be automated when constructing an ORM layer on-top of web2py's DAL, using the DAL's already-existing schema information. These could benefit the construction of ORM classes using the DAL.

Examples for such benefits could be seen in the second part of this talk:

http://www.youtube.com/watch?v=woKYyhLCcnU

* This is an almost 3-hour talk from PyCon 2013, covering most aspects of SQA that even the most experienced users of it might be unfamiliar with - it is fast-paced and highly condensed, and well-worth the time to sit through.

These automations, I predict, would emerge as recurring-patterns for people trying to implement such a thing in web2py, and what I am looking for, is an integration-abstract-layer with tools (in the form of abstract-methods) that would facilitate the design and implementation of ORM classes using the DAL.

I don't have a clear conception of how such a thing would look like, I just have a general idea that such a thing would be most beneficial in many use-cases for which that DAL by itself is insufficient.

I feel that the fear of having such a layer "restrict" future development by locking-down the business-data-model are unjustified. Refactoring can still occur later-on, and should occur with minimal effort by the developer using web2py - the framework should support and facilitate these kinds of refactoring scenarios, with appropriate auxiliary methods in the form of adapter/proxy design-patterns. If the ORM-abstract-class's design is built up-front using such patterns, than this would facilitate refactoring, and thus avoid scarring-off developers from using it.

It should also facilitate performance-optimizations, by way of a transactionally-aggregating queries and updates/inserts automatically.

Optionally, this abstract-class would be meant to be "multipally-inheritted-from", using the mix-in pattern. This way, a developer can have a separate class-hierarchic dealing with pure-business logic, to keep separation-of-concerns between database-usage and domain-logic, and also to enable re-use of code for other DAL's if needed (i.e, in a python-client that talks to web2py via rpc).

These are all just ideas, and perhaps I am missing some things - maybe such capabilities already exist in some other form that I am unaware of.

I would appreciate any references to such uses, if they already exist.

(e.g - I am not too clear about lazy-tables and computed-fields, but I think they are an interesting start in this direction - I just haven't used them yet, as we are still using an older version of we2py that does not include them)

Niphlod

unread,

Apr 27, 2013, 11:28:26 AM4/27/13

to web...@googlegroups.com

Whoa, that's definitely a lot to digest, but frankly..... again with no practical usecases to digest too ^_^...

From the "technical" standpoint I love (being more a sql guy than a python one) to avoid defining a class when what I know that what I want is a table.
That's more or less the (well specified) difference between an "expression layer" and an "object mapper".

That comes probably because I know already how to design the tables beforehand, having planned the requirements. Elaborating on this, I see a lot of users that are probably "less expert" on the db side that entangle themselves on a really hard "model mapping" and have, in my POV, crazy "query requirements" to fetch some rowset they need. I hardly fall into that situation, so I guess my understanding and "expertise" help me plan beforehand a good "environment" to play with DAL.

That being said, doing a db.define_table() and a class Whatever(model) isn't that much different in terms of what you gain access to.....
db.table has 'fields', every field has its sets of attribute (default, requires, update, type, etc etc etc).
For references, you know already that every single column of a row holds a pointer to the referenced row....
As for "mixins", web2py has table inheritance ....
I guess all that comes up in term of differences is merely technicalities.... as far as my projects went, I rarely had the need to define "real python classes" to interact with "my mind model"....

Doing Record.find(1) is not that different from db.record(1) as far as I'm concerned, and I have a lot more control on what an hypotetical Record.save() does with record.update_record() as far as db queries are concerned (ok, I know that an "expression layer", having a "layer less", helps retaining control on what is happening between your models and your db, but that's why I like it :-) ) .
From the moment callbacks were put in trunk (and I will be thankful until I die for those :-P) I couldn't find anything that can be done with an ORM that is not easily replicable with DAL.
From DAL to SQLA there are of course many differences, but DAL tries to expose the 95% of the features with an homogeneous API. SQLA is more "flexible" but every adapter has its set of features that closely builds on the backend available features.

The only thing I miss from SQLA is the wonderful "session". I guess in your post is described in the lines

It should also facilitate performance-optimizations, by way of a transactionally-aggregating queries and updates/inserts automatically.

but dude, believe me, it's very, very, very hard to support "python-side" nested transactions, each one with their own commit/rollback. That's why you find that bit of functionality in 2 libraries in the whole python world: SQLA and Django's ORM (to be fair, Django's one covers a lot less of cases)

What do you miss, specifically, from the DAL standpoint ?

Massimo Di Pierro

unread,

Apr 27, 2013, 11:42:31 AM4/27/13

to web...@googlegroups.com

Without addressing all of the issues I would like to make two comments:

1) In the past we have created ORM structure on top of DAL to allow use of Django models and SQLAlchemy models with DAL. The code is here:

http://www.web2py.com/AlterEgo/default/show/189

Nobody paid attention.

2) I prefer the DAL syntax because I foten program in this way:

MODELS = [

('office', [

{'name': 'email_address', 'label': 'Office E-Mail Address'},

{'name': 'office_location', 'label': 'Office Location'},

{'name': 'office_phone', 'label': 'Office Phone'},

{'name': 'fax_number', 'label': 'Office Fax Number'}]),

)]

def autofield(f):

ftype = f.get('type','string')

requires=None

elif not requires and f.get('required'): requires = IS_NOT_EMPTY()

if ftype=='text': represent=lambda v,r=None:MARKMIN(v)

return Field(f['name'],ftype,requires=requires,

represent=represent,label=f['label'])

for tablename, fields in MODELS:

fields = [autofield(f) for f in fields]+[auth.signature]

db.define_table(tablename, *fields)

And allow customization of the code by editing the MODELS table instead of changing the code. ORM make the code very inflexible in this respect. They also are slower.

Arnon Marcus

unread,

Apr 27, 2013, 2:49:16 PM4/27/13

to

What do you miss, specifically, from the DAL standpoint ?

Well, frankly, as I said - I LOVE THE DAL (!) :)

I don't "miss" anything from it, in fact, I prefer it onto SQLA's SQL-Expression layer - as I said.

But it's not the issue at hand. It's not what I am missing from the DAL. It's what I am missing from web2py. I don't or an ORM as somthing that the DAL is missing. I think of it as a bonus-layer on-top of it, that web2py is missing. The reason is that I don't expect the DAL to be something that it is not.

As I said, I use the DAL extensively. I would continue to do so, even if/when web2py would have an ORM layer. I do not inend to use an ORM-layer for everything. That would be a poor design on my part.

Here is a REALLY grate talk, describing the SHORTCOMINGS of some common-forms of ORM implementations:

http://www.youtube.com/watch?v=E09qigk_hnY

As you will see, the problems are not in the ORM concept as a "philosophy".

The problems arise from inadequate "implementations" of ORMs, that "hide" and "abstact" TOO MUCH, and are not built in a layered-fashion, so do not provide the necessary flexibility to define schemas and other factors of the data-model "manually", override/define defaults for structural/behavioral patterns, as well as configuring save/load strategies. These are crucial characteristics of a "truly" useful ORM.

AFAIK, SQLA might be the only one who does that right, all the way to the ORM level - at least in python-land.

As for MY use-case, and hence my reason for wanting an ORM layer:

I am architectonic a collaborative-system, that integrates financial-management, project-management, process-management and content-management - all under a single roof using a single database. It's an ambitious software-project, which was quite novel in it's scope when it first started.

Why would I want an ORM? Well, it's not that I NEED it, it can very well stay ORM-less, however: Once we reached a certain volume of code, and certain level of complexity of the data-model, it started to become very difficult to maintain and reason about. We have hundreds of tables and many thousands of lines of code, and it lacks structure to some degree. Yes, MVC does help - a lot - but it's not enough. The MVC part is actually pretty straight-forward. The problem is the spaghetti that emerged in our custom-modules, that our controllers use. You might argue for poor-software-design on our part, but I believe it would be more than a little disingenuous to deny the implication of not having an ORM in the models...

Additionally, there are performance issues that a well-designed ORM can solve - but that may have more to do with the implementation of Units-Of-Work in SQLA more than an ORM, however I still believe that using an ORM model, can help a lot in defining clusters of operations - that would then be ordered and aggregated into a transaction. It may provide orders-of-magnitude of better performance in some cases, and may be the only way out of the N+1 problem.

For example, in one of the threads I read, Massimo gave an example of how an ORM can be emulated using the DAL. This example included a db-query inside a nested for-loop...

Now, we recently did a massive profiling of our application, and guess where we found the bottlenecks...

This leads me to the subject of hierarchies: Sometimes, there is no way to model a query that scans a tree of records, without resorting to this kind of awful approach... This is a real problem.

Now, I am not sure whether SQLA can solve this, and if so how. But I have a feeling that using an ROM to model a record-tree, and using a transcriptional aggregating mechanism that introspect this ORM and infers a better aggregate for a transaction, might have huge performance benefits. I saw some examples in the long lecture from PyCon-2013, but I'm not sure it modeled a hierarchy...

Anyways, I can elaborate more on my use-case if you think I should, but I think you get the idea...

Basically we are talking about a tree of "production-items", that relate to a table of budget-items, but also that each node in the item-hierarchy has it's own tree of "tasks", and each task has it's own tree of "comments" for collaborative messaging. Additionally, each task may have a tree of file-records representing a CMS of versions of files that are used in the process of executing that task. then there's a status-matrix for representing task-status's pipeline-process...

I really don't know how far I should get with this description, but you may get the point - it's a huge tree-of-trees... The most horrible data-model for a relational-database, and one that can benefit a lot from an ORM.

I was thinking about moving that data into an ODB such as ZODB, but I still want the data to be stored in the same Postgres DB, in the back-end - so THERE I though I could use an ORM ontop of the DAL, as an adapter for a ZODB back-end...

But that was a wild idea - I would much rather solve everything within web2py, and there also, an ORM could really help...

Arnon Marcus

unread,

Apr 27, 2013, 3:22:41 PM4/27/13

to web...@googlegroups.com

Hi Massimo, I thank you for adressing this thread, and apologize that it became so long and verbose, and that you are tackled with this ORM-stuff "yet again"...

I imagine it might feel tedious by now, considering the amount of discussion that has already been done over this issue in the past.

However, I still think it should be re-visited.

I had already gone through the link you posted, before - it had appeared in many past-discussion that I had read in this group when searching for 'ORM' and SQLAlchemy" keywords.

I was disappointed to discover that this example is proorly-formed. The comparison ther, (as well as in other places, like this: http://web2py.com/examples/static/sqla2.html) feels somewhere between ill-informed to disingenuous.

It completely overlooks the actual "usage" of the design in SQLA that would be manifested once this structure is defined there. The ORM features are not even mentioned - it onl compared the DAL's "syntax" to the SQLA's SQL-Expression DDL/DML syntax - there is no reference to semantic-meaning that pertain to the usage of SQLA's ORM afterwards...

The reason, as it seems (and as is evident in the link I gave here), is that ther IS NO comparable semantics in the DAL, or in web2py in general - it simply does not exist there. Now, I don't imply that there was any malicious-intent on your-part, to misdirect the reader - I can't know that - but I AM mentioning the feeling I got when I read it, just so you know - it felt like a bad rep for web2py, and so was disappointing. I would have expected a more revealing description of what web2py is missing in comparison - it might not have given web2py points in feature-support, but would have given it many points in integrity and sincerity. But that may just be my personal interpretation...

I think you should be proud of web2py for it's achievements, and be confident enough about it's capabilities to admit it's limitations front and center.

I think that would help a lot for web2py's rep.

anyways, I was not looking for a SQLA-bridge - as I said, I don't like their SQL-Expression syntax, and would much rather still use web2p's DAL. Hell, we have thousands of lines of code using it... It would be a nightmare to switch-gears at this point.

No. What I AM asking for, is extra tools in-and-around the DAL, to help and facilitate the construction of custom ORM classes around it.

As for your usage of the DAL, that is actually exactly what I had in mind when I was thinking about a back-end for a GUI-Schema-designer. This is a very declarative style of defining schema-metadata, so it could very well be serialized and stored in a declarative format, that could then be used in a GUI-Schema-Builder.

It can obviously also be used to separate the concerns and build ORM classes around. It's the same kind of separation that I would expect between ORM classes and business logic classes - having a single-source-of-truth that is framework-neutral. Very cool stuff...

Niphlod

unread,

Apr 27, 2013, 4:00:53 PM4/27/13

to web...@googlegroups.com

I'll not argue that some really extended project got eventually messy (they all do): my point is that even with an ORM your code wouldn't be as neat as you might think.
Without you providing an api that you may expect web2py can't figure out your business requirements, and surely you can't expect anyone to code something on top of your ideas without providing a consistent use-case that definitely benefits of an ORM on top of the DAL.

Additionally, there are performance issues that a well-designed ORM can solve - but that may have more to do with the implementation of Units-Of-Work in SQLA more than an ORM, however I still believe that using an ORM model, can help a lot in defining clusters of operations - that would then be ordered and aggregated into a transaction. It may provide orders-of-magnitude of better performance in some cases, and may be the only way out of the N+1 problem.

DAL does everything in one transaction (db transaction) always. If you're referring to some custom pattern as (pseudo-code)

record = Record(1)
record.ordered_items = 1
....
record.ordered_items +=1
record.save()

ending up in a "update record set ordered_items = 2 where id=1"

the same exact thing gets done by
record = db.record(1)
record.ordered_items = 1
.....
record.ordered_items +=1
record.update_record()
by the DAL.

If instead you're talking about DAL "optimizing" whatever you do to the minimum required number of queries...well, that goes very far along the way of AI, e.g.

db(db.record.id == 1).update(ordered_items=1)
....
db(db.record.id == 1).update(ordered_items=2)
....
db(db.record.id == 1).delete()

ending up in "delete from record where id = 1".

DAL issues the queries you instruct in your code: it doesn't force you to use them.
ORMs on the other hand are a little bit more "obscure" in terms of "what will happen when I choose to alter this object".

ORMs can do that kind of "optimization" (up to a certain point) but please note that you're asking the code to be smarter than you.

For example, in one of the threads I read, Massimo gave an example of how an ORM can be emulated using the DAL. This example included a db-query inside a nested for-loop...
Now, we recently did a massive profiling of our application, and guess where we found the bottlenecks...
This leads me to the subject of hierarchies: Sometimes, there is no way to model a query that scans a tree of records, without resorting to this kind of awful approach... This is a real problem.

Anyways, I can elaborate more on my use-case if you think I should, but I think you get the idea...

Basically we are talking about a tree of "production-items", that relate to a table of budget-items, but also that each node in the item-hierarchy has it's own tree of "tasks", and each task has it's own tree of "comments" for collaborative messaging. Additionally, each task may have a tree of file-records representing a CMS of versions of files that are used in the process of executing that task. then there's a status-matrix for representing task-status's pipeline-process...
I really don't know how far I should get with this description, but you may get the point - it's a huge tree-of-trees... The most horrible data-model for a relational-database, and one that can benefit a lot from an ORM.
I was thinking about moving that data into an ODB such as ZODB, but I still want the data to be stored in the same Postgres DB, in the back-end - so THERE I though I could use an ORM ontop of the DAL, as an adapter for a ZODB back-end...
But that was a wild idea - I would much rather solve everything within web2py, and there also, an ORM could really help...

The fact that your model doesn't fit a tabular storage doesn't really matter in "DAL vs ORM" ways of coding such a thing, performance-wise... you have to fetch the data either way, and that's the bottleneck. If your bottleneck is the code to fetch the data, then it will take surely less time with the DAL than with an ORM (given the head-start of having a layer less).

As of choosing the right model to store hierarchies in a database, ink has been wasted on books and blood on posts around the web.... but that deserves another thread alltogether ^_^

<offtopic>
Please note that I'm not saying that DAL fits the bills always: SQLA is by far the most complete "db integration on python" and, from a lot of perspective, may be one of the most mature out there also not "restricting" the perspective to python. Those are piece of code optimized over the years by loads of peoples.
E.g., if I were to manage a database only with a python module, one of the major shortcomings of DAL vs SQLA are:
- no indexes in models
- flaky "schema" support
- no full support for non integer pk
- no full support for multiple-columns pk
Now, if you're talking about those requirements, ok, maybe we need more features in DAL.... but those doesn't "count" for requiring an ORM on top of DAL.
As far as framework goes, a little "dose" of assumptions are meant to be made to make the code less overbloated.... "requiring" every table to have an integer pk is somewhat recommended in every SQL book out there.

Pun intended: I'm assuming nobody creating something has a problem with having column names without spaces, even if they are a possibility in every database engine.
</offtopic>

Arnon Marcus

unread,

Apr 27, 2013, 5:24:56 PM4/27/13

to web...@googlegroups.com

Niphlod, you keep referring to this "generalization" called "ORMs"...

i am not talking about poorly-implemented ORMs - I am sure there are plenty of those...

I am referring to SQLA's ORM.

Have you watched the video in the last link I posted?

I would really appreciate you'r take on what's in it, please watch it - here it is again:

http://www.youtube.com/watch?v=E09qigk_hnY

(* I predict you would find it illuminating, if not surprising...;) )

Arnon Marcus

unread,

Apr 27, 2013, 7:00:17 PM4/27/13

to web...@googlegroups.com

even with an ORM your code wouldn't be as neat as you might think.

Well, SQLA has this cool feature called "relationships", which basically lets you define bidirectional relationships between ORM classes. After you define it, there are events that will take care of updating the relationships for added/removes records for all direction in all relationships defined:

http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=7962s

* Time-coded link : watch about 7 minutes...

p.s: This can also facilitate the build-up and maintenance of tree-structures.

But I think the main issue here is different, though - it may have to do with web2py's execution-model - having it "execute" everything for each request - so there is nothing being saved across-requests...

However, "modules" are not automatically-reloaded, and it is actually in THIS place where I would expect to put my ORM... This is in fact the main issue that is not considered in web2py...

If instead you're talking about DAL "optimizing" whatever you do to the minimum required number of queries..

Hmmm, well, then what would you call SQLA's "eager-loading", then?

http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=9445s

* Time-coded link - watch about 10-15 minutes from that point

ORMs on the other hand are a little bit more "obscure" in terms of "what will happen when I choose to alter this object".

Are they? Really?:

http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=8214s

* Time-coded link : watch about 2 minutes

ORMs can do that kind of "optimization" (up to a certain point) but please note that you're asking the code to be smarter than you.

How about SQLA's lazy-loading?

http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU

* Time-coded link :watch 1 minute

Niphlod

unread,

Apr 28, 2013, 1:44:05 PM4/28/13

to web...@googlegroups.com

On Sunday, April 28, 2013 1:00:17 AM UTC+2, Arnon Marcus wrote:

even with an ORM your code wouldn't be as neat as you might think.

Well, SQLA has this cool feature called "relationships", which basically lets you define bidirectional relationships between ORM classes. After you define it, there are events that will take care of updating the relationships for added/removes records for all direction in all relationships defined:
http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=7962s
* Time-coded link : watch about 7 minutes...
p.s: This can also facilitate the build-up and maintenance of tree-structures.

And where did you see a lack in functionality using DAL callbacks ?

But I think the main issue here is different, though - it may have to do with web2py's execution-model - having it "execute" everything for each request - so there is nothing being saved across-requests...
However, "modules" are not automatically-reloaded, and it is actually in THIS place where I would expect to put my ORM... This is in fact the main issue that is not considered in web2py...

Just because the scaffolding app doesn't have any module in it? I use modules all the times if I don't want to reload the code at every request, that's why they are available.

If instead you're talking about DAL "optimizing" whatever you do to the minimum required number of queries..

Hmmm, well, then what would you call SQLA's "eager-loading", then?
http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=9445s
* Time-coded link - watch about 10-15 minutes from that point

That's just avoiding recursive queries. Again, SQLA doesn't figure out it for you, you have to instruct it to be "eager" upfront. At that point, you can instruct DAL to be eager too with joins.

ORMs on the other hand are a little bit more "obscure" in terms of "what will happen when I choose to alter this object".

Are they? Really?:
http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU#t=8214s
* Time-coded link : watch about 2 minutes

Yep, really. Whenever I use db(query).select(), .update(), .delete() I know that a query is fired (sort of "explicit is better than implicit"). If I use SQLA I have to remember how the particular model was coded (and optionally do some debugging) to see what's going on under the hood. Please note that this is expected: you want an ORM to have another layer of abstraction on top of database operation, so by default is more implicit.

ORMs can do that kind of "optimization" (up to a certain point) but please note that you're asking the code to be smarter than you.

How about SQLA's lazy-loading?
http://www.youtube.com/watch?feature=player_detailpage&v=woKYyhLCcnU
* Time-coded link :watch 1 minute

Same thing as before. You have to tell it upfront, and it's not different from planning what to do with one/two/n DAL queries.

Arnon Marcus

unread,

Apr 28, 2013, 2:53:16 PM4/28/13

to

And where did you see a lack in functionality using DAL callbacks ?

I am not that familiar with them, honestly.

As I said - It may seem as though I am saying that web2py currently can-not accomplish this - what this actually means, is that "I don't know of ways in which web2py can accomplish this"

I would be more than happy to find out that I am wrong! :)

How can DAL-callbacks be used to emulate this?

Can they auto-infer relationships bi-directionally from the schema, and supply event-listeners for each end of each multi-targetted relationship?

How would I factor such a mechanism into my own classes to link them up?

Just because the scaffolding app doesn't have any module in it? I use modules all the times if I don't want to reload the code at every request, that's why they are available.

I said that most of our code is in custom-modules, so obviously I know that web2py "supports" this.

What I meant by that, is that it seems as though due to the fact that web2py "executes" the models, and so has implicit-session-management going on on the DAL, it "assumes" a "data-model-session" by default. So, the characteristics and features of the DAL are (obviously) targeted at supporting this mechanism. All other frameworks do not execute the models on each request. You may write a reload(), but that's your business So, SQLA is designed to support this execution model, which is why it has to provide all these wild connection-sessions, transactions, and table-cache-invalidation mechanisms with expirations, dirty-checkings and the like...

Web2py took advantage of it's execution to basically allow itself to completely avoid any of these stuff.

So, if you want to implement a data-model using the DAL "within a non-executed/non-reloaded custom-module", than you are basically on your own for implementing all these crazy db-sessions and the like.

This is why it seems so odd for web2py users to grasp this notion of an ORM, as it is basically assuming a totally-different execution model for the db-models, compared to how it is currently built in web2py.

It is like the different between a statefull system, and a stateless one.

SQLA is built for statefull applications.

Web2py's DAL is built for stateless ones.

It's a very different paradigm, with a different way of thinking.

And because of that, it is so wired to try to think about an ORM in web2py, especially if it is going to sit in a "statefull module".

Perhaps I am missing something here, but this is how I currently understand things.

That's just avoiding recursive queries. Again, SQLA doesn't figure out it for you, you have to instruct it to be "eager" upfront. At that point, you can instruct DAL to be eager too with joins.

Obviously, but that's not the point - we are talking about "tools for automating abstractions". He did not write the join himself there...

The eager-loading system figured out by itself which joins it should construct in order to eager-load things - all he did is tell the system to do so for that relationship.

Yep, really. Whenever I use db(query).select(), .update(), .delete() I know that a query is fired (sort of "explicit is better than implicit"). If I use SQLA I have to remember how the particular model was coded (and optionally do some debugging) to see what's going on under the hood. Please note that this is expected: you want an ORM to have another layer of abstraction on top of database operation, so by default is more implicit.

No. It's as if you completely missed his point. SQLA's ORM philosophy is different than others. Did you watch the lecture I gave you about that?

SQLA is not building your models - you are.

SQLA is not "hiding", it is "automating".

SQLA is not "obtuse", it is "transparent".

SQLA does not lock you into it's defaults - you define it's defaults up-front.

Same thing as before. You have to tell it upfront, and it's not different from planning what to do with one/two/n DAL queries.

No. It's automatic. You set-up the "relationship" object yourself - yes - but IT figures out whether this is a one-to-many relationship, and to which direction, and constructs an optimized query based on which end of the relationship you are querying. As he said, this is non-trivial in ORMs - Django's ORM (as he alluded...) is NOT doing that...

Niphlod

unread,

Apr 28, 2013, 3:05:44 PM4/28/13

to web...@googlegroups.com

On Sunday, April 28, 2013 8:41:46 PM UTC+2, Arnon Marcus wrote:

And where did you see a lack in functionality using DAL callbacks ?

I am not that familiar with them, honestly.

As I said - It may seem as though I am saying that web2py currently can-not accomplish this - what htis actually means, is that "I don't know of ways in which web2py can accomplish this"

I would be more than happy to find out that I am wrong! :)

How can DAL-callbacks be used to emulate this?

Can they auto-infer relationships bi-direction ally from the schema, and supply event-listeners for each end of a each multi-target relationship?

How would I factor such a mechanism into my own classes to link them up?

they don't build up magically by default (though the cascade parameter allows the same flexibility as the one present in your backend) but given that all the six "important" events are available and fired accordingly, you can hook up whatever business logic you like.

Just because the scaffolding app doesn't have any module in it? I use modules all the times if I don't want to reload the code at every request, that's why they are available.

I said that most of our code is in custom-modules, so obviously I know that web2py "supports" this.

What I meant by that, is that it seems as though due to the fact that web2py "executes" the models, and so has implicit-session-management going on on the DAL, it "assumes" a "data-model-session" by default. So, the characteristics and features of the DAL are (obviously) targeted at supporting this mechanism. All other frameworks do not execute the models on each request. You may write a reload(), but that's your business So, SQLA is designed to support this execution model, which is why it has to provide all these wild connection-sessions, transactions, and table-cache-invalidation mechanisms with expiration, dirty-checkings and the like...

Web2py took advantage of it's execution to basically allow itself to completely avoid any of these stuff.

So, if you want to implement a data-model using the DAL "within a non-executed/non-reloaded custom-module", than you are basically on your own for implementing all these crazy bb-sessions and the like.

This is why it seems so odd for web2py users to grasp this notion of an ORM, as it is basically assuming a totally-different execution model for the db-models, compared to how it is currently built in web2py.

It is like the different between a statefull system, and a stateless one.
SQLA is built for statefull applications.
Web2py's DAL is built for stateless ones.
It's a very different paradigm, with a different way of thinking.
And because of that, it is so wired to try to think about an ORM in web2py, especially if it is going to sit in a "statefull module".

Perhaps I am missing something here, but this is how I currently understand things.

HTTP is (ATM) a stateless world. Yep, cookies are used to store something from different requests, but that's business logic of your app, not a database concern.
Even with SQLA you DON'T want to "keep alive" a session between multiple requests, as you can very easily end up in lockings.

That's just avoiding recursive queries. Again, SQLA doesn't figure out it for you, you have to instruct it to be "eager" upfront. At that point, you can instruct DAL to be eager too with joins.

Obviously, but that's not the point - we are talking about "tools for automating abstractions". He did not write the join himself there...
The eager-loading system figured out by itself which joins it should construct in order to eager-load things - all he did is tell the system to do so for that relationship.

And what's different than looping over all the referenced tables and joining them ? Do you really need a 10-lines code snippet to accomplish that in DAL ? This can go in the "please add features to DAL" chapter, as other things mentioned before, but don't really count for an ORM.

Yep, really. Whenever I use db(query).select(), .update(), .delete() I know that a query is fired (sort of "explicit is better than implicit"). If I use SQLA I have to remember how the particular model was coded (and optionally do some debugging) to see what's going on under the hood. Please note that this is expected: you want an ORM to have another layer of abstraction on top of database operation, so by default is more implicit.

No. It's as if you completely missed his point. SQLA's ORM philosophy is different than others. Did you watch the lecture I gave you about that?
SQLA is not building your models - you are.
SQLA is not "hiding", it is "automating".
SQLA is not "obtuse", it is "transparent".
SQLA does not lock you into it's defaults - you define it's defaults up-front.

That means exactly what I meant to explain before: you define how it should work beforehand and you need to review the model to understand what is "firing" under the hood.

Same thing as before. You have to tell it upfront, and it's not different from planning what to do with one/two/n DAL queries.

No. It's automatic. You set-up the "relationship" object yourself - yes - but IT figures out whether this is a one-to-many relationship, and to which direction, and constructs an optimized query based on which end of the relationship you are querying. As he said, this is non-trivial in ORMs - Django's ORM (as he alluded...) is NOT doing that...

So, you added a lot of statements and pieces of code in your model to define the relationship, and SQLA replays it figuring out "the most optimized way possible".
My point of "I end up with a nice schema to play with" may be throwing me ahead of road-blocks that you faced, but is it really that difficult to code a relationship and figuring out how to query it ?

Anthony

unread,

Apr 28, 2013, 8:59:40 PM4/28/13

to web...@googlegroups.com

I think you're arguing against a bit of a straw man here. Past resistance to the idea of an ORM has usually been in the context of suggestions for an ORM instead of a DAL, not as an additional abstraction on top of the DAL for particular use cases. As Massimo noted, there have already been some efforts at the latter, though none generated enough interest to persist. If you can identify a common use case where some kind of abstraction on top of the DAL would make development much easier and/or execution more efficient, and you can clearly articulate the nature of that abstraction, I don't think anyone would object. It might help to see something concrete, though. Let's see an example data model, how it would be easily implemented in SQA or the hypothetical web2py ORM, and how that is a clear improvement over how you would otherwise have to implement it using the DAL. I also wonder if it is a full-blown ORM that is needed, or just some additional DAL functionality to more easily accommodate particular types of data models.

Anthony

On Sunday, April 28, 2013 2:41:46 PM UTC-4, Arnon Marcus wrote:

And where did you see a lack in functionality using DAL callbacks ?

I am not that familiar with them, honestly.

As I said - It may seem as though I am saying that web2py currently can-not accomplish this - what this actually means, is that "I don't know of ways in which web2py can accomplish this"

I would be more than happy to find out that I am wrong! :)

How can DAL-callbacks be used to emulate this?

Can they auto-infer relationships bi-directionally from the schema, and supply event-listeners for each end of each multi-targetted relationship?

How would I factor such a mechanism into my own classes to link them up?

Just because the scaffolding app doesn't have any module in it? I use modules all the times if I don't want to reload the code at every request, that's why they are available.

I said that most of our code is in custom-modules, so obviously I know that web2py "supports" this.

What I meant by that, is that it seems as though due to the fact that web2py "executes" the models, and so has implicit-session-management going on on the DAL, it "assumes" a "data-model-session" by default. So, the characteristics and features of the DAL are (obviously) targeted at supporting this mechanism. All other frameworks do not execute the models on each request. You may write a reload(), but that's your business So, SQLA is designed to support this execution model, which is why it has to provide all these wild connection-sessions, transactions, and table-cache-invalidation mechanisms with expirations, dirty-checkings and the like...

Web2py took advantage of it's execution to basically allow itself to completely avoid any of these stuff.

So, if you want to implement a data-model using the DAL "within a non-executed/non-reloaded custom-module", than you are basically on your own for implementing all these crazy db-sessions and the like.

This is why it seems so odd for web2py users to grasp this notion of an ORM, as it is basically assuming a totally-different execution model for the db-models, compared to how it is currently built in web2py.

It is like the different between a statefull system, and a stateless one.
SQLA is built for statefull applications.
Web2py's DAL is built for stateless ones.
It's a very different paradigm, with a different way of thinking.
And because of that, it is so wired to try to think about an ORM in web2py, especially if it is going to sit in a "statefull module".

Perhaps I am missing something here, but this is how I currently understand things.

That's just avoiding recursive queries. Again, SQLA doesn't figure out it for you, you have to instruct it to be "eager" upfront. At that point, you can instruct DAL to be eager too with joins.

Obviously, but that's not the point - we are talking about "tools for automating abstractions". He did not write the join himself there...
The eager-loading system figured out by itself which joins it should construct in order to eager-load things - all he did is tell the system to do so for that relationship.

Yep, really. Whenever I use db(query).select(), .update(), .delete() I know that a query is fired (sort of "explicit is better than implicit"). If I use SQLA I have to remember how the particular model was coded (and optionally do some debugging) to see what's going on under the hood. Please note that this is expected: you want an ORM to have another layer of abstraction on top of database operation, so by default is more implicit.

No. It's as if you completely missed his point. SQLA's ORM philosophy is different than others. Did you watch the lecture I gave you about that?
SQLA is not building your models - you are.
SQLA is not "hiding", it is "automating".
SQLA is not "obtuse", it is "transparent".
SQLA does not lock you into it's defaults - you define it's defaults up-front.

Same thing as before. You have to tell it upfront, and it's not different from planning what to do with one/two/n DAL queries.

Anthony

unread,

Apr 28, 2013, 9:12:33 PM4/28/13

to web...@googlegroups.com

Another past effort: https://code.google.com/p/web2pyorm/

Arnon Marcus

unread,

Apr 29, 2013, 4:43:46 AM4/29/13

to web...@googlegroups.com

On Monday, April 29, 2013 3:59:40 AM UTC+3, Anthony wrote:

I think you're arguing against a bit of a straw man here. Past resistance to the idea of an ORM has usually been in the context of suggestions for an ORM instead of a DAL, not as an additional abstraction on top of the DAL for particular use cases. As Massimo noted, there have already been some efforts at the latter, though none generated enough interest to persist. If you can identify a common use case where some kind of abstraction on top of the DAL would make development much easier and/or execution more efficient, and you can clearly articulate the nature of that abstraction, I don't think anyone would object. It might help to see something concrete, though. Let's see an example data model, how it would be easily implemented in SQA or the hypothetical web2py ORM, and how that is a clear improvement over how you would otherwise have to implement it using the DAL

I haven't given a DAL-alternative to what I am after, because I am not as familiar with it's advanced-usage-patterns as some of you may be. I am presenting a set of needs, and asking for a way to meet them.

If you go through the links I have been pin-pointing here, you would find many concrete examples of data-models and how they are used in SQLA's ORM.

You are basically asking me to get much more proficient in the DAL, as well as in SQLA, and to design the solution myself and present it.

This is a valid expectation in a developers-group, not a users-group. The whole point of sharing ideas and expertise, is that someone can suggest a conceptual/abstract direction, and this entices someone else who is much more proficient, to come-up with a concrete solution. We share each-other's expertise also, not just ideas. I am inclined to hoped I would not be needing to become an expert myself, in order to propose an idea for improvement. I think it is a legitimate inclination.

Arnon Marcus

unread,

Apr 29, 2013, 5:47:41 AM4/29/13

to web...@googlegroups.com

they don't build up magically by default (though the cascade parameter allows the same flexibility as the one present in your backend) but given that all the six "important" events are available and fired accordingly, you can hook up whatever business logic you like.

That answer is too vague for me... Can you elaborate?

What cascade-parameter? What do you mean by "your backend"? What "six important" event?

Does this answer my questions?

HTTP is (ATM) a stateless world. Yep, cookies are used to store something from different requests, but that's business logic of your app, not a database concern.
Even with SQLA you DON'T want to "keep alive" a session between multiple requests, as you can very easily end up in lockings.

I was referring to stateless-vs-statefull as a metaphor - an analogy.

The statefullness I was trying to convey, that exist in all other frameworks except web2py, is in the ORM:

SQLA assumes that the model-code is not executed for each request. It is imported once at server-launch-time.

It's ORM provides a set of classes that are instantiated on-the-fly, but not in a session-lexical manner - If a request from a different session asks for a table that has an already-existing ORM-class-instance for it in memory, it will be reused. The data it contains, however, will have already been invalidated (dirty-marked) by the last database-connection-session-closing. Connection-pooling means that it re-uses connection to the database, not re-create them. A database-session-close from the point of view of the ORM, is not really a connection-close in the core-layer - it is just a release of a connection-object back to the connection pool. Then on the ORM-level, a transaction-end is not really a session-end in terms of the life-time of the class-instance-objects of the ORM classes. They persist in memory, they just get invalidated by the transaction-end. Again, it is a very different execution model from web2py - and it has nothing to do with HTTP, keep-alive, and long-lived-connections with the client - it has to do with the way the web-framework is implemented:

SQLA's ORM-class-instances are reused across requests.

Web2py's DAL instances are not.

If I wanted to emulate a SQLA-ORM in web2py, I would have to put it all in custom-modules, that would persist class-instances in memory, and do a lot of these invalidation/dirty-chacking stuff myself. I think a way around this might be to keep the DAL-layer separate, in that instances of ORM-classes may persist in memory across requests in a custom module that is not reloaded, but that they would be dynamically re-linked to DAL instances that would still be re-generated for each request by executing the models as usual.

This is what I mean by another layer on-top of the DAL. How to implement such a thing? This is way over my league...

And what's different than looping over all the referenced tables and joining them ? Do you really need a 10-lines code snippet to accomplish that in DAL ? This can go in the "please add features to DAL" chapter, as other things mentioned before, but don't really count for an ORM.

In respect to my previous section, the auto-generation of query-objects, should be done "within" the ORM layer, that just "uses" the DAL instances.

The reason for this is that behavioral-characteristics for domain-model classes, go in the ORM layer - that's what it's for.

That means exactly what I meant to explain before: you define how it should work beforehand and you need to review the model to understand what is "firing" under the hood.

The DAL is an abstraction-layer. It's purpose is that you should not need to go beneath that layer, unless things go horribly wrong. This means, that for 99% of the usage, you shouldn't care what goes on underneath in the low-level SQL. Otherwise, there really is no need for a DAL at all...

The same would go for an ORM layer. It should help you facilitate the mapping of your business-logic classes to database-classes, and define automation of database-layer-behavioral-patterns, as a function of business-logic behaviors, "within" you your business-logic layer, and not the database-layer.

This is in a nutshell what I am after, and where I think an ORM layer on-top of the DAL could be most beneficial.

So, you added a lot of statements and pieces of code in your model to define the relationship, and SQLA replays it figuring out "the most optimized way possible".
My point of "I end up with a nice schema to play with" may be throwing me ahead of road-blocks that you faced, but is it really that difficult to code a relationship and figuring out how to query it ?

I'm not sure I understand what you are saying here, but it seems to pertain to my last comment in the previous section.

Niphlod

unread,

Apr 29, 2013, 6:47:21 AM4/29/13

to web...@googlegroups.com

On Monday, April 29, 2013 11:47:41 AM UTC+2, Arnon Marcus wrote:

they don't build up magically by default (though the cascade parameter allows the same flexibility as the one present in your backend) but given that all the six "important" events are available and fired accordingly, you can hook up whatever business logic you like.

That answer is too vague for me... Can you elaborate?
What cascade-parameter? What do you mean by "your backend"? What "six important" event?
Does this answer my questions?

So, a small recap. Don't take this in the wrong way but:
- I need to watch 2 hours of presentations that you found interesting to be able to reply
- you expect "us" (by us I mean web2py developers) to figure out YOUR business requirement because you liked a presentation about SQLA and you want that feature(s) on top of DAL, without providing your use-case and the possible benefits
- you expect web2py developers to come up with an API that even you can't figure out
- you don't want to raise a finger providing a real usecase scenario cause "you're a user, not a developer"
- you can't take a look at the current DAL implementation to see what are the proposed solutions to your (ATM imaginary) problems
If these are the requirements of your discussion-environment, don't be surprised if nobody provides code for you.

Docs are there. Given that you're using time to look at other python modules/packages, at least be documented on what you want to use as a starting-point (DAL) for comparisons.

Arnon Marcus

unread,

Apr 29, 2013, 8:09:24 AM4/29/13

to web...@googlegroups.com

This has turned into a meta-discussion, but I get the criticism (I think it is exaggerated, though...).

Yes, I should read more about new DAL features, and I will, I just assumed that creative-usage-patters would not necessarily be as well documented as the bare features themselves, and that some people here - you maybe - would be more proficient than me regarding such creative usage-patterns that I might not think of. Just pointing out features of the DAL, is insufficient, without some toy-examples...

Perhaps it is my way of asking for help that you don't like, but you should at least admit that I had to go quite far just to convince you that my proposal is even "potentially" beneficial at all...

We never even got to the point in which you gave me concrete alternative usage-patterns of the DAL - you consistently reject the proposition I present in principal. This is not helping.

As for use-cases, I wanted you to see someone else explain things better then me - so I gave you some links. It is an almost three-hour lecture, so I pin-pointed specific minutes that I think where relevant, and gave time-coded links. Is that too much to ask?

As for the other 50min lecture, I could narrow it down even more, but the point was that I felt that there was a mismatch in perception of philosophies between how you perceive SQLA's philosophy (based on your prior perception of ORMs in general), and the actual philosophy of SQLA, as it's creator is presenting it, and I wanted you to see that.

As for API design, I am a user of web2py. Meaning, I AM a developer, but a developer of applications - not frameworks - there is a crucial distinction to make here. You wouldn't expect users of your framework to suggest concrete-implementations of infrastructure-improvements, would you?

I am not expecting any of you to figure out my business requirements (I have no idea what gave you that impression...). I am expecting a helping hand of "expertise" regarding existing usage-patterns, as well as have an architectural/abstract discussion of a proposed additional-abstraction-layer, that "might" require proficiency in the inner-workings of the DAL, which is beyond my expertise. How would you suggest I should ask for such a thing?

--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/vyOEkUeCNj4/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

villas

unread,

Apr 29, 2013, 8:25:52 AM4/29/13

to web...@googlegroups.com

I am interested in these discussions but it seems to me that far too much is being written and I apologise if I haven't read all the references which have been proposed.

The fact is that Hierarchical data does not fit at all well into a RDBMS. The main issue is therefore to decide what 'kludge' to use. The best kludge will vary case by case so the ORM would have to be incredibly sophisticated to always choose an optimised kludge without the developer specifying exactly what he wants. I do not know how SQLA achieves this, but I cannot imagine the DAL being developed to cover all cases in the best possible way.

Nevertheless it would indeed be nice to cover at least one common use-case of hierarchical data and I would suggest that someone should propose a specific case. Here is a simple example: geographical areas. I would propose either a nested set (or perhaps simply a materialised path, which I like very much for its simplicity). I think such an example would be very useful.

A few simple functions to manipulate this included in the DAL would be really great. I believe it is only with a specific proposal in mind that this discussion will progress. A theoretical discussion will remain just that and nothing practical will result.

There are truckloads of stuff already written on this topic and I cannot imagine that this thread would add anything new to that.

Arnon Marcus

unread,

Apr 29, 2013, 9:03:27 AM4/29/13

to web...@googlegroups.com

This is actually really close to what I am after - 10x.

Alas, it seems DOA...

I have contacted the creator.

Arnon Marcus

unread,

Apr 29, 2013, 9:06:29 AM4/29/13

to web...@googlegroups.com

Anthony found something close to what I'm after:

https://code.google.com/p/web2pyorm/

But what exactly an I expected to propose that would be concrete?

An API?

Must I be an API coder in order to propose a feature-set?

And about use-cases - I gave mine, as well as links to others. What else should I do?

Anthony

unread,

Apr 29, 2013, 9:40:29 AM4/29/13

to web...@googlegroups.com

I was disappointed to discover that this example is proorly-formed. The comparison ther, (as well as in other places, like this: http://web2py.com/examples/static/sqla2.html) feels somewhere between ill-informed to disingenuous.

The original link states the following:

The web2py database Abstraction layer can read most Django models and SQLAlchemy models...This is experimental and needs more testing. This is not recommended nor suggested but it was developed as a convenience tool to port your existing python apps to web2py.

It was not intended as a comparison between the DAL and SQA nor as a comprehensive representation or implementation of SQA's features, just as a convenience for using existing SQA model code if you happen to have some.

And the link you included above starts with the following:

We translated all the examples from the SQLAlchmey tutorial...This is not a feature comparison but more of a legend for new web2py users with SQLAlchemy experience...

It is literally just showing how to do what is shown in the SQA tutorial using the web2py DAL.

I don't see how either of these posts can be construed as ill-informed or disingenuous -- they are quite clear about their scope and intention.

Anthony

villas

unread,

Apr 29, 2013, 10:17:07 AM4/29/13

to web...@googlegroups.com

Hi Arnon,

Is this thread about...
    1. ...storing and manipulating hierarchical trees using the DAL?
    If so, please specify an example of your data?

    2. ...using the DAL with ORM-like syntax?
    If so, it would be interesting to see an example of the syntax you would like. Perhaps the example link provided by Anthony might be a starting point for you?

I suppose an ORM fits better with python objects, but the kludge with the RDBMS does not go away - you simply find it further down the street.

Regards, David

Anthony

unread,

Apr 29, 2013, 10:19:38 AM4/29/13

to web...@googlegroups.com

If you go through the links I have been pin-pointing here, you would find many concrete examples of data-models and how they are used in SQLA's ORM.

I have gone through the links (not the full presentations, though), and I don't think I've seen anything that looked obviously easier than the DAL equivalent (though, maybe we could add an abstraction to simplify many-to-many relationships).

You are basically asking me to get much more proficient in the DAL, as well as in SQLA, and to design the solution myself and present it.

Not really. It's a bit odd, though, that you would claim with such confidence that web2py needs an SA-like ORM without fully understanding the DAL's capabilities or any concrete notion about how such a hypothetical ORM would improve the development experience. A better starting point might be to say, "Here's something I can do quite easily with SA -- how can I achieve something similar with the DAL"? Then maybe we can discover specific areas where some additional abstraction on top of the DAL might be beneficial.

This is a valid expectation in a developers-group, not a users-group. The whole point of sharing ideas and expertise, is that someone can suggest a conceptual/abstract direction, and this entices someone else who is much more proficient, to come-up with a concrete solution.

Yes, but even in the users group, it helps to have concrete domain examples and preferably even code examples. It sounds like you're saying, "Hey, I suspect that an SA-like ORM could make some types of data modeling a lot easier, so you experts please go out and investigate whether that is really the case and in what situations it is the case, and then build a solution that covers those cases." I'm not sure this is a sufficiently well-defined problem to entice anyone to come up with a concrete solution.

I am inclined to hoped I would not be needing to become an expert myself, in order to propose an idea for improvement.

It doesn't hurt to propose ideas, but frankly, most new features are developed by someone who needs the feature, develops and refines it, and then offers it to the community (at which point it may then undergo further revision/refinement). It's not likely that a volunteer developer will be motivated to spend tens or hundreds of hours of his time because one person on the users list says he needs an ORM. If you want something to happen along these lines, it's likely you'll have to get at least somewhat involved beyond just making some very general suggestions.

Anthony

unread,

Apr 29, 2013, 12:45:17 PM4/29/13

to

But what exactly an I expected to propose that would be concrete?

An API?
Must I be an API coder in order to propose a feature-set?

Sort of. At least in pseudo-code, perhaps you could suggest how you would expect to be able to represent and interact with a particular type of data model.

And about use-cases - I gave mine, as well as links to others. What else should I do?

See above. If you think SQLAlchemy is getting it right, maybe show how you would code your model in it. Even better, also show how you do it in the DAL and explain how SA makes it easier (not just defining the model, but selects, inserts, updates, etc.).

Anthony

unread,

Apr 29, 2013, 10:43:38 AM4/29/13

to web...@googlegroups.com

they don't build up magically by default (though the cascade parameter allows the same flexibility as the one present in your backend) but given that all the six "important" events are available and fired accordingly, you can hook up whatever business logic you like.

That answer is too vague for me... Can you elaborate?
What cascade-parameter? What do you mean by "your backend"? What "six important" event?

See "ondelete" here, as well as the section on callbacks.

SQLA's ORM-class-instances are reused across requests.
Web2py's DAL instances are not.
If I wanted to emulate a SQLA-ORM in web2py, I would have to put it all in custom-modules, that would persist class-instances in memory, and do a lot of these invalidation/dirty-chacking stuff myself. I think a way around this might be to keep the DAL-layer separate, in that instances of ORM-classes may persist in memory across requests in a custom module that is not reloaded, but that they would be dynamically re-linked to DAL instances that would still be re-generated for each request by executing the models as usual.

Not quite sure what you mean here. Are you saying in SA, you would keep a database transaction open across multiple HTTP requests? Can you show an example of SA code that would be inconsistent with web2py's execution model?

Anthony

Cliff Kachinske

unread,

Apr 29, 2013, 1:07:45 PM4/29/13

to web...@googlegroups.com

The really great thing about free open source software is that you can fork or write a wrapper for any module.

You don't even have to wait for someone else to agree that your idea is worth spending time on.

Arnon Marcus

unread,

Apr 29, 2013, 2:59:29 PM4/29/13

to web...@googlegroups.com

<off-topic>

Look, I get that in many open-source communities, feature-requests are generally frowned upon. It's like "who this guy thinks he is, telling me what's missing with MY great achievement, hu...?"

There is a lot of ego-involved...

Sometimes too-much so...

Granted, I don't pay anyone here, so social-legitimacy for requesting "anything" is questionable at-best...

But there is a difference between a "request" and a "demand", and a lot of people seem to conflate the two - especially in western-cultures.

<off-topic>

But do you really think that there is no place "at all" for feature-requests in FOSS communities? We may be "political" and call them "wish-lists" or "suggestions-for-additions" - that may sound better - but it is still practically the same thing...

Someone raises an idea, tries to convince other people that it's cool and shiny, and maybe someone else get's excited about it enough to implement it, and maybe not... But does this make it illegitimate to try?

Derek

unread,

Apr 29, 2013, 3:37:10 PM4/29/13

to web...@googlegroups.com

I think the issue here is that the feature you are proposing doesn't provide any real benefit, not many people are interested, it would take significant development time to implement, and potentially confuse new users of the software. The thing is, web2py is open source and released under the LGPL version 3 license. So, if you feel like implementing this, fork it and make it your own. As far as the "someone raises an idea" in OSS - those people proposing the new ideas generally code it first and introduce it as a cool hack, which others may merge.

Anthony

unread,

Apr 29, 2013, 3:42:14 PM4/29/13

to web...@googlegroups.com

On Monday, April 29, 2013 2:59:29 PM UTC-4, Arnon Marcus wrote:

<off-topic>
Look, I get that in many open-source communities, feature-requests are generally frowned upon. It's like "who this guy thinks he is, telling me what's missing with MY great achievement, hu...?"
There is a lot of ego-involved...
Sometimes too-much so...

I'm not sure whom you're addressing here, but I don't think anyone is dismissing your requests because they are perceived as being critical of web2py and therefore ego threatening. I think the only problem so far has been a lack of specificity.

Granted, I don't pay anyone here, so social-legitimacy for requesting "anything" is questionable at-best...
But there is a difference between a "request" and a "demand", and a lot of people seem to conflate the two - especially in western-cultures.

Again, not sure to whom that comment is directed, but I haven't seen anyone here misconstrue your requests as demands, nor do I consider that to be a particularly western problem.

But do you really think that there is no place "at all" for feature-requests in FOSS communities? We may be "political" and call them "wish-lists" or "suggestions-for-additions" - that may sound better - but it is still practically the same thing...
Someone raises an idea, tries to convince other people that it's cool and shiny, and maybe someone else get's excited about it enough to implement it, and maybe not... But does this make it illegitimate to try?

I think that all sounds fine -- you just have to do some more work to convince folks you have a truly cool and shiny new idea. It's one thing to make a request and hope someone gets excited enough to implement it, but quite another to make a request and then chastise the audience for failing to become excited.

Anthony

Arnon Marcus

unread,

Apr 29, 2013, 9:17:44 PM4/29/13

to web...@googlegroups.com

So, clean-slake here - who is excited about SQLA's unit-of-work/session implementation?

http://youtu.be/uAtaKr5HOdA

* The slides for it are here:

http://techspot.zzzeek.org/files/2012/session.key.pdf

* Thw second-half shows an interactive walk-though of it, using HTM5, which

you can manually interact with yourself using this:

http://techspot.zzzeek.org/files/2012/session.html.tar.gz

How much of that is the DAL doing? How does it map to it?

Would it be correct to say that a db-connection is akin to an SQLA-session?

I have gone through the DAL documentation again, and I've seen glimpses of parts of this, but the whole auto-querying-on-attribute-access - with implicit-transaction-caching - is a really cool feature.

Can I do "db.person[1].name" and have it query the database "if and only if" this value was not already queried in the current transaction?

I saw that it "auto-commits" at the end of each request/response session, right?

So, this is the DB-transaction view that is "committed", right?

So, if I manually-commit - it automatically starts a new transaction?

If I get a row object, then run the same query again - will I get the same row-object internally? I mean, does the DAL do this cool identity-map thing internally?

I'm thinking about this whole dirty-checking/invalidation thing - it seems crucial for enabling orm-like access to the records (meaning, auto-query-on-access).

We could emulate that in an abstraction-layer - I think this is what I am after.

Am I being more clear now?

With these features in the DAL, I we can pass-around the db object from controller-actions to custom-modules, instantiating it's classes with it - which would be the equivalent of passing the "session" object in SQLA.

This way, we can build classes that provide attribute-accessors that proxy DAL-SET-objects, and include implicit-cashing with a memoizer, and even go farther and do lazy-loaders with deferred-query classes.

What do you say?

Massimo Di Pierro

unread,

Apr 29, 2013, 11:46:36 PM4/29/13

to web...@googlegroups.com

Hello Arnon,

In web2py all DAL operations are wrapped in a transaction which can be committed explicitly via db.commit() or automatically at the end of an http request. It does not suffer from the problems of Active Records mentioned in the slides where each object is committed separately and thus things can get out of sync.

In web2py objects are not proxies, as they are in SQLA. The DAL is simply an abstraction to generate SQL for you. the DAL api (insert/delete/update/select) are mapped into the corresponding SQL statements and executed when called, not deferred. Transactions are handled at the database level. This has pros and cons. The pro is that the developer knows when the DB IO is performed. In the SQLA ORM because objects are proxies and DB IO is always lazy, you do not know when DB IO is performed. I do not like that. Yes this is cool from an implementation point of view but still I need to understand what benefit it provides. It does not seem to provide a performance benefit. It does provide a consistency benefit but the consistency problem only exist if the object persists beyond the session in which it is defined. In DAL records are not objects, they are dictionaries, and they only exist for the life of an HTTP request therefore it is not clear one gains anything from the extra complexity.

In other words, in my view, the consistency problem is a database problem. ACID and Transactions solve it. Some ORM move the problem at the application level because of the imperfect map between SQL API and Object API. This creates a problem and they have to jump hoops to solve it. Sessions partially solve the problem the ORM created.

Perhaps I am missing something.

Massimo

Arnon Marcus

unread,

Apr 30, 2013, 6:11:56 AM4/30/13

to web...@googlegroups.com

So, let me see if I get this straight - I'll describe how I understand things, so stop me when I start speaking non-sense:

"

Web2py's DAL is stateless, and hens is not designed for object-oriented style of data-modeling. Whenever you create an object to represent a record, you are creating a statefull scenario - you now have to manage the internal-state of that object, and keep it synchronized with the rest of the transaction. By restricting the support of the DAL from using such a pattern, web2py avoids a whole class of problems prevalent in ORMs that are implemented via the Active-Record pattern. Rows are not comparable to ORM-result-objects, as they are not supposed to be treated as statefull - they are mere convenience-wrappers over a dictionary for attribute-access usage for getting field-results. This goes well with web2py's execution model, as it does not intend to be statefull, in the sense that models and controllers are 'executed' on each request-session, and hence Rows are not meant to be treated as objects that could persist in memory across request-sessions. However, additional convenience-functions are present in a Row object, for thread-local changes that can be done on it, without it updating-back to the current transaction automatically. An explicit-update is thereupon expected to be taken by the controller-code, in order to update the changes for the current database-transaction.

Using the DAL, transaction-creation is handled implicitly by the database-itself (if there are no active-transactions in the current db-connection-session and then the connection is used), but committing of a transaction is handled explicitly by the DAL, either automatically at request-session-end, and/or manually by a call to do so on the db-connection object. By designing it this way, web2py is leaving the job of transaction-management to the database itself, and avoids many issues that might arise if it would have tried to manage the transaction itself. Given all that, the DAL's design is ill-suited for an OOP style of coding the data-models, as whenever an object is instantiated, it becomes statefull, and hence requires some mechanism to manage it's state - even if it's lifetime is bound to a single transaction and/or connection-session - a mechanism that web2py's DAL is not providing - by design - in order to avoid a slew of issues that could arise from mismanaging these object's state. The Row object may seem as though it has it's own internal-state-management functionality, but this is only for short-lived manipulations, for aggregating changes into a single update operation within the transaction, for performance improvement. It's state-management is not meant to be used like it is used in ORMs.

"

Am I with you so far?

Anything I got wrong?

Michele Comitini

unread,

Apr 30, 2013, 6:50:04 AM4/30/13

to web...@googlegroups.com

I love the DAL since I find it's functional programming approach very lean without the need to switch back and forth from OO to relational reasoning :-)

Moving from a mapping object (the row) to a complex custom object is much easier than it seems at the python level. ORMs on the other side tend to build objects their own style and you have to live with it. In static languages (java, C#, C++) ORM [anti]pattern actually seems a benefit. On dynamic languages like python there is no need of ORM, but there are many since it is easier and fun to develop them. Implementing *your* ORM for *your* application can be done in few lines of code, very readable and easy to maintain.

The following blog entry shares much of my idea of ORM in real applications with large

http://seldo.com/weblog/2011/06/15/orm_is_an_antipattern

My 4 points where I think that DAL can be (and so has to be ;-) ) improved:

1. lazy rows being backed by a sql cursor when requested by the user;

2. since point 1. is not always the solution, row object instance creation must be way faster;

3. a simple way to support db indexes even at a basic level;

4. sql view/readonly tables support (this could be related to cacheable tables).

mic

2013/4/30 Massimo Di Pierro <massimo....@gmail.com>

--

---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.

Niphlod

unread,

Apr 30, 2013, 8:05:16 AM4/30/13

to web...@googlegroups.com

put them on the roadmap ^_^ as for 3. it's just a matter of finding the right syntax for every db engine. I'm not sure about 4., it's available right now with migrate=False ?

Arnon Marcus

unread,

Apr 30, 2013, 8:17:37 AM4/30/13

to web...@googlegroups.com

The following blog entry shares much of my idea of ORM in real applications with large
http://seldo.com/weblog/2011/06/15/orm_is_an_antipattern

I read this article about a week ago.

It is interesting, but I find 2 fundamental problems with it - it assumes premises that are termed as an "either-or"-type scenarios:

- You "either" have a relational-data-model, "or" you don't.

- You "either" use an abstraction-layer, "or" you don't.

These are idealistic-premises that do not map to reality.

In reality, you have a hybrid-data-model, with some areas that are more "relational" than others - all co-existing within a single database.

In reality, you almost always end-up dealing with a framework using "multiple" abstraction layers, at different areas of your application, depending on how well a given use-case maps to higher-layers of abstractions that the framework provides.

Another MAJOR issue, with basically every critique I have read about ORMs, is that they almost EXCLUSIVELY target the Active-Record pattern.

The entire first half of the SQLA lecture I posted here, is talking explicitly about this, and goes into great detail in describing the problems inherited in the Active-Record pattern.

Another SQLA lecture I put here, is discussion exactly about the reality of abstraction-layer-usages that I have described, in terms of a hybrid-approach.

It also shows how a layered-approach has benefits that surpass the shortcomings of having to learn multiple layers (a problem that the article touches upon).

Contrary to ANY ORM out there, SQLA provides BOTH a relational-layer AS-WELL as an ORM-layer on-top, both of which use the same SQL-Expression syntax, using python-objects:

- No need for a 3'rd language like Java-Hibernate's HQL.

- No need for manual SQL witing.

- SQLA-DAL (SQL-Expression layer) is EXPLICITLY uses throughout the build-up of ORM classes by the developer, and is NOT assumed to be treated as a "hidden" layer.

- SQLA-ORM is NOT using the Active-Record pattern (so basically ALL criticisms you generally read about ORMs, do NOT apply...)

- SQLA-ORM layer is OPTIONAL(!), and even if you DON'T use it, you STILL DON'T have to write SQL(!)

You use SQL-Expressions, which is ALMOST IDENTICAL to web2py's DAL (!)

Basically, almost all of that article is completely irrelevant to SQLA, and hence to the model that I am proposing here.

As to OOP, again, it is almost exclusively-criticized in respect of the Active-Record pattern (which again, is irrelevant here...)

SQLA's ORM, and hence what I am proposing, goes in line with the article's proposals near the end, which is:

"For relational-models, go with a more direct-SQL layer for using the data, and build business-logic layers on-top."

Most of what SQLA's ORM is providing, is "automation"-tools, for helping the build-up of business-logic classes "on-top" of a pythonic-SQL-abstraction-layer.

My 4 points where I think that DAL can be (and so has to be ;-) ) improved:

1. lazy rows being backed by a sql cursor when requested by the user;
2. since point 1. is not always the solution, row object instance creation must be way faster;
3. a simple way to support db indexes even at a basic level;
4. sql view/readonly tables support (this could be related to cacheable tables).

I would add:

5. A configurable mechanism based on the "Unit-Of-Work" pattern.

Anthony

unread,

Apr 30, 2013, 8:42:23 AM4/30/13

to web...@googlegroups.com

This is nice, but you're still talking about very general features. It would be more helpful if you could explain why we need those features. Perhaps you could show in SQLA how you would specify and interact with a particular data model, and then explain why you think that is better/easier than how you would do it using the DAL.

Anthony

Massimo Di Pierro

unread,

Apr 30, 2013, 10:33:38 AM4/30/13

to web...@googlegroups.com

Yes. In other words: we do not try introduce problems for the purpose of solving them. ;-)

In my opinion, the job of keeping a consistent state is with the database (and its acid model) not with the API.

Michele Comitini

unread,

Apr 30, 2013, 11:35:09 AM4/30/13

to web...@googlegroups.com

2013/4/30 Niphlod <nip...@gmail.com>

put them on the roadmap ^_^ as for 3. it's just a matter of finding the right syntax for every db engine. I'm not sure about 4., it's available right now with migrate=False ?

4. migrate False is on the DDL side and you are right about that, but I'd like it also on DML part i.e. in postgresql doing an INSERT on a VIEW is not allowed.

Niphlod

unread,

Apr 30, 2013, 11:59:57 AM4/30/13

to web...@googlegroups.com

so just a "read_me_only" kind of check where if you even try to load a record in an edit form web2py complains ?
or just complain before any insert(), update() is tried on any "read_me_only" fake table (i.e. do those always in a try:except block or do before an "assert is_not_fake" before "write" operations)?

not sure cause I never had that requirement (and I don't have python right now), but wouldn't the same thing being accomplished by writable=False on all columns and using validate_and_update(), validate_and_insert() - as you should if you're unsure about the table model, i.e. you don't know if it's a table or a view ?
I'm a little unsure about the performance penalty on checking this kind of things every time when they can be prevented just knowing that that particular collection of entities is not writable ^_^

PS: some backends allow specially coded views to be updateable and deletable......

Michele Comitini

unread,

Apr 30, 2013, 12:37:37 PM4/30/13

to web...@googlegroups.com

2013/4/30 Niphlod <nip...@gmail.com>

so just a "read_me_only" kind of check where if you even try to load a record in an edit form web2py complains ?
or just complain before any insert(), update() is tried on any "read_me_only" fake table (i.e. do those always in a try:except block or do before an "assert is_not_fake" before "write" operations)?

not sure cause I never had that requirement (and I don't have python right now), but wouldn't the same thing being accomplished by writable=False on all columns and using validate_and_update(), validate_and_insert() - as you should if you're unsure about the table model, i.e. you don't know if it's a table or a view ?
I'm a little unsure about the performance penalty on checking this kind of things every time when they can be prevented just knowing that that particular collection of entities is not writable ^_^

After thinking a bit maybe more useful thing is to have a new property in the table to allow simple introspection.

for t in db.tables:

if db[t].table_type == "view":

print "%s is not writable!"

PS: some backends allow specially coded views to be updateable and deletable.....

PostgreSQL allows you to do the worst things by defining rules ... alas real world is a nasty place... ;-)

On Tuesday, April 30, 2013 5:35:09 PM UTC+2, Michele Comitini wrote:

2013/4/30 Niphlod <nip...@gmail.com>

put them on the roadmap ^_^ as for 3. it's just a matter of finding the right syntax for every db engine. I'm not sure about 4., it's available right now with migrate=False ?

4. migrate False is on the DDL side and you are right about that, but I'd like it also on DML part i.e. in postgresql doing an INSERT on a VIEW is not allowed.

Arnon Marcus

unread,

Apr 30, 2013, 1:20:08 PM4/30/13

to

On Tuesday, April 30, 2013 3:42:23 PM UTC+3, Anthony wrote:

This is nice, but you're still talking about very general features. It would be more helpful if you could explain why we need those features. Perhaps you could show in SQLA how you would specify and interact with a particular data model, and then explain why you think that is better/easier than how you would do it using the DAL.

Well, I propose adding a set of tools for building business-model classes on-top of the DAL.

I basically mean implementing a 'Unif-Of-Work', and an 'identity-map' as explained in the link(s) I provided.

Now, what does this mean...

Well, here is what it does NOT mean - it should NOT lead the developer to implement an Active-Record model.

Class-attributes representing records, should NOT 'save()' whenever you set them, and should NOT automatically 'load()' whenever you access them.

That said, the developer should also NOT have to manage the 'order-of-operations' by himself - this should be automated (read: "auto-inferred from the schema').

Instead, when an attribute is being set, it should marks itself as 'dirty', and would be 'pushed' to the database later.

There should also be a way to tell the class-attributes, which other attributes of which other classes would be affected by it being modified (SQLA calls it the 'relationship-object').

Then, there should be a procedure that uses this information, and while keeping track of what's going on inside the transaction, whenever a record is changed somehow, it 'discovers' which other records participating in the current transaction, might be effected by that change to that attribute, and invalidates their caches.

The idea is that there is a mechanism that makes sure that if there are pending-changes of other attributes that needs to be pushed 'before' the attribute-being accessed is valid, then the push-operations are executed before the attribute refreshes itself. This is done by linking events between attributes, and propagating-them on attribute-access (SQLA calls this a 'cascade').

* The process of marking relationships may be further automated even beyond what exists in SQLA, by analyzing the db-schema itself, and 'inferring' relationships bi-directionally, and wiring all the events automatically at class-instantiation-time.

If there are no such pending-changes, than the attribute being accessed should be assumed to be valid (so it's "caches" value is returned), unless it was previously invalidated by a transaction-commit. Every transaction-commit should invalidate all existing records that are in memory.

This is basically the 'unit-of-work' pattern (if I understood correctly)

* But you should really watch the lectures I posted - they probably explain this much better than I did...

"What are the benefits?" You should ask?

Well, there are performance-improvements with the caching mechanism, and with the aggregation of operations - certain change-operations are only pushed to the transaction-view, when they are actually needed, or at the end of the transaction. There may also be consistency-benefits, by insuring correct order-of-operations.

Lastly, the main benefit is automatic-handling of caching and ordering or operations, that the developer no-longer needs to take care of himself.

The benefit is not just for simple-data-modes, on the contrary - the more complex the data-mode, the more this becomes beneficial, as the automatic-detection of relationship-dependencies, and auto-cascade of operations, can both save brain-cycles from the developer trying to hold the whole schema in his head and make sure he pushes things in the correct order, and also prevent human-errors that can mess-up the database in cases that constrains are insufficient, and the developer overlooked some relationship-dependencies and was using stale-data without knowing it.

Then there is the 'identity-map':

This is a complementary-needed feature, to maintain consistency across results of the same records taken in different places in the code within the same transaction.

It makes sure that there can be one-and-only-one instance or a record in-memory withing a single transaction of a single db-session.

"What are the benefits here?" You ask?

First of all, there's a consistency issue that can arise without such a system, if the developer is manually controlling the order-of-operations, and makes a mistake.

This is not an issue that is exclusive for Active-Record patterns - it can also happen in web2py's DAL.

For example, lets take the following code:

def setItemName(id, name):

row = db.item[id].select()

row.update_record(name=name)

def getItemByName(name):

return db.item(db.item.name==name).select()

id = 42

...

row = db.item[id].select()

row.name='some name' # or row[name]='some name' or row.update(name='some name')

...

setItemName(id, 'some other name')

....

getItemByName(row.name)

As you can see, the last function-call would either fail, or silently return something other than what was asked for.

The reason is that there is no singleton of a record that was given back.

The function that pushed the item's name-change to the database, did not affect the row-object that represented that same record outside.

The 'identity-map' solves this by storing all rows received for a given transaction, in a thread-local-global variable, and re-uses it

based on it's identity (table-name + field-name + primary-key). This also has some memory-benefits, especially for large queries.

* BTW: SQLA-ORM is doing this, Django-ORM is not...

This feature can be implemented within the DAL itself, but does not have to be. As long as the db-connection is paired with a thread-local pool of records, and that this pool is invalidated on each commit, things with work fine. But it's an obvious-fit for the db-connection object, as it can use it as a cache internally, and can invalidate it's pool internally on 'commit()' calls.

Anthony

unread,

May 6, 2013, 10:47:06 PM5/6/13

to

"What are the benefits?" You should ask?

Well, there are performance-improvements with the caching mechanism, and with the aggregation of operations - certain change-operations are only pushed to the transaction-view, when they are actually needed, or at the end of the transaction.

I suppose there could be in some cases, but to avoid multiple updates to the same record, you can apply changes to the Row object and then call row.update_record() when finished. On the other hand, there may be cases where you want separate updates in order to trigger separate database triggers or DAL callbacks. Anyway, before investing lots of time building (and subsequently maintaining) a rather sophisticated ORM, it would be nice to know what kind of performance improvements can be expected and in what circumstances. In other words, how often do cases come up where the DAL is much less efficient than SQLA, and how difficult would it be to write equally efficient DAL code in those cases?

Lastly, the main benefit is automatic-handling of caching and ordering or operations, that the developer no-longer needs to take care of himself.
The benefit is not just for simple-data-modes, on the contrary - the more complex the data-mode, the more this becomes beneficial, as the automatic-detection of relationship-dependencies, and auto-cascade of operations, can both save brain-cycles from the developer trying to hold the whole schema in his head and make sure he pushes things in the correct order, and also prevent human-errors that can mess-up the database in cases that constrains are insufficient, and the developer overlooked some relationship-dependencies and was using stale-data without knowing it.

Do you have an example of the above (i.e., where use of the DAL is complicated and prone to error but SQLA handles things automatically)?

For example, lets take the following code:

def setItemName(id, name):
row = db.item[id].select()
row.update_record(name=name)

def getItemByName(name):
return db.item(db.item.name==name).select()

id = 42
...
row = db.item[id].select()
row.name='some name' # or row[name]='some name' or row.update(name='some name')
...

def setItemName(id, 'some other name')
....
getItemByName(row.name)

As you can see, the last function-call would either fail, or silently return something other than what was asked for.

So, obviously you wouldn't do it that way with the DAL, as that isn't how it works. But how important is it to have this pattern? Once you've finished manipulating the record, you can do row.update_record(), and then any subsequent select will see the changes. It seems like this would be useful only if you need to do a read in between two separate updates to the record and you need the read to see the first update (and for some reason you can't just look at the existing Row object itself). Is this a common enough use case to justify replicating the SQLA ORM on top of the DAL? Do you have a real-world example where this would be a critical feature?

Anthony

Derek

unread,

Apr 30, 2013, 1:57:45 PM4/30/13

to web...@googlegroups.com

Agreed, you'd never do it this way. First of all, you have the ID already, so why would you use 'getItemByName'. Second, the getItemByName is flawed, it will return a set, not a single row. Third, if you are selecting anything, before you perform an action on it, you should at least verify that you have records to work on. Fourth, if 'setItemName' is an internal function, then you wouldn't want it to call update_record since you may have other updates that need to happen.

Developer stupidity is a problem, but you aren't going to solve that with code.

On Tuesday, April 30, 2013 10:45:04 AM UTC-7, Anthony wrote:

"What are the benefits?" You should ask?

Well, there are performance-improvements with the caching mechanism, and with the aggregation of operations - certain change-operations are only pushed to the transaction-view, when they are actually needed, or at the end of the transaction.

I suppose there could be in some cases, but to avoid multiple updates to the same record, you can apply changes to the Row object and then call row.update_record() when finished. On the other hand, there may be cases where you want separate updates in order to trigger separate database triggers or DAL callbacks. Anyway, before investing lots of time building (and subsequently maintaining) a rather sophisticated ORM, it would be nice to know what kind of performance improvements can be expected and in what circumstances. In other words, how often do cases come up where the DAL is much less efficient than SQLA, and how difficult would it be to write equally efficient DAL code in those cases.

Anthony

unread,

Apr 30, 2013, 2:03:01 PM4/30/13

to web...@googlegroups.com

And keep in mind that the ORM alone in SQLA is over 20,000 lines of code (by contrast, the DAL is now around 10,000 loc). Even if we could get away with a modest fraction of that, this would be a significant undertaking. So, we really need compelling evidence that there are common use cases where this would make a notable difference.

Also, some of these features may not require an ORM per se -- it may be possible to add some of this functionality within the context of the DAL.

Anthony

Arnon Marcus

unread,

Apr 30, 2013, 3:00:12 PM4/30/13

to web...@googlegroups.com

I admit it is a very silly example, I didn't give much thought to it - I just looked for something to exemplify a problem that might occur when no identity-mapping exists. The fact that I could not find a better example, is not a testament to say that such scenarios don't (or can't) occur.

There where better example in one of the talks I posted here...

Arnon Marcus

unread,

Apr 30, 2013, 3:27:33 PM4/30/13

to web...@googlegroups.com

Here is a really awesome talk about SALQ's Core/ORM dichotomy, featuring diagrams that overview the architecture of the ORM, and how it implements the Unit-of-Work pattern:

http://lanyrd.com/2011/pygotham/shpkm/#link-fkfb

If you take into account what's said in this talk - that basically the ORM and the CORE in SQLA are 2 separate beasts - than if follows what I've been saying here, that the web2py-DAL is equivalent just to the CORE, and has no features that the ORM provides whatsoever.

That leves you with 20K lines of code that are not represented at all inside the 10K lines of code of web2py's DAL. I think this would be a solid assertion.

That means that it would be a substantial undertaking to replicate the SQLA-ORM over web2py.

But it also means something else - that saying that it's benefits are questionable, is akin to saying that a 20k lines-of-code of the coolest ORM in existence, is a useless-piece of software...

Useless in the sense that it takes a lot and gives very little (if at all)

Somehow I doubt that assertion...

Lastly, what this may all mean, is that SQLA-ORM may be re-tooled to be layered on-top of web2py's DAL instead of it's CORE.

This should be especially easier using what he called in the talk the "classical mapping" pattern, in which your domain-model classes and your CORE/DAL objects are built-up completely separately, and a 3'rd separate "mapper" objects takes care of "instrumenting" your domain-classes with the CORE/DAL objects to essentially build-up and form the ORM out of mearging the two via what he called "formal-monkey-patching" (which I assume he means some sort of dependency-injection).

Given all that, all you would need to do is sub-class the "mapper" and redactor it to instrument domain-classes using web2py's DAL instead.

This may be a substantially-less of an undertaking, and would basically achieve the same thing. Since these are both open-source projects, both written in pure-python, it should be at the very-least "possible" to do something like that.

I had seen past-experiments of doing the opposite - mapping the DAL to layer on-top of SQLA models - but nowhere was it suggested to do it the other way around with SALQ's ORM... I thing it is even a more logical rout to take, as you don't need SQLA's CORE, we already have a full equivalent for that...

Arnon Marcus

unread,

Apr 30, 2013, 4:52:07 PM4/30/13

to

Hi Massimo,

You should "really" watch this lecture:

http://lanyrd.com/2011/pygotham/shpkm/#link-fkfb

I think you will like it, even if you don't like ORMs.

It shows some architectural diagrams that you might find interesting, for both the Core layer and ORM layer of SQLAlchemy.

I think this would make clear the dichotomy between the two.

As the lecture shows, the Core and the ORM are actually separate code-bases (though the ORM is built on-top of Core), so that people can actually not use the ORM at all if they don't like to, and what they will be left eith, is basically equivalent to what web2py provides - a DAL.

Since these are separate layers, I think what might be an interesting research-project, is not re-inventing the wheel, but instead try to layer SQLA's ORM on-top of web2py's DAL. The past-experiments I've seen where trying to do the opposite, but I actually think that this makes more sense.

The modern way of generating an ORM class, is called the "declarative-mapping" approach, in which you build the schema right into the python classes ("decleratively"). This would be way to difficult to re-instrument.

However, there still exist the 'older', more "dis-jointed" way of mapping that they call "classical-mapping", which maps a completely regular python classes to a 'Core' schema-object, and "generates" an ORM-class out of the two using a 3'rd object called a "mapper".

It is basically taking a regular python class, and a schema that was built using Core (in an almost identical fashion as it is done in the web2py-DAL) and using a separate mapper-object to 'instrument' the schema into the class, and thus generating an ORM class out of the merge of the two.

So I was thinking it might be interesting research-project, is to just try to sub-class the mapper-object, and write a web2py-DAL-mapper variation, that basically generates the same ORM, using the same kind of instrumentation, but just using the web2py-DAL schema-object instead of the SQLA-Core one.

What do you say?

Is that feasible?

Massimo Di Pierro

unread,

Apr 30, 2013, 5:06:33 PM4/30/13

to web...@googlegroups.com

I will watch it asap. Anyway, I am not trying to discourage you or oppose to it. If you have resources and want to start building something like this, you have all my support. Some people may love it. What I cannot promise is that it will be included in web2py until I see a real benefit.

Derek

unread,

Apr 30, 2013, 5:57:34 PM4/30/13

to web...@googlegroups.com

Have you not tried just importing sqla in your 0.py model, and writing your models and code as you see fit? You can certainly bypass the DAL if you want.

Anthony

unread,

Apr 30, 2013, 6:18:50 PM4/30/13

to web...@googlegroups.com

If you take into account what's said in this talk - that basically the ORM and the CORE in SQLA are 2 separate beasts - than if follows what I've been saying here, that the web2py-DAL is equivalent just to the CORE, and has no features that the ORM provides whatsoever.

I'm not sure that follows. The web2py DAL has a lot of features that might otherwise be found in an ORM. I'm not very familiar with SQLA, but I suspect the DAL has some features not present in CORE but similar to functionality included in the ORM.

But it also means something else - that saying that it's benefits are questionable, is akin to saying that a 20k lines-of-code of the coolest ORM in existence, is a useless-piece of software...

Anything is possible. Feel free to make the case with some real examples.

You might also consider trying SQLA directly in place of the DAL, and maybe using something like WTForms if you need more forms automation.

Anthony

Arnon Marcus

unread,

Apr 30, 2013, 6:56:54 PM4/30/13

to web...@googlegroups.com

On Tuesday, April 30, 2013 2:57:34 PM UTC-7, Derek wrote:

Have you not tried just importing sqla in your 0.py model, and writing your models and code as you see fit? You can certainly bypass the DAL if you want.

I know that, but well, you see, there lies the problem - I DON'T want... I love the DAL too much - wouldn't change it for anything! :)

(not even SQLA-Core...;) )

I even considered using web2py's DAL outside of web2py as well - in all the plug-ins I plan to write for desktop-applications. It's an amazing and elegant piece of software!

Besides, it's not feasible for us anymore anyways, as I said, we already have thousands of lines of code built using the DAL - switching it to something else would be a nightmare, and way too costly. we have DAL-based code more than any other python code - hell, we would be switching a web-framework before we consider replacing the DAL... :)

All I want is a decent ORM on-top to structure all that wonderful DAL code we have into... But it has to be statefull to be worth-it, so I couldn't write my own, and SQLA-ORM is the best ORM in python I currently know of...

I am looking at Storm also, but it currently seems to be less-modular that SQLA... That's from just a glance, though...

Arnon Marcus

unread,

Apr 30, 2013, 7:05:25 PM4/30/13

to web...@googlegroups.com

On Tuesday, April 30, 2013 3:18:50 PM UTC-7, Anthony wrote:

If you take into account what's said in this talk - that basically the ORM and the CORE in SQLA are 2 separate beasts - than if follows what I've been saying here, that the web2py-DAL is equivalent just to the CORE, and has no features that the ORM provides whatsoever.

I'm not sure that follows. The web2py DAL has a lot of features that might otherwise be found in an ORM. I'm not very familiar with SQLA, but I suspect the DAL has some features not present in CORE but similar to functionality included in the ORM.

I wouldn't be so sure about that...

You should really check out the links I've posted in this thread - SQLA-Core is a fully-fledged DAL. Reddit is using it alone, without any of the ORM level...

And again, it may be a matter of semantics, but there is no ORM feature in the DAL - I've been going over the documentation and asking lost of questing talking to Massimo - it's pretty conclussive - iweb2py's DAL is exclusively-stateless...

You might also consider trying SQLA directly in place of the DAL

Anthony

unread,

Apr 30, 2013, 10:07:10 PM4/30/13

to

If you take into account what's said in this talk - that basically the ORM and the CORE in SQLA are 2 separate beasts - than if follows what I've been saying here, that the web2py-DAL is equivalent just to the CORE, and has no features that the ORM provides whatsoever.

I'm not sure that follows. The web2py DAL has a lot of features that might otherwise be found in an ORM. I'm not very familiar with SQLA, but I suspect the DAL has some features not present in CORE but similar to functionality included in the ORM.

I wouldn't be so sure about that...
You should really check out the links I've posted in this thread - SQLA-Core is a fully-fledged DAL. Reddit is using it alone, without any of the ORM level...

Well, I did say I "suspect", not that I'm sure, but does it have things like migrations, automatic file uploads/retrieval, recursive selects, automatic results serialization into HTML, virtual fields, computed fields, validators, field representations, field labels, field comments, table labels, list:-type fields, JSON fields, export to CSV, smart queries, callbacks, record versioning, common fields, multi-tenancy, common filters, GAE support, and MongoDB support?

And again, it may be a matter of semantics, but there is no ORM feature in the DAL - I've been going over the documentation and asking lost of questing talking to Massimo - it's pretty conclussive - iweb2py's DAL is exclusively-stateless...

I didn't say there were ORM features in the DAL, just that it includes features that you might otherwise expect to find in an ORM (e.g., something like virtual fields). In other words, some of what you get with that 20,000+ lines of ORM code might be functionality that is in fact available in the web2py DAL.

Also, note that it might be the case that some of the features of the SQLA ORM that you find appealing don't necessarily require an ORM pattern per se but could possibly be implemented within the context of a DAL (i.e., just because SQLA does it via an ORM doesn't mean that's the only way).

You might also consider trying SQLA directly in place of the DAL

I know that, but well, you see, there lies the problem - I DON'T want... I love the DAL too much - wouldn't change it for anything! :)
(not even SQLA-Core...;) )

But if you use an ORM built on top of the DAL, you won't be using the DAL API anyway, so what would you miss? Or are you saying you would still want to use the DAL for a significant portion of code and only move to the ORM for select parts of the application?

Anyway, it may nevertheless be a useful exercise to start by using SQLA for some project just to see if it really delivers on the promises you believe it is making.

Anthony

Arnon Marcus

unread,

May 1, 2013, 10:15:11 AM5/1/13

to web...@googlegroups.com

I didn't say there were ORM features in the DAL, just that it includes features that you might otherwise expect to find in an ORM

Well, it seems like a semantic-issue. DAL and ORM are pretty abstract-terms.

Here is how interpret them:

DAL - A way to construct schemas and queries without writing SQL or DBAPI calls.

ORM - A way to construct domain-models using a DAL in a statefull manner.

The DAL is a way of saying:

"Hey, here's a bunch of objects and methods, please generate an SQL out of them, send them to the database for me and give me results"

The ORM is a way of saying:

"Hey, here's a bunch of classes and attributes, please wire them up so their instances would communicate their state to each other, optimizing my transaction-operations for me as I use them"

Generally, as Massimo confirmed, the DAL is purely stateless.

It only returns dictionary-like immediate-results.

...migrations, automatic file uploads/retrieval, recursive selects, automatic results serialization into HTML, virtual fields, computed fields, validators, field representations, field labels, field comments, table labels, list:-type fields, JSON fields, export to CSV, smart queries, callbacks, record versioning, common fields, multi-tenancy, common filters, GAE support, and MongoDB support?

That's a mouth-full...

Let's brake it down, shell we?:

Multi-Tenancy, Common-Filters, Smart-Queries:

These are SQL-related features - meaning, DAL-features, not ORM ones.

Common Fields, Automatic-Migrations, CSV/HTML/XML-Exports:

These are schema-related features - meaning, DAL/framework-features, not ORM ones

Labels, Comments:

These are schema-metadata-related features - meaning, DAL-features, not ORM ones.

GAE/MongoDB:

Target database-support is a low-level DAL-feature - The DAL may or may not support specific targets - but it's not an ORM feature.

JSON/List fields:

These are database-related feaatures - they are adapters for data-types that may or may not be supported in your target-database.

The DAL may or may not support them, but they are not ORM features either way.

Validators, Upload/Retrieval, Record-Versioning, Callbacks, Record-Representations:

These are not DAL nor ORM features - they are framework-features. They are input/output adapters.

It is a way of saying: "Hey, when you get results back, run them through these operations".

Virtual/Computed fields:

These are kinda-tricky to classify.

Computed-Fields are for automating input-transformations. They are a way of saying:

"Hey, take these values that I'm already inserting to these other fields, run them through this function, and store the result in that other field."

Virtual-Fields are for automating output-transformations. They are a way of saying:

"Hey, take these values that I'm already getting from these other fields, run them through this function, and produce the results as that other field".

The distinctions between these features vs. the ORM-equivalent ones, are quite subtle and illusive, but profound.

The first difference is of scope - Virtual/Computed-fields can only be applied to other fields of the same Table.

In (some) ORMs they are not, because an ORM class does not necessarily have to be a 1:1 representation of a table.

The whole point of an ORM is to be able to construct "domain-models", not mere class-forms of Table-descriptions.

In the DAL, Virtual/Computed-fields can NOT generate implicit calls to foreign-table-fields.

The second difference is of statelessness-vs-statfullness - The DAL is stateless, so it can-not give values from a previous query.

ORMs are statefull in nature, so:

- For output, Virtual-fields can use values already stored in those other-field's cache, and not even query the database.

- For input, Computed-Fields can use values already stored in those other-field's cache, and not even insert them to the database.

The DAL is stateless in nature, so:

- For output, Virtual-fields must query values from those other-field, in order to invoke the functionality of the automated-output.

- For input, Computed-Fields must insert values to those other fields, in order to invoke the functionality of the automated-input.

* There is also cache for the DAL, but it's time-based, and not transaction-based.

As for Lazy-virtual-fields, they are not the sort of laziness that an ORM has - it's a deferred-execution of the automation that is defined to run on the records after the query has returned. In ORMs there are deferred-queries for laziness.

Recursive-Selects:

The documentation on this feature is not cleat - it seems that it generates queries on your behalf, but is only useful for single-record queries, as it's like an Active-Record pattern - it doesn't do any transaction-level operation-optimizations within loops (as there are non deferred-queries).

But if you use an ORM built on top of the DAL, you won't be using the DAL API anyway, so what would you miss? Or are you saying you would still want to use the DAL for a significant portion of code and only move to the ORM for select parts of the application?

As many SQLA-talks explain, the use of an ORM is rarely a viable "replacement" for a DAL - it is a supplement.

It's not just a matter of domain-modeling that is not needed or unfit for some use-cases - but rather a DAL should be using within domain-model-classes to define complex-queries.

Good ORMs are not implementing an Active-Record pattern (at least not exclusively), so it should not be used as a schema-modeler only (or at all), but as a class for defining higher-level constructs with attributes derived from complex queries.

Anyway, it may nevertheless be a useful exercise to start by using SQLA for some project just to see if it really delivers on the promises you believe it is making.

Or read the documentation and some SQLA forums... :)

Anthony

unread,

May 1, 2013, 12:35:44 PM5/1/13

to web...@googlegroups.com

I didn't say there were ORM features in the DAL, just that it includes features that you might otherwise expect to find in an ORM

Well, it seems like a semantic-issue. DAL and ORM are pretty abstract-terms.
Here is how interpret them:
DAL - A way to construct schemas and queries without writing SQL or DBAPI calls.
ORM - A way to construct domain-models using a DAL in a statefull manner.

I don't think you are understanding me, so let me try to be more clear. Let's say an ORM is a particular design pattern for modeling data, and a DAL is a different design pattern for modeling data. Each of those different design patterns can nevertheless be used to implement similar types of features. For example, you might want to query the database and return a results set. This can be done in an ORM, and it can be done in a DAL. The implementation and the syntax will be different in each case, but they are both implementing a common feature. So, when I say the DAL implements features that might otherwise be found in a typical ORM, I am not saying the DAL implements an ORM design pattern, just that it replicates functionality for which you might otherwise use an ORM. For example, in an ORM, you can define a method in a class that returns a value calculated from the fields of a database record. In the web2py DAL, this same functionality can be achieved using a virtual field or lazy field. I don't know if the SQLA CORE has virtual fields, but if it doesn't, I would suppose it leaves this kind of functionality to the ORM.

The point is, many features found in DAL's and ORM's are not unique or specific to the DAL or ORM design pattern. Each design pattern can be used to implement many common types of functionality (the functionality may not be identical, but at least similar, and used to satisfy the same goals).

The ORM is a way of saying:
"Hey, here's a bunch of classes and attributes, please wire them up so their instances would communicate their state to each other, optimizing my transaction-operations for me as I use them"

That looks like the definition of the SQLA ORM, not ORM's in general.

...migrations, automatic file uploads/retrieval, recursive selects, automatic results serialization into HTML, virtual fields, computed fields, validators, field representations, field labels, field comments, table labels, list:-type fields, JSON fields, export to CSV, smart queries, callbacks, record versioning, common fields, multi-tenancy, common filters, GAE support, and MongoDB support?

That's a mouth-full...
Let's brake it down, shell we?:

No, let's not. My point is not that any of those items properly belong to either a DAL or an ORM, or that they can only be implemented with either a DAL or an ORM design pattern. Rather, you had claimed that the SQLA CORE is equivalent to the web2py DAL and that all 20,000+ lines of SQLA ORM code must therefore be providing unique functionality not available in the DAL (thus implying that the ORM must be useful). I was just suggesting that the DAL might be doing more than the SQLA CORE (at least in some areas), and that the DAL might possibly be offering some features for which you would otherwise need the SQLA ORM.

Regarding all the features you claim are inherently DAL features and not ORM features, I disagree. Any one of those features could rightly be part of either a DAL or an ORM. They are simply features you might want to implement within any data modeling abstraction, whatever the design pattern.

The first difference is of scope - Virtual/Computed-fields can only be applied to other fields of the same Table.

No, they can also be applied to the results of joins (not sure if that's typically as easy to do in an ORM) -- see http://web2py.com/books/default/chapter/29/06#Old-style-virtual-fields.

In the DAL, Virtual/Computed-fields can NOT generate implicit calls to foreign-table-fields.

Yes, they can with recursive selects.

The second difference is of statelessness-vs-statfullness - The DAL is stateless, so it can-not give values from a previous query.
ORMs are statefull in nature, so:
- For output, Virtual-fields can use values already stored in those other-field's cache, and not even query the database.
- For input, Computed-Fields can use values already stored in those other-field's cache, and not even insert them to the database.
The DAL is stateless in nature, so:
- For output, Virtual-fields must query values from those other-field, in order to invoke the functionality of the automated-output.
- For input, Computed-Fields must insert values to those other fields, in order to invoke the functionality of the automated-input.
* There is also cache for the DAL, but it's time-based, and not transaction-based.

I'm not quite sure what you mean here. Even in an ORM, in order to calculate the value of a virtual field, you first have to retrieve the input field values from the database; and when creating a computed field, you still ultimately have to write the computed value to the database.

Also, be careful not to confuse ORM's in general with the SQLA ORM in particular -- some properties you claim for ORM's are unique to SQLA, and they are not necessarily properties that can be implemented only within the ORM design pattern (i.e., they are not limited to ORM's).

Anyway, it may nevertheless be a useful exercise to start by using SQLA for some project just to see if it really delivers on the promises you believe it is making.

Or read the documentation and some SQLA forums... :)

Which I'm assuming you have not yet done, as you have yet to show an example of something you can do easily in the SQLA ORM but find difficult or inefficient in the DAL. A real project might help to surface some compelling use cases and provide benchmarks for possible efficiency gains.

Anthony

Derek

unread,

May 1, 2013, 1:36:35 PM5/1/13

to web...@googlegroups.com

You don't get to define terms any way you see fit. DAL and ORM have specific meanings.

DAL is a TLA (three letter acronym) for "Database Abstraction Layer".

ORM is a TLA for "Object Relational Mapping".

So, what does a DAL do? Wikipedia tells us that it "... is an application programming interface which unifies the communication between a computer application and databases such as SQL Server, DB2, MySQL, PostgreSQL, Oracle or SQLite. Traditionally, all database vendors provide their own interface tailored to their products which leaves it to the application programmer to implement code for all database interfaces he or she would like to support. Database abstraction layers reduce the amount of work by providing a consistent API to the developer and hide the database specifics behind this interface as much as possible. There exist many abstraction layers with different interfaces in numerous programming languages."

What does an ORM do? Wikipedia tells us that it "... is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a "virtualobject database" that can be used from within the programming language. "

So, the two terms are not mutually exclusive, but they handle different domains. It may be interesting to have an ORM on top of DAL, but I personally feel that creating YAORM (Yet Another Object Relational Mapping) is counter-productive especially when you could bypass the DAL and just use SQLA which you yourself say is the best ORM there is.

Now, perhaps what may be beneficial is to separate the DAL from the HTML generation and data validation logic. That way, you could plug in SQLA and yet your SMARTGRID and FORMs would work with all the bells and whistles.

Michele Comitini

unread,

May 1, 2013, 5:07:43 PM5/1/13

to web...@googlegroups.com

Why not write a driver for SQLA that speaks DAL instead of a sql dialect?

2013/5/1 Derek <sp1...@gmail.com>

--

Cliff Kachinske

unread,

May 1, 2013, 5:27:02 PM5/1/13

to web...@googlegroups.com

I started doing this stuff hand hacking SELECT, INSERT, DELETE and UPDATE statements and feeding them into MySQL via PHP. It was tedious, boring and error prone.

This is probably not a typical background, but it is the one I have.

Given my experience, the Web2py DAL feels great. The whole concept behind it is very straightforward: get a request from the controller, manipulate the data and pass a response back.

Also, working with the DAL is more akin to configuration than it is to programming. It's a little faster and way less error prone.

In my brief encounters with ORMs, they have seemed squishy and obtuse; overly complicated.

So I don't see the value of fitting an ORM layer on top of the DAL. But some might. I would propose that the best way to get others on board would be to channel the energy being burned on this thread into an implementable design or even a set of specific software requirements or pseudo code.

On Saturday, April 27, 2013 9:18:45 AM UTC-4, Arnon Marcus wrote:

I am in the process of researching ways to improve the structure of my web2py-app's code, simplifying usage of certain areas, and enabling RPC-like interface for external programs.

I use web2py for over 3 years now, and love every aspect of it - especially the DAL (!)

However, as the code grew larger, and as hierarchical domain-model-patterns started to emerge, I started to look for alternative ways of accessing and using the portion of the data-model that is strictly hierarchical in nature.

It is a huge controversial issue with relational-data-models which contain hierarchies. I don't intend to open a large discussion about this here. Suffice it to say, that even the most die-hard SQL lover, would admit it's shortcomings when hierarchies are introduces into the data-mode. It is an unsolved (probably "unsolvable") problem in data-model theory.

So it is no a matter of looking fot the "best" solution, because there can not exist such a concept - even in theory.

It is a matter of looking for the "most-fitting" set of trade-offs for the problem at hand.

That said, some projects are large and/or varied enough, that they DO include both highly-relational areas, as well as highly-hierarchical areas - within the same data-model (!)

For such use-cases, a more flexible/hybrid approach is beneficial.
You don't expect to have to choose either/or relational-models vs. hierarchical-models - you expect your framework to include and facilitate support for both approaches for the same database.
You would use the relational-features of the framework for when it is most suited for, and hierarchical-features for when IT makes better sense.
Ideally, your framework would be built in an integrated-yet-layered design, that would make it easy for you to accomplish both approaches in a synergetic manner.

My research has led me through ZODB and SQLAlchemy, just to get a feel for what an ORM could provide. Aside from reading a lot and watching a lot of lectures about these technologies, as well as general opinions about them, I have also taken the time to really go through tons of threads in this group about these issues. as well as the web2py documentation.

Bottom-line, my current feelings about this issue, is that there is still something missing in web2py to facilitate the construction of higher-levels of abstractions, that are more focused on business-logic than database-schema. I also feel that there are "dogmatic" sentiments being thrown from both sides of the fence in this flame-fest fiasco.
I think this hurts us - a lot.

I think a more constructive approach would be to acknowledge that there are different use-cases that can benefit from different approaches, and that this leads to opposing opinions regarding certain trade-off that are being sought after.

I think that web2py has taken an approach that is still too narrow-minded when it comes to supporting multiple-approaches, and that a layered-design could be beneficial here.

Case in point, the philosophy and design of SQLAlchemy:
http://www.youtube.com/watch?v=uvFBXxftvN4

Now, just to be clear, I think that the web2py-DAL's API is much cleaner, simpler, and more easy and fun to use than SQA's API, at least for the SQL-Expression layer. But I also think that SQA's is a more flexible approach - it can target a more varied set of use-cases.
Contrary to most of what I've read about SQA in this group, it's ORM is NOT mandatory, nor is it a "necessarily" more-restrictive/less-performant way of using the database. I think most criticisms I've seen here of it, are ill-informed, and have a somewhat "prima-facie" smell to them. They mainly attack the ORM concept in it's Active-Record form, which is NOT what SQA has. They also don't consider the layered-architecture of SQA, and compare the DAL with different implementations of ORMs that ARE more restrictive and obtuse, "assuming" that this is probably what SQA does, when in fact it is not. They compare the DAL to SQA's ORM, which is a fundamental mistake(!) The DAL is not comparable to SQA's ORM layer, but to it's SQL-Expression layer(!) These API's are almost identical in what they do - only the interface is different. So I could very well refer to SQA's SQL-Expression layer, as being SQA's DAL.
SQA's ORM is something different, that to my knowledge, web2py is lacking. It is a layer "on-top" of it's DAL, while "using" it's DAL, as well as lower-layers, in an integrated fashion.
In short, SQA is NOT a single-layer ORM (!)

Now, what I think is missing, is something comparable to SQL's ORM in web2py - another layer on-top of it's DAL. Not a set of classes, but a single abstract-class meant to be inherited from. There are many operations that can be automated when constructing an ORM layer on-top of web2py's DAL, using the DAL's already-existing schema information. These could benefit the construction of ORM classes using the DAL.
Examples for such benefits could be seen in the second part of this talk:
http://www.youtube.com/watch?v=woKYyhLCcnU
* This is an almost 3-hour talk from PyCon 2013, covering most aspects of SQA that even the most experienced users of it might be unfamiliar with - it is fast-paced and highly condensed, and well-worth the time to sit through.

These automations, I predict, would emerge as recurring-patterns for people trying to implement such a thing in web2py, and what I am looking for, is an integration-abstract-layer with tools (in the form of abstract-methods) that would facilitate the design and implementation of ORM classes using the DAL.

I don't have a clear conception of how such a thing would look like, I just have a general idea that such a thing would be most beneficial in many use-cases for which that DAL by itself is insufficient.

I feel that the fear of having such a layer "restrict" future development by locking-down the business-data-model are unjustified. Refactoring can still occur later-on, and should occur with minimal effort by the developer using web2py - the framework should support and facilitate these kinds of refactoring scenarios, with appropriate auxiliary methods in the form of adapter/proxy design-patterns. If the ORM-abstract-class's design is built up-front using such patterns, than this would facilitate refactoring, and thus avoid scarring-off developers from using it.
It should also facilitate performance-optimizations, by way of a transactionally-aggregating queries and updates/inserts automatically.
Optionally, this abstract-class would be meant to be "multipally-inheritted-from", using the mix-in pattern. This way, a developer can have a separate class-hierarchic dealing with pure-business logic, to keep separation-of-concerns between database-usage and domain-logic, and also to enable re-use of code for other DAL's if needed (i.e, in a python-client that talks to web2py via rpc).

These are all just ideas, and perhaps I am missing some things - maybe such capabilities already exist in some other form that I am unaware of.
I would appreciate any references to such uses, if they already exist.

(e.g - I am not too clear about lazy-tables and computed-fields, but I think they are an interesting start in this direction - I just haven't used them yet, as we are still using an older version of we2py that does not include them)

Arnon Marcus

unread,

May 1, 2013, 6:02:38 PM5/1/13

to web...@googlegroups.com

I am not re-defining terms - I understand them correctly.

An ORM is a "Mapping" between "Objects", and "Relations".

Here are the parts of the wikipedia pages, that are actually-relevant to this discussion (important parts in bold-text):

ORM:

"...Data management tasks in object-oriented (OO) programming are typically implemented by manipulating objects that are almost always non-scalar values. For example, consider an address book entry that represents a single person along with zero or more phone numbers and zero or more addresses. This could be modeled in an object-oriented implementation by a "Person object" with attributes/fields to hold each data item that the entry comprises: the person's name, a list of phone numbers, and a list of addresses. The list of phone numbers would itself contain "PhoneNumber objects" and so on. The address book entry is treated as a single object by the programming language (it can be referenced by a single variable containing a pointer to the object, for instance). Various methods can be associated with the object, such as a method to return the preferred phone number, the home address, and so on."

The first thing to notice here, is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects. There is no feature in web2py that generates such an object.

The second thing to notice here, is that the attributes of an ORM object usually contain child-objects (plural) that represent fields from a different table than the parent-object. Again, there is no feature in web2py that can generate such an object. A JOIN operation may return row objects, each of which may contain sub-attributes that hold A SINGLE field-value from a foreign-table, but it is a scalar-value - NOT another domain-entity-object (with it's own attributes, etc.), NOR a SEQUENCE of domain-entity objects

Here are some diagrams that presents it really well:

The crucial thing to notice here is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects, which themselves may contain links to other objects/sequences-of-objects, etc.

As for DAL, here is the part of the wilipedia page, that is relevant to this discussion:

"...Popular use for database abstraction layers are among object-oriented programming languages, which are similar to API level abstraction layers. In an object oriented language like C++ or Java, a database can be represented through an object, whose methods and members (or the equivalent thereof in other programming languages) represent various functionalities of the database. They also share the same advantages and disadvantages as API level interfaces."

As you can see, even wikipedia says that there is more to a DAL than just the SQL translation.

Here is another usage of the same "Three Letter Acronym" (DAL), that represents how an ORM is layered on-top of a DAL:

Obviously, THIS dal is not an abstraction-layer, but an access-layer, but it could very-well be substituted by an abstraction-layer that does the same thing.

Granted, this is a different form of an ORM, directly mapping class-attributes to table-fields, but in principal it is the same - an ORM is a layer on-top a DAL, that uses a DAL.

villas

unread,

May 1, 2013, 7:02:05 PM5/1/13

to web...@googlegroups.com

I am not as learned in these things but it seems to me that a DAL is a small black box, whilst an ORM is a larger black box.

If I look inside the DAL 'black box', I can just about figure out what's going on. If I look inside an ORM black box, it is already too complex (for me).

If we add the two black boxes together, it would only be maintainable by someone of Massimo's skill level (and even he thought it was too complex for his needs, hence the DAL!). I cannot imagine that anyone would commit themselves to such a project.

Perhaps we should try to list some of the benefits? Otherwise we shall remain theorists in pursuit of a hypothetical abstraction - which never looks good on a CV.

Anthony

unread,

May 6, 2013, 9:08:38 AM5/6/13

to

The first thing to notice here, is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects. There is no feature in web2py that generates such an object.

The second thing to notice here, is that the attributes of an ORM object usually contain child-objects (plural) that represent fields from a different table than the parent-object. Again, there is no feature in web2py that can generate such an object. A JOIN operation may return row objects, each of which may contain sub-attributes that hold A SINGLE field-value from a foreign-table, but it is a scalar-value - NOT another domain-entity-object (with it's own attributes, etc.), NOR a SEQUENCE of domain-entity objects

...

The crucial thing to notice here is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects, which themselves may contain links to other objects/sequences-of-objects, etc.

Although ORM's may do that, such a feature is not unique to the ORM pattern. In the web2py DAL, for example, in a Row object with a reference to another table, the reference field is actually a DAL.Reference object, not a single scalar value from a database record (it includes the scalar value but also allows access to related records in the referenced table). Similarly, a Row object from a table that is referenced by another table includes an attribute that is a DAL.LazySet object, which allows access to the records in the referencing table that reference the current Row object. The DAL also has list:-type fields, whose values are lists, including lists of DAL.Reference objects in the case of list:reference fields. Row objects can also include custom methods (i.e., "lazy" virtual fields) as well as virtual fields, which can contain complex objects.

Anthony

Arnon Marcus

unread,

May 1, 2013, 8:16:50 PM5/1/13

to

Well, it seems like a semantic-issue. DAL and ORM are pretty abstract-terms.
Here is how interpret them:
DAL - A way to construct schemas and queries without writing SQL or DBAPI calls.
ORM - A way to construct domain-models using a DAL in a statefull manner.

I don't think you are understanding me, so let me try to be more clear. Let's say an ORM is a particular design pattern for modeling data, and a DAL is a different design pattern for modeling data. Each of those different design patterns can nevertheless be used to implement similar types of features. For example, you might want to query the database and return a results set. This can be done in an ORM, and it can be done in a DAL. The implementation and the syntax will be different in each case, but they are both implementing a common feature. So, when I say the DAL implements features that might otherwise be found in a typical ORM, I am not saying the DAL implements an ORM design pattern, just that it replicates functionality for which you might otherwise use an ORM.

No, it does not do that. It implements very different functionality, that may have a similar API and the same terminology used, which honestly I find quite confusing - border-line misleading.

I have given a very specific and granular description of the differences of such functionalists between an ORM and web2py.

For example, in an ORM, you can define a method in a class that returns a value calculated from the fields of a database record. In the web2py DAL, this same functionality can be achieved using a virtual field or lazy field.

There are no lazy-fields in web2py, and I find the terminology misleading - as I said - lazyness in the context of database-access, is a deferred-query - NOT a deferred-calculation of the results of a query. The difference is too profound to overlook. Deferred calculations of field-results are generally useless - web-applications are generally I/O-Bound much more than CPU-Bound - so the benefits of deferring is mute in post-query calculations compared to benefits in deferred-queries that are used within the context of transaction-operation-optimizations - which is the context most people would think of whenever they hear the term Lazy thrown about a database-context,

I don't know if the SQLA CORE has virtual fields, but if it doesn't, I would suppose it leaves this kind of functionality to the ORM.

That's irellevant to the comparison of SQLA-Core vs. web2py-DAL, since I am not suggesting using the SQLA-Core and dumping it's ORM - quite the opposite - and since virtual-fields are actually much more beneficial when used within an ORM layer, as opposed to a DAL one. The only relevance for this point to this discussion, is the comparison of the size of the code-bases. I get that this was what you meant.

That looks like the definition of the SQLA ORM, not ORM's in general.

Not quite - look ar my comment to Derek down below

.

No, let's not. My point is not that any of those items properly belong to either a DAL or an ORM, or that they can only be implemented with either a DAL or an ORM design pattern. Rather, you had claimed that the SQLA CORE is equivalent to the web2py DAL and that all 20,000+ lines of SQLA ORM code must therefore be providing unique functionality not available in the DAL (thus implying that the ORM must be useful). I was just suggesting that the DAL might be doing more than the SQLA CORE (at least in some areas), and that the DAL might possibly be offering some features for which you would otherwise need the SQLA ORM.

You are saying that a lot of web2py's extra-features that "complement" the usage of the DAL, might not be included in SQLA's Core, but rather may represent a big portion of the 20K lines of code of the ORM, which would then suggest that the features I was excited about may actually represent a much minor portion of the 20K code-base, which would then suggest that they may be small, and therefore legitimate for being considered "useless". You could have said so more clearly (like I just did) and prevent the confusion.

Now, if you had watched the lecture I gave Massimo the link to, you would have seen how complex these features might be, so I doubt they are implemented within a small code-base. But if they are, this would degrade your argument that this is such a "substantial-investment" as you called it. But In either case, I would go for trying to use what they wrote, long before I would consider re-inventing it, so that's irrelevant.

The point is, many features found in DAL's and ORM's are not unique or specific to the DAL or ORM design pattern. Each design pattern can be used to implement many common types of functionality (the functionality may not be identical, but at least similar, and used to satisfy the same goals).

Regarding all the features you claim are inherently DAL features and not ORM features, I disagree. Any one of those features could rightly be part of either a DAL or an ORM. They are simply features you might want to implement within any data modeling abstraction, whatever the design pattern.

What I meant, was that the "essence" of an ORM, is NOT these features. Any kind of Access-Layer-API may contain some of these features, yes - I agree - but that was not the point - the point was that the MAIN role an ORM, is NOT found in these features - it is in the mapping of relations to domain-model objects - which is not what they are all about (again, check-out my comment to Derek below).

So you may say they are not DAL-related, but in that case they would be just as unrelated to an ORM as well. API-implementations may vary, as to what auxiliary features are present for convenience, but I found it misleading to allude that by the mare inclusions of such features in web2py, that it somehow get's it closer to providing ORM functionality - this is not the case - as ORM-functionality - at it's core - is the design-pattern of providing domain-model tools - not the existence of convenience-auxiliary features.

The first difference is of scope - Virtual/Computed-fields can only be applied to other fields of the same Table.

No, they can also be applied to the results of joins (not sure if that's typically as easy to do in an ORM) -- see http://web2py.com/books/default/chapter/29/06#Old-style-virtual-fields.

I have read this whole DAL section in the book today - I know you can super-impose virtual-fields onto row-objects that result from a JOIN operation - but that's such an edge-case with such marginal-utility, that it can hardly even be considered a "feature"...

I was comparing that to the fact that ORM classes may contain representations of fields from multiple-tables from the get-go - so it can be used for "querying" multiple-tables - NOT for result-manipulations of JOINs (Which has questionable-utility at best)

In the DAL, Virtual/Computed-fields can NOT generate implicit calls to foreign-table-fields.

Yes, they can with recursive selects.

I touched on Recursive-selects further along, they are useless for 2 reasons:

1. They are Active-Record-like implementations - using it on a field within a for-loop is a big no-no for relational-database usage, as then would generate tons of queries. In an ORM, because it is statefull, you could do an eager-load for the foreign-table in order to cache it - THEN it would be bemeficial, as the foreign-table-access would not generate tons of queries.

2. It is only applicable for single-relational fields - which are few and far between in real-life code - for more complex relationship, the backward-relational-inference brakes down. SQLA solves this with the "relationship" object - basically defining relationships on both ends, so that the ORM object-graph contains a bi-directional reference (the creator had to talk to Guido to find-out how to do that, so it would not create a cyclical-object-reference that might become a memory-leak, but eventually they got it working).

So again, it seems that web2py has features with syntax that makes it "appear as though" it has ORM-usefull stuff there - when in fact it doesn't - which I find to be misleading.

The second difference is of statelessness-vs-statfullness - The DAL is stateless, so it can-not give values from a previous query.
ORMs are statefull in nature, so:
- For output, Virtual-fields can use values already stored in those other-field's cache, and not even query the database.
- For input, Computed-Fields can use values already stored in those other-field's cache, and not even insert them to the database.
The DAL is stateless in nature, so:
- For output, Virtual-fields must query values from those other-field, in order to invoke the functionality of the automated-output.
- For input, Computed-Fields must insert values to those other fields, in order to invoke the functionality of the automated-input.
* There is also cache for the DAL, but it's time-based, and not transaction-based.

I'm not quite sure what you mean here. Even in an ORM, in order to calculate the value of a virtual field, you first have to retrieve the input field values from the database; and when creating a computed field, you still ultimately have to write the computed value to the database.

No. ORMs are statefull - It you have already data loaded from a previous query in those source-fields, than a virtual-field would not have to round-trip to the database at all - it would just use the cached-values in the source-fields.

As for computed-fields, again, the benefit is for storing a computed-value, so it will not have to be re-calculated on subsequent accesses. So yes, "eventually" the computed-value would be saved to the database, but it doesn't have to occur in the calculation-operation - it can just leave the calculated-value in memory, for further use in subsequent queries within the same transaction. Additionally, similarly-to-virtual fields, here again the source-fields for the calculation, may already exist in memory from previous queries - so the calculation would not require a round-trip to the database.

Also, be careful not to confuse ORM's in general with the SQLA ORM in particular -- some properties you claim for ORM's are unique to SQLA, and they are not necessarily properties that can be implemented only within the ORM design pattern (i.e., they are not limited to ORM's).

No - they are not limited to ORMs - but they ARE limited to statefull-frameworks, which most web-frameworks are and web2py is not.

Transactionally-scoped-persistence may be implemented in a DAL in a statefull web-framework (which again, web2py isn't) - but the utility would still be limited when compared to having that within an ORM domain-class object-graph.

That said, if the DAL would get statefulnes, many of these features would become immensely more beneficial - almost automatically. It is a matter of statefullness - not of design-pattern of domain-classes.

But still, a statefull-domain-class implementation, would still benefit more out of such features than a web2py's DAL, ans so would still be superior.

Which I'm assuming you have not yet done, as you have yet to show an example of something you can do easily in the SQLA ORM but find difficult or inefficient in the DAL. A real project might help to surface some compelling use cases and provide benchmarks for possible efficiency gains.

I think in this very comment, and the one before, as well as my response to Derek about ORMs, as well as in many lectures I've posted here, there are more than sufficient reasons to see benefits of using SQLA's ORM on-top of web2py's DAL, as opposed to using just the DAL. It is an architectural-issue, not a use-case one.

Given all I've already written, insisting on requesting for actual use-cases, when the architectural-issues are obviously already so glaring - it would seem like a picky - borderline straw-men - argument.

Web2py is stateless - by design - it was a conscious design-choice, weighing the pros/cons and trade-offs - and at the end adding all these ORM-like features just seems like a futile attempt at rounding the square (or squaring the circle - whichever you like), so long as web2py remains stateless.

This is why a statefull layer on-top of the stateless DAL, makes the most sense.

Arnon Marcus

unread,

May 1, 2013, 8:34:22 PM5/1/13

to web...@googlegroups.com

On Wednesday, May 1, 2013 4:39:49 PM UTC-7, Anthony wrote:

The first thing to notice here, is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects. There is no feature in web2py that generates such an object.

The second thing to notice here, is that the attributes of an ORM object usually contain child-objects (plural) that represent fields from a different table than the parent-object. Again, there is no feature in web2py that can generate such an object. A JOIN operation may return row objects, each of which may contain sub-attributes that hold A SINGLE field-value from a foreign-table, but it is a scalar-value - NOT another domain-entity-object (with it's own attributes, etc.), NOR a SEQUENCE of domain-entity objects

...

The crucial thing to notice here is that an ORM object-attribute can contain NON-SCALAR values - meaning, a link to a list of other objects, which themselves may contain links to other objects/sequences-of-objects, etc.

Although ORM's may do that, such a feature is not unique to the ORM pattern. In the web2py DAL, for example, in a Row object with a reference to another table, the reference field is actually a DAL.Reference object, not a scalar value (it includes the scalar value but also allows access to related records in the referenced table).

In this case it does not reference a set of DAL fields.

Similarly, a Row object from a table that is referenced by another table includes an attribute that is a DAL.LazySet object (also not a scalar), which allows access to the records in the referencing table that reference the current Row object.

I did not know that - what form of Layziness are we talking about here? Will it generate a query to fill-up the target rows?

In any case, it is stil a reference to something the WOULD generate a Rows object - it is not a reference to an already-exising domain-object (which may then have references to othe domain-objects, etc. - all already within memory) object as is in ORMS

The DAL also has list:-type fields, whose values are lists, including lists of DAL.Reference objects in the case of list:reference fields.

That's interesting, but that is not exactl the same - list-fields need to be supported in the database, but in any case, it is not comparable to being linked to relation ally-stored primary-keys - which would be how it would be implemented in an ORM.

Row objects can also include custom methods (i.e., "lazy" virtual fields) as well as virtual fields, which can contain complex objects.

Relates to the comment I gave you a couple of minutes ago...

These are complementary-auxiliary features (with in the web2py-implementation case, have questionable real-world-utility) which while they do go beyond a "simple" value, they are still scalar, as they ultimately result in a reference to a scalar-value - not a reference to a sequence of objects.

Anthony

Anthony

unread,

May 6, 2013, 9:21:27 AM5/6/13

to

So, when I say the DAL implements features that might otherwise be found in a typical ORM, I am not saying the DAL implements an ORM design pattern, just that it replicates functionality for which you might otherwise use an ORM.

No, it does not do that. It implements very different functionality, that may have a similar API and the same terminology used, which honestly I find quite confusing - border-line misleading.

If by "functionality" you mean doing the exact same thing in the exact same way, then there aren't even two ORM's that can be said to have the same functionality. By "functionality", I'm thinking of things like "querying a database to retrieve some records and converting them to a format that can be used in Python", and "inserting records in a database", and "updating records in a database". These are things you can do with a DAL as well as with an ORM. There are many other such things. They are not implemented in the same fashion, nor are they executed using the same abstractions within the application code, but they achieve similar goals. So, you can use the web2py DAL to do things that you might otherwise do with an ORM.

For example, in an ORM, you can define a method in a class that returns a value calculated from the fields of a database record. In the web2py DAL, this same functionality can be achieved using a virtual field or lazy field.

There are no lazy-fields in web2py, and I find the terminology misleading - as I said - lazyness in the context of database-access, is a deferred-query - NOT a deferred-calculation of the results of a query.

In the context of Row fields, the term "lazy" means that the value is "filled in" sometime after creation (typically at access time). Whether that lazy "filling in" involves database access or not depends on the nature of the field. If the field is simply a "virtual" field calculated based on the values of other fields in the Row, then there is no sense in which you would be deferring database access, as the value is not stored in the database. In that case, you are simply deferring calculation. On the other hand, reference fields do allow one to access the referenced record, and the database access in that case is in fact deferred. Likewise, Row objects can include LazySet attributes that defer database access of referencing records. Finally, a web2py virtual field or method field can do whatever you want it to do, including deferred database queries of any sort. So yes, there are lazy fields in web2py, both of the deferred-calculation type and the deferred-database-access type. In any case, they're not officially called "lazy" fields in the API -- that's just a term that is commonly used.

Deferred calculations of field-results are generally useless - web-applications are generally I/O-Bound much more than CPU-Bound - so the benefits of deferring is mute in post-query calculations compared to benefits in deferred-queries that are used within the context of transaction-operation-optimizations

Deferring calculations is certainly not useless, and you may even care more about being CPU-bound than I/O-bound if you're using an async web server, but even if we stipulate the above, that still doesn't change the definition of the word "lazy".

which is the context most people would thing of whenever they here the term Lazy thrown about a database-context

I don't know -- sounds like an empirical question. At least within the web2py community, though, I think the term is understood.

I don't know if the SQLA CORE has virtual fields, but if it doesn't, I would suppose it leaves this kind of functionality to the ORM.

That's irellevant to the comparison of SQLA-Core vs. web2py-DAL, since I am not suggesting using the SQLA-Core and dumping it's ORM

I completely agree, which is why that point had nothing to do with a comparison of the DAL to SQLA Core. If you will recall, the point was that the DAL includes some functionality for which you might otherwise use an ORM. If the SQLA Core doesn't have virtual fields, then you need to jump to the ORM for that functionality. Hence, some of the 20,000+ lines of SQLA ORM code are for generating functionality already available via other means in the DAL -- hence, the sheer size of the SQLA ORM does not necessarily imply a high degree of usefulness over and above what you can do with the DAL.

- quite the opposite - and since virtual-fields are actually much more beneficial when used within an ORM layer, as opposed to a DAL one.

I don't see how you can justify that claim.

The only relevance for this point to this discussion, is the comparison of the sized of the code-bases. I get that this was what you meant.

Hmm, then why were you complaining about its irrelevance to a point you know I wasn't making?

No, let's not. My point is not that any of those items properly belong to either a DAL or an ORM, or that they can only be implemented with either a DAL or an ORM design pattern. Rather, you had claimed that the SQLA CORE is equivalent to the web2py DAL and that all 20,000+ lines of SQLA ORM code must therefore be providing unique functionality not available in the DAL (thus implying that the ORM must be useful). I was just suggesting that the DAL might be doing more than the SQLA CORE (at least in some areas), and that the DAL might possibly be offering some features for which you would otherwise need the SQLA ORM.

You are saying that a lot of web2py's extra-features that extend on-top of the DAL, might not be included in SQLA's Core, but rather may represent a big portion of the 20K lines of code of the ORM, which would then suggest that the features I was excited about may actually represent a much minor portion of the 20K code-base, which would then suggest that they may be small, and therefore legitimate for being considered "useless". You could have said so more clearly (like I just did) and prevent the confusion.

Well, it was a conversation, so it unfolded in parts, but here's what I said. First, in response to you claiming the DAL had no features beyond the SQLA Core:

The web2py DAL has a lot of features that might otherwise be found in an ORM. I'm not very familiar with SQLA, but I suspect the DAL has some features not present in CORE but similar to functionality included in the ORM.

Then in response to your skepticism at the above, I listed a bunch of DAL features that I thought might not be in Core, and finally ended with this:

I didn't say there were ORM features in the DAL, just that it includes features that you might otherwise expect to find in an ORM (e.g., something like virtual fields). In other words, some of what you get with that 20,000+ lines of ORM code might be functionality that is in fact available in the web2py DAL.

Sounds a lot like your summary above, no?

Now, if you would have seen the lecture I gave Massimo the link to watch, you would have seen how complex these features might be, so I doubt they are implemented within a small code-base. But if they do, this would degrade your argument that this is such a "substantial-investment" as you called it...

You argue: 20,000+ lines of code ==> therefore not useless. I argue: Not all 20,000+ lines of code provide unique functionality ==> therefore possibly useless (useless == not worth the development/maintenance effort for the incremental benefits over what we already have). To do an ORM like SQLA on top of the DAL, you probably would need a substantial code base because you would have to re-implement much of what the DAL already does but with the ORM.

On the other hand, as I have suggested, you may be able to get some of the features you like in the SQLA ORM within the DAL itself (i.e., without building an ORM on top of it). That would probably also be a substantial effort, but obviously less than building a full ORM.

What I meant, was that the "essence" of an ORM, is NOT these features. Any kind of Access-Layer-API may contain some of these features, yes - I agree - but that was not the point - the point was that the MAIN role an ORM has, is not found in these features - it is in the mapping of relations to domain-model objects - which is not what they are all about (again, check-out my comment to Derek below).

I agree with you. And yet you now well know that was not my point, so not sure why you keep beating this very dead horse.

...I found it misleading to allude that by the mear inclusions of such features in web2py, that it somehow get's it closer to providing ORM functionality - this is not the case - as ORM-functionality - at it's core - is the design-pattern of providing domain-model tools - not the existence of convinience-auxiliary features.

It appears you have found a statement I did not make misleading. Nevertheless, I believe the DAL does provide "domain-model tools."

I have read this whole DAL section in the book today - I know you can super-impose virtual-fields onto row-objects that result from a JOIN operation - but that's such an edge-case with such marginal-utility, that it can hardly even be considered a "feature"...

I was comparing that to the fact that ORM classes may contain representations of fields from multiple-tables from the get-go - so it can be used for "querying" multiple-tables - NOT for result-manipulations of JOINs (Which has questionable-utility at best)

It would help if you could show an example of what you are talking about. Show something in SQLA that you find much more difficult in web2py. When would you create a class with fields from multiple tables -- would that be for cases where you have one-to-one relationships?

In the DAL, Virtual/Computed-fields can NOT generate implicit calls to foreign-table-fields.

Yes, they can with recursive selects.

I touched on Recursive-selects further along, they are useless for 2 reasons:

1. They are Active-Record-like implementations - using it on a field within a for-loop is a big no-no for DAL usage, as there it would generate tons of queries. In an ORM, because it is statefull, you could do an eager-load for the foreign-table in order to cache it - THEN it would be usefull, as the foreign-table-access would not generate tons of queries.

So, you do a query to load the entire foreign table just in case you might need it? In web2py, you would do a join, and only when you need it, which sounds a lot more efficient. In any case, the default behavior of SQLA is actually the same as web2py recursive selects -- lots of individual lazy queries, not a single eager-load query.

2. It is only applicable for single-relational fields - which are few and far between in real-life code - for more complex relationship, the backward-relational-inference brakes down. SQLA solves this with the "relationship" object - basically defining relationships on both ends, so that the ORM object-graph contains a bi-directional reference (the creator had to talk to Guido to find-our how to do that, so it would not create a cyclical-object-reference that might become a memory-leak, but eventually they got it working).

web2py recursive selects go both ways.

I'm not quite sure what you mean here. Even in an ORM, in order to calculate the value of a virtual field, you first have to retrieve the input field values from the database; and when creating a computed field, you still ultimately have to write the computed value to the database.

No. ORMs are statefull - It you have already data loaded from a oreviouse query in those source-fields, than a virtual-field would not have to round-trip to the database at all - it would just use the cached-values in the source-fields.

Same in web2py. You query the db, get a Rows object, and you can keep referencing that Rows object. Yes, if you make changes to some records and then need to query a subset (that you cannot easily extract from the Rows object), then you need to update and re-query. How often do we have this situation, though, and how large is the efficiency gain? Do we build a whole SQLA-like ORM just for that? And do we even need an ORM for that, or can something similar be done with DAL objects?

As for computed-fields, again, the benefit is for storing a computed-value, so it will not have to be re-calculated. So yes, "eventually" the computed-value would be saved to the database, but it doesn't have to occur in the calculation-operation - it can just leave the calculated-value in memory, for further use in subsequent queries within the same transaction.

See above.

Additionally, similarly-to-virtual fields, here again the source-fields for the calculation, may already exist in memory from previouse queries - so the calculation would not requier a round-trip to the database.

You can update records in a Rows object.

No - they are not limited to ORMs - but they ARE limited to statefull-frameworks, which most web-frameworks are and web2py is not.

Transitional-scope-persistence may be implemented in a DAL in a statefull web-framework (which again, web2py isn't) - but the utility would still be limited when compared to having that within an ORM domain-class object-graph.

That said, you are right about one thing - if the DAL would get statefulness, many of these features would become immensely more beneficial - almost automatically. It is a matter of statefullness - not or design-pattern of domain-classes.

But still, a statefull-domain-class implementation, would still benefit more out of such features than a web2py's DAL, ans so would still be superior.

Maybe. It remains to be seen.

I think in this very comment, and the one before, as well as my response to Derek about ORMs, as well as in many lectures I've posted here, there are more than sufficient reasons to see benefits of using SQLA's ORM as opposed to web2py's DAL. It is an architectural-issue, not a use-case one.
Given all I've alrady written, insisting on requesting for actual use-cases, when the architectural-issues are obviously already so glaring - it would seem like a picky - borderline straw-men - argument.

I can't say I follow this line of reasoning. You implement architectures to solve problems -- if you can't articulate the problem you're solving with your architecture, I think you have bigger problems. Are you really saying you can't come up with a single use case where this would be beneficial?

Anthony

unread,

May 1, 2013, 10:17:41 PM5/1/13

to web...@googlegroups.com

Although ORM's may do that, such a feature is not unique to the ORM pattern. In the web2py DAL, for example, in a Row object with a reference to another table, the reference field is actually a DAL.Reference object, not a scalar value (it includes the scalar value but also allows access to related records in the referenced table).

In this case it does not reference a set of DAL fields.

I'm not sure what you mean. A reference field references records, not fields.

Similarly, a Row object from a table that is referenced by another table includes an attribute that is a DAL.LazySet object (also not a scalar), which allows access to the records in the referencing table that reference the current Row object.

I did not know that - what form of Layziness are we talking about here? Will it generate a query to fill-up the target rows?
In any case, it is stil a reference to something the WOULD generate a Rows object - it is not a reference to an already-exising domain-object (which may then have references to othe domain-objects, etc. - all already within memory) object as is in ORMS

Are you saying that when you select a set of records that include reference fields, the ORM automatically selects all the referenced records (and any records they may reference, and so on) and stores them in memory, even if you have not requested that? That sounds inefficient.

The DAL also has list:-type fields, whose values are lists, including lists of DAL.Reference objects in the case of list:reference fields.

That's interesting, but that is not exactl the same - list-fields need to be supported in the database, but in any case, it is not comparable to being linked to relation ally-stored primary-keys - which would be how it would be implemented in an ORM.

No, list fields do not have to be supported in the database (they are stored as strings) -- they are an abstraction provided by the DAL. list:reference fields do in fact store a list of primary keys (in fact, a list of objects that include the primary keys and know how to retrieve the associated records). web2py also has JSON fields, which I would say does not count as a scalar either.

Row objects can also include custom methods (i.e., "lazy" virtual fields) as well as virtual fields, which can contain complex objects.

Relates to the comment I gave you a couple of minutes ago...
These are complementary-auxiliary features (with in the web2py-implementation case, have questionable real-world-utility) which while they do go beyond a "simple" value, they are still scalar, as they ultimately result in a reference to a scalar-value - not a reference to a sequence of objects.

No, you can define a virtual field whose value is any custom complex Python object you like, with its own methods, that may do or return whatever you like. They need not reference or return a scalar value. This is not a mere "auxiliary" feature but something that allows you to replicate functionality you might otherwise find in an ORM class.

Anthony

Philip Kilner

unread,

May 2, 2013, 5:46:15 AM5/2/13

to web...@googlegroups.com, Michele Comitini

Hi,

On 01/05/13 22:07, Michele Comitini wrote:
> Why not write a driver for SQLA that speaks DAL instead of a sql dialect?

+1

--

Regards,

PhilK

'a bell is a cup...until it is struck'

Philip Kilner

unread,

May 2, 2013, 5:49:25 AM5/2/13

to web...@googlegroups.com, Cliff Kachinske

Hi,

On 01/05/13 22:27, Cliff Kachinske wrote:
> I would propose that the best way to get others on board
> would be to channel the energy being burned on this thread into an
> implementable design or even a set of specific software requirements or
> pseudo code.
>

+1

This has been a very interesting thread, but the length (and passion!)
of some of the posts has made it a long hard read, and despite my best
efforts I truly do not understand what problem the DAL fails to solve,
and I don't expect to without some much-less-abstract discussion.

Arnon Marcus

unread,

May 2, 2013, 5:59:36 AM5/2/13

to web...@googlegroups.com

On Thursday, May 2, 2013 5:17:41 AM UTC+3, Anthony wrote:

Although ORM's may do that, such a feature is not unique to the ORM pattern. In the web2py DAL, for example, in a Row object with a reference to another table, the reference field is actually a DAL.Reference object, not a scalar value (it includes the scalar value but also allows access to related records in the referenced table).

In this case it does not reference a set of DAL fields.

I'm not sure what you mean. A reference field references records, not fields.

The operative word in my comment is SET not FIELD. I may have gotten the Field-vs-Record terminology in this sentence, but that is irrelevant to what I was referring to.

My point was that the assertion of wikipedia that ORM can have references to non-scalar values, applies in your example, as the existence of intermediary-objects on the way to get to the value, does not grant the attribute "non-scalar" status - only a reference to a SEQUENCE of objects can do that.

Similarly, a Row object from a table that is referenced by another table includes an attribute that is a DAL.LazySet object (also not a scalar), which allows access to the records in the referencing table that reference the current Row object.

I did not know that - what form of Layziness are we talking about here? Will it generate a query to fill-up the target rows?
In any case, it is stil a reference to something the WOULD generate a Rows object - it is not a reference to an already-exising domain-object (which may then have references to othe domain-objects, etc. - all already within memory) object as is in ORMS

Are you saying that when you select a set of records that include reference fields, the ORM automatically selects all the referenced records (and any records they may reference, and so on) and stores them in memory, even if you have not requested that? That sounds inefficient.

No. that is not what I am saying, although that might occur if you configure it to behave like that.

But the flaw in your reasoning is that you are trying to apply a "stateless-mid-set" to a "statefull-system". I'm not sure why you are doing that.

When you say "automatically selects" you are meaning to say "automatically generate queries", because within a stateless system there is no record-cache-management, so any access to an attribute IS a "select" from the database. But that is not the case with statefull-ORMs - they "may" issue selects "if-and-only-if" the requested value is "invalid" (meaning, it was either never queried at all as of yet, or was invalidated by a previous transaction-commit). Attribute-values along the object-graph, "may-or-may-not" be valid, and so "may-of-may-not" require a "select". When you use a statefully-automatic system, you are deliberately relinquishing (at-least-some) control over such matters, in favor of not having to worry about whether a given attribute-access would need to occur or not.

You make an object-attribute access, and the ORM is traversing the object-graph that is linked to this attribute, in order to get a value at some point.

The ORM is doing the traversal for you. If all of the objects and attributes that are traversed over, are "valid" (a.k.a "exist and are up-to-date"), then no select would have to be sent to the database.

Again, in an ORM you are building your object-graph "up-front", using domain-classes. It just get's "inhabited" and "updated" for you, as you use it.

So saying that any attribute-access would "necessarily-always" generate a "select" is inaccurate.

Also, it is not necessarily inefficient, as the idea in an ORM is that it figures-out the minimal-required database-access in order to get you the data you ask. It may NOT "query" entire sets of records, if you only need "some" of them (here a DAL layer underneath is taking care of that, generating optimized queries for you, an/or using queries that you have defined for it to use).

And again, in subsequent access to the same attribute, it would again traverse the same object-graph linked to it, and this time, all the objects/attributes that where previously missing and queried, are now caches-and-valid, so no database-access is performed.

The DAL also has list:-type fields, whose values are lists, including lists of DAL.Reference objects in the case of list:reference fields.

That's interesting, but that is not exactl the same - list-fields need to be supported in the database, but in any case, it is not comparable to being linked to relation ally-stored primary-keys - which would be how it would be implemented in an ORM.

No, list fields do not have to be supported in the database (they are stored as strings) -- they are an abstraction provided by the DAL. list:reference fields do in fact store a list of primary keys (in fact, a list of objects that include the primary keys and know how to retrieve the associated records). web2py also has JSON fields, which I would say does not count as a scalar either.

It is amazing how far the DAL features have gone to mimic an ROM "on the surface", while not being the real thing underneath....

You are basically suggesting "bypassing" the relational-functionality of the database, and do it in the DAL, all just to achieve "appearance" of an ORM...

I don't even know where to place that... It's absurd...

An ORM object-graph is a representation of foreign-key relationships in the database - not a relation of sub-values stored in some "string"-type value...

An ORM object, may have a value that is a representation of foreign-table-fields that point to it via a foreign key. It is a "real" non-scalar-relation in both the database AND the object-attribute - not some kind of a framework-hack...

Row objects can also include custom methods (i.e., "lazy" virtual fields) as well as virtual fields, which can contain complex objects.

Relates to the comment I gave you a couple of minutes ago...
These are complementary-auxiliary features (with in the web2py-implementation case, have questionable real-world-utility) which while they do go beyond a "simple" value, they are still scalar, as they ultimately result in a reference to a scalar-value - not a reference to a sequence of objects.

No, you can define a virtual field whose value is any custom complex Python object you like, with its own methods, that may do or return whatever you like. They need not reference or return a scalar value. This is not a mere "auxiliary" feature but something that allows you to replicate functionality you might otherwise find in an ORM class.

In order to do that, you would have to "replicate" the relations (that already-exist in the database) in code, and construct-that on every assess to that attribute - instead of "using" the relationships that already exist in the database, and have them "automatically" generating an object-graph for you "once", in application-launch time.

Which would you say is more efficient? Which would you say is more easy to use? Which would you say is more maintainable?

Anthony

Anthony

unread,

May 6, 2013, 9:29:17 AM5/6/13

to

In this case it does not reference a set of DAL fields.

I'm not sure what you mean. A reference field references records, not fields.

The operative word in my comment is SET not FIELD. I may have gotten the Field-vs-Record terminology in this sentence, but that is irrelevant to what I was referring to.

I don't see how that is irrelevant, as it makes your statement incomprehensible. I can't infer what you really meant.

My point was that the assertion of wikipedia that ORM can have references to non-scalar values, applies in your example, as the existence of intermediary-objects on the way to get to the value, does not grant the attribute "non-scalar" status - only a reference to a SEQUENCE of objects can do that.

First, in this case I was talking about a reference field, which references a single foreign record -- there is no sequence of objects in this case, not even in an ORM. If you go in the other direction, however, a DAL LazySet does in fact reference a sequence of records. In any case, your reasoning applies just as well to an ORM -- the attributes that are objects are themselves merely intermediate objects on the way to scalar values stored in the database -- so I guess we cannot grant them "non-scalar" status either.

But the flaw in your reasoning is that you are trying to apply a "stateless-mid-set" to a "statefull-system". I'm not sure why you are doing that.

Well, I'm not doing that.

When you say "automatically selects" you are meaning to say "automatically generate queries", because within a stateless system there is no record-cache-management, so any access to an attribute IS a "select" from the database. But that is not the case with statefull-ORMs - they "may" issue selects "if-and-only-if" the requested value is "invalid" (meaning, it was either never queried at all as of yet, or was invalidated by a previous transaction-commit).

Yes, but you act as if all the data are always magically there without any database queries. There has to be an initial query (or set of queries) to get the data from your database in order to populate the instance objects to begin with. The SQLA session lasts only for a single web request, so this has to happen at every request anyway. Furthermore, if you need to issue a query that happens to require the same records you already have in the session, it still needs to do a database select again (unless you query specifically by primary key) -- the DAL, on the other hand, can cache queries, even across multiple web requests.

Yes, in web2py, if you need to apply the same recursive select twice, it will hit the database twice (unless you cache or store the result), whereas SQLA will generally do only one select. On the other hand, in web2py, you might simply do a join, in which case, you have all the related data and don't need any subsequent selects.

You make an object-attribute access, and the ORM is traversing the object-graph that is linked to this attribute, in order to get a value at some point.
The ORM is doing the traversal for you.

Note, this is not unique to an ORM and can be done in a DAL as well. The web2py DAL does this, albeit exclusively with lazy loading, though presumably an eager loading option could be implemented as well.

So saying that any attribute-access would "necessarily-always" generate a "select" is inaccurate.

Right, but I didn't say that.

Also, it is not necessarily inefficient, as the idea in an ORM is that it figures-out the minimal-required database-access in order to get you the data you ask.

It's not quite that simple. You have to make some decisions, and there are tradeoffs. By default, the SQLA ORM does lazy loading of reference records (i.e., it issues a separate query for each reference attribute accessed) -- same as web2py recursive selects. This is more efficient if you only need to do this on one or a few instances. The ORM can also optionally do eager loading, either with a join or a second subquery, both of which have their advantages and disadvantages. In any case, you are not guaranteed to get the minimum-required database access automatically.

No, list fields do not have to be supported in the database (they are stored as strings) -- they are an abstraction provided by the DAL. list:reference fields do in fact store a list of primary keys (in fact, a list of objects that include the primary keys and know how to retrieve the associated records). web2py also has JSON fields, which I would say does not count as a scalar either.

It is amazing how far the DAL features have gone to mimic an ROM "on the surface", while not being the real thing underneath....
You are basically suggesting "bypassing" the relational-functionality of the database, and do it in the DAL, all just to achieve "appearance" of an ORM...
I don't even know where to place that... It's absurd...

No, the purpose of list:-type fields is not to mimic ORM behavior, nor are they absurd. If you need to store lists but don't need the benefits of a relational design, they can be simpler and more efficient.

An ORM object-graph is a representation of foreign-key relationships in the database - not a relation of sub-values stored in some "string"-type value...
An ORM object, may have a value that is a representation of foreign-table-fields that point to it via a foreign key. It is a "real" non-scalar-relation in both the database AND the object-attribute - not some kind of a framework-hack...

I wasn't arguing that list:-type fields are web2py's version of the above (which web2py implements as well) -- just pointing out yet another non-scalar abstraction.

These are complementary-auxiliary features (with in the web2py-implementation case, have questionable real-world-utility) which while they do go beyond a "simple" value, they are still scalar, as they ultimately result in a reference to a scalar-value - not a reference to a sequence of objects.

No, you can define a virtual field whose value is any custom complex Python object you like, with its own methods, that may do or return whatever you like. They need not reference or return a scalar value. This is not a mere "auxiliary" feature but something that allows you to replicate functionality you might otherwise find in an ORM class.

In order to do that, you would have to "replicate" the relations (that already-exist in the database) in code, and construct-that on every assess to that attribute - instead of "using" the relationships that already exist in the database, and have them "automatically" generating an object-graph for you "once", in application-launch time.

Not at all. web2py already provides attributes to handle the relations, as already described. If you believe the web2py Reference and LazySet attributes qualify only as scalars, then so do the SQLA ORM relationship attributes, as they both ultimately retrieve scalar values from the database. I was making a separate point that web2py method attributes can be complex objects, as you seem to think that is important.

Also, note that DAL LazySets allow you to apply additional query filters and to select only specific fields as well as expressions -- not sure that can be done via a SQLA ORM relationship attribute.

For what it's worth, I like some of these features of the SQLA ORM, and I wouldn't mind having them in web2py. One question, though, is whether these features really require an ORM. It seems that a lot of this behavior could be built into DAL Rows and Row objects (e.g., eager loading of relationships, caching of lazily loaded relationships, statefulness). Another question, of course, is whether the development effort would be worth the benefits (which may be modest and limited to special cases), though I suppose this depends on the answer to the first question as well.

Anthony

Arnon Marcus

unread,

May 2, 2013, 12:52:28 PM5/2/13

to web...@googlegroups.com

We can continie bickering about who-said-what, and in referrence to what. Instead, I'll just refrase the statement:
According to wikipidia, DAL-object-attributes are not referencing an object-graph in memory, as ORM ones do.

You may say anything to avoid admitting that, but it would sill be the case.

As for lazy-set objects, existing in row-objects and pointing to a backwards-foreign-key records, I dont remember seeing anything about that in the documentation. Can you post a link?

Anthony

unread,

May 6, 2013, 9:30:49 AM5/6/13

to

According to wikipidia, DAL-object-attributes are not referencing an object-graph in memory, as ORM ones do.

You may say anything to avoid admitting that, but it would sill be the case.

First, as far as I can tell, Wikipedia does not say that about DAL's (it doesn't say all DAL's do have object attributes that reference an object-graph in memory, but nor does it say they don't or can't in principle have such attributes). Furthermore, it doesn't even appear to say that about ORM's. Second, even if Wikipedia did say that, it would have no bearing on this discussion, as we have not been discussing Wikipedia's definitions of DAL and ORM, but rather the actual web2py DAL and the actual SQLA ORM. web2py does have Reference and LazySet fields that allow you to traverse the relations. Their retrieved values do not stay in memory, but that doesn't mean such persistence could not in principle be implemented. I won't speak for all DAL's, but what you describe could be implemented at least in the web2py DAL.

As for lazy-set objects, existing in row-objects and pointing to a backwards-foreign-key records, I dont remember seeing anything about that in the documentation. Can you post a link?

http://web2py.com/books/default/chapter/29/06#Recursive-selects

Anthony

Arnon Marcus

unread,

May 2, 2013, 2:50:47 PM5/2/13

to web...@googlegroups.com

The reference to wikipedia was done by Derek, not me. I was just reacting to this reference to falsify his claim that I am re-inventing my own definitions.

You will see in my comment to him where it defines an ORM to be an object-graph.
I iterpreted the Wikipedia DAL definition, of scalar-vs-non scalar, to mean an absent of an object-graph, since a reference of an object-attribute to a sequence, is a non-scalar reference - and an object-graph may contain such references, in a way a DAL api can-not, according to the scalarity-definition of the DAL. And again, all of this is in the context of my reaction to Derek's claim that the wikipedia-definitions say otherwize.

As to LazySets, no-where in the documentation is that term mentioned. As for Recursive-selects, we've been through this already - twice actually - it is an explicit 'select' statement that generates a query - it is not a reference to a sequence object that exists in memory.
Also, it can't be used for complex relational-graphs, only for single-foreign-key relations.

Anthony

unread,

May 6, 2013, 9:32:48 AM5/6/13

to

I iterpreted the Wikipedia DAL definition, of scalar-vs-non scalar, to mean an absent of an object-graph, since a reference of an object-attribute to a sequence, is a non-scalar reference - and an object-graph may contain such references, in a way a DAL api can-not, according to the scalarity-definition of the DAL. And again, all of this is in the context of my reaction to Derek's claim that the wikipedia-definitions say otherwize.

I don't know why this matters to you, but it appears you have not read the Wikipedia definitions. I will let Derek speak for himself.

As to LazySets, no-where in the documentation is that term mentioned.

It doesn't mention the term LazySet, as that is an internal class, but the functionality is documented.

As for Recursive-selects, we've been through this already - twice actually - it is an explicit 'select' statement that generates a query - it is not a reference to a sequence object that exists in memory.

Stipulated, but so is a reference attribute in an ORM until you actually access the attribute and run the query. The only difference is that the ORM keeps the value in the object after the initial query. As I have mentioned, there is no reason why this could not be done in the DAL as well. It is not a feature unique to ORM's. Actually, even now you can do:

db.define_table('person', Field('name'))
db.define_table('dog', Field('name'), Field('owner', 'reference person'))

Bob = db(db.person.name == 'Bob').first()
Bob.dog = Bob.dog.select() # db query here
print Bob.dog # no db query here

Now Bob.dog is an attribute holding Bob's dog records in memory -- subsequent references will not trigger any db queries. Not as automatic as in SQLA, but it could be made more automatic.

Also, it can't be used for complex relational-graphs, only for single-foreign-key relations.

Agreed, though in principle this could be implemented, so again, not an ORM-specific feature.

Keep in mind, I have not claimed that the DAL can do everything the SQLA ORM can do.

Anthony

Arnon Marcus

unread,

May 2, 2013, 5:20:58 PM5/2/13

to web...@googlegroups.com, Cliff Kachinske

I agree.

Here is a more concrete explanation.

Given the following tables:

db.define_table('Continent', Field('Name', 'string'))

db.define_table('Country', Field('Name', 'string'), Field('Continent', db.Continent))

db.define_table('City', Field('Name', 'string'), Field('Country', db.Country))

Using an ORM, you could do something like:

>>> Country(Name='France').City.list

[<City Name:Paris>, <City Name:Nice>]

Using the DAL, the best you might get is:

>>> [city for city in db.City.Country.select() if city.Country.Name == ''France']

[<Row Name:Paris>, <Row Name:Nice>]

In an ORM on-top of the DAL, I would like to be able to do something like:

@ORM(db.City)

class City:

pass

@ORM(db.Country)

class Country:

pass

@ORM(db.Continent)

class Continent:

pass

>>> europe = Continent(Name='Europe')

>>> france = Country(Name='France', Continent=europe)

>>> paris = City(Name='Paris', Country=france)

>>> nice = City(Name='Nice', Country=france)

>>>

>>> europe.Country(Name='France') is france

True

>>> france.City(Name='Paris') is paris

True

>>> europe.Country(Name='France').City.list

[<City Name:Paris>, <City Name:Nice>]

>>> paris.Country is france

True

>>> france.Continent is europe

True

This would be so intuitive and easy to use...

Anthony

unread,

May 2, 2013, 7:14:17 PM5/2/13

to web...@googlegroups.com, Cliff Kachinske

db.define_table('Continent', Field('Name', 'string'))

db.define_table('Country', Field('Name', 'string'), Field('Continent', db.Continent))
db.define_table('City', Field('Name', 'string'), Field('Country', db.Country))

Using an ORM, you could do something like:
>>> Country(Name='France').City.list
[<City Name:Paris>, <City Name:Nice>]

Using the DAL, the best you might get is:
>>> [city for city in db.City.Country.select() if city.Country.Name == ''France']
[<Row Name:Paris>, <Row Name:Nice>]

Actually, you can do this with the DAL:

>>> db.Country(name='France').City.select()

Almost identical to your ORM code (not that I think it needs to be similar looking code to be useful).

>>> europe = Continent(Name='Europe')
>>> france = Country(Name='France', Continent=europe)
>>> paris = City(Name='Paris', Country=france)
>>> nice = City(Name='Nice', Country=france)

>>> europe.Country(Name='France').City.list

[<City Name:Paris>, <City Name:Nice>]

In the DAL, you can do (I think I got this right):

>>> europe = db.Continent.insert(Name='Europe')
>>> france = db.Country.insert(Name='France', Continent=europe)
>>> paris = db.City.insert(Name='Paris', Country=france)
>>> nice = db.City.insert(Name='Nice', Country=france)
>>> db.Country(france).City.select()


[<Row Name:Paris>, <Row Name:Nice>]

Of course, you don't get the "is" equivalencies, but your example doesn't actually require that.

Anthony

Mariano Reingart

unread,

May 2, 2013, 7:41:16 PM5/2/13

to web...@googlegroups.com

On Thu, May 2, 2013 at 6:20 PM, Arnon Marcus <a.m.m...@gmail.com> wrote:
> I agree.
>
> Here is a more concrete explanation.
>
> Given the following tables:
>
> db.define_table('Continent', Field('Name', 'string'))
> db.define_table('Country', Field('Name', 'string'), Field('Continent',
> db.Continent))
> db.define_table('City', Field('Name', 'string'), Field('Country',
> db.Country))
>
> Using an ORM, you could do something like:
>>>> Country(Name='France').City.list
> [<City Name:Paris>, <City Name:Nice>]

Sorry, but what should do .list() ?
it is a query? it is pre-fetched or cached?
What about if I need the cities of countries where main language is
not Spanish, and population is above 1 millon?

Please, note that you are mixing a declarative query syntax (DAL),
with an imperative one (ORM)

> Using the DAL, the best you might get is:
>>>> [city for city in db.City.Country.select() if city.Country.Name ==
>>>> ''France']
> [<Row Name:Paris>, <Row Name:Nice>]

I would like a elegant declarative syntax similar to LINQ, but in python:

[city for db.City.ALL in db.City, db.Country if db.Country.Name ==
'France' and db.Country.id == db.City.country]

Sadly, there is NO possible way to archive a pythonic list
comprenhension syntax that query the objects in the server side AFAIK
Some libraries uses dis (python disassembler) and other internal &
dirty python hacks to do something similar, please, see:

http://www.aminus.net/geniusql/

http://www.aminus.org/blogs/index.php/2008/04/22/linq-in-python?blog=2

http://www.aminus.net/geniusql/chrome/common/doc/trunk/managing.html

Note that web2py is much closer to the "pythonic expressions", but
without the early / late binding and other issues described there ;-)

>
> In an ORM on-top of the DAL, I would like to be able to do something like:
>
> @ORM(db.City)
> class City:
> pass
>
> @ORM(db.Country)
> class Country:
> pass
>
> @ORM(db.Continent)
> class Continent:
> pass
>
>>>> europe = Continent(Name='Europe')
>>>> france = Country(Name='France', Continent=europe)
>>>> paris = City(Name='Paris', Country=france)
>>>> nice = City(Name='Nice', Country=france)
>>>>
>>>> europe.Country(Name='France') is france
> True

This would require caching or storing previous queried record in
memory (something like a singleton), or "is" will not work as you are
expecting (it is for checking "identity")...
That could be also archived hacking "Row", but you should use == in
python for this kind of comparision (equality)

>
> This would be so intuitive and easy to use...

For you that are creating it, but I will not understand it in a first
sight, so I prefer the DAL syntax that is a bit verbose but more
uniform
(at least I know what I'm doing in each step, or can get a SQL book
and see how to "pythonize" the expression)
A ORM is not more intuitive and will have a higher learning curve, and
no formal / logical / math justification theory BTW

I think that a more elegant way for web2py would be research how to
put bussiness rules (and not only "tables"), in the models.
This could be done right now to some extent with virtual fields &
predefined queries (we already have represent, field validators, etc).
Implementing "stored procedures" and "triggers" in web2py also could
be interesting (and you could win a lot of speed), for example,
PostgreSQL supports PL/Python for that.

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Arnon Marcus

unread,

May 3, 2013, 7:44:20 AM5/3/13

to web...@googlegroups.com

Sorry, but what should do .list() ?
it is a query? it is pre-fetched or cached?

The whole point of having an ORM, is for you to not have to worry about the answer to that question - it's gonna do the optimal thing it can, based on the circumstances in which it is called.

As I described, like 10 times already, ORMs are statefull - they have cache-management built-in.

The question of how these cache-management is working, and so whether or not you can trust it, is a long and complex question. Suffice it to say, that there has been more than a decade of research into this issue, so it is solved.

Just as example, if the list has objects in it, it will check that what it has is up-to-date. If it is called within the context of a transaction that is in progress, and it had already did a query for that beforehand within the same transaction, than by ACID's "C of Consistency" rule, the code should assume that it's cached-data is up-to-date. If between the previously-called query, and this call, a transaction was ended and a new one started, then according to ACID's "C of Consistency" rule, the code should not assume that the data is up-to-date, and issue a query.

ORMs like SQLA's ORM do a 'Unit-Of-Work' patterns, in which these caches are managed for you. Any transaction-commit invalidates the caches, so you can guarantee consistency across transactions.

What about if I need the cities of countries where main language is
not Spanish, and population is above 1 millon?

ORMs allow you to attache filters for that, it really depends on the implementation, but let's think of some options, shell we?

The most straight-forward approach I would suggest, is the layered-approach - meaning, the ORM should use the DAL for that internally.

It should allow you to construct the filters yourself using the DAL, and have mechanisms to facilitate attaching of filters to these attributes.

We can go into a discussion of how it will look like in an ORM in web2py in each of the options - which would be the fun-part, I think - so let's see:

First, let's extend the schema and ORM example-definitions to include that:

db.define_table('Language', Field('Name', 'String'))

db.define_table('Continent', Field('Name', 'string'))

db.define_table('Country', Field('Name', 'string'), Field('Continent', db.Continent))

db.define_table('City', Field('Name', 'string'), Field('Country', db.Country), Field('Language', db.Language), Field('Population', 'Integer'))

@ORM(db.Language)

class Language:

pass

@ORM(db.City)

class City:

pass

@ORM(db.Country)

class Country:

pass

@ORM(db.Continent)

class Continent:

pass

>>> ...

>>> spanish = Language(Name='Spanish')

>>> french = Language(Name='French')

>>> france.City.Language = french

>>> ...

Then, we could devise to do any number of things:

We can build basic stuff like .isGraterThan() into the ORM classes, and do:

>>> ...

>>> [city for city in fance.City.list if \

city.Language is not spanish and city.Population.isGraterThan(1000000)]

[<City Name:Paris>]

It almost reads like plain English... Beautiful (!)

Alternatively, we could have a set of operators that we can give to a filter function:

>>> ...

>>> from ORM.operators import not, moreThan

>>> france.City.where(

Language=not(spanish), Population=moreThan(1000000))

[<City Name:Paris>]

Lastly, we could let the developer filter objects using the DAL indirectly:

>>> ...

>>> france.City.FilterBy(

(City.Languase != spanish) & (City.Population > 1000000))

[<City Name:Paris>]

Please, note that you are mixing a declarative query syntax (DAL),
with an imperative one (ORM)

No, I am LAYERING imperative-syntax "on-top" of declarative one.

I would like a elegant declarative syntax similar to LINQ, but in python:

[city for db.City.ALL in db.City, db.Country if db.Country.Name ==
'France' and db.Country.id == db.City.country]

My first suggestion was way more readable.

Sadly, there is NO possible way to archive a pythonic list
comprenhension syntax that query the objects in the server side AFAIK

Not sure what you mean...

Some libraries uses dis (python disassembler) and other internal &
dirty python hacks to do something similar, please, see:

http://www.aminus.net/geniusql/

http://www.aminus.org/blogs/index.php/2008/04/22/linq-in-python?blog=2

http://www.aminus.net/geniusql/chrome/common/doc/trunk/managing.html

Note that web2py is much closer to the "pythonic expressions", but
without the early / late binding and other issues described there ;-)

I will look into that.

This would require caching or storing previous queried record in
memory (something like a singleton), or "is" will not work as you are
expecting (it is for checking "identity")...

Exactly! This is exactly what SQLA is already doing - I explained that before - it uses what it calls 'Identity Mapping' internally:

http://www.youtube.com/watch?v=woKYyhLCcnU#t=6382s

* Time-coded link - watch the following 4 minutes

That could be also archived hacking "Row", but you should use == in
python for this kind of comparision (equality)

Not is SQLA :)

Web2py could do that too.

For you that are creating it, but I will not understand it in a first
sight, so I prefer the DAL syntax that is a bit verbose but more
uniform

There is no non-uniformity in SQLA syntax, as there are 2 layers that are explicitly-different. Web2py could do that too.

(at least I know what I'm doing in each step, or can get a SQL book
and see how to "pythonize" the expression)

Same as in SQLA - again, I am suggesting "augmenting" the DAL with another separate ORM layer - not replacing the DAL.

A ORM is not more intuitive and will have a higher learning curve, and
no formal / logical / math justification theory BTW

An ORM is MUCH MORE intuitive, as my examples had demonstrated, and the formal/logical/math justification would still apply within the DAL layer that would still exist underneath.

Arnon Marcus

unread,

May 3, 2013, 8:26:13 AM5/3/13

to web...@googlegroups.com

Just to clarify, again, the following syntax:

[city for city in fance.City.list if \

city.Language is not spanish and city.Population.isGraterThan(1000000)]

Is not a pie-in-the-sky dream-API - there are implementational details that are viable for each section.

First, the reason the "is not" would work, is that there would exist an implementation of 'Identity Mapping" that would take care of having a singleton for each ORM-object representing each row.

Second, the access to france.City.list is viable, as an attribute-access is totally customization in python, so we can devise anything we want for the access to do.

For example, it could return an iterator, if the "City" attribute of the "france" ORM-object is "valid" (meaning, it is cached and not "invalidated" by a previous transaction-commit), and do a "query" right-there on the spot, and "return" an iterator, if the "City" attribute of the "france" ORM-object is "invalid" at the time of the "list" attribute access.

Lastly, the .isGraterThan() can easily be implemented in the ORM-class that would be generated out of the @ORM() class-decorator.

It is used for 3 purposes:

1. Readability

2. Beauty.

3. Conciseness.

When comparing this:

city.Language is not spanish and city.Population.isGraterThan(1000000)

With this:

db.Country.Name == 'France' and db.Country.id == db.City.country

The "Zen of Python" comes to mind:

"Beautiful is better than ugly."

Anthony

unread,

May 3, 2013, 9:31:15 AM5/3/13

to web...@googlegroups.com

The whole point of having an ORM, is for you to not have to worry about the answer to that question - it's gonna do the optimal thing it can, based on the circumstances in which it is called.

Has anyone given you any Kool-Aid to drink recently?

The question of how these cache-management is working, and so whether or not you can trust it, is a long and complex question. Suffice it to say, that there has been more than a decade of research into this issue, so it is solved.

Famous last words.

What about if I need the cities of countries where main language is
not Spanish, and population is above 1 millon?

ORMs allow you to attache filters for that, it really depends on the implementation, but let's think of some options, shell we?

db.define_table('Language', Field('Name', 'String'))
db.define_table('Continent', Field('Name', 'string'))
db.define_table('Country', Field('Name', 'string'), Field('Continent', db.Continent))
db.define_table('City', Field('Name', 'string'), Field('Country', db.Country), Field('Language', db.Language), Field('Population', 'Integer'))

>>> spanish = Language(Name='Spanish')
>>> french = Language(Name='French')
>>> france.City.Language = french

What does france.City.Language = french do? france.City refers to all cities in France, so does this assign "French" as the language for all cities? Does SQLA employ that syntax?

In web2py, this would be:

french = db.Language.insert(Name='French')
db.Country(france).City.update(Language=french)

Looks a lot like your example (maybe even a bit more explicit about what's going on).

(Note, Mariano's example actually assumed language to be a country-level field.)

>>> [city for city in fance.City.list if \
city.Language is not spanish and city.Population.isGraterThan(1000000)]
[<City Name:Paris>]

The problem here is that you are doing all the filtering in Python rather than in the database. Not a big deal in this example, but with a large initial set of records, this is inefficient because you will return many unnecessary records from the database and Python will likely be slower to do the filtering.

It almost reads like plain English... Beautiful (!)

There are differing opinions on this. Some prefer symbols like != and > over plain English like "is not" and "isGreaterThan" because it is easier to scan the line and quickly discern the comparisons being made. In particular, I would much prefer both typing and reading ">" rather than ".isGreaterThan()".

>>> france.City.where(
Language=not(spanish), Population=moreThan(1000000))

>>> france.City.FilterBy(
(City.Languase != spanish) & (City.Population > 1000000))

In web2py, you can already do:

db.Country(france).City(
(db.City.Language != spanish) & (db.City.Population > 1000000)).select()

So far, for every example you have shown, web2py has fairly similar syntax. Sure, we can quibble over which is more "beautiful", but I haven't seen anything to justify building a massive ORM. And the downside of adding an ORM (aside from the time cost of development and maintenance) is that you would then have to learn yet another abstraction and syntax.

That could be also archived hacking "Row", but you should use == in
python for this kind of comparision (equality)

Not is SQLA :)

Well, if you're actually making such a comparison, you generally would only be interested in "==", not "is" -- it just so happens that "is" would also be True in SQLA because of the identity mapping.

Web2py could do that too.

Sure, but it could do it at the DAL level, without an ORM. So far, though, you haven't made a strong case for the utility of such a feature (I don't doubt that it can be helpful -- the question is how helpful).

An ORM is MUCH MORE intuitive, as my examples had demonstrated

Maybe I missed an example, but I don't think I saw any that were more intuitive at all, let alone "MUCH MORE" intuitive.

Anthony

unread,

May 6, 2013, 9:38:42 AM5/6/13

to

When comparing this:
city.Language is not spanish and city.Population.isGraterThan(1000000)
With this:
db.Country.Name == 'France' and db.Country.id == db.City.country

Well, we should compare your first line to the DAL equivalent:

(db.City.Language != spanish) & (db.City.Population > 1000000)

I actually find that latter easier to process. The parentheses and & make it easier to see there are two separate conditions, and the != and > are easier to pick out and comprehend than "is not" and ".isGreaterThan()". A non-programmer may have an easier time with the more English-like version (assuming they happen to speak English, of course), but I think it's reasonable to expect even novice programmers to understand the basic boolean operators. Whatever your opinion on the "beauty" of one over the other, though, surely this doesn't justify the massive undertaking of building an ORM, particularly since you would still have to know and use the underlying DAL syntax in addition anyway.

It's also worth noting that the SQLA ORM itself uses boolean operators for queries, just like the DAL -- it does not use these English equivalents.

Anthony

Alex

unread,

May 3, 2013, 10:01:27 AM5/3/13

to web...@googlegroups.com

Am Freitag, 3. Mai 2013 13:44:20 UTC+2 schrieb Arnon Marcus:

Sorry, but what should do .list() ?
it is a query? it is pre-fetched or cached?

The whole point of having an ORM, is for you to not have to worry about the answer to that question - it's gonna do the optimal thing it can, based on the circumstances in which it is called.

Here's the problem. In a perfect world you don't have to worry about what's going on behind the scenes. In a real application you need to know what exactly the ORM is doing, otherwise you'll run into problems sooner or later.

I had to work with an ORM (Hibernate) for many years. The ORM is very convenient but the "magic" which is going on in the background will cause you trouble (at least performance-wise). I'm really happy to use web2py and the DAL for my own project, I always know what exactly is going on and I never encountered a strange bug where I could not figure out what's going on. Not so with the ORM where I lost many hours debugging unexcepted behavior.

In my opinion it does not make sense to define how references are fetched on an object level (at least that's how it is done in Hibernate, don't know about SQLA). E.g. when I query a City object I can't say in advance if I always want to access the Country reference as well. In Hibernate single references have eager loading so they are always fetched. That's bad because when you don't need the reference there is unnecessary data fetched (and of course this is recursive, e.g. if Country has a single reference itself this will also be loaded). With lazy loading you often run into the problem that the session or transaction is not active anymore and the references cannot be accessed (just google for "Lazy Loading Exception"). Therefor we often loaded all those references when the object is queried to avoid running into a lazy loading exception later on - which is of course also not very good for performance. With the DAL I say exactly what I need (joining the tables I really need for my use case) and when, that's so much better and easier than depending on some ORM magic.

For me all the added internal complexity of an ORM is not worth the effort. The complexity is just shifted to a different layer. Maybe the videos and presentations of SQLA look nice and promising but you'll only find out about the disadvantages (which are not mentioned in the videos of course) once you develop a real world application.

Alex

Arnon Marcus

unread,

May 3, 2013, 10:59:05 AM5/3/13

to

Has anyone given you any Kool-Aid to drink recently?

I have no idea what you mean by that...

Famous last words.

I have no idea what you mean by that either...

What does france.City.Language = french do? france.City refers to all cities in France, so does this assign "French" as the language for all cities? Does SQLA employ that syntax?

Ooops, cought me there... :)

Obviously I meant:

>>> fance.City.Paris.Language = french

The ".Paris" part can be implemented via __getAttr__

The ".Language = french" part can be implemented via a Property.

It would obviously treat the primary-key internally, for relationships - but that's a pluming-level thing so it's implicit as a convenience layer.

But I like your idea - it could be implemented also! :)

In web2py, this would be:

french = db.Language.insert(Name='French') db.Country(france).City.update(Language=french)

Looks a lot like your example (maybe even a bit more explicit about what's going on).

It is not more explicit - it has the same level of explicitness of what's going on - just less pluming-level explicitness.

But the important distinction is that there is a hidden fundamental difference - in my approach it is actually making object-references (alongside the database insertion) so that can be used further-on in the code, at the very leas within the same transaction, but even across transactions. Again, this is a domain-model, not just a a database-model.

Internally, the values may get invalidated on transaction-sommit, but the object-references would persist within the current runtime.

Again, you have to break-away the stateless mind-set and appreciate statefullness. This assumes a very different execution-model than what you are used to in web2py. It is something that would happen within a module, not within a controller-action, so it is saved within the module-object across transactions/requests/sessions.

(Note, Mariano's example actually assumed language to be a country-level field.)

>>> [city for city in fance.City.list if \
city.Language is not spanish and city.Population.isGraterThan(1000000)]
[<City Name:Paris>]

The problem here is that you are doing all the filtering in Python rather than in the database. Not a big deal in this example, but with a large initial set of records, this is inefficient because you will return many unnecessary records from the database and Python will likely be slower to do the filtering.

Well, that depends a lot on the context in which this is executed.

If you have many queries similar to your example, before/around this line of code, that may reuse the same objects for other purposes (which is not uncommon)

It may in fact be slower to do it your way in many circumstances, because every specialized-query you are making is another round-trip to the database, which would be orders-of-magnitude slower than doing an eager-loading up-front, and filtering in python.

Also, you need to keep in mind that this is assuming a long-lasting set of objects that out-live a single transaction-operation.

Also, bare in mind that I am not suggesting to "replace" the DAL, only to "augment" it with a statefull layer on-top.

You could still do it your way whenever it would deem more efficient to do - meaning, if you have a single query to issue for that table within a transaction, and have a ahigh-degree of complexity/specificity to that query.

The benefits/trade-offs are not absolute/constant - they vary circumstantially.

It almost reads like plain English... Beautiful (!)

There are differing opinions on this. Some prefer symbols like != and > over plain English like "is not" and "isGreaterThan" because it is easier to scan the line and quickly discern the comparisons being made. In particular, I would much prefer both typing and reading ">" rather than ".isGreaterThan()".

You are right - there are different opinions, but the "Zen of Python" is conclusive. :)

Also, there are both performance AND memory benefits to using "is not". An object-id check is much faster that an equality check, and having the same object referenced by different names instead of having copies of it that need to be equality-tested, may save tons of memory.

But if you insist in using an ugly form, than in my example you may still do that - it would work just as well - while having the same memory-footprint benefits, just not the performance-benefits. :)

>>> france.City.where(
Language=not(spanish), Population=moreThan(1000000))

>>> france.City.FilterBy(
(City.Languase != spanish) & (City.Population > 1000000))

In web2py, you can already do:

db.Country(france).City( (db.City.Language != spanish) & (db.City.Population > 1000000)).select()

So far, for every example you have shown, web2py has fairly similar syntax.

But with radically-different semantics (!!!)

Sure, we can quibble over which is more "beautiful", but I haven't seen anything to justify building a massive ORM.

It is not just more beautiful - it is also faster for long-transactions, and takes less memory.

Also it results in developer code being much more readable and concise and hence much more maintainable.

And the downside of adding an ORM (aside from the time cost of development and maintenance) is that you would then have to learn yet another abstraction and syntax.

This doesn't deter people from using SQLA - how do you account for that?

Well, if you're actually making such a comparison, you generally would only be interested in "==", not "is" -- it just so happens that "is" would also be True in SQLA because of the identity mapping.

web2py is all for stealing great ideas wherever it finds them! :)

Sure, but it could do it at the DAL level, without an ORM. So far, though, you haven't made a strong case for the utility of such a feature (I don't doubt that it can be helpful -- the question is how helpful).

So, a smaller memory-footprint, better-performance, readability, conciseness, beauty and ease-of-maintenance for the developers, are all insufficient for your criterion of utility?

I wonder what is then...

Maybe I missed an example, but I don't think I saw any that were more intuitive at all, let alone "MUCH MORE" intuitive.

Are you seriously suggesting that this:

[city for city in fance.City.list if \

city.Language is not spanish and city.Population.isGraterThan(1000000)]

Is not much-more intuitive than this?:

db.Country(france).City(
(db.City.Language != spanish) & (db.City.Population >1000000)).select()

If so, than I have no idea what you're talking about...

What more, I believe you don't even know what you're talking about either...

Arnon Marcus

unread,

May 3, 2013, 11:04:19 AM5/3/13

to web...@googlegroups.com

I actually find that latter easier to process. The parentheses and & make it easier to see there are two separate conditions, and the != and > are easier to pick out and comprehend than "is not" and ".isGreaterThan()". A non-programmer may have an easier time with the more English-like version (assuming they happen to speak English, of course), but I think it's reasonable to expect even novice programmers to understand the basic boolean operators. Whatever your opinion on the "beauty" of one over the other, though, surely this doesn't justify the massive undertaking of building an ORM, particularly since you would still have to know and use the underlying DAL syntax in addition anyway.

Anthony

Again:

There are both performance AND memory benefits to using "is not". An object-id check is much faster that an equality check, and having the same object referenced by different names instead of having copies of it that need to be equality-tested, may save tons of memory.

Arnon Marcus

unread,

May 3, 2013, 11:20:34 AM5/3/13

to web...@googlegroups.com

I know a lot of people have been burned in production by many ORMs in the past - and those scares are what anyone suggesting an ORM today has to deal with.

SQLA is doing a very good job at that - have you seen the videos I posted here? They are talking about Hibernate, as well as the Active-Record pattern. The guy who wrote it has mentioned all of the worries you are talking about in those videos.

For example:

Eager-vsLazy loading is configurable - both are supported, and you can use each in you core, depending on circumstances.

As I said in the previouse message (I updated it since you posted this one). it depends - the trande-offs are circumstancial - that's why you need both approaches, and the ability to concigure each object to use one or the other in different circumstances - and that's what SQLA provides.

As for recursive-queries, it does not occur in eager-loading in SQLA.

SQLA features what it called a "cascade", which means it "traverses" through the attribute-accesses, and only fetches what you actually asked for (well, it's not really "traversing", there are events that are triggered on each object's attribute-access...) this is for both lazy and eager loading configurations. The cascade makes sure that you only get what you explicitly asked for, no more, no less.

As for "Magic", he is talking about that also.

SQLA is not hiding it's internals like many other ORMs are doing - it is highly configurable, with sane-defaults, so you can go as deep as you like and re-configure things the way you like.

In fact, this is not only "possible", but rather "advised" explicitly - at some areas it is even "mandatory". The explicit-support for configurability comes in the form of tools for automating manual-configuration, through meta-classes and mix-ins.

Anthony

unread,

May 6, 2013, 9:43:17 AM5/6/13

to

Has anyone given you any Kool-Aid to drink recently?

I have no idea what you mean by that...

http://en.wikipedia.org/wiki/Drinking_the_Kool-Aid

Famous last words.

I have no idea what you mean by that either...

http://idioms.yourdictionary.com/famous-last-words

french = db.Language.insert(Name='French') db.Country(france).City.update(Language=french)

Looks a lot like your example (maybe even a bit more explicit about what's going on).

It is not more explicit - it has the same level of explicitness of what's going on - just less pluming-level explicitness.

More explicit because it makes it clear we are doing an update.

But the important distinction is that there is a hidden fundamental difference - in my approach it is actually making object-references (alongside the database insertion) so that can be used further-on in the code,

You can save the object and refer to it later in the code in web2py as well.

Again, you have to break-away the stateless mind-set and appreciate statefullness.

I understand the distinction. You simply haven't yet demonstrated a compelling use case for the latter.

This assumes a very different execution-model than what you are used to in web2py. It is something that would happen within a module, not within a controller-action, so it is saved within the module-object across transactions/requests/sessions.

No, in web applications, SQLA sessions last only as long as a single web request -- basically the same as in web2py.

>>> [city for city in fance.City.list if \
city.Language is not spanish and city.Population.isGraterThan(1000000)]
[<City Name:Paris>]

The problem here is that you are doing all the filtering in Python rather than in the database. Not a big deal in this example, but with a large initial set of records, this is inefficient because you will return many unnecessary records from the database and Python will likely be slower to do the filtering.

Well, that depends a lot on the context in which this is executed.
If you have many queries similar to your example, before/around this line of code, that may reuse the same objects for other purposes (which is not uncommon)

It may in fact be slower to do it your way in many circumstances, because every specialized-query you are making is another round-trip to the database, which would be orders-of-magnitude slower than doing an eager-loading up-front, and filtering in python.

Yes, and in that case, you can do something exactly like your code above in web2py (i.e., filtering in Python).

Also, you need to keep in mind that this is assuming a long-lasting set of objects that out-live a single transaction-operation.

Not typically. In most cases, you will probably have one transaction per request in both SQLA and web2py. Furthermore, even if you have multiple transactions within a request, SQLA will expire the state of any instances whenever a transaction is committed.

Also, bare in mind that I am not suggesting to "replace" the DAL, only to "augment" it with a statefull layer on-top.

Why not augment it with statefulness at the DAL level? Why do you need a layer on top?

The benefits/trade-offs are not absolute/constant - they vary circumstantially.

Yes, it would help to understand the circumstances in which your preferred features offer substantial benefits.

It almost reads like plain English... Beautiful (!)

There are differing opinions on this. Some prefer symbols like != and > over plain English like "is not" and "isGreaterThan" because it is easier to scan the line and quickly discern the comparisons being made. In particular, I would much prefer both typing and reading ">" rather than ".isGreaterThan()".

You are right - there are different opinions, but the "Zen of Python" is conclusive. :)

Where in the "Zen of Python" does it say that English words are more beautiful than boolean operators when expressing boolean logic?

Also, there are both performance AND memory benefits to using "is not". An object-id check is much faster that an equality check, and having the same object referenced by different names instead of having copies of it that need to be equality-tested, may save tons of memory.

If you're talking about building queries, your point is moot -- the operations happen in the database, not Python. As for comparisons in Python, in web2py, you wouldn't be testing equality of a whole object/record -- typically it would be a scalar (e.g., the integer ID). And you wouldn't have multiple copies of records in memory either.

>>> france.City.where(
Language=not(spanish), Population=moreThan(1000000))

>>> france.City.FilterBy(
(City.Languase != spanish) & (City.Population > 1000000))

In web2py, you can already do:

db.Country(france).City( (db.City.Language != spanish) & (db.City.Population > 1000000)).select()

So far, for every example you have shown, web2py has fairly similar syntax.

But with radically-different semantics (!!!)

So what?

Sure, we can quibble over which is more "beautiful", but I haven't seen anything to justify building a massive ORM.

It is not just more beautiful - it is also faster for long-transactions, and takes less memory.

How so? Do you have benchmarks?

Also it results in developer code being much more readable and concise and hence much more maintainable.

You haven't shown any examples that are more readable or concise, and certainly not more maintainable.

And the downside of adding an ORM (aside from the time cost of development and maintenance) is that you would then have to learn yet another abstraction and syntax.

This doesn't deter people from using SQLA - how do you account for that?

Not entirely sure, as I have not done a sociological study of SQLA users. Perhaps many folks are using primarily just Core or primarily just the ORM and don't worry much about the layer they don't use. It could also be that there are some features they need in the ORM that aren't available or as easy in the Core. That doesn't mean such features have to be implemented via an ORM, though.

Sure, but it could do it at the DAL level, without an ORM. So far, though, you haven't made a strong case for the utility of such a feature (I don't doubt that it can be helpful -- the question is how helpful).

So, a smaller memory-footprint, better-performance, readability, conciseness, beauty and ease-of-maintenance for the developers, are all insufficient for your criterion of utility?

You haven't made any case that there would be a smaller memory footprint or better performance let alone quantify these hypothetical improvements. I have seen no improvements in readability, conciseness, or ease of maintenance either -- at least nothing that would remotely justify a massive development and maintenance effort for the framework and the cognitive overhead of learning another abstraction for its users.

Anyway, if you just want a little syntactic sugar, that could be built into the DAL without requiring an entire new layer on top of it (and you could probably create the sugar yourself quite easily so you could make it exactly to your taste). Check out http://gluonframework.wordpress.com/2010/07/30/web2py-and-metaclasses/.

Are you seriously suggesting that this:

[city for city in fance.City.list if \
city.Language is not spanish and city.Population.isGraterThan(1000000)]

Is not much-more intuitive than this?:

db.Country(france).City(
(db.City.Language != spanish) & (db.City.Population >1000000)).select()

Those two lines are doing different things. You can also do the first one in web2py:

[city for city in db.Country(france).select() if \
city.Language != spanish and city.Population > 1000000]

What more, I believe you don't even know what you're talking about either...

Please correct me on anything you believe I have said that is mistaken.

Anthony

Arnon Marcus

unread,

May 3, 2013, 12:12:53 PM5/3/13

to web...@googlegroups.com

Sadly, there is NO possible way to archive a pythonic list
comprenhension syntax that query the objects in the server side AFAIK
Some libraries uses dis (python disassembler) and other internal &
dirty python hacks to do something similar, please, see:

http://www.aminus.net/geniusql/

http://www.aminus.org/blogs/index.php/2008/04/22/linq-in-python?blog=2

http://www.aminus.net/geniusql/chrome/common/doc/trunk/managing.html

Note that web2py is much closer to the "pythonic expressions", but
without the early / late binding and other issues described there ;-)

I didn't know about this, but it's not what I meant.

It might be a cool concept, but I'm not sure how viable it would be in production. I haven't read everything yet, I'll read it some other time.

In my example, there was no "magic" involved, beyond what I've described:

Just to clarify, again, the following syntax:

[city for city in fance.City.list if \
city.Language is not spanish and city.Population.isGraterThan(1000000)]

Is not a pie-in-the-sky dream-API - there are implementational details that are viable for each section.
First, the reason the "is not" would work, is that there would exist an implementation of 'Identity Mapping" that would take care of having a singleton for each ORM-object representing each row.
Second, the access to france.City.list is viable, as an attribute-access is totally customization in python, so we can devise anything we want for the access to do.
For example, it could return an iterator, if the "City" attribute of the "france" ORM-object is "valid" (meaning, it is cached and not "invalidated" by a previous transaction-commit), and do a "query" right-there on the spot, and "return" an iterator, if the "City" attribute of the "france" ORM-object is "invalid" at the time of the "list" attribute access.

it was a simple list-comprehension.

I "assumes" the data is there - if it isn't, it goes and fetches it.

Here is a quote from one of the comments in your second link, that I really resonate with:

I think the distinction between:

Customer.select

and

source.Customers

is quite important as the developer may be thinking quite differently in each. In the first, I am thinking that I'm grabbing objects from some store that is linked with the class Customer via an ORM. In the latter, I may think the same way, but I personally like to think of it as "I've got this object that has many customers and now I can play with them". That collection may be backed in a database, or it may just be a collection.

None of this is not already accomplished either in Python (generators or list comprehensions) or via the many community modules that address object-relational mapping or extend what itertools already gives you. However, what I can't find in Python is a unified API regardless of what the underlying mechanism of keeping my data is. To me, it doesn't matter if I'm calling into a for loop or generating a monster SQL statement. I'd rather write the same code regardless of which of these is going to happen under the hood.

As for the performance-issue, I've later said this:

Well, that depends a lot on the context in which this is executed.
If you have many queries similar to your example, before/around this line of code, that may reuse the same objects for other purposes (which is not uncommon)
It may in fact be slower to do it your way in many circumstances, because every specialized-query you are making is another round-trip to the database, which would be orders-of-magnitude slower than doing an eager-loading up-front, and filtering in python.

Also, you need to keep in mind that this is assuming a long-lasting set of objects that out-live a single transaction-operation.

And this:

Eager-vs-Lazy loading is configurable - both are supported, and you can use each in you core, depending on circumstances.

Anthony

unread,

May 3, 2013, 12:14:20 PM5/3/13

to web...@googlegroups.com

OK, then, again:

Anthony

unread,

May 3, 2013, 12:20:33 PM5/3/13

to web...@googlegroups.com

Eager-vsLazy loading is configurable - both are supported, and you can use each in you core, depending on circumstances.

As I said in the previouse message (I updated it since you posted this one). it depends - the trande-offs are circumstancial - that's why you need both approaches, and the ability to concigure each object to use one or the other in different circumstances - and that's what SQLA provides.

In web2py, we have lazy loading for individual references and referencing sets, and when that is inefficient, we would just do an explicit join. I think an eager loading option on the individual references would be a cool addition, though that wouldn't require an ORM layer.

As for recursive-queries, it does not occur in eager-loading in SQLA.
SQLA features what it called a "cascade", which means it "traverses" through the attribute-accesses, and only fetches what you actually asked for (well, it's not really "traversing", there are events that are triggered on each object's attribute-access...) this is for both lazy and eager loading configurations. The cascade makes sure that you only get what you explicitly asked for, no more, no less.

When you eager load in SQLA, it doesn't know ahead of time what attributes you will access, so it fetches everything (though in just one, or possibly two queries, depending on the eager method used). Lazy loading requires a query for each attribute accessed, just as in web2py.

Anthony

Arnon Marcus

unread,

May 3, 2013, 12:27:45 PM5/3/13

to web...@googlegroups.com

If you're talking about building queries, your point is moot -- the operations happen in the database, not Python.

I don't know what you mean by this...

For complex-queries, I would still be using web2py's DAL layer, even if I had this ORM on top. I would either use them outside the ORM, or insert them into the ORM.

My example was not meant to show that you should do complex queries in Python - that would obviously be absurd, and you don't even need a relational database for that - any NoSQL one would do fine.

I was giving an example to a case in which I have a simple comparison to make, and want to reuse the ORM objects I already have for it.

The DAL is not built for that.

As for comparisons in Python, in web2py, you wouldn't be testing equality of a whole object/record -- typically it would be a scalar (e.g., the integer ID).

That would still be much slower than an equality-test - especially for large data-sets.

And you wouldn't have multiple copies of records in memory either.

How so?

If I do this:

row1 = db.Country[1]

...

row2 = db.Country[1]

Would I then get:

row1 is row2 == True

?

How about:

row1.Name is row2.Name == True

?

I would be surprised to find out this is the case...

Arnon Marcus

unread,

May 3, 2013, 12:52:54 PM5/3/13

to web...@googlegroups.com

In web2py, we have lazy loading for individual references and referencing sets, and when that is inefficient, we would just do an explicit join. I think an eager loading option on the individual references would be a cool addition, though that wouldn't require an ORM layer.

We've been though this already, Anthony, Laziness in web2py has a different meaning - it uses the same word, but it means something completely different.

For example, you statement:

Lazy loading requires a query for each attribute accessed, just as in web2py.

Is plain false.

There is another mechanism that exists in SQLA that you are not accounting for, and it is statefullness - a caching-mechanism, using the 'Unit of Work' pattern. I don't know how many times and in how many forms I need to say this.

In SQLA, using a LazyLoading configuration on an attribute, would mean that on the first time (within a transaction) that your code accesses this attribute, it would issue a query to the database.

The "Lazyness" here is defined in terms of "when you FIRST access the attribute, and NOT BEFORE" - but as to what happens AFTER that, THEN the caching-mechanism kicks in - if the attribute has not been invalidated within the same transaction, then the LazyLoading is NOT EVEN ACTIVATED (!) You get the value IMMEDIATELY from the attribute-cache.

EagerLoading is therefore a way to configure the attribute to be queried from the database EXPLICITLY, but even in THAT case, that would only apply ONCE within a transaction. Every subsequent accesses to that attribute would use the cached-value just as well.

As for recursive-queries, it does not occur in eager-loading in SQLA. When you eager load in SQLA, it doesn't know ahead of time what attributes you will access, so it fetches everything (though in just one, or possibly two queries, depending on the eager method used).

You are right about that - my mistake - I meant LazyLoading there.

But if I understood correctly, than since the Lazynes vs. Eagerness can be configured on a per-attribute basis, you don't necessarily have to load the whole table - you could configure just the fields you want.

If you need a complex query for eager-loading, you can simply execute it at the beginning of your transaction - explicitly.

You don't even have to store the results anywhere yourself - they are automatically filled-in within their respective attribute-caches across the object-graph. Every access to those attributes thereafter, within the same transaction, would just use those values - regardless of the Lazyness/Eagerness configurations.

Anthony

unread,

May 3, 2013, 12:54:03 PM5/3/13

to web...@googlegroups.com

If you're talking about building queries, your point is moot -- the operations happen in the database, not Python.

I don't know what you mean by this...
For complex-queries, I would still be using web2py's DAL layer, even if I had this ORM on top. I would either use them outside the ORM, or insert them into the ORM.

Not sure what you mean here -- you can do simple queries in the db as well as complex filtering in Python -- they are orthogonal considerations.

My example was not meant to show that you should do complex queries in Python - that would obviously be absurd, and you don't even need a relational database for that - any NoSQL one would do fine.

Now you've really lost me -- what does any of this have to do with RDBMS vs. NoSQL? And why shouldn't you do complex filtering in Python?

I was giving an example to a case in which I have a simple comparison to make, and want to reuse the ORM objects I already have for it.
The DAL is not built for that.

Not sure what you mean. The DAL returns a Rows object. You can filter it with any simple or complex comparison you like. It even has a .find() method to simplify the process:

db.Country(france).select().find(lambda r: r.Language != spanish and r.Population > 1000000)

As for comparisons in Python, in web2py, you wouldn't be testing equality of a whole object/record -- typically it would be a scalar (e.g., the integer ID).

That would still be much slower than an equality-test - especially for large data-sets.

OK, please provide some benchmarks. What percentage decrease in CPU usage can we expect if we compare object identities rather than integer equivalencies?

And you wouldn't have multiple copies of records in memory either.

How so?

If I do this:

row1 = db.Country[1]
...
row2 = db.Country[1]

And why are you doing that?

Anthony

unread,

May 6, 2013, 9:48:27 AM5/6/13

to

In web2py, we have lazy loading for individual references and referencing sets, and when that is inefficient, we would just do an explicit join. I think an eager loading option on the individual references would be a cool addition, though that wouldn't require an ORM layer.

We've been though this already, Anthony, Laziness in web2py has a different meaning - it uses the same word, but it means something completely different.

I'm not going to repeat what I already said -- please go back and read it earlier in this thread. In my statement above, the term "lazy" means exactly the same thing as it does in SQLA and that you mean when you say it -- the database query is deferred until you access the attribute.

Lazy loading requires a query for each attribute accessed, just as in web2py.

Is plain false.

Or, it's plain true. In SQLA, if you specify lazy loading of relationships (which is the default), the query is deferred until the first time you access the attribute, and there is therefore a query for each attribute accessed. This is in contrast to eager loading, which does a single query to populate all attributes (whether or not they are ever accessed).

There is another mechanism that exists in SQLA that you are not accounting for, and it is statefullness - a caching-mechanism, using the 'Unit of Work' pattern. I don't know how many times and in how many forms I need to say this.

No need, as nothing I have said has contradicted this.

Anthony

Arnon Marcus

unread,

May 3, 2013, 1:15:41 PM5/3/13

to web...@googlegroups.com

Not sure what you mean here -- you can do simple queries in the db as well as complex filtering in Python -- they are orthogonal considerations.

I CAN, but I SHOULDN'T want to...

RDBMS are built for complex filtering - this is (part) of what SQL is allabout - I wouldn't want to dismiss that - it would be a bad choice all-around.

Conversely, simple-filtering is way too verbose using the DAL - it's an overkill for that, and makes the code much less readable. Again, it's a circumstancial-trade-off: For complex filtering, the price of verbosity in using the DAL is worth it, because the performance-benefits are just too high.

For simple filtering, well, I'd rather do it in python and get readability, becuase the performance-benefits are negligible.

Now you've really lost me -- what does any of this have to do with RDBMS vs. NoSQL? And why shouldn't you do complex filtering in Python?

See above.

Not sure what you mean. The DAL returns a Rows object. You can filter it with any simple or complex comparison you like. It even has a .find() method to simplify the process:

db.Country(france).select().find(lambda r: r.Language != spanish and r.Population > 1000000)

That's actully pretty nice - I didn't know I can do that - but what would the "france" and "spanish" objects be in this case? Ids?

OK, please provide some benchmarks. What percentage decrease in CPU usage can we expect if we compare object identities rather than integer equivalencies?

Really? You think I need to?

Identity-checking is a built-in and operates within the VM-level.

Equality-checks, even for the simplest of objects, are much more complex internally, and are not exclusively operating on the VM-core level - there are object-attributes and value-type-checking involved, etc.

Anthony

unread,

May 6, 2013, 9:47:35 AM5/6/13

to

RDBMS are built for complex filtering - this is (part) of what SQL is allabout - I wouldn't want to dismiss that - it would be a bad choice all-around.

A complex filter on a small set of items might be faster in Python than doing another database hit. And a simple filter might belong in the db if it has to go over lots of records. As I said, these are orthogonal considerations.

Conversely, simple-filtering is way too verbose using the DAL - it's an overkill for that, and makes the code much less readable.

Don't know why you think that.

For simple filtering, well, I'd rather do it in python and get readability, becuase the performance-benefits are negligible.

But I thought you were a fan of achieving negligible performance benefits at great cost (see below).

Now you've really lost me -- what does any of this have to do with RDBMS vs. NoSQL? And why shouldn't you do complex filtering in Python?

See above.

Still don't know why you would want a NoSQL database or what it has to do with this topic.

db.Country(france).select().find(lambda r: r.Language != spanish and r.Population > 1000000)

That's actully pretty nice - I didn't know I can do that - but what would the "france" and "spanish" objects be in this case? Ids?

Well, you've sure made a lot of claims about what web2py needs without knowing much about what it already has. Anyway, those are ids. If they were rows, then you could just do france.id and spanish.id.

OK, please provide some benchmarks. What percentage decrease in CPU usage can we expect if we compare object identities rather than integer equivalencies?

Really? You think I need to?

Yes, I think you need to. If this is only going to save a half a second of CPU time per day, I'm not going to build an ORM to get it. The question isn't how much faster the identity check is (and I don't think it's that much faster) -- the question is how much of your overall application CPU time is spent doing this kind of thing?

Anthony

Arnon Marcus

unread,

May 3, 2013, 1:50:00 PM5/3/13

to web...@googlegroups.com

Or, it's plain true. In SQLA, if you specify lazy loading of relationships (which is the default), the query is deferred until the first time you access the attribute, and there is therefore a query for each attribute accessed. This is in contrast to eager loading, which does a single query to populate all attributes (whether or not they are ever accessed).

I was interpreting your statement of "every access" to mean "every time you access the same attribute" and not to mean "every attribute you access".

Anthony, your use of language is sometimes ambiguous - it is not the first time I have misunderstood you - please be more specific next time. 10x :)

As for LazyLoading in web2py:

We discussed this in the context of virtual-fields, and we've established that Lazyness in that context was NOT a deferred-access to the database, but a deferred-computation of the results within the run-time heap.

Are you now referring to Laziness in web2py within a different context?

Say, like in:

db.Country(france).City.find(...)

?

Because I don't understand how you would consider to see that as "Lazy" - it is "implicit", yes, but a "Lazy access to the database" can only have meaning within the context of a statefull framework. If every query is accessing the database ANYWAYS, then where is the laziness in that case?

You mean a LazySet ?

Like in Recursive-selects?

person =  db.person(id)
for thing in person.thing.select(orderby=db.thing.name):
    print person.name, 'owns', thing.name

Well, again, it IS "implicit", but why call it "Lazy"?

The "so-called" LazySet is in "person.thing" ?

If so, than it isn't "Lazy", just "implicit" and that's a bad choice of name - good thing it isn't in the documentation, as it would have generated even more confusion than already exists there...

If not, then where is it? The thing.name ? If so, than It may be legitimate to call the "person.thing" lazy, but then I wouldn't want to use it.

The way I see it, the only meaning the term "Lazy" has for accessing-the-database, can exist within a statefull framework - which is, "in relation to eager-loading" that can only exist there.

Arnon Marcus

unread,

May 6, 2013, 9:52:38 AM5/6/13

to

A complex filter on a small set of items might be faster in Python than doing another database hit. And a simple filter might belong in the db if it has to go over lots of records. As I said, these are orthogonal considerations.

Perhaps, but again, we are talking about a context of a statefull system - we might already have some data on our object-graph - so it's more complicated then that - If we're talking about the first query of a transaction, we need to think about the context of that whole transaction - will we be using all of the fields in the subsequent attribute-accesses? How about the of the records? Do we need all of them for our access-pattern later on? How should we construct our query so it's optimal for re-use of the results in subsequent attribute-accesses of that same transaction? Such considerations do not even exist in a stateless system like web2py's DAL - it doesn't have the same kind of re-usability of returned data.

For example - If I am writing a code for a transaction that would later need to do a simple-filter on a large data-set, but that I also know I'm gonna need "some" of that data, and also some-other related data for some other attribute-access, than I should construct the first query in the transaction so it would do complex filtering, even if I have a large data-set, so I can have the data cached for me, for when I do the simple-filtering of that data in-python afterwards. As for that extra-data-reuse, it might not pertain to all the records I got, but it might do-pertain to more fields than my simple filtering needed.

So in that case, I might do the simple-filtering in python, even if a large-data-set is involved, because I am optimizing the number of queries for a wider-context.

Conversely, simple-filtering is way too verbose using the DAL - it's an overkill for that, and makes the code much less readable.

Don't know why you think that.

Because it is.

For simple filtering, well, I'd rather do it in python and get readability, becuase the performance-benefits are negligible.

But I thought you were a fan of achieving negligible performance benefits at great cost (see below).

Now you're being cynical...

Still don't know why you would want a NoSQL database or what it has to do with this topic.

You're barking on the wrong tree - It does not relate to this discussion, and I didn't say I need a NoSQL database - I don't.

I meant it as a hypothetical-alternative to an imaginary scenario of me doing ALL the filtering in python - for THAT I said "well I might-as-well use NoSql" as I would then not muster the benefits of a relational database. It was statement to emphesize why I wouldn't want to do complex filtering in Python "in general" - obviousely there are edge-cases as you alluded, and then there's the additional complexity of decision-making as I alluded, due to the introduction of statefull-caching/reuse of results.

Well, you've sure made a lot of claims about what web2py needs without knowing much about what it already has. Anyway, those are ids. If they were rows, then you could just do france.id and spanish.id.

I was simply avoiding making assumptions in that example, as there was no context for these variables in it.

Yes, I think you need to. If this is only going to save a half a second of CPU time per day, I'm not going to build an ORM to get it. The question isn't how much faster the identity check is (and I don't think it's that much faster) -- the question is how much of your overall application CPU time is spent doing this kind of thing?

Fine, don't make the "is not" usage a reason for an ORM - you may still benefit from an "Identity Mapper" in an ORM, in terms of memory-efficiency, even if you stick to your ugly "!="s and "=="s....

I wouldn't make my decision of having an "Identity Mapper" only for the usage of "is" and "is not" - in fact, it is rarely used even in SQLA - it was just an example of readability that can be harnesses "in addition to" the memory efficiency that an Identity-Mapper is providing.

For benchmarks on THAT, you may look for SQLAlchemy vs DJango if you like... I don't really case much for that - I just know it is obviousely better...

Philip Kilner

unread,

May 3, 2013, 2:25:23 PM5/3/13

to web...@googlegroups.com

Hi Arnon,

I really don't see that your requirements in this example are not a good
for web2py's DAL. I'd really like to understand what, if anything, I'm
missing here.

I'm originally an RDBMS developer, and what I think of as the
set-oriented approach of web2py's DAL is a much better fit for my mental
model of what my application is doing. Conversely, I find many ORM
solutions (and Active Record, come to that) to be an anti-pattern [1].

FWIW, my data is typically consumed by more than one application (e.g.
not just web2py), and I've tended to let web2py create and manage the
schema in its own way, and work in the db with views etc. to address
more complex or performance-critical functions, so perhaps my concerns
are different to yours. However, the examples you give don't seem to
take best advantage of the DAL to my eye.

Anthony provided some examples as to how one might do that, but you have
not responded directly to that post (apologies if I've missed your
response if it was elsewhere in the thread). I'd like to take the
liberty of quitting Anthony's responses and asking for your take on them.

On 02/05/13 22:20, Arnon Marcus wrote:
> Using the DAL, the best you might get is:
> >>> [city for city in db.City.Country.select() if city.Country.Name ==
> ''France']
> [<Row Name:Paris>, <Row Name:Nice>]
>

Anthony's suggestion: -

>>> db.Country(name='France').City.select()

> >>> europe = Continent(Name='Europe')
> >>> france = Country(Name='France', Continent=europe)
> >>> paris = City(Name='Paris', Country=france)
> >>> nice = City(Name='Nice', Country=france)
> >>>
> >>> europe.Country(Name='France') is france
> True
>>>> france.City(Name='Paris') is paris
> True
> >>> europe.Country(Name='France').City.list
> [<City Name:Paris>, <City Name:Nice>]
>

Anthony's suggestion: -

>>> europe = db.Continent.insert(Name='Europe')

>>> france = db.Country.insert(Name='France', Continent=europe)
>>> paris = db.City.insert(Name='Paris', Country=france)

>>> nice = db.City.insert(Name='Nice', Country=france)
>>> db.Country(france).City.select()

[<Row Name:Paris>, <Row Name:Nice>]

> This would be so intuitive and easy to use...
>

Can you expand on this statement and help me understand how this
proposed syntax is any more intuitive or simpler than the web2py
suggestions, which to my eye are clearer by dint of being more explict?

Am I missing something here?

[1] http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern

--

Regards,

PhilK

e: ph...@xfr.co.uk - m: 07775 796 747

'work as if you lived in the early days of a better nation'
- alasdair gray

Anthony

unread,

May 6, 2013, 9:56:04 AM5/6/13

to

Or, it's plain true. In SQLA, if you specify lazy loading of relationships (which is the default), the query is deferred until the first time you access the attribute, and there is therefore a query for each attribute accessed. This is in contrast to eager loading, which does a single query to populate all attributes (whether or not they are ever accessed).

I was interpreting your statement of "every access" to mean "every time you access the same attribute" and not to mean "every attribute you access".

I said, "Lazy loading requires a query for each attribute accessed...". That is a fairly precise statement.

Anthony, your use of language is sometimes ambiguous - it is not the first time I have misunderstood you - please be more specific next time. 10x :)

Interesting that you attribute your misunderstandings to my lack of clarity. Is that the only possibility?

As for LazyLoading in web2py:
We discussed this in the context of virtual-fields, and we've established that Lazyness in that context was NOT a deferred-access to the database, but a deferred-computation of the results within the run-time heap.

No, we didn't establish that at all. Please review the thread.

Are you now referring to Laziness in web2py within a different context?

No.

Say, like in:

db.Country(france).City.find(...)
?

Because I don't understand how you would consider to see that as "Lazy" -

That is not proper web2py code, so I don't see it as anything.

but a "Lazy access to the database" can only have meaning within the context of a statefull framework.

No. In web2py, if a Row object contains a reference attribute, a query is fired when the attribute is accessed, not when the Row object was first created. This is lazy.

person =  db.person(id)
for thing in person.thing.select(orderby=db.thing.name):
    print person.name, 'owns', thing.name

Well, again, it IS "implicit", but why call it "Lazy"?

It's actually not even implicit there -- you explicitly call .select(). Anyway, "thing" is an attribute of the "person" object, but there is no database query to retrieve the "things" when the "person" object is first created -- instead, the query for "things" is deferred until actually needed.

The "so-called" LazySet is in "person.thing" ?

Yes.

If so, than it isn't "Lazy", just "implicit" and that's a bad choice of name...

Agree to disagree.

- good thing it isn't in the documentation, as it would have generated even more confusion than already exists there...

More confusion among whom?

The way I see it, the only meaning the term "Lazy" has for accessing-the-database, can exist within a statefull framework - which is, "in relation to eager-loading" that can only exist there.

No, web2py could implement eager loading of these Reference and LazySet attributes by doing a single query and filling in all the attributes with the retrieved values. This would be in contrast to the current behavior, which is lazy (i.e., the values are not retrieved when the Row objects are initially created but at some later time upon access/explicit request).

Anthony

Derek

unread,

May 3, 2013, 3:24:22 PM5/3/13

to web...@googlegroups.com

Let's take a look at identity comparison vs integer comparison...

import timeit

setup = """

def isequalItems(itemone, itemtwo):

return itemone is itemtwo

def isequalInts(itemone, itemtwo):

return itemone == itemtwo

def testOne():

a = 1

b = 2

isequalItems(a,b)

def testTwo():

a = 1

b = 2

isequalInts(a,b)

"""

print "isequalitems", timeit.timeit(stmt="testOne()", setup=setup, number=10000000)

print "isequalints", timeit.timeit(stmt="testTwo()", setup=setup, number=10000000)

I get

isequalitems 2.77487170111

isequalints 2.73482146489

So, integer comparison is faster. This is with Python 2.7.

PyPy 1.9...

isequalitems 0.067024457849

isequalints 0.0263884617855

Integer comparison is still faster.

On Friday, May 3, 2013 10:36:10 AM UTC-7, Anthony wrote:

RDBMS are built for complex filtering - this is (part) of what SQL is allabout - I wouldn't want to dismiss that - it would be a bad choice all-around.

A complex filter on a small set of items might be faster in Python than doing another database hit. And a simple filter might belong in the db if it has to go over lots of records. As I said, these are orthogonal considerations.

Conversely, simple-filtering is way too verbose using the DAL - it's an overkill for that, and makes the code much less readable.

Don't know why you think that.

For simple filtering, well, I'd rather do it in python and get readability, becuase the performance-benefits are negligible.

But I thought you were a fan of achieving negligible performance benefits at great cost (see below).

Now you've really lost me -- what does any of this have to do with RDBMS vs. NoSQL? And why shouldn't you do complex filtering in Python?

See above.

Still don't know why you would want a NoSQL database or what it has to do with this topic.

db.Country(france).select().find(lambda r: r.Language != spanish and r.Population > 1000000)

That's actully pretty nice - I didn't know I can do that - but what would the "france" and "spanish" objects be in this case? Ids?

Well, you've sure made a lot of claims about what web2py needs without knowing much about what it already has. Those are ids. If they were rows, then you would just do france.id and spanish.id.

OK, please provide some benchmarks. What percentage decrease in CPU usage can we expect if we compare object identities rather than integer equivalencies?

Really? You think I need to?

Yes, I think you need to. If this is only going to save a half a second of CPU time per day, I'm not going to build an ORM to get it. The question isn't how much faster the identity check is (and I don't think it's that much faster) -- the question is how much of your overall application CPU time is spent doing this kind of thing?

Anthony

Anthony

unread,

May 6, 2013, 9:58:59 AM5/6/13

to

A complex filter on a small set of items might be faster in Python than doing another database hit. And a simple filter might belong in the db if it has to go over lots of records. As I said, these are orthogonal considerations.

Perhaps, but again, we are talking about a context of a statefull system - we might already have some data on our object-graph - so it's more complicated then that...

No, your statement had nothing to do with any of this. Anyway, sounds like you now agree with me that it depends on the context.

- If we're talking about the first query of a transaction, we need to think about the context of that whole transaction - will we be using all of the fields in the subsequent attribute-accesses? How about the of the records? Do we need all of them for our access-pattern later on? How should we construct our query so it's optimal for re-use of the results in subsequent attribute-accesses of that same transaction? Such considerations do not even exist in a stateless system like web2py's DAL - it doesn't have the same kind of re-usability of returned data.

I think it would help if you are more precise about what you mean by stateless and stateful (preferably with code examples). I don't see why these considerations would not be applicable to web2py. I also don't know why you say web2py lacks re-usability of returned data. Perhaps you could offer some examples.

For simple filtering, well, I'd rather do it in python and get readability, becuase the performance-benefits are negligible.

But I thought you were a fan of achieving negligible performance benefits at great cost (see below).

Now you're being cynical...

A bit sarcastic, but a serious point -- in one breath you claim to care deeply about what is probably a negligible performance benefit, and in the next you are willing to tolerate some inefficiency. It appears you are being disagreeable for the sake of being disagreeable rather than trying to progress the discussion.

I meant it as a hypothetical-alternative to an imaginary scenario of me doing ALL the filtering in python - for THAT I said "well I might-as-well use NoSql"

OK, got it.

Well, you've sure made a lot of claims about what web2py needs without knowing much about what it already has. Those are ids. If they were rows, then you would just do france.id and spanish.id.

I was simply avoiding making assumptions in that example, as there was no context for these variables in it.

I was referring to the fact that you didn't know about the .find() method. You also didn't seem to know much about recursive selects, virtual and method fields, etc.

you may still benefit from an "Identity Mapper" in an ORM, in terms of memory-efficiency

Why do you need an ORM to have an identity mapper? And how much benefit are you expecting here? Do you have an example of where this would create big savings?

even if you stick to your ugly "!="s and "=="s....

Ouch, you better tell Guido to change the "equals" and "not equals" operators in Python.

- it was just an example of readability that can be harnesses

"is not" is very readable within English prose, but it is not more readable in code (though it can be fairly readable with proper syntax highlighting).

Anthony