Is it a bad idea to avoid to use backref and back_populates ?

932 views
Skip to first unread message

Sven

unread,
Nov 8, 2017, 5:05:37 AM11/8/17
to sqlalchemy
Hello,

I am actually working on my previous post about the OrderedDict and the None values and I met new problems and new questions. I didn't found answers on the Internet, so here I am !

As explained in the official documentation (or in the following topic : https://groups.google.com/forum/#!msg/sqlalchemy/2dVQTvzmi84/8J8lGpLfw4EJ), using backref or back_populates on relationship "sets up the "synchronization" of the two directions, which consists of an event that performs the complimentary set/append operation when an append/set occurs on the other side".

Here is the example gived by the topic previously mentionned :

p = Parent('parent1')
c
= Child('child1')
c
.parent = p
p
.children.append(c)


p's children will contain c twice.

In deed, the "c.parent = p" operation results in the "append" on the other side, and vice versa.

In my case, I'm working on an existing project which contains a lot of classes and the methods are already designed in order to add the objects on the two sides. It is so because the program is using Pickle to persist everything (my objectiv is to replace it by SQLAlchemy). Of course, I could adapt the code but it will be a lot of work and I don't like the idea that my structures are automatically synchronized without explicit declarations. Furthermore, what will happen if I decide, one day, to stop using SQLAlchemy ? My code would be too dependent on the fonctionnalities provided by SQLAlchemy.

So, my questions are :

1) Is it feasible to avoid to use backref or back_populates ? Would it be a bad idea to work without these fonctionnalities ? Will I face for example inconsistent state of the program ?
2) Is it possible to use back_populates and allow SQLAlchemy to detect that it should not append something which was already inserted ?


Thank you very much !

Regards,

Sven


Mike Bayer

unread,
Nov 8, 2017, 9:46:08 AM11/8/17
to sqlal...@googlegroups.com
On Wed, Nov 8, 2017 at 5:05 AM, Sven <sven....@gmail.com> wrote:
> Hello,
>
> I am actually working on my previous post about the OrderedDict and the None
> values and I met new problems and new questions. I didn't found answers on
> the Internet, so here I am !
>
> As explained in the official documentation (or in the following topic :
> https://groups.google.com/forum/#!msg/sqlalchemy/2dVQTvzmi84/8J8lGpLfw4EJ),
> using backref or back_populates on relationship "sets up the
> "synchronization" of the two directions, which consists of an event that
> performs the complimentary set/append operation when an append/set occurs on
> the other side".
>
> Here is the example gived by the topic previously mentionned :
>
> p = Parent('parent1')
> c = Child('child1')
> c.parent = p
> p.children.append(c)
>
>
> p's children will contain c twice.
>
> In deed, the "c.parent = p" operation results in the "append" on the other
> side, and vice versa.

so the super-duper best way to fix for the above use case, if
possible, would be to use collection_class=set rather than list.
sets are all around better for relationship() and if I was writing
SQLAlchemy today this would have been the default (back in python 2.3,
sets were kind of an exotic feature .... :) )



>
> In my case, I'm working on an existing project which contains a lot of
> classes and the methods are already designed in order to add the objects on
> the two sides. It is so because the program is using Pickle to persist
> everything (my objectiv is to replace it by SQLAlchemy).

Are you seeing objects that are being unpickled, and then are firing
off SQLAlchemy events such that collections are being populated twice?
That's not supposed to happen. Backrefs don't get in the way of
normal pickling / unpickling of collections and attributes, that's how
major functionality like the whole dogpile.cache example work.

Otherwise, I'm not sure how pickling relates to the above code sample
where ".append()" is being explicitly called.


Of course, I could
> adapt the code but it will be a lot of work and I don't like the idea that
> my structures are automatically synchronized without explicit declarations.

backref/back_populates are explicit declarations :)

> Furthermore, what will happen if I decide, one day, to stop using SQLAlchemy
> ? My code would be too dependent on the fonctionnalities provided by
> SQLAlchemy.

if you are using the ORM, then you're buying into the class-state
management solution provided by the ORM as well. There are hooks to
entirely replace this state management system with your own system,
but building out on that while trying to make your class models
completely agnostic of the fact that something has given them new
attributes and features would be difficult (see
http://docs.sqlalchemy.org/en/latest/_modules/examples/custom_attributes/custom_management.html
for an illustration of these systems).

The idea that an application model can be developed against an ORM and
then be completely swapped out to not use an ORM without alterations
to the usage of the model has long been a popular ideal but I don't
think it has ever played out in practice. ORMs in all languages apply
a lot of assumptions to a model, and in Python, this is even more
prevalent since Python allows for all the great metaprogramming
features which mean you aren't having to write getter/setters across
all your models, among other things.

If you truly are trying to build out an easy stepping stone out of
ORMs later on, there are two design patterns that come to mind. One is
that you don't change your model at all, and you instead create an
entirely separate model that represents your database mapping. A
mediation layer then transfers state between your "default" model and
the ORM-enabled model. FWIW a large swath of Openstack, which I get
paid to work on, works sort of this way. it's a common pattern that
there is some kind of public-facing "API" model, e.g. the one that
gets converted into JSON and such for services, and the internal
"business" model that knows how to be persisted and is more
application-specific.

The other pattern is to not use the ORM and just use Core. This is
also a very popular pattern, and a big reason SQLAlchemy Core exists
is specifically to allow all of SQLAlchemy's great database features
to be available to projects that aren't comfortable wiring the whole
object model to SQLAlchemy, or projects that don't really use an
OO-style object model in the first place.

>
> So, my questions are :
>
> 1) Is it feasible to avoid to use backref or back_populates ? Would it be a
> bad idea to work without these fonctionnalities ? Will I face for example
> inconsistent state of the program ?

You can go this road for one-to-many and many-to-one although this is
not a well-traveled use case.

For a many-to-many with "secondary", there can still be some conflicts
because the unit of work uses "backref" to look at the fact that the
two sides represent the same collection to resolve duplicate events
that occur on both sides.


> 2) Is it possible to use back_populates and allow SQLAlchemy to detect that
> it should not append something which was already inserted ?

using collection_class=set is by far the easiest way to do this.


>
>
> Thank you very much !
>
> Regards,
>
> Sven
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Sven

unread,
Nov 8, 2017, 1:09:52 PM11/8/17
to sqlalchemy
Thank you for your answer. It is always very complete and interesting. 

"so the super-duper best way to fix for the above use case, if 
possible, would be to use collection_class=set rather than list. 
sets are all around better for relationship() and if I was writing 
SQLAlchemy today this would have been the default (back in python 2.3, 
sets were kind of an exotic feature .... :)  )"

The solution with the set works with the specific code sample I gave but what about the other possible situations (when having a set is not possible) ? I think that I will possibly meet very different type of situations and I was more interested to know what would be the good way to proceed in general, not in particular cases.

"Are you seeing objects that are being unpickled, and then are firing 
off SQLAlchemy events such that collections are being populated twice? 
  That's not supposed to happen.   Backrefs don't get in the way of 
normal pickling / unpickling of collections and attributes, that's how 
major functionality like the whole dogpile.cache example work. 

Otherwise, I'm not sure how pickling relates to the above code sample 
where ".append()" is being explicitly called."

Sorry, that was not what I meant. Until now, each object in memory in the program is stored with Pickle.

Let's take an example :

In my program, each instance of the class "Room" contains the list of the players located in this particular Room. But each instance of the class "Player" also contains the room where the player is.

Here is a code to illustrate :

class Player():

 
def __init__(self, name):
     
self.name = name
     
self.room = None

 
def set_room(self, room):
     
self.room = room

 
def __repr__(self):
     
return self.name

class Room():

 
def __init__(self, name):
     
self.name = name
     
self.players = []

 
def add_player(self, player):
     
self.players.append(player)

 
def __repr__(self):
     
return self.name

room1
= Room("room1")

player1
= Player("player1")
player2
= Player("player2")

# The room is stored in the player instances :
player1
.set_room(room1)
player2
.set_room(room1)

# And the players are also stored in the room instance :
room1
.add_player(player1)
room1
.add_player(player2)

print(room1.players)
print(player1.room)
print(player2.room)


Result :

>>> 
[player1, player2]
room1
room1
>>>

And I have this kind of relationships almost everywhere.

For now, these objects are stored with Pickle. With Pickle, the relationships between classes doesn't matter. It just store the objects in a file as they are and Unpickle allow us to retrieve everything with just a few lines of code.

But now, we are trying to change the save system and to use only SQLAlchemy. And I don't know exactly how I am supposed to configure my relationships because if backref or back_populates are used, the statement "player1.set_room(room1)" will populate room1.players and player1.room immediately. And the statement "room1.add_player(player1)" will do the same thing and I will have duplicates.

Of course, in this specific example, I could use a set, but it's just a way to illustrate the problem that I will meet multiple times in different situations and with different collections.

In this specific example, I could also just delete one of the statement, but :
- the two statements are sometimes in different classes and different files. Is it not dirty to decide that one of the statements will be deleted, letting the other do all the work and populate the two attributes ?
- it could be a lot of work because for most of relationships, I will have to determine where these statements are, why and when there are used, and think of a cleaner way to organize my code according to SQLAlchemy principles (just one modification start automatically the modification of the linked attribute in the other side). And there is a lot of classes...

I think you are right with the fact that it would be maybe too complicated and a bit useless to use SQLAlchemy and still try to keep the model absolutly separated and independant from it.

"You can go this road for one-to-many and many-to-one although this is 
not a well-traveled use case. 

For a many-to-many with "secondary", there can still be some conflicts 
because the unit of work uses "backref" to look at the fact that the 
two sides represent the same collection to resolve duplicate events 
that occur on both sides."

So, I suppose that it is probably a bad idea to avoid to use backref and back_populates if it is not a well-traveled use case and if there can be some conflicts in many-to-many relations. So, the cleaner way to proceed would be to adapt all the classes and always keep only one modification statement per bidirectional relationship ?

Mike Bayer

unread,
Nov 8, 2017, 1:34:57 PM11/8/17
to sqlal...@googlegroups.com
How many collection types do you have?   All of the collections used by relationship () can support deduplication.   Using a list class with an idempotent append() method is not a big deal.   If python had better collection classes like native ordered sets that act like lists we'd probably not use list at all anymore.





In this specific example, I could also just delete one of the statement, but :
- the two statements are sometimes in different classes and different files. Is it not dirty to decide that one of the statements will be deleted, letting the other do all the work and populate the two attributes ?
- it could be a lot of work because for most of relationships, I will have to determine where these statements are, why and when there are used, and think of a cleaner way to organize my code according to SQLAlchemy principles (just one modification start automatically the modification of the linked attribute in the other side). And there is a lot of classes...

I think you are right with the fact that it would be maybe too complicated and a bit useless to use SQLAlchemy and still try to keep the model absolutly separated and independant from it.

"You can go this road for one-to-many and many-to-one although this is 
not a well-traveled use case. 

For a many-to-many with "secondary", there can still be some conflicts 
because the unit of work uses "backref" to look at the fact that the 
two sides represent the same collection to resolve duplicate events 
that occur on both sides."

So, I suppose that it is probably a bad idea to avoid to use backref and back_populates if it is not a well-traveled use case and if there can be some conflicts in many-to-many relations. So, the cleaner way to proceed would be to adapt all the classes and always keep only one modification statement per bidirectional relationship ?

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
http://www.sqlalchemy.org/
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscribe@googlegroups.com.

Sven

unread,
Nov 8, 2017, 1:58:19 PM11/8/17
to sqlalchemy
What do you mean by "deduplication" ?

I have certainly just a few exotic different type of collection. The others are standard (lists, dictionaries, ordereddict, etc), but I don't understand why you are asking that :p

Are you asking that because you think that the solution would be to always use collections that can't contain duplicate (like set) ? So for example, I could make a list class and just overide the append method to avoid duplicate ?

Mike Bayer

unread,
Nov 8, 2017, 2:48:21 PM11/8/17
to sqlal...@googlegroups.com
On Wed, Nov 8, 2017 at 1:58 PM, Sven <sven....@gmail.com> wrote:
> What do you mean by "deduplication" ?


it means if you have a collection :

[a, b]

and you try to add "a" again:

collection.append(a)

you get:

[a, b]

that is, "append" is idempotent given a particular object identity.



>
> I have certainly just a few exotic different type of collection. The others
> are standard (lists, dictionaries, ordereddict, etc), but I don't understand
> why you are asking that :p

all of those except "list" feature idempotent additions already.

>
> Are you asking that because you think that the solution would be to always
> use collections that can't contain duplicate (like set) ? So for example, I
> could make a list class and just overide the append method to avoid
> duplicate ?

yes. Also, with some time to think about my answers, I can give
you a better explanation why you not only *need* to use backrefs but
you also need deduplicating collections.

your use case is:

child.parent = p
parent.children.append(p)

your issue is that with backrefs, the append() adds "p" twice. So
let's assume that doesn't happen. Now lets add the context that your
two operations are far apart. If we perhaps had this:

child.parent = p

# things happen.... such as:

session.commit()

# elsewhere in the code

parent.children.append(p)

Above, we again would have "p" in parent.children twice, if
client-side deduplication is not present. This because the relational
database, unlike a pickle file, *does not have uni-directional
associations*. Unless there is a much more complicated table
structure under the hood, it is not possible to have "child.parent ==
p" and "p not in parent.children". That's why the backref was
created in the first place, so that an object relationship can behave
like it would after it's flushed, without needing to insert the row
and re-select it immediately.

Additionally, it is not possible for parent.children to contain the
"p" instance more than once in the database itself, again unless there
is a much more complicated table structure taking place behind the
scenes. If you appended "p" five times to parent.children ,
persisted it, and then reloaded fresh, if this is standard
one-to-many, you only get the "p" identity once.

Both of these rules have to do with the basic geometry of how row
associations work in relational databases, that is, they are based on
the concept of the foreign key. Except in the case of many-to-many,
there is no separate "collection" table present that would indicate a
unidirectional association or multiple associations of the same
object.

The fact that relational databases don't have unidirectional or
multiple associations, whereas in a "serialized object file" or object
database these are the default, is one reason why it's not practical
to assume the object model can be completely separated from its
persistence style.


>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.

Sven

unread,
Nov 9, 2017, 3:17:23 AM11/9/17
to sqlalchemy
Thank you for your answers and your explanations ! :-)

It is clear to me, now and I will work on that.
Reply all
Reply to author
Forward
0 new messages