many-to-many relationship : how to update items properly if there are duplicates entries ?

2,002 views
Skip to first unread message

yoch....@gmail.com

unread,
May 22, 2017, 9:59:47 AM5/22/17
to sqlalchemy
Hi,

I'm trying to create a tag system with a many-to-many relationship approach, and I have problems with the updating phase.

class Post(Base):
    __tablename__
= 'post'

    id
= Column(Integer, primary_key=True)
    _tags
= relationship('Tag', secondary='post_tags')

   
@property
   
def tags(self):
       
return ';'.join(tag.key for tag in self._tags)

   
@tags.setter
   
def tags(self, s):
        lst
= [Tag(key=tag) for tag in s.split(';')]
       
self._tags = lst

class Tag(Base):
    __tablename__
= 'tag'

    key
= Column(String(40), primary_key=True)
    count
= Column(Integer)


And the `post_tags` is defined as :

CREATE TABLE `post_tags` (
 
`idpost` INT(10) UNSIGNED NOT NULL,
 
`tag` VARCHAR(40) NOT NULL,
 PRIMARY KEY
(`idpost`, `tag`),
 INDEX
`FK_post_tags` (`tag`),
 FOREIGN KEY
(`tag`) REFERENCES `tag` (`key`),
 FOREIGN KEY
(`idpost`) REFERENCES `post` (`id`) ON DELETE CASCADE
);


The problem is that when adding tags that already exists in the `tag` table

post.tags = 'a'  # tag 'a' already created

it produces this exception : sqlalchemy.orm.exc.FlushError: New instance <Tag at 0x7f7b56a3a940> with identity key (<class 'test_mm.Tag'>, ('a',)) conflicts with persistent instance <Tag at 0x7f7b567af7f0>


Ideally, I'd like to be able to produce a query like (MySQL) :

INSERT INTO tag (`key`, `count`) VALUES (%s, 1) ON DUPLICATE KEY UPDATE count = count + 1

Instead that, the way I found to do seems much less efficient :

@tags.setter
def tags(self, s):
    sess
= object_session(self)
    lst
= [sess.merge(Tag(key=tag)) for tag in s.split(';')]
   
self._tags = lst

(and the counter is managed by a trigger on `post_tags` INSERT)

Is there another way to do that properly and efficiently ?

mike bayer

unread,
May 23, 2017, 12:48:11 PM5/23/17
to sqlal...@googlegroups.com
we don't have ON DUPLICATE KEY UPDATE for MySQL as of yet however there
is a PR that I will attend to at some point for possible 1.2 inclusion.

However, note you can just as well just emit the SQL string for ON
DUPLCIATE KEY UPDATE if you're only targeting MySQL. Then just create
the object you need that corresponds to this row, then use
make_transient_to_detached() -> session.add() to make it happen (see
example below).

Theoretically, the functionality of session.merge() could provide an
alternate form that makes use of ON DUPLICATE KEY UPDATE automatically
however for the foreseeable future, you'd need to roll this yourself
once you make use of the SQL statement.

The traditional way to handle this is largely like what you have except
that client-side caching is used to make it more efficient. One
example is
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject,
which caches per single object. Other ways include just selecting a
whole batch of objects based on keys you know you will be working with
up front.


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

post_tags = Table(
"post_tags", Base.metadata,
Column("postid", ForeignKey('post.id'), primary_key=True),
Column("tagid", ForeignKey('tag.key'), primary_key=True)
)


class Post(Base):
__tablename__ = 'post'

id = Column(Integer, primary_key=True)
_tags = relationship('Tag', secondary='post_tags')

@property
def tags(self):
return ';'.join(tag.key for tag in self._tags)

@tags.setter
def tags(self, s):
lst = [Tag(key=tag) for tag in s.split(';')]
self._tags = lst


class Tag(Base):
__tablename__ = 'tag'

key = Column(String(40), primary_key=True)
count = Column(Integer)

e = create_engine("mysql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)
s = Session(e)


def make_tag(name):
s.execute(
"INSERT INTO tag (`key`, `count`) VALUES (:name, 1) "
"ON DUPLICATE KEY UPDATE count = count + 1",
{"name": name}
)
tag_obj = Tag(key=name)
make_transient_to_detached(tag_obj)
return s.merge(tag_obj, load=False)


s.add(Post(_tags=[make_tag("tag_a")]))
s.commit()

s.add(Post(_tags=[make_tag("tag_a")]))
s.commit()

assert s.query(Tag).count() == 1



On 05/22/2017 09:59 AM, yoch....@gmail.com wrote:
> Hi,
>
> I'm trying to create a tag system with a many-to-many relationship
> approach, and I have problems with the updating phase.
>
> |
> classPost(Base):
> __tablename__ ='post'
>
> id =Column(Integer,primary_key=True)
> _tags =relationship('Tag',secondary='post_tags')
>
> @property
> deftags(self):
> return';'.join(tag.key fortag inself._tags)
>
> @tags.setter
> deftags(self,s):
> lst =[Tag(key=tag)fortag ins.split(';')]
> self._tags =lst
>
> classTag(Base):
> __tablename__ ='tag'
>
> key =Column(String(40),primary_key=True)
> count =Column(Integer)
> |
>
>
> And the `post_tags` is defined as :
>
> |
> CREATE TABLE `post_tags`(
> `idpost`INT(10)UNSIGNED NOT NULL,
> `tag`VARCHAR(40)NOT NULL,
> PRIMARY KEY (`idpost`,`tag`),
> INDEX `FK_post_tags`(`tag`),
> FOREIGN KEY (`tag`)REFERENCES `tag`(`key`),
> FOREIGN KEY (`idpost`)REFERENCES `post`(`id`)ON DELETE CASCADE
> );
> |
>
>
> The problem is that when adding tags that already exists in the `tag` table
>
> |
> post.tags ='a'# tag 'a' already created
> |
>
> it produces this exception : sqlalchemy.orm.exc.FlushError: New instance
> <Tag at 0x7f7b56a3a940> with identity key (<class 'test_mm.Tag'>,
> ('a',)) conflicts with persistent instance <Tag at 0x7f7b567af7f0>
>
>
> Ideally, I'd like to be able to produce a query like (MySQL) :
>
> |
> INSERT INTO tag (`key`,`count`)VALUES (%s,1)ON DUPLICATE KEY UPDATE
> count =count +1
> |
>
> Instead that, the way I found to do seems much less efficient :
>
> |
> @tags.setter
> deftags(self,s):
> sess =object_session(self)
> lst =[sess.merge(Tag(key=tag))fortag ins.split(';')]
> self._tags =lst
> |
>
> (and the counter is managed by a trigger on `post_tags` INSERT)
>
> Is there another way to do that properly and efficiently ?
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

yoch....@gmail.com

unread,
May 23, 2017, 2:41:51 PM5/23/17
to sqlalchemy
Thank you Mike for this detailled response !

The UniqueObject recipe is very interesting, but not very suitable to my case because my webservice don't use the same session on each Post insertion.

Maybe a session.merge_all() method can help to improve performance in such cases by grouping the underlying SQL queries (only one SELECT and one INSERT) for all instances.

Reply all
Reply to author
Forward
0 new messages