PickleType collections of rows?

331 views
Skip to first unread message

Jacob Biesinger

unread,
Aug 27, 2012, 3:50:08 PM8/27/12
to sqlal...@googlegroups.com
Hi all,


I'm converting a library to use SA as the datastore. I like the flexibility of the PickleType column, but it doesn't seem to work well when pickling SA objects (table rows).  Even if I overload setstate and getstate to do a query + session merge when unpickling, there's no referential integrity across that pickle boundary.  That means that I can't query collections of objects.

class Bar(Base):
    id = Column(Integer, primary_key=True)
    __tablename__ = 'bars'
    foo_id = Column(Integer, ForeignKey('foos.id'), primary_key=True)

class Foo(Base):
    __tablename__ = 'foos'
    values = Column(PickleType)
    #values = relationship(Bar)  # list interface (one->many), but can't assign a scalar or use a dictionary
    def __init__(self):
        self.values = [Bar(), Bar()]

        # only allowed with PickleType column
        #self.values = Bar()
        #self.values = {'one' : Bar()}
        #self.values = [ [Bar(), Bar()], [Bar(), Bar()]]

# get all Foo's with a Bar whose id=1
session.query(Foo).filter(Foo.values.any(Bar.id == 1)).all()

One workaround would be to implement my own mutable object type as is done here: https://github.com/ccat/sqlalchemy_examples/blob/master/mutable_example/mutable_alwayUpdate.py#L51 and to have some kind of flattening scheme which traverses the collections and appends them to a simpler one->many relationship.  Perhaps the flattened list might have to be weakrefs to the pickled collection's objects?  Tracking changes and references sounds like no fun.  Any advice?

Michael Bayer

unread,
Aug 27, 2012, 3:54:08 PM8/27/12
to sqlal...@googlegroups.com
The PickleType is really a hacky way around edge cases where you have some arbitrary object you'd just like to shove away.     It's a given that when you use PickleType, you're giving up any relational advantages, including being able to filter/query on them, etc.   

So putting an ORM mapped object in a Pickle is basically a terrible idea.

If you want a collection of scalar values, use traditional mappings and relationship() in combination with association_proxy.  See http://docs.sqlalchemy.org/en/rel_0_7/orm/extensions/associationproxy.html#simplifying-scalar-collections .


Michael Bayer

unread,
Aug 27, 2012, 3:56:05 PM8/27/12
to sqlal...@googlegroups.com
re: "or dictionaries".  Use attribute_mapped_collection:  http://docs.sqlalchemy.org/en/rel_0_7/orm/collections.html#dictionary-collections

"dictionaries plus scalars": combine both attribute_mapped_collection and association_proxy: http://docs.sqlalchemy.org/en/rel_0_7/orm/extensions/associationproxy.html#proxying-to-dictionary-based-collections


Jacob Biesinger

unread,
Aug 27, 2012, 4:16:01 PM8/27/12
to sqlal...@googlegroups.com
Point taken. 

If you want a collection of scalar values, use traditional mappings and relationship() in combination with association_proxy.  See http://docs.sqlalchemy.org/en/rel_0_7/orm/extensions/associationproxy.html#simplifying-scalar-collections .

re: "or dictionaries".  Use attribute_mapped_collection:  http://docs.sqlalchemy.org/en/rel_0_7/orm/collections.html#dictionary-collections

"dictionaries plus scalars": combine both attribute_mapped_collection and association_proxy: http://docs.sqlalchemy.org/en/rel_0_7/orm/extensions/associationproxy.html#proxying-to-dictionary-based-collections

Thanks, I saw this example before but hadn't thought of it in this way.  I suppose the creator here could be a polymorphic vertical column as in http://www.sqlalchemy.org/trac/browser/examples/vertical/dictlike-polymorphic.py, which could act as a scalar, as a list, or as a dictionary depending on an introspected type.  Querying such a collection seems like it would be a pain though. 

 

Jacob Biesinger

unread,
Aug 27, 2012, 4:43:34 PM8/27/12
to sqlal...@googlegroups.com
Is the use case of having nested collections really all that out of mainstream?  It seems like a fairly straightforward requirement, though perhaps with a lot of boilerplate.  A simple list-of-lists seems achievable using something like:

all_lists = Table('all_lists', metadata,
                    Column('id', Integer, primary_key=True),
                    Column('type', String(32)),  # polymorphic type... one of 'list' or 'scalar'
                    Column('scalar_id', Integer, ForeignKey('scalars.id')),
                    Column('list_id', Integer, ForeignKey('all_lists.id')))

with an appropriate polymorphic identifier.  Am I going down a dead-end here?
 

Michael Bayer

unread,
Aug 27, 2012, 4:57:52 PM8/27/12
to sqlal...@googlegroups.com
On Aug 27, 2012, at 4:43 PM, Jacob Biesinger wrote:


Is the use case of having nested collections really all that out of mainstream?

I would think just about any non-trivial application winds up having "nested collections".    In relational databases this just means you have one-to-many relationships chained along.  relationship() is the SQLAlchemy mechanism to achieve one-to-many.    This is like the most prominent feature in the whole ORM so I'm not sure what's causing it to appear "non mainstream".

If you're getting at the idea of having two or three big tables that generically handle all kinds of lists and dictionaries from all over the rest of the schema, yes, I'd characterize it as common, but not necessarily a best practice in most cases.   It doesn't scale very well as those two or three giant tables grow to be huge, a single point of contention, hard to index and manipulate.   Nevertheless we support this architecture through such notions as the "polymorphic association", which you'll find in the examples with the distribution.   

The real use case for the polymorphic association in the "single collection table" form is when you need to be able to find many types of parent objects together in one query against a selection of rows in the collection table.  It usually requires using a UNION.     This use case in my opinion is extremely rare, which is why in the polymorphic examples you'll find other recipes that encourage each parent class to have its own sub-table for its own collection of things of a certain type.


 It seems like a fairly straightforward requirement, though perhaps with a lot of boilerplate.  A simple list-of-lists seems achievable using something like:

all_lists = Table('all_lists', metadata,
                    Column('id', Integer, primary_key=True),
                    Column('type', String(32)),  # polymorphic type... one of 'list' or 'scalar'
                    Column('scalar_id', Integer, ForeignKey('scalars.id')),
                    Column('list_id', Integer, ForeignKey('all_lists.id')))

with an appropriate polymorphic identifier.  Am I going down a dead-end here?
 

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Jacob Biesinger

unread,
Aug 27, 2012, 5:16:54 PM8/27/12
to sqlal...@googlegroups.com
On Mon, Aug 27, 2012 at 4:57 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:

On Aug 27, 2012, at 4:43 PM, Jacob Biesinger wrote:


Is the use case of having nested collections really all that out of mainstream?

I would think just about any non-trivial application winds up having "nested collections".    In relational databases this just means you have one-to-many relationships chained along.  relationship() is the SQLAlchemy mechanism to achieve one-to-many.    This is like the most prominent feature in the whole ORM so I'm not sure what's causing it to appear "non mainstream".

If you're getting at the idea of having two or three big tables that generically handle all kinds of lists and dictionaries from all over the rest of the schema, yes, I'd characterize it as common, but not necessarily a best practice in most cases.   It doesn't scale very well as those two or three giant tables grow to be huge, a single point of contention, hard to index and manipulate.   Nevertheless we support this architecture through such notions as the "polymorphic association", which you'll find in the examples with the distribution.   

Yes, this is the pattern I'm trying to describe though I've been thinking of these as "typed collections", smallish tables for handling a particular table's collections needs, which are all of a single type (so rows are either a collection or a FK to ONE other table).  It feels very doable, given the many examples I've seen that don't go quite this far.  If it's a common enough use case, why not have an example on the best way to do this? 

Thanks a ton for your quick responses.  I'll try hacking something together and will post here eventually.

Michael Bayer

unread,
Aug 27, 2012, 6:05:29 PM8/27/12
to sqlal...@googlegroups.com
On Aug 27, 2012, at 5:16 PM, Jacob Biesinger wrote:

On Mon, Aug 27, 2012 at 4:57 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:

On Aug 27, 2012, at 4:43 PM, Jacob Biesinger wrote:


Is the use case of having nested collections really all that out of mainstream?

I would think just about any non-trivial application winds up having "nested collections".    In relational databases this just means you have one-to-many relationships chained along.  relationship() is the SQLAlchemy mechanism to achieve one-to-many.    This is like the most prominent feature in the whole ORM so I'm not sure what's causing it to appear "non mainstream".

If you're getting at the idea of having two or three big tables that generically handle all kinds of lists and dictionaries from all over the rest of the schema, yes, I'd characterize it as common, but not necessarily a best practice in most cases.   It doesn't scale very well as those two or three giant tables grow to be huge, a single point of contention, hard to index and manipulate.   Nevertheless we support this architecture through such notions as the "polymorphic association", which you'll find in the examples with the distribution.   

Yes, this is the pattern I'm trying to describe though I've been thinking of these as "typed collections", smallish tables for handling a particular table's collections needs, which are all of a single type (so rows are either a collection or a FK to ONE other table).  It feels very doable, given the many examples I've seen that don't go quite this far.  If it's a common enough use case, why not have an example on the best way to do this? 

Uh, scroll up, I said, "which you'll find in the **examples** with the distribution".    start reading here: http://docs.sqlalchemy.org/en/rel_0_7/orm/examples.html#examples-generic-associations


Jacob Biesinger

unread,
Aug 28, 2012, 2:43:47 PM8/28/12
to sqlal...@googlegroups.com
Yes, this is the pattern I'm trying to describe though I've been thinking of these as "typed collections", smallish tables for handling a particular table's collections needs, which are all of a single type (so rows are either a collection or a FK to ONE other table).  It feels very doable, given the many examples I've seen that don't go quite this far.  If it's a common enough use case, why not have an example on the best way to do this? 

Uh, scroll up, I said, "which you'll find in the **examples** with the distribution".    start reading here: http://docs.sqlalchemy.org/en/rel_0_7/orm/examples.html#examples-generic-associations


I'm sorry to be a pest here.  I've read through all the examples and have spent several days with the documentation (which is really excellent).  I think what I'm describing is distinct from these patterns.  I'm looking for a polymorphic collection type capable of reflecting on the column (at run-time) and presenting a different ORM interface.  Extending the polymorphic vertical column example at http://www.sqlalchemy.org/trac/browser/examples/vertical shows what I'm trying to do pretty well. Again, please forgive me if I'm just being dense.

# set up declarative tables
from sqlalchemy_examples.vertical import dictlike_polymorphic as dictlike

metadata = MetaData()
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Base.metadata.bind = engine
session = Session(engine)


class AnimalFact(dictlike.PolymorphicVerticalProperty, Base):
    """key/value attribute whose value can be either a string or a list of strings"""
    __tablename__ = 'animalfacts'
    type_map = {str: ('string', 'str_value'),
                list: ('list', 'list_value')}
    id = Column(Integer, primary_key=True)
    animal_id = Column(Integer, ForeignKey('animal.id'), primary_key=True)
    key = Column(String, primary_key=True)
    type = Column(String)
    str_value = Column(String)
    list_value = relationship('StringEntry')


class Animal(Base, dictlike.VerticalPropertyDictMixin):
    __tablename__ = 'animal'
    _property_type = AnimalFact
    _property_mapping = 'facts'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    facts = relationship(AnimalFact, backref='animal',
                          collection_class=attribute_mapped_collection('key'))

    def __init__(self, name):
        self.name = name


class StringEntry(Base):
    __tablename__ = 'myvalue'
    id = Column(Integer, primary_key=True)
    animalfacts_id = Column(Integer, ForeignKey('animalfacts.id'))
    value = Column(String)

    def __init__(self, value):
        self.value = value

Base.metadata.create_all()
 

# create a new animal
a = Animal('aardvark')

# create a new AnimalFact.  This uses the list_value column, which is in turn a one-to-many relationship to StringEntry
a['eyes'] = [StringEntry('left side'), StringEntry('right side')]  # works great

# this should use the str_value column, which is a simple string
a['cute'] = 'sort of'  # fails with message "TypeError: Incompatible collection type: None is not list-like"


Thanks as always for any help you can give!  I'll update the stackoverflow question once we get this squared away.

Michael Bayer

unread,
Aug 28, 2012, 3:49:00 PM8/28/12
to sqlal...@googlegroups.com
Well, you dug into a really esoteric and complex example there.    association_proxy is a much easier way to get around these cases where you want an object to act like a scalar, so here's that, without all that crazy boilerplate of the "vertical" example, which I'd avoid as it is really too complex.   Your example seemed undecided about primary key style so I went with the composite version.    Surrogate + composite can't be mixed in a single table (well it can, but its relationally incorrect.   The key should be the smallest unit that identifies a row - http://en.wikipedia.org/wiki/Unique_key is a good top level read into various subjects regarding this).

from sqlalchemy import Integer, String, Column, create_engine, ForeignKey, ForeignKeyConstraint
from sqlalchemy.orm import relationship, Session
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy

Base = declarative_base()

class AnimalFact(Base):
    """key/value attribute whose value can be either a string or a list of strings"""
    __tablename__ = 'animalfacts'

    # use either surrogate PK id, or the composite animal_id/key - but
    # not both.   id/animal_id/key all together is not a proper key.
    # Personally I'd go for "id" here, but here's the composite version.

    animal_id = Column(Integer, ForeignKey('animal.id'), primary_key=True)
    key = Column(String, primary_key=True)

    # data
    str_value = Column(String)
    _list_value = relationship('StringEntry')

    # proxy list strings
    list_proxy = association_proxy('_list_value', 'value')

    def __init__(self, key, value):
        self.key = key
        self.value = value

    @property
    def value(self):
        if self.str_value is not None:
            return self.str_value
        else:
            return self.list_proxy

    @value.setter
    def value(self, value):
        if isinstance(value, basestring):
            self.str_value = value
        elif isinstance(value, list):
            self.list_proxy = value
        else:
            assert False

class Animal(Base):
    __tablename__ = 'animal'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    _facts = relationship(AnimalFact, backref='animal',
                          collection_class=attribute_mapped_collection('key'))
    facts = association_proxy('_facts', 'value')

    def __init__(self, name):
        self.name = name

    # dictionary interface around "facts".
    # I'd just use "animal.facts" here, but here's how to skip that.
    def __getitem__(self, key):
        return self.facts.__getitem__(key)

    def __setitem__(self, key, value):
        self.facts.__setitem__(key, value)

    def __delitem__(self, key):
        self.facts.__delitem__(key)

    def __contains__(self, key):
        return self.facts.__contains__(key)

    def keys(self):
        return self.facts.keys()


class StringEntry(Base):
    __tablename__ = 'myvalue'
    id = Column(Integer, primary_key=True)
    animal_id = Column(Integer)
    key = Column(Integer)
    value = Column(String)

    # because AnimalFact has a composite PK, we need
    # a composite FK.
    __table_args__ = (ForeignKeyConstraint(
                        ['key', 'animal_id'],
                        ['animalfacts.key', 'animalfacts.animal_id']),
                    )
    def __init__(self, value):
        self.value = value

engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

session = Session(engine)


# create a new animal
a = Animal('aardvark')

a['eyes'] = ['left side', 'right side']

a['cute'] = 'sort of'

session.add(a)
session.commit()
session.close()

for animal in session.query(Animal):
    print animal.name, ",".join(["%s" % animal[key] for key in animal.keys()])




Jacob Biesinger

unread,
Aug 29, 2012, 10:06:01 AM8/29/12
to sqlal...@googlegroups.com
On Tue, Aug 28, 2012 at 3:49 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:
Well, you dug into a really esoteric and complex example there.    association_proxy is a much easier way to get around these cases where you want an object to act like a scalar, so here's that, without all that crazy boilerplate of the "vertical" example, which I'd avoid as it is really too complex.

Thanks a million for cutting through the confusion on this.  Precisely the example I've been looking for-- a property that can behave as scalar or a collection.  If you'd like to update your answer on stackoverflow, I'll happily accept this example code.
 
 Your example seemed undecided about primary key style so I went with the composite version.    Surrogate + composite can't be mixed in a single table (well it can, but its relationally incorrect.   The key should be the smallest unit that identifies a row - http://en.wikipedia.org/wiki/Unique_key is a good top level read into various subjects regarding this).

DB design is fairly new to me-- I've implemented the DB engines, but this is my first attempt at schema design as well as my first foray into SQLAlchemy.  Your examples and the docs are very helpful.  Thanks for the fantastic package!
Reply all
Reply to author
Forward
0 new messages