Grouping related Columns in table definitions and Python object reconstruction

gamcil

unread,

Jul 1, 2019, 5:56:16 AM7/1/19

to sqlalchemy

Hi,

I'm new to SQLAlchemy/ORMs - previously I had just been interacting directly with the builtin SQLite driver and had built up my own mess of a mapping system.

1) I'm starting to convert some classes in a Python package to SQLAlchemy models. . A paired down example of what I'm doing:

class Gene:
    def __init__(self, identifiers, features):
        self.identifiers = identifiers
        self.features = features

gene = Gene(identifiers={'locus': 'GENE_0001', 'protein_id': 'PROT_0001'},
            features={'mRNA': '1..300,400..500', 'CDS': '1..300,400..500'})

In reality, I'm dealing with many attributes that are all related to the Gene and stored in the same table, which is why to simplify the interface of my classes, I grouped related instance attributes in dictionaries. For example, the location of the mRNA feature of this Gene can then be accessed by gene.features['mRNA']. Each value in the dictionary refers to a unique database column.

However, when creating SQLAlchemy models, it's required to explicitly map Columns directly as class attributes, e.g.:

class Gene:
    id = Column(Integer, primary_key=True)
    locus = Column(String)
    protein_id = Column(String)
    mRNA = Column(String)
    CDS = Column(String)

Is there a simple way to provide the dictionary functionality? Something like:

class Gene:
    id = Column(Integer, primary_key=True)
    identifiers = {
        'locus': Column(String),
        'protein': Column(String)
    }
    ...

Where a change in Gene.identifiers['locus'] would then result in an update to the 'locus' column. I saw MutableDict in the documentation but it seems non-trivial to implement for something like this.
Is this even a good idea?

2) Say for the example above I have created a plain Python Location class to represent a location of a feature in self.features:

class Location:
    def __init__(self, intervals, ...):
        self.intervals = intervals
        ...

    @classmethod
    def from_flat(cls, flat):
        # convert flat location to list of tuples
        # e.g. [(1, 300), (400, 500)]

This class has a classmethod that takes the stored flat string (e.g. 1..300,400..500) and converts it to a list of interval tuples, and has some extra logic in it for use in my applications. I understand I can use orm.reconstructor to instantiate these Location objects from the stored attributes upon object loading, i.e.

@reconstructor
def init_on_load(self):
    self.mRNA = Location.from_flat(self.mRNA)
    ...

However, what I don't get is how I should be shuttling changes in these objects back to the database as updates.

Is there some way to map this (instantiate Location on loading; mutate Location; convert Location back to flat format to update the Gene attribute)? Do i need to write setter methods for each that directly set the underlying column attribute to the flat representation of the Location?

Any help is appreciated, cheers

Mike Bayer

unread,

Jul 1, 2019, 11:25:38 AM7/1/19

to noreply-spamdigest via sqlalchemy

I'm going to focus on #1 here, so, this kind of thing is pretty straightforward in Python, throughout SQLAlchemy's examples / extensions there are all kinds of "magic dictionary" kinds of examples going on, and this one is...another one :) To give a high level for this kind of thing, as far as setting up mappings / tables, that can always be done "magically", that is, you can write event hooks to read any kind of configuration you want that will set up the mapper() / Table with whatever columns are needed. For the "get / set" a dictionary part, we always use a Python descriptor for that kind of thing, so in this case I'd likely be building a custom kind of descriptor class that also serves as the information to generate the table definition. Finally, you want the dictionary to be "live", that is, not get /set of the whole dictionary, you want changes in the dictionary to affect the model. So for the last part, yes there needs to be a custom dictionary object of some kind. It would use the same Python techniques as MutableDict but it likely would be specific to the case here, and overall this case is not that "hard". In this case I would create a Python object that is not in fact a "dict", but just an object that has as dict-like interface, e.g. __getitem__(), __setitem__(), keys(), etc., and then it just uses the object that it comes from as the source of data, like:

class GeneView:

def __init__(self, gene, attrs):

self.gene = gene

self.attrs = attrs

def keys(self):

return iter(self.attrs)

def items(self):

return ((attr, getattr(self.parent, attr)) for attr in self.attrs)

def __getitem__(self, key):

return getattr(self.parent, key) # maybe you want to check self.attrs too

def __setitem__(self, key, value):

setattr(self.parent, key, value) # mutability

The above thing comes from a descriptor, a hardcoded version would look like:

class Gene(...):

@property

def identifiers(self):

return GeneView(self, ['locus', 'protein'])

I'd then want to make the above "hardcoded" pattern more generic making a custom descriptor class (https://docs.python.org/3/howto/descriptor.html#descriptor-example) so that the mapping looks like:

class Gene(...):

identifiers = gene_view_attr('locus', 'protein')

to map it, a simple event hook like before_mapper_configured (or declararive's declare_first / declare_last hooks: https://docs.sqlalchemy.org/en/13/orm/extensions/declarative/api.html?highlight=declare_last#declare-last) can be used, in this case it probably doesn't matter where in the mapping phase we are since columns can be added at any time:

from sqlalchemy import event

@event.listens_for(Base, "before_mapper_configured", propagate=True)

def setup_view_attrs(mapper, class_):

for value in class_.__dict__.values():

if isinstance(value, gene_view_attr): # look for our descriptor

for attr in value.attrs:

setattr(class_, attr, Column(attr, String)) # map columns

That's all you need for #1, let me know if you need more detail.

2) Say for the example above I have created a plain Python Location class to represent a location of a feature in self.features:
class Location:
    def __init__(self, intervals, ...):
        self.intervals = intervals
        ...     @classmethod     def from_flat(cls, flat):         # convert flat location to list of tuples         # e.g. [(1, 300), (400, 500)]

This class has a classmethod that takes the stored flat string (e.g. 1..300,400..500) and converts it to a list of interval tuples, and has some extra logic in it for use in my applications. I understand I can use orm.reconstructor to instantiate these Location objects from the stored attributes upon object loading, i.e.
@reconstructor
def init_on_load(self):
    self.mRNA = Location.from_flat(self.mRNA)
    ...

However, what I don't get is how I should be shuttling changes in these objects back to the database as updates.
Is there some way to map this (instantiate Location on loading; mutate Location; convert Location back to flat format to update the Gene attribute)? Do i need to write setter methods for each that directly set the underlying column attribute to the flat representation of the Location?

So I usually use descriptors for this kind of thing too, like above. There is also the possibility of using custom datatypes that deal with the tuples and convert to strings for the database side. If your application never wants to see the "string" form and just wants to see tuples, you can make a TypeDecorator that does this, see the examples at https://docs.sqlalchemy.org/en/13/core/custom_types.html#typedecorator-recipes . Then again if you are going to use the dictionary approach above for these attributes, you can build the coercion between string / tuple into the GeneView class directly. I'd probably do it that way in this case assuming that's the place you want to see the tuples.

Any help is appreciated, cheers

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/f3105bee-b6dc-489c-82e0-c6d36710a462%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gamcil

unread,

Jul 2, 2019, 10:46:14 PM7/2/19

to sqlalchemy

Awesome thanks for the extensive reply.

This is the first time I've played with descriptor classes, so I'm sorry if this is really basic stuff.

I set up my view class:

class View:


    def __init__(self, gene, attrs):
        self.gene = gene
        self.attrs = attrs

    def keys(self):
        return iter(self.attrs)

    def items(self):


        return ((attr, getattr(self.gene, attr)) for attr in self.attrs)

    def __getitem__(self, key):
        return getattr(self.gene, key)

    def __setitem__(self, key, value):
        setattr(self.gene, key, value)

and a descriptor class:

class Descriptor:
    def __init__(self, *attrs):
        self.attrs = attrs

    def __get__(self, instance, owner):
        return View(instance, self.attrs)

    def __set__(self, instance, value):
        self.attrs = value

then the Gene class using __declare_first__ becomes:

class Gene(Base):
    __tablename__ = 'Gene'
    id = Column(Integer, primary_key=True)
    identifiers = Descriptor('locus', 'protein')
    features = Descriptor('gene', 'mRNA', 'CDS')

    def __init__(self, **kwargs):
        for dic in kwargs.values():
            for key, value in dic.items():
                setattr(self, key, value)

    @classmethod
    def __declare_first__(cls):
        for value in list(cls.__dict__.values()).copy():
            if isinstance(value, Descriptor):
                for attr in value.attrs:
                    setattr(cls, key, Column(attr, String))

This seems to give me close to the interface I would like. A couple of questions:

1) So I guess this is still just directly mapping columns to Class attributes, accessable by directly getting an attribute, i.e. gene.features['mRNA'] is equivalent to gene.mRNA. Would you then mask the class attributes by e.g. prepending with underscore?

2) Currently, every time a method is called on the descriptor, a new instance of the View class is returned. Is that by design, or is there a way of persisting the one View per attribute (identifiers, features, ...) for each Gene instance? Is that even an overhead I should be worried about? It seems like the equivalent of creating a new dictionary every time I want to access attributes on the Gene object.

3) When I call Base.metadata.create_all(bind=engine) without first creating a Gene instance, the generated SQL is:

CREATE TABLE "gene" (
    id INTEGER NOT NULL,
    PRIMARY KEY (id)
)

missing all of the other mappings. After calling e.g. gene = Gene(), it's as expected:

CREATE TABLE "gene" (
        id INTEGER NOT NULL,
        locus VARCHAR,
        protein VARCHAR,
        gene VARCHAR,
        "mRNA" VARCHAR,
        "CDS" VARCHAR,
        PRIMARY KEY (id)
)

When I'm first initialising the database, when should I be calling create_all? Do I need to actually create objects before I create the schema and initialise the database?

Thanks again

To unsubscribe from this group and stop receiving emails from it, send an email to sqlal...@googlegroups.com.

Simon King

unread,

Jul 3, 2019, 4:47:08 AM7/3/19

to sqlal...@googlegroups.com

That's entirely up to you. It's not *necessary* (no harm will be done
if you access the data through both mechanisms, since the value is
only stored in a single place), so it's just an aesthetic choice.

>
> 2) Currently, every time a method is called on the descriptor, a new instance of the View class is returned. Is that by design, or is there a way of persisting the one View per attribute (identifiers, features, ...) for each Gene instance? Is that even an overhead I should be worried about? It seems like the equivalent of creating a new dictionary every time I want to access attributes on the Gene object.

This is probably a case of "don't bother optimising until you know
it's a problem", but in this case the optimisation is simple. You
could change your Descriptor.__get__ method to store the View instance
on the object itself, perhaps under a name derived from the list of
attributes. For example:

def __get__(self, instance, owner):
cachename = '_descriptor_' + '_'.join(self.attrs)
result = getattr(instance, cachename, None)
if result is None:
result = View(instance, self.attrs)
setattr(instance, cachename, result)
return result

>
> 3) When I call Base.metadata.create_all(bind=engine) without first creating a Gene instance, the generated SQL is:
> CREATE TABLE "gene" (
> id INTEGER NOT NULL,
> PRIMARY KEY (id)
> )
>
> missing all of the other mappings. After calling e.g. gene = Gene(), it's as expected:
> CREATE TABLE "gene" (
> id INTEGER NOT NULL,
> locus VARCHAR,
> protein VARCHAR,
> gene VARCHAR,
> "mRNA" VARCHAR,
> "CDS" VARCHAR,
> PRIMARY KEY (id)
> )
>

This seems like a bug; I would have expected the __declare_first__
method to be called automatically when you call create_all.

As a workaround, does it make any difference if you add a call to
sqlalchemy.orm.configure_mappers() before your call to create_all?

Simon

Mike Bayer

unread,

Jul 3, 2019, 10:28:42 AM7/3/19

to noreply-spamdigest via sqlalchemy

oh, whoops sorry it does not :) just call configure_mappers() :)

Simon

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAFHwexdP95%2BWy5-8OjqhoQcYcs8jRT9RFp-G-SCY9fwwO5UoHg%40mail.gmail.com.

Reply all

Reply to author

Forward