Primaire keys mandatory sortability

15 views
Skip to first unread message

Nico C.

unread,
Jun 24, 2019, 6:12:45 AM6/24/19
to sqlalchemy
Hi there,

Where I work, in our data model, we'd have several the opportunity to have primary keys defined has a composite of something + an enum value. Though because sqlalchemy requises
that primary keys can be sorted, it is not possible to define such primary keys.

Indeed, in Python, enums are not sortablle. Which is fine by us and, though we could implemented some kind of ordre, to keep the behavior of enum types consistent between those use in a table and those that are not, we'd rater not do that. And yes, some might argue that this is a de-normalization and hence is evil, but we find it handy enough to accept corrupting our souls...

If we do define such primary keys we end-up with the following error:


[2019-01-29T18:56:24.936Z app.py:35986 MainProcess session.py:400 ] ERR APIException during a database transaction: . The database transaction will be roll backed
Traceback (most recent call last):
File "~/.repositories/project/static_api/repository/zone.py", line 175, in delete_zone
zone_to_delete.delete()
File "~/.repositories/project/models/zone.py", line 1851, in delete
super(Zone, self).delete()
File "~/.repositories/project/orm/declarative_base.py", line 325, in delete
return get_session().delete(self, **kwargs)
File "~/.repositories/project/orm/session.py", line 115, in delete
self.flush()
File "~/.repositories/project/orm/utils.py", line 289, in retry_on_deadlock_decorator
return wrapped(*args, **kwargs)
File "~/.repositories/project/orm/session.py", line 188, in flush
return super(EtlSession, self).flush(objects)
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/session.py", line 2139, in flush
self._flush(objects)
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/session.py", line 2259, in _flush
transaction.rollback(_capture_exception=True)
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/session.py", line 2223, in _flush
flush_context.execute()
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/unitofwork.py", line 389, in execute
rec.execute(self)
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/unitofwork.py", line 577, in execute
uow
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 243, in delete_obj
uowtransaction))
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 357, in _organize_states_for_delete
states):
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 1108, in _connections_for_states
for state in _sort_states(states):
File "~/.virtualenvs/project-py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 1130, in _sort_states
sorted(persistent, key=lambda q: q.key[1])
TypeError: unorderable types: _EnumSymbol() < _EnumSymbol()


This is not à New issue, see reference below. It is mentioned there that this may be made optional. So what would be the status of this ? I understand this is not an arbitrary choice (a post mentions preventing deadlocks). What would be the trade offs of such an options ?

https://groups.google.com/forum/#!searchin/sqlalchemy/primary$20key$20sortable%7Csort:date/sqlalchemy/mWbr-Tw4wvU/QXFYH2rRgFsJ

Thanks for enlightening us.

Mike Bayer

unread,
Jun 24, 2019, 11:30:22 AM6/24/19
to sqlal...@googlegroups.com


On Mon, Jun 24, 2019, at 6:12 AM, Nico C. wrote:
Hi there,

Where I work, in our data model, we'd have several the opportunity to have primary keys defined has a composite of something + an enum value. Though because sqlalchemy requises
that primary keys can be sorted, it is not possible to define such primary keys.

Indeed, in Python, enums are not sortablle. Which is fine by us and, though we could implemented some kind of ordre, to keep the behavior of enum types consistent between those use in a table and those that are not, we'd rater not do that. And yes, some might argue that this is a de-normalization and hence is evil, but we find it handy enough to accept  corrupting our souls... 

If you aren't worried about  a lack of purity on the SQL side, then why can't you also just add a __lt__() to your Enum object?    That would allow your program to work right now and this is what you'd need to do for now.


This is not à New issue, see reference below. It is mentioned there that this may be made optional. So what would be the status of this ? I understand this is not an arbitrary choice (a post mentions preventing deadlocks). What would be the trade offs of such an options ?

The official issue for Enum as primary key without the need to apply an __lt__() method to the Enum is at:


Don't worry that it's closed at the moment, feel free to reopen this issue if you are motivated to provide a pull request with tests as it's labeled that motivated volunteers are requested.  

The description of the feature is:

a pull request that includes well written tests may be accepted. PR would add a new method TypeEngine.value_sortkey(value), and then the persistence._sort_states() method would call upon this function to work on top of the values pulled from the "persistent" set, and the Enum datatype would likely allow a new callable to be passed that implements this (or it somehow knows how to build one from a pep-435 style collection).

let me know if this is important enough that you can donate some programming time for it and I will help you every step of the way to implement.




-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nico C.

unread,
Jun 24, 2019, 1:24:50 PM6/24/19
to sqlalchemy
One of the many reasons not to implement __lt__, is that the types are used with different databases. Those have had very different stories, especially enum labels may not have been added in the same order. Hence, defining and order may lead people to rely on it. But, on either side (postgresql or python) that may yield different behavior depending on the underlying db.

We could sort this out (but may not really be worth the pain, on a system that is in production 24/7). But we also have had bugs in the past, that boiled down to ill defined or non-sensical orders (e.g. inadvertently sorting objects based on their memory addresses [thank Python3 for fixing that]) That can work for suprisingly long periods of time, until it doesn't. So, though it may sound a bit too phisolophical or just silly, I'd rather not implement orders that don't exist. As Python3 chose to do (e.g. objects no longer have an implicit order)

Our use case is more about partitioning tables. The fact that an entity belongs to one partition or another doesn't make the entity greater or lower than any other.

Not being able to add the enum in the primary key often means we have to handle unique constraints manually, add surrogate keys, which are on some large tables uselessly space and time consuming.

I've read the issue. Our code base having been started quite a while back, we're using your enum recipe. We have implemented a few other behavior on top of that, that has delayed our migration to plain Python enums. Not sure if it would be worth doing in that setting or be a valid motivation to do the change. I may have some time, later in the summer (end of July, August). No promises.

Anyhow, thank you for your answer.

Regards,
Nicolas.

Jonathan Vanasco

unread,
Jun 24, 2019, 3:24:43 PM6/24/19
to sqlalchemy
I am not sure if this helps or not, but the Primary key in SqlAlchemy doesn't need to be the same as the Primary key in the database.  You may be able to work around the constraints of your system by divorcing the two concepts from one another (I have, but not in an example I can easily adapt to your case).  

Mike Bayer

unread,
Jun 24, 2019, 3:43:02 PM6/24/19
to sqlal...@googlegroups.com


On Mon, Jun 24, 2019, at 1:24 PM, Nico C. wrote:
One of the many reasons not to implement __lt__, is that the types are used with different databases. Those have had very different stories, especially enum labels may not have been added in the same order. Hence, defining and order may lead people to rely on it. But, on either side (postgresql or python) that may yield different behavior depending on the underlying db.

OK given PostgreSQL's exception case that ENUMs are sorted in creation order, which is not what any other database does, this is a point for purity.    But if you want the ORM to not sort UPDATEs at all, then you're going to open yourself up to deadlocks.     So it has to sort by something and it does not need to match what the database would sort as since that's not the purpose of this particular sort.


We could sort this out (but may not really be worth the pain, on a system that is in production 24/7). But we also have had bugs in the past, that boiled down to ill defined or non-sensical orders (e.g. inadvertently sorting objects based on their memory addresses [thank Python3 for fixing that]) That can work for suprisingly long periods of time, until it doesn't. So, though it may sound a bit too phisolophical or just silly, I'd rather not implement orders that don't exist. As Python3 chose to do (e.g. objects no longer have an implicit order)

there is an ordering that exists though, it's just that due to PostgreSQL it has to pull in a server-side algorithm to do it which I am not proposing we attempt to implement.



Our use case is more about partitioning tables. The fact that an entity belongs to one partition or another doesn't make the entity greater or lower than any other.

Not being able to add the enum in the primary key often means we have to handle unique constraints manually, add surrogate keys, which are on some large tables uselessly space and time consuming.

If you are using an ENUM as a primary key then that suggests there are a limited number of rows here, if they are in fact UPDATED infrequently you can still use these objects as ORM objects, you'd just need to use the bulk update interface until the SQLAlchemy feature can be implemented.


I've read the issue. Our code base having been started quite a while back, we're using your enum recipe. We have implemented a few other behavior on top of that, that has delayed our migration to plain Python enums. Not sure if it would be worth doing in that setting or be a valid motivation to do the change. I may have some time, later in the summer (end of July, August). No promises.

You've reminded me that PG enums sort by insert order and not string name so I continue to favor a feature that detaches the sorting from the Enum value itself.

The feature is implemented in the comment at https://github.com/sqlalchemy/sqlalchemy/issues/4285#issuecomment-505144511 .    It needs one additional test in test/sql/test_types.py -> EnumTest to verify the Enum side of the feature is working.   Please add this test, review the code I've added, make sure the patch works for your use case, and send a PR, thanks.





Anyhow, thank you for your answer.

Regards,
Nicolas. 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages