Batch patch ORM entities

9 views
Skip to first unread message

Pierre Massé

unread,
Oct 11, 2023, 4:22:38 AM10/11/23
to sqlal...@googlegroups.com
Dear all,

I have a requirement that makes me think that I need to "mass patch" some ORM objects. However, I am open to any suggestions regarding the way to answer my requirements.

I have defined an ORM object which represents a user, holding longitude and latitude (among other attributes). At some point, I want to query many of those users, and send them back holding the geographical distance from a certain point (defined by longitude and latitude) along with their other data.

Computing the distance is computationally heavy, and I noticed that I could greatly improve performance by mass computing those distances, using numpy for example.

My question is: would it be possible to split my flow in 2 : 
- a flow that queries the data that is simply available in the database, as ORM entities
- a flow that queries lon/lat as a numpy array, perform the distance computation
and afterward merge those 2 in the queried ORM entities?

It is important to me that I finally get back a list of ORM entities fully populated, because my whole downstream process is built around this assumption.

Thanks a lot for your insights on the matter!

Regards,

Pierre

PS: giving me a "SQLAlchemy fast distance computation" won't do the trick, because I have other kinds of computations that may not be optimizable this way.

Mike Bayer

unread,
Oct 11, 2023, 9:07:46 AM10/11/23
to noreply-spamdigest via sqlalchemy


On Wed, Oct 11, 2023, at 4:22 AM, Pierre Massé wrote:
Dear all,

I have a requirement that makes me think that I need to "mass patch" some ORM objects. However, I am open to any suggestions regarding the way to answer my requirements.

I have defined an ORM object which represents a user, holding longitude and latitude (among other attributes). At some point, I want to query many of those users, and send them back holding the geographical distance from a certain point (defined by longitude and latitude) along with their other data.

Computing the distance is computationally heavy, and I noticed that I could greatly improve performance by mass computing those distances, using numpy for example.

My question is: would it be possible to split my flow in 2 : 
- a flow that queries the data that is simply available in the database, as ORM entities
- a flow that queries lon/lat as a numpy array, perform the distance computation
and afterward merge those 2 in the queried ORM entities?

This is a straightforward programming task.   Query for the set of objects you want, assemble the appropriate values into a numpy array, do wahtever numpy thing you need, then merge back.    you'd likely want to ensure you can correlate numpy rows back to original objects most likely by keeping a sort order between your result set and your numpy array.

not stated here is if these numpy-calculated values as assembled onto ORM objects are also database-column mapped, it sounds like they are not (otherwise those values would be in the database), so the matrix values can be applied to plain attributes on the objects directly.

now where this may be more challenging, not sure if this is what you're asking, is if you want this to happen implicitly for all queries or something like that.   there's ways to do this depending on the programming patterns you are looking to achieve however I'd certainly start simple with a function like "apply_lat_long_to_list_of_objects(obj)".



It is important to me that I finally get back a list of ORM entities fully populated, because my whole downstream process is built around this assumption.

Thanks a lot for your insights on the matter!

Regards,

Pierre

PS: giving me a "SQLAlchemy fast distance computation" won't do the trick, because I have other kinds of computations that may not be optimizable this way.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Pierre Massé

unread,
Oct 11, 2023, 11:10:38 AM10/11/23
to sqlal...@googlegroups.com
Thanks a lot for the insight Mike,

My question might then be quite naive: let's say I have a list of ORM entities on one side, and an accordingly sorted numpy array of computed features, how would I merge back attributes on entities?

Let's say I have a list like :
user_list = [User(id=1, dist=None), User(id=2, dist=None)]

and a pandas DataFrame (or numpy array) like:

dist_df = 
id dist
1 123
2 90

How would I correlate those 2 into:
[User(id=1, dist=123), User(id=2, dist=90)]

Would the way to go be a simple for loop? Like:
for user in user_list:
    user.dist = dist_df.loc[user.id, 'dist']

Or is there something included in SQLAlchemy for this kind of task?

Regards,

Pierre

Mike Bayer

unread,
Oct 11, 2023, 11:16:21 AM10/11/23
to noreply-spamdigest via sqlalchemy


On Wed, Oct 11, 2023, at 11:10 AM, Pierre Massé wrote:
Thanks a lot for the insight Mike,

My question might then be quite naive: let's say I have a list of ORM entities on one side, and an accordingly sorted numpy array of computed features, how would I merge back attributes on entities?

Let's say I have a list like :
user_list = [User(id=1, dist=None), User(id=2, dist=None)]

and a pandas DataFrame (or numpy array) like:

dist_df = 
id dist
1 123
2 90

How would I correlate those 2 into:
[User(id=1, dist=123), User(id=2, dist=90)]

Would the way to go be a simple for loop? Like:
for user in user_list:
    user.dist = dist_df.loc[user.id, 'dist']

Or is there something included in SQLAlchemy for this kind of task?

let's assume the two lists are ordered, which I would strongly recommend.  merge them with zip:

for user, dist in zip(user_list, dist_df):
    user.dist = dist



Pierre Massé

unread,
Oct 11, 2023, 11:23:09 AM10/11/23
to sqlal...@googlegroups.com
Thanks a lot, sorry to have been a bother for such a simple question..

Reply all
Reply to author
Forward
0 new messages