jsonb_set in PostgreSQL 9.5+

900 views
Skip to first unread message

Zsolt Ero

unread,
May 15, 2017, 9:32:39 AM5/15/17
to sqlalchemy
In PostgreSQL 9.5+ it is finally possible to modify a single key inside a JSONB column. Usage is something like this:

update maps set screenshots=jsonb_set(screenshots, '{key}', '"value"') where id = 10688

Is it possible to write this query using the ORM somehow? If not, please take it as a feature request.



mike bayer

unread,
May 15, 2017, 9:58:57 AM5/15/17
to sqlal...@googlegroups.com
You can use that function directly:

my_object = session.query(Maps).get(5)

my_object.screenshots = func.jsonb_set(my_object.screenshots, '{key}',
'"value"')

session.flush()


as far as "transparent" ORM use of that, like this:

my_object.screenshots[key] = "value"

right now that is a mutation of the value, and assuming you were using
MutableDict to detect this as an ORM change event, the ORM considers
"screenshots" to be a single value that would be the target of an
UPDATE, meaning the whole JSON dictionary is passed into the UPDATE.
There is no infrastructure for the ORM to automatically turn certain
column updates into finely-detailed SQL function calls. I can imagine
that there might be some event-based way to make this happen
transparently within the flush, however, but I'd need to spend some time
poking around to work out how that might work.


I'm not familiar with what the advantage to jsonb_set() would be and I
can only guess it's some kind of performance advantage. I'd be curious
to see under what scenarios being able to set one element of the JSON
vs. UPDATEing the whole thing is a performance advantage significant
compared to the usual overhead of the ORM flush process; that is,
Postgresql is really fast, and for this optimization to be significant,
you probably need to be calling the Core function directly anyway rather
than going through the whole ORM flush process. But this is all based
on my assumption as to what your goal of using this function is.



>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Zsolt Ero

unread,
May 15, 2017, 10:19:22 AM5/15/17
to sqlal...@googlegroups.com
Thanks for the answer. My use case is the following:

I have an object (map_obj), which has screenshots in two sizes. I'm
using JSONB columns to store the screenshot filenames.

Now, the two screenshot sizes are generated in parallel. The code is
like the following:

map_obj = query(...by id...)
filename = generate_screenshot(size) # long running screenshot generation

try:
dbsession.refresh(map_obj, ['screenshots'])
map_obj.screenshots = dict(map_obj.screenshots, **{size: filename})
except Exception as e:
logger.warning(...)

It worked well for 99.9% of the cases. The problem is that in the rare
case when both screenshots got rendered within a few milliseconds, one
of the screenshots got lost.

The simple solution was to add lockmode='update' to the refresh, so
this way the refreshes are blocking until the other finishes the
update.

But since this means locking a full row, I was thinking a simple JSONB
insertion would probably be better, since I can avoid locking the row.

Zsolt
> --- You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/hjjIyEC8KHQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.

Zsolt Ero

unread,
May 15, 2017, 10:31:54 AM5/15/17
to sqlal...@googlegroups.com
I'm trying to run your example, but it doesn't work:

from sqlalchemy import func

m = request.dbsession.query(models.Map).get(3)
m.screenshots = func.jsonb_set(m.screenshots, '{key}', '"value"')
request.dbsession.flush()

It ends up in a (psycopg2.ProgrammingError) can't adapt type 'dict'.

Also, from the generated SQL it seems to me that it's also doing the
full JSONB update from client side, not just inserting a key into the
database server side.

UPDATE maps SET screenshots=jsonb_set(%(jsonb_set_1)s,
%(jsonb_set_2)s, %(jsonb_set_3)s) WHERE maps.id = %(maps_id)s
{'maps_id': 3, 'jsonb_set_3': '"value"', 'jsonb_set_2': '{key}',
'jsonb_set_1': {u'small': u'2ad139ee69cdcd9e.jpg', u'full':
u'68b3f51491ff1501.jpg'}}

mike bayer

unread,
May 15, 2017, 10:32:21 AM5/15/17
to sqlal...@googlegroups.com


On 05/15/2017 10:18 AM, Zsolt Ero wrote:
> Thanks for the answer. My use case is the following:
>
> I have an object (map_obj), which has screenshots in two sizes. I'm
> using JSONB columns to store the screenshot filenames.
>
> Now, the two screenshot sizes are generated in parallel. The code is
> like the following:
>
> map_obj = query(...by id...)
> filename = generate_screenshot(size) # long running screenshot generation
>
> try:
> dbsession.refresh(map_obj, ['screenshots'])
> map_obj.screenshots = dict(map_obj.screenshots, **{size: filename})
> except Exception as e:
> logger.warning(...)
>
> It worked well for 99.9% of the cases. The problem is that in the rare
> case when both screenshots got rendered within a few milliseconds, one
> of the screenshots got lost.
>
> The simple solution was to add lockmode='update' to the refresh, so
> this way the refreshes are blocking until the other finishes the
> update.
>
> But since this means locking a full row, I was thinking a simple JSONB
> insertion would probably be better, since I can avoid locking the row.


OK since you're looking to get around a race and do an "atomic" update,
I'd recommend running UPDATE straight, with ORM you can get this with
query.update()

session.query(YourClass).filter(YourClass.id ==
whatever).update({"screenshots": func.jsonb_set(YourClass.screenshots,
"key", "value")}, synchronize_session='fetch')

that will also refetch the current value of the row

mike bayer

unread,
May 15, 2017, 10:33:15 AM5/15/17
to sqlal...@googlegroups.com


On 05/15/2017 10:31 AM, Zsolt Ero wrote:
> I'm trying to run your example, but it doesn't work:
>
> from sqlalchemy import func
>
> m = request.dbsession.query(models.Map).get(3)
> m.screenshots = func.jsonb_set(m.screenshots, '{key}', '"value"')
> request.dbsession.flush()
>
> It ends up in a (psycopg2.ProgrammingError) can't adapt type 'dict'.

jsonb_set(models.Map.screenshots, ...)

because this works against the column, not the value

Zsolt Ero

unread,
May 15, 2017, 10:54:50 AM5/15/17
to sqlal...@googlegroups.com
Thanks, it is all clear now. Just out of interest, what is the point
of synchronize_session='fetch'?

For me all it does is a simple SELECT maps.id AS maps_id FROM maps
WHERE maps.id = %(id_1)s

All I get as a return value is 0: not successful (probably id didn't
exist), while 1: successful. It is the same behaviour both with
'fetch' and False.

Zsolt

Jonathan Vanasco

unread,
May 15, 2017, 11:28:12 AM5/15/17
to sqlalchemy

On Monday, May 15, 2017 at 9:58:57 AM UTC-4, Mike Bayer wrote:
I'd be curious to see under what scenarios being able to set one element of the JSON
vs. UPDATEing the whole thing is a performance advantage significant
compared to the usual overhead of the ORM flush process; that is,
Postgresql is really fast, and for this optimization to be significant,
you probably need to be calling the Core function directly anyway rather
than going through the whole ORM flush process.  

I did a bunch of tests on this a while back, but in regards to the very-similar HTSORE column and some tests on JSONB.

The big takeaways--

* after a certain amount of data is in the column, the most significant issue is bandwidth and timing from the payload transfer.
* there is a decent performance update if you're in a sweet spot where the column payload is TOASTable and that's the only update.  in that instance, postgres just updates the toast table  -- otherwise it does the standard routine of "mark the old row for deletion, copy the row and update it as the new row".  toasting a jsonb column has been tweaked a lot, the last time I checked it had to be "just right" -- big enough to toast, but small enough to fit in a single toast column.  

tldr, it won't noticeably affect performance for most situations.  

mike bayer

unread,
May 15, 2017, 11:29:24 AM5/15/17
to sqlal...@googlegroups.com


On 05/15/2017 10:54 AM, Zsolt Ero wrote:
> Thanks, it is all clear now. Just out of interest, what is the point
> of synchronize_session='fetch'?

that will do a SELECT and get the new value back and update your ORM
object in memory. Set synchronize_session=False if you don't care.

Zsolt Ero

unread,
May 15, 2017, 11:57:37 AM5/15/17
to sqlal...@googlegroups.com
I might not be understanding something, but for me there are two
different concepts here:

map_obj = dbsession.query(Map).get(id_)

is an object in memory, loaded with a long SELECT statement, allowing
us to get and set different attributes and the session / transaction
manager commits the auto-detected changes.

Whereas with

dbsession.query(Map).filter(Map.id == id_).update(
{"screenshots": func.jsonb_set(Map.screenshots, '{size}',
'"filename.jpg"')}, synchronize_session='fetch')

there is no object in memory, what we are writing here is just a nicer
syntax for a one line SQL UPDATE query. Even the triggered SELECT
statement is just querying for a single .id, which we have anyway.

Zsolt

mike bayer

unread,
May 15, 2017, 12:56:57 PM5/15/17
to sqlal...@googlegroups.com


On 05/15/2017 11:56 AM, Zsolt Ero wrote:
> I might not be understanding something, but for me there are two
> different concepts here:
>
> map_obj = dbsession.query(Map).get(id_)
>
> is an object in memory, loaded with a long SELECT statement, allowing
> us to get and set different attributes and the session / transaction
> manager commits the auto-detected changes.
>
> Whereas with
>
> dbsession.query(Map).filter(Map.id == id_).update(
> {"screenshots": func.jsonb_set(Map.screenshots, '{size}',
> '"filename.jpg"')}, synchronize_session='fetch')
>
> there is no object in memory, what we are writing here is just a nicer
> syntax for a one line SQL UPDATE query. Even the triggered SELECT
> statement is just querying for a single .id, which we have anyway.

the purpose of synchronize_session is only if you happened to run *both*
Python statements, so that you have map_obj present as a local variable,
and wish to expire the now stale value of map_obj.screenshots, so that
when you next access it, a SELECT is emitted to get the most recent
value, e.g. the one that's the result of your UPDATE statement.

the "fetch" strategy is actually wasteful here because it runs the
query() as a SELECT in order to locate the primary keys of the objects
that might be locally present, but your query is simple enough that this
is already apparent. the "fetch" strategy currently doesn't even bother
to get the new value right now and just expires, I forgot about this.
this is why you see just the one wasteful SELECT statement. "fetch"
probably should be improved to actually fetch and directly update the
values for the instances it locates, not sure why it wasn't done that
way to start.

the "evaluate" strategy does everything in Python, but also won't work
because it currently expects that the values which were set are also
evaluatable in Python, also should be improved to at least do a simple
"expire" for attributes that can't be evaluated in Python.

in this case the only strategy left is sychronize_session=False.
However if you have map_obj in memory, you'd need to run refresh() on it
to get the new JSON value if you care about it.

Zsolt Ero

unread,
May 16, 2017, 8:03:44 AM5/16/17
to sqlal...@googlegroups.com
Thanks a lot for the explanation, it's all clear now!
Reply all
Reply to author
Forward
0 new messages