Named tuples in SQLAlchemy

1,570 views
Skip to first unread message

Christoph Zwerschke

unread,
Nov 9, 2012, 6:49:19 AM11/9/12
to sqlal...@googlegroups.com
Query results in SQLAlchemy are returned as named tuples, but SQLAlchemy
uses its own flavor of named tuples which is not fully compatible with
collections.namedtuple in the standard lib. For instance, "_fields" is
called "_labels" in SQLAlchemy, and the method "_asdict()" which could
be helpful to convert query results to JSON is missing in SQLAlchemy.
Wouldn't it be better to use the standard collections.namedtuples. I
know it's only available since Py 2.6, but SQLAlchemy will eventually
only work with Py 2.6 anyway, and could just use a fallback
implementation for Py 2.5 for the time being.

-- Christoph

Michael Bayer

unread,
Nov 9, 2012, 5:54:27 PM11/9/12
to sqlal...@googlegroups.com
NamedTuple is a tough one - because with our result sets we need to create a new NamedTuple for every call to execute(), meaning it has to be performant not just on creating new instances of the tuple, but on creating new tuple types as well.

If you look at the source to NamedTuple, it is going through some very elaborate hoops to produce the usage contract it has, including defining a whole new custom "tuple" subclass as a Python string, then exec'ing that code to produce the new class. This is a common technique of producing very well-behaved dynamic classes.

Then, not only does it exec() a whole Python source string, after that it actually does *stack frame* logic to further manipulate the object so that it is pickleable - again this is because it's a dynamically generated type.

NamedTuple starts at http://hg.python.org/cpython/file/f938d478359a/Lib/collections.py#l237.

I use NamedTuple a lot, but to have the generation of new NamedTuples added onto the latency of all Query executions, including stack frame poking as well as parsing and invoking 29 lines of Python code, kind of frightens me.

Our own "keyed tuple" has a different way of working, in that we just have a fixed object, not a custom type. There is no latency for creating new types and its constructor is almost as simple as that of a plain NamedTuple (we just send the "keys" along each time, not a big deal since we only do this in one place). NamedTuple's custom type generation is why it has such involved hoops to jump through.

We can certainly add the methods you're describing to the object. i've added http://www.sqlalchemy.org/trac/ticket/2601 for this. Feel free to add your thoughts there.

it's also possibly worth it to performance check our own util.KeyedTuple against NamedTuple, just to get an idea for if/how much the performance differs. My impression though is that there'd be a significant speed bump, though, and on the other end I often have to help people get Query to speed up as it is.


On Nov 9, 2012, at 6:49 AM, Christoph Zwerschke wrote:

> Query results in SQLAlchemy are returned as named tuples, but SQLAlchemy uses its own flavor of named tuples which is not fully compatible with collections.namedtuple in the standard lib. For instance, "_fields" is called "_labels" in SQLAlchemy, and the method "_asdict()" which could be helpful to convert query results to JSON is missing in SQLAlchemy. Wouldn't it be better to use the standard collections.namedtuples. I know it's only available since Py 2.6, but SQLAlchemy will eventually only work with Py 2.6 anyway, and could just use a fallback implementation for Py 2.5 for the time being.
>
> -- Christoph
>
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>
>

Christoph Zwerschke

unread,
Nov 10, 2012, 6:06:27 AM11/10/12
to sqlal...@googlegroups.com
Am 09.11.2012 23:54, schrieb Michael Bayer:
> NamedTuple is a tough one - because with our result sets we need to
> create a new NamedTuple for every call to execute(), meaning it has
> to be performant not just on creating new instances of the tuple,
> but on creating new tuple types as well.
>
> If you look at the source to NamedTuple, it is going through some
> very elaborate hoops ...

Yes, that's true. Of course this is done for good reasons, namely to
give you all the goodness of zero overhead per instance in terms of
memory and creation time, a telling type name etc. Raymond Hettinger
explains the design principles very well at
http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-fun-with-python-s-newer-tools-4901215
(minutes 11 to 27).

But you're right, it also comes at a cost, namely creation time for the
type itself. A quick test with timeit showed that this time overhead
only amortizes when you create at least about 175 instances. The memory
advantage is of course always there, but it's not interesting for
smaller datasets either. And then it will depend on how large your data
values are compared to the names of the columns. There is also not much
benefit in creating a custom type name for the tuple, since query
results usually don't have an obvious name anyway.

So maybe it's better to keep the current implementation and just make it
a bit more similar to Python's named tuples, e.g. renaming _labels to
_fields and adding _as_dict. By the way, the underscore has been only
added here to minimize the possibility of name clashes with tuple
fields, they shall not indicate that these are private attributes. As
another aside, the _as_dict method should not return a normal dict, but
an OrderedDict which can also be taken from collections nowadays.

-- Christoph
Reply all
Reply to author
Forward
0 new messages