Using namedtuple instead of pure tuples

232 views
Skip to first unread message

Adam Kaliński

unread,
Feb 22, 2014, 11:24:39 AM2/22/14
to django-d...@googlegroups.com
Hi, 

I was thinking that it would be nice if we could use more power of namedtuples. For example get_field_by_name method returns tuple (field_object, model, direct, m2m) which requires user to dig up what each index means. I think this would increase code readability because data.model is more descriptive then data[1]. Nice thing is that it should be totally backward compatible with existing code as it's just a nicely named tuple ;). I don't really know if namedtuples have any serious drawbacks that would make this change a big problem so please point them out if you know any :). 
The get_field_by_name is probably not the only place where they could be used.  
What do you guys think? 


Regards, 
Adam Kaliński

Florian Apolloner

unread,
Feb 22, 2014, 11:37:18 AM2/22/14
to django-d...@googlegroups.com
On Saturday, February 22, 2014 5:24:39 PM UTC+1, Adam Kaliński wrote:
What do you guys think? 

Sounds good, you might check if namedtuple has negative performance effects (I doubt it, but who knows). The reason we didn't add it yet is that it just exists since 2.6.

cheers,
Florian

Stratos Moros

unread,
Feb 22, 2014, 1:01:21 PM2/22/14
to django-d...@googlegroups.com

Completely unscientific microbenchmarks: (gist)

>>> from timeit import Timer

## creation
# tuple
>>> Timer('(1, 2, 3, 4, 5)').timeit()
0.02694106101989746

# namedtuple with args
>>> Timer('T(1, 2, 3, 4, 5)', setup='''
from collections import namedtuple
T = namedtuple("T", "a b c d e")'''
).timeit()
0.773979902267456

# namedtuple with kwargs
>>> Timer('T(a=1, b=2, c=3, d=4, e=5)', setup='''
from collections import namedtuple
T = namedtuple("T", "a b c d e")'''
).timeit()
1.155663013458252 

## item access
# tuple
>>> Timer('t[0], t[1], t[2], t[3], t[4]', setup='t = (1, 2, 3, 4, 5)').timeit()
0.3240630626678467

# namedtuple with indices
>>> Timer('t[0], t[1], t[2], t[3], t[4]', setup='''
from collections import namedtuple
T = namedtuple("T", "a b c d e")
t = T(1, 2, 3, 4, 5)'''
).timeit()
0.2994410991668701

# namedtuple with attributes
>>> Timer('t.a, t.b, t.c, t.d, t.e', setup='''
from collections import namedtuple
T = namedtuple("T", "a b c d e")
t = T(1, 2, 3, 4, 5)'''
).timeit()
0.9363529682159424

It seems that the only significant slowdown is on a namedtuple's creation. I imagine the impact would be negligible on a complete request-response cycle, but that would have to be tested.

Accessing a namedtuple's items using attributes is also somewhat slower, but this wouldn't be a problem. Existing code would continue to work with the same performance and users could decide for themselves which way to access a namedtuple's items for new code.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/61b1a5f0-1d1c-480d-87cd-639c728327a7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Adam Kaliński

unread,
Feb 23, 2014, 8:12:38 AM2/23/14
to django-d...@googlegroups.com
I did my own tests with similar results:
 
http://nbviewer.ipython.org/gist/adamkal/9171081

looks like creating a namedtuple is also quite time consuming and it's even hard to compare to tuple here.
Good thing is that we have backward compatibility that has no overhead on accessing data. 

Adam Kaliński

unread,
Feb 23, 2014, 8:18:33 AM2/23/14
to django-d...@googlegroups.com
Oh! And as far as I understand they're equally memory consuming  

import sys
from collections import namedtuple
T = namedtuple('T', 'a b c d e')
assert sys.getsizeof((1,2,3,4,5)) == sys.getsizeof(T(1,2,3,4,5)) 

Russell Keith-Magee

unread,
Feb 23, 2014, 8:57:13 PM2/23/14
to Django Developers

Hi Stratos,

Thanks for providing those benchmarks -- that's really helpful. 

However, It's all well and good to say that "Doing X takes 0.01s" or "X is 50% slower than Y", but that if the time taken to do X is incredibly small to start with, a 50% slowdown doesnt really matter that much. The fact that namedtuple is slower than tuple is clear from your data; what isn't clear is how much we should be concerned. Unfortunately, the raw time data doesn't tell us how fast or slow your test machine is. 

If you can provide a baseline for comparison, that would be very helpful. In the past when we've done benchmarks, we've used:

 * Invoking a no-op (i.e., pass)
 * Invoking a function that performs a no-op.

This essentially gives us an indication for the order of magnitude of the time values you've provided, and allows us to say "creating a named tuple has the same overhead as a single function call", (or 10 function calls, or whatever is appropriate). This puts the benchmark into concrete terms - nobody would do a code review and say "you need to remove 1 function call for speed purposes", but they might say "these 1000 function calls seem excessive". 

Yours,
Russ Magee %-)

Alex Gaynor

unread,
Feb 23, 2014, 9:01:07 PM2/23/14
to django-d...@googlegroups.com
FWIW these benchmaks are not measuring accurately. Creating a tuple of the form "(1, 2, 3)" (notably, where all the members are literal constants) takes NO time actually, since it's an immutable container of all immutable items, the compiler is able to factor the work out. A benchmark, even as idiotically simple as:

"a = 1; (1, 2, a)" would more accurately mirror the real world.

Alex



For more options, visit https://groups.google.com/groups/opt_out.



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Adam Kaliński

unread,
Feb 24, 2014, 3:43:43 AM2/24/14
to django-d...@googlegroups.com
Good tips. Thanks! I'll improve them and post as soon as they're ready. Moreover I was thinking that it might be nice to just modify Django to use namedtuples and see how will that influence execution. The problem is that it might be difficult to create reliable tests.  
If you think that there is anything more missing please let me know.

Adam Kaliński

unread,
Mar 26, 2014, 4:12:32 PM3/26/14
to django-d...@googlegroups.com
Hi, 


Alex was right about those variables and new benchmarks have proved it. 

I also created a draft of how it might look like for Options.get_field_by_name (as this returns 4-tuple):
This is only a draft as I said. I think this might not be a good idea to change all index lookups to attr, but this example might show how this change would impact the code.  

What do you think? Is it worth changing? There are more places that might use such change also but get_fields_by_name sounds like good place to start. 
Reply all
Reply to author
Forward
0 new messages