On Sat, Feb 11, 2017 at 08:15:08PM +0000, Adam Johnson wrote:
> I can see the advantage from an operational perspective with files matching
> byte-for-byte. I know many API's do the same with sorting the keys in their
> JSON output for the same reason.
> I should think the performance impact isn't too great, but would be nice to
> see some benchmarking to prove it's not disastrous.
I did a small benchmark, and as expected, the impact is really small.
Without the pull-request, generating an Atom feed with 100 medium sized
entries takes ~ 0.01 s on consumer hardware. The sorting and
creation of OrderedDicts increases the runtime by ~ 27 %.
Note that this benchmark generates the same feed several times in a row
because the absolute runtime is so small. See
https://gist.github.com/gsauthof/e2787adc7e9a46811dcb6aff2054110c for
details.
It pays off to only sort+create OrderedDict if there are attributes,
e.g.:
def startElement(self, name, attrs):
xs = collections.OrderedDict(sorted(attrs.items())) if attrs else attrs
super().startElement(name, xs)
That version yields an ~ 18 % runtime increase.
We can get down to ~ 7 % runtime increase if we - in feedgenerator.py - replace
the attribute dictionaries with a simple wrapper around a list that provides
the items() method, e.g.:
class Atts:
def __init__(self, items):
self.xs = items
def items(self):
return self.xs
And then add the elements with attributes like this:
handler.addQuickElement("atom:link", None,
Atts([("rel", "self"), ("href", self.feed['feed_url'])]))
But IMHO, this would be a premature optimization that just
obfuscates the feedgenerator code.
Even the conditional sorting is hardly necessary outside of
microbenchmarking scenarios.
Best regards
Georg
--
'One must not put a loaded rifle on the stage if no one is
thinking of firing it.' ( Anton Chekhov, 1889 )