I did some performance testing (and profiling) to figure out what was
the best strategy to use to load multiple entities by key names. I
though it would be nice to share the result, so here they are...
I've put an excerpt of the source used for the test at the following
place:
http://nopaste.info/31b3652941_nl.html
I have compared the following strategies:
- multiple calls to db.get_by_key_name() or datastore.Get()
(respectively _loadDocumentsAsModel and _loadDocumentsAsRaw below)
- single calls to db.get_by_key_name() or datastore.Get() passing
multiple key_names (respectively _loadDocumentsAsModelBatch and
_loadDocumentsAsRawBatch below)
- modified version datastore.get() with my own Entity._FromPB
(_loadDocumentsAsCustomBatch below)
Notes that db.get_by_key_name() use datastore.Get() in its
implementation.
The model has about 13 properties, among those 3 string properties
which are usually about 30 characters long, and a text property of
about 200 characters long.
Results:
Load strategy \ number of entities: 300 1000 2000
_loadDocumentsAsModel 2.53s
_loadDocumentsAsRaw 2.48s
_loadDocumentsAsModelBatch 0.73s 2.60s
_loadDocumentsAsRawBatch 0.70s 2.47s
_loadDocumentsAsCustomBatch 0.42s 1.48s 2.97s
Notes:
- Missing data are due to dead-line excess (processing time >3s).
Conclusion:
- Batch get where you pass the list of keys are a must use
- datastore.get() give only a small 5% advantage compared to
db.get_by_key_name()
- _loadDocumentsAsCustomBatch using the modified datastore.Get() I
dubbed get_as_pb() seems to be worth the pain when there is a need to
retrieve a singiticant amount of entities. It provides a good 30%
reduction of CPU usage compared to datastore.Get() as it replaces the
costly Entity._FromPB() mapping with a more effective (and specific)
one.