Hi Andrew
> I think it's necessary to do some profiling before trying to optimize.
Yep, totally agree. :)
> The simplest way I've found to profile "smt" calls is to apply the following
> diff to the "bin/smt" script:
>
> [...]
Thanks for that recipe! After sending my email I actually did almost
precisely that. As expected, the costly function turned out to be the
DjangoRecordStore.save() in recordstore/django_store/__init__.py (line
172). I also did a line-profiling of that function, and it turns out
that 80% of the time is spent in line 193, inside the following for
loop:
for key in record.output_data:
db_record.output_data.add(self._get_db_obj('DataKey', key))
Just to make sure that it doesn't add much to the cost I also
temporarily separated out the call to self._get_db_obj(), but it turns
out that nearly all the time is indeed spent in the call to
'db_record.output_data.add()'.
I hope this helps. Any ideas how to optimise it? I think the fact that
writing records to the database takes so long (if there are many
output files) is actually the main reason why I also ran into some of
the concurrency issues with locked databases that were discussed on
the mailing list a while ago (see also this blog post: [1]). Btw, has
Daniel had a chance to submit a patch for postgres support yet? Or are
there any other ways of circumventing the concurrency issues that
could be integrated into Sumatra quickly?
Cheers,
Max
[1]
http://wd15.github.io/2013/04/08/configuring-sumatra-for-postgres/