Perfomance with pynag.Model and filtering

36 views
Skip to first unread message

Tommi

unread,
Aug 14, 2013, 9:25:46 AM8/14/13
to pynag-...@googlegroups.com
Hi

I've been working on getting better performance out of filtering using pynag.Model and decided to share my findings.

I created a test case where I tried 3 different methods where I filter numerous times:
  1. Running Model.Service.objects.filter(host_name=..., service_description=...)
  2. Running Utils.grep(Model.Service.objects.all, host_name=..., service_description=...)
  3. Running services = Model.Service.objects.all and then running Utils.grep(services, host_name=..., service_description=...)

My test involved looping through all service objects that have host_name and service_description and filtering with the above methods. The test suite ran on a setup with 3028 different service objects and gave me the following findings:

# python Modelperf.py
Testing object filtering
  3028 365.00 seconds
Testing grep objects.all from within loop filtering
  3028 185.46 seconds
Testing grep with objects.all outside loop filtering
  3028 36.11 seconds

As you can see the last method was factor 10 faster than running Model.Service.objects.filter()

You can download the source of Modelperf.py at https://gist.github.com/tomas-edwardsson/6230958

---

Tommi

Páll Valmundsson

unread,
Aug 15, 2013, 9:56:05 AM8/15/13
to pynag-...@googlegroups.com
Hi.

I spent some time digging through this, my findings:
* filter is slower than grep, that was known, you should not use filter
* the time difference in preloaded and per loop generated service list is pynag checking the filesystem for changes in configuration files, nothing else that I saw

Most of the time spent, excluding the filesystem stuff, is in the actual filtering. Since you might get many results when using grep you have to search all of the list for every search, that's just expensive with a relatively large amount of services (my test case was 400 hosts with 10 services each). A lot of time is spent in Model.ObjectDefinition.get and thus Model.ObjectDefinition.__getitem__ as a result of the lambda that is used in grep, but that's only because of the amount of calls; my 4000 services called get/__getitem__ ~33 million times.... as I wrote this I figured out that .get is called at least one too many times and wrote a patch. The tests pass, hope they're not faulty :) My rudimentary testing shows ~15% drop in exec time. Interestingly the old filter methods only call .get once per iteration :)

Páll Sigurðsson

unread,
Aug 19, 2013, 6:01:28 AM8/19/13
to pynag-...@googlegroups.com
I would say the performance issues in item 2 is a bug. 

Its on the roadmap to throw away the implementation inside filter and let it use grep() internally.

There are a few quirks to consider to maintain full backwards compatibility but otherwise i think the change is trivial.

The benchmarks are awesome, we should run them between every release that changes parsing behavior :)


--
You received this message because you are subscribed to the Google Groups "pynag-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pynag-discus...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages