On Thu, May 10, 2012 at 03:46:48AM -0400, Andrew Cooke wrote:
> the constructor should be *outside* the loop.
> i think you're right - lepl will cache data to improve speed on repeated
> parses and that should be disabled for this library (that's also why repeating
> a previous matcher consumes no more memory).
> i'll test and do a new release this weekend, hopefully, but if you want to fix
> things yourself and can access source, modify
> matcher.config.compile_to_re()
> to be
> matcher.config.compile_to_re().no_memoize()
> in _matcher_to_validator in lepl.apps.rfc3696
> sorry about that + thanks for the report,
> andrew
> On Wed, May 09, 2012 at 10:53:20PM -0700, Osma Suominen wrote:
> > Hi all
> > I'd like to use the LEPL rfc3696 module for URL/URI validation. But
> > when I added validation into my application (which processes RDF data
> > with many URLs, some of them broken), its memory usage jumped through
> > the roof. It seems to me that LEPL leaks a significant amount of
> > memory when validating URLs.
> > This simple test script that validates 10000 generated URLs takes
> > about 500MB memory on my system (Ubuntu 12.04 amd64, Python 2.7.3,
> > LEPL 5.1.1 installed via easy_install):
> > #!/usr/bin/env python
> > from lepl.apps.rfc3696 import HttpUrl
> > URLS = 10000
> > print "validating %d URLs" % URLS
> > validator = HttpUrl()
> > for i in xrange(URLS):
> > url = "http://example.org/%d" % i
> > validator(url)
> > print "done, press enter"
> > raw_input()
> > If I change the script to validate the same URL over and over, memory
> > usage goes back to normal. So maybe LEPL is storing (fragments of?)
> > the URLs somewhere. In this case I'm only interested in the validation
> > result (True/False), though. I would expect GC to reclaim any memory
> > after validation.
> > I also tried moving the HttpUrl constructor inside the loop. The code
> > became a lot slower, taking minutes instead of seconds to run, but
> > memory usage is still high - in fact, even higher than in the first
> > run (I killed it at 5 minutes and more than 700 MB memory).
> > Am I doing something wrong?
> > Thanks,
> > Osma Suominen
> > --
> > You received this message because you are subscribed to the Google Groups "lepl" group.
> > To post to this group, send email to lepl@googlegroups.com.
> > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.