Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
rfc3696 URL validation memory leak?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Osma Suominen  
View profile  
 More options May 10 2012, 1:53 am
From: Osma Suominen <osma.suomi...@aalto.fi>
Date: Wed, 9 May 2012 22:53:20 -0700 (PDT)
Local: Thurs, May 10 2012 1:53 am
Subject: rfc3696 URL validation memory leak?
Hi all

I'd like to use the LEPL rfc3696 module for URL/URI validation. But
when I added validation into my application (which processes RDF data
with many URLs, some of them broken), its memory usage jumped through
the roof. It seems to me that LEPL leaks a significant amount of
memory when validating URLs.

This simple test script that validates 10000 generated URLs takes
about 500MB memory on my system (Ubuntu 12.04 amd64, Python 2.7.3,
LEPL 5.1.1 installed via easy_install):

#!/usr/bin/env python
from lepl.apps.rfc3696 import HttpUrl
URLS = 10000
print "validating %d URLs" % URLS
validator = HttpUrl()
for i in xrange(URLS):
  url = "http://example.org/%d" % i
  validator(url)
print "done, press enter"
raw_input()

If I change the script to validate the same URL over and over, memory
usage goes back to normal. So maybe LEPL is storing (fragments of?)
the URLs somewhere. In this case I'm only interested in the validation
result (True/False), though. I would expect GC to reclaim any memory
after validation.

I also tried moving the HttpUrl constructor inside the loop. The code
became a lot slower, taking minutes instead of seconds to run, but
memory usage is still high - in fact, even higher than in the first
run (I killed it at 5 minutes and more than 700 MB memory).

Am I doing something wrong?

Thanks,
Osma Suominen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options May 10 2012, 3:46 am
From: andrew cooke <and...@acooke.org>
Date: Thu, 10 May 2012 03:46:48 -0400
Local: Thurs, May 10 2012 3:46 am
Subject: Re: [LEPL] rfc3696 URL validation memory leak?

the constructor should be *outside* the loop.

i think you're right - lepl will cache data to improve speed on repeated
parses and that should be disabled for this library (that's also why repeating
a previous matcher consumes no more memory).

i'll test and do a new release this weekend, hopefully, but if you want to fix
things yourself and can access source, modify

      matcher.config.compile_to_re()

to be

      matcher.config.compile_to_re().no_memoize()

in _matcher_to_validator in lepl.apps.rfc3696

sorry about that + thanks for the report,
andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options May 13 2012, 4:50 pm
From: andrew cooke <and...@acooke.org>
Date: Sun, 13 May 2012 16:50:30 -0400
Local: Sun, May 13 2012 4:50 pm
Subject: Re: [LEPL] rfc3696 URL validation memory leak?

OK, there's a new release (5.1.2) that disables memoization making the RFC3696
package much more suitable for long-running processes.

Andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »