Google Groups

Re: Regex bug?


Jordan Ritter Jul 13, 2011 12:54 PM
Posted in group: Ruby Enterprise Edition
A friend of mine (Brendan Baldwin) tried to post this in response, but it didn't get through for some reason, so I'm forwarding it for him:


Your Regexp is this: /\A(?:%[0-9a-fA-F]{2}|[^%]+)*\z/ 

The problem occurs because you have a variable length pattern [^%]+ inside another variable length pattern (?: ... )*
When the pattern fails to match the URL because of an improperly managed % character, the engine has to try another permutation of 
variable length matches and it will attempt to try every permutation possible until it exhausts them all or arrives at a match.
Fortunately, in this case, the solution is quite simple.  Remove the + from the [^%] variable length pattern and the engine wont need to 
backtrack and cycle through all possible match permutations.

/\A(?:%[0-9a-fA-F]{2}|[^%])*\z/ 

--Brendan

This is one of a set of long-standing crappy short-comings of Ruby's regex engines (see previous post) and I would love to see "solving" them rise in priority.  But, refactoring problematic regexes has generally been enough to avoid the broader issues with the engines.  Seems like the fix should get sent along to the Rack folks, if they haven't figured it out already..

cheers,
--jordan


On Jul 12, 2011, at 4:17 PM, John Nunemaker wrote:

A week back I innocently updated Sinatra on an app which in turn updated Rack. Since then, I've had crazy hung passenger processes that just gobble up CPU like it is going out of style.

After spending a few days trying everything I knew to to fix it, today I got help from a friend (Eric Lindvall) and dug in with strace, rbtrace, gdb, and gdb.rb and found the issue. Rack 1.3.0 tests a regex against a URL and it causes things to hang.

== Version of REE:

ruby 1.8.7 (2011-02-18 patchlevel 334) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2011.02

== Steps to reproduce:

$ irb
ree-1.8.7-2010.02 :001 > str = "http%3A%2F%2Fwww.google.com%2Furl%3Fsa%3Dt%26source%3Dweb%26cd%3D1%26sqi%3D2%26ved%3D0CCkQFjAA%26url%3Dhttp%253A%252F%252Fnd.edu%252F%26rct%3Dj%"
ree-1.8.7-2010.02 :002 > str =~ /\A(?:%[0-9a-fA-F]{2}|[^%]+)*\z/

The regex is used in Rack 1.3.0: https://github.com/rack/rack/blob/1.3.0/lib/rack/backports/uri/common.rb#L61 and when certain urls hit it, rack freezes, the passenger processes freezes, and CPU climbs til it is maxed or you kill it.

I am in no way smart enough to know why it hangs, or how to fix it, but, man, did it kill my last 3 days tracking this down. Hope this helps. If I need to post this somewhere else or any more information is needed, just let me know.

Below are links to some random gists and pasties with gdb stuff that may or may not help:




--
You received this message because you are subscribed to the Google Groups "Ruby Enterprise Edition" group.
To view this discussion on the web visit https://groups.google.com/d/msg/emm-ruby/-/ADNZp6T5p8oJ.
To post to this group, send email to emm-...@googlegroups.com.
To unsubscribe from this group, send email to emm-ruby+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/emm-ruby?hl=en.