[Boost-users] [Regex] Major performance difference between Boost.Regex and Linux regex library

111 views
Skip to first unread message

Kieran O'Donohoe

unread,
Apr 4, 2010, 7:58:48 AM4/4/10
to boost...@lists.boost.org
Hi,
I am just starting to use Boost.Regex, porting from the Linux system regex library to Boost.Regex, and after doing initial testing on my first change I see a major performance impact, where Boost.Regex is about 30 times slower than the system regex library.
 
The Boost.Regex documentation seems to imply that its performance has a positive comparsion with existing libraries so I expect that I am doing something wrong but can't see it.
 
My regex strings are relatively short normal strings, e.g. "Authentication-Info", that is they contain no RE like syntax.
 
The str passed to regex_search() (also tried regex_match()) can be an exact match to the regex, but may differ by case or surrounded by white space.
 
Boost.Regex is used as follows:
 
// I have a heap pointer to 44 boost::regex objects, all stored in a list
// also associated with each boost::regex object is a function unique to each
boost::regex* myRegex = new boost::regex("Authentication-Info", boost::regex::icase|boost::regex::nosubs);
...

/* boost::regex_search() is called in a loop on the above list of boost::regex objects, for each test iteration there are 9 different values of name, all of which have a match in the above list, that is boost::regex_search() can be called up 396 times per test iteration, I performed a 1000 iteration test.
*/
// const char* name could be " authentication-info  " and should match positively (and does) with the regex
if(boost::regex_search(name, *myRegex, boost::match_nosubs)==true) {
    // do stuff on match - that is call associated function
    // then exit loop
}
I have also tried flags other than does listed, all with the same result.
 
I put the test through a profiler and see that calls on match_results and sub_match are called a number of times (over 9 million) which makes me pretty sure that this is where the problem is, I don't need any match_results/sub_match, I just need validation that the string exists. Obviously a match needs to be made, but approx 51 match_results object constructions per search seems over the top. 
 
Fyi, here is an entry from the profiler output calling boost::regex_search() 186,000 times, which is expected:

[5]     94.9    0.00    2.46  186000         bool boost::regex_search<char const*, char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >(char const*, char const*, boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, boost::regex_constants::_match_flags) [5]
 
But this entry where boost::match_results::match_results() is called 9,486,000 times is unexpected:
 
[9]     35.3    0.18    0.73 9486000         boost::match_results<char const*, std::allocator<boost::sub_match<char const*> > >::match_results(std::allocator<boost::sub_match<char const*> > const&) [9]

The same code executes 30 times faster if I replace new boost::regex(...) with a call to regcomp(&m_preg, "Authentication-Info", REG_EXTENDED | REG_ICASE) where m_preg is wrapped in a UDT that is heap allocated and likewise boost::regex_search(..) is replaced with a call to regexec() where nmatch is 0.
 
 


Hotmail: Trusted email with powerful SPAM protection. Sign up now.

Roland Bock

unread,
Apr 4, 2010, 2:58:16 PM4/4/10
to boost...@lists.boost.org
Kieran O'Donohoe wrote:
Hi,
I am just starting to use Boost.Regex, porting from the Linux system regex library to Boost.Regex, and after doing initial testing on my first change I see a major performance impact, where Boost.Regex is about 30 times slower than the system regex library.
 
The Boost.Regex documentation seems to imply that its performance has a positive comparsion with existing libraries so I expect that I am doing something wrong but can't see it.
[lots of details]

Hi,

it would be much easier to understand what you are doing if you could send your test program (compiling code, boiled down to the bare minimum).

Regards,

Roland

Reply all
Reply to author
Forward
0 new messages