Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

core dumped at regex_search()

189 views
Skip to first unread message

Jivanmukta

unread,
May 13, 2022, 5:55:25 AM5/13/22
to
I have a problem with regex_search function from standard regex library.
I use gcc compiler in Ubuntu 20.
I wrote PHP obfuscator in C++. Now I am testing my program with PHP
source codes found in Internet. For example PHP project
shopping_portal_0.1 contains very long line with string data (108599
characters). When I call regex_reach() with the following regex:
(static|private|protected|public)?\s*(readonly)?\s*([_a-zA-Z0-9]+)?\s*(\$([a-zA-Z_][a-zA-Z0-9_]*)\s*=.*,?\s*)+
I receive core dumped at this call.
Can you tell me a workaround/solution for my problem?

Juha Nieminen

unread,
May 13, 2022, 6:20:47 AM5/13/22
to
Are you sure it's regex_search that's the culprit and it's not just a
symptom of a bug in your own code? (Out-of-bounds accesses and similar
errors can have all kinds of weird effects, where the program doesn't
crash or misbehave at the place of the bug, but somewhere else, which
uses data corrupted by the buggy code.)

You can try running your program with valgrind to see if it detects
such errors. You can also try compiling with the gcc compiler flag
"-fsanitize=address" for a similar functionality. Might also specify
"-D_GLIBXCC_DEBUG" for good measure.

If nothing of that helps, post a minimal complete example that
replicates the problem.

Juha Nieminen

unread,
May 13, 2022, 6:25:48 AM5/13/22
to
Juha Nieminen <nos...@thanks.invalid> wrote:
> Might also specify "-D_GLIBXCC_DEBUG" for good measure.

I mean "-D_GLIBCXX_DEBUG"

Jivanmukta

unread,
May 13, 2022, 6:47:35 AM5/13/22
to
W dniu 13.05.2022 o 11:54, Jivanmukta pisze:
> When I call regex_reach() with the following regex

I meant regex_search() of course.

Jivanmukta

unread,
May 13, 2022, 8:41:19 AM5/13/22
to
W dniu 13.05.2022 o 11:54, Jivanmukta pisze:
> I receive core dumped at this call.


The problem occurs in regex.h file; at line
const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
I have segmentation fault.

template<typename _Ch_type>
struct regex_traits
{
...
/**
* @brief Translates a character into a case-insensitive equivalent.
*
* @param __c A character to the locale-specific character set.
*
* @returns the locale-specific lower-case equivalent of __c.
* @throws std::bad_cast if the imbued locale does not support
the ctype
* facet.
*/
char_type
translate_nocase(char_type __c) const
{
typedef std::ctype<char_type> __ctype_type;
const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
return __fctyp.tolower(__c);
}

James Kuyper

unread,
May 14, 2022, 1:22:17 AM5/14/22
to
It's not obvious what the actual problem is. If you could provide a
simplified version of your program that demonstrates the failure, that
would be a big help.

Here's a systematic process for simplifying your code:

1. Start with a program that is known to demonstrate the problem you've
run into. Create two copies of your program, the saved copy and the
working copy.
2. From the working copy, choose something to remove. It should be as
big as possible, and something that seems unlikely to be relevant to the
problem you're running into. A good starting point would be to remove
everything that's supposed to happen after the point where a problem
occurred. Each time you reach this step, remember to remove a different
something.
3. Remove that part, and test to see if you still see the problem.
4a. If you do see the problem, remove the saved copy, replacing it with
the working copy.
4b. If you don't see the problem, think long and hard about that fact.
You removed something that wasn't supposed to effect the problem - but
it did. That's a clue about something you didn't understand. Oftentimes,
while following this procedure, you will figure out the problem yourself
by careful consideration of those clues.
If you haven't resolved the problem, delete the working copy.
5. Go back to step 2.

This process ends when you figure out the problem, or when you have no
new ideas as to what to remove.

In that later case, you've got something suitable for presentation to
this newsgroup. Remember, when posting it, to include the following
information:

1. The full text of the program, cut-and-pasted from the actual sources.
Don't type it in - you don't want us to waste our time investigating
your typos.
2. Precisely how you built the code: what platform, what compiler, which
command line options, etc.
3. What you expected to see. Keep in mind that if your expectations are
wrong, it will be hard for people to realize that if you don't tell them
what your expectations are.
4. Cut-and-paste of the text that demonstrates that something other than
what you expected, happened. That might include error messages, a dump
of the output file, or any of several other things. Don't just say "it
didn't work".


Juha Nieminen

unread,
May 14, 2022, 3:18:23 AM5/14/22
to
Jivanmukta <jivan...@poczta.onet.pl> wrote:
> W dniu 13.05.2022 o 11:54, Jivanmukta pisze:
>> I receive core dumped at this call.
>
>
> The problem occurs in regex.h file; at line
> const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
> I have segmentation fault.

Did you even read what I wrote?

Jivanmukta

unread,
May 18, 2022, 4:20:03 AM5/18/22
to
I failed to reproduce problem in isolated test program.

#include <string>
#include <iostream>
#include <fstream>
#include <regex>

using namespace std;

int main() {
ifstream in("test.txt" /* very big text file */, std::ifstream::in);
string line;
in >> line;
string regexp1 =
"(static|private|protected|public)?\\s*(readonly)?\\s*([_a-zA-Z0-9]+)?\\s*(\\$([a-zA-Z_][a-zA-Z0-9_]*)\\s*=.*,?\\s*)+";
regex re1(regexp1, std::regex_constants::icase);
smatch matches;
if (regex_search(line, matches, re1) && matches.ready()) {
cout << "match" << endl;
} else {
cout << "no match" << endl;
}
}

Now I am learning valgrind.

Juha Nieminen

unread,
May 18, 2022, 4:53:02 AM5/18/22
to
Try first compiling with the command-line parameters

-fsanitize=address -D_GLIBCXX_DEBUG

This is easier because you don't need to learn new things. When you run
the program if the checking code added by those options detects a problem
it will tell where it's happening.

(The advantage of using valgrind is that it will detect even more runtime
errors.)

Jivanmukta

unread,
May 18, 2022, 5:59:48 AM5/18/22
to
The problem stack-oveflow occurs in different places in my code (during
sequence of tests).
There's a long list of
#... 0x... in ...
and SUMMARY.
Tail:

#248 0x59a22d in
std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char
const*, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > >, std::__cxx11::regex_traits<char>,
true>::_M_rep_once_more(std::__detail::_Executor<__gnu_cxx::__normal_iterator<char
const*, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char
const*, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > >, std::__cxx11::regex_traits<char>,
true>::_Match_mode, long) /usr/include/c++/9/bits/regex_executor.tcc:184

SUMMARY: AddressSanitizer: stack-overflow
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:433
in __interceptor_strcmp
==18084==ABORTING

Jivanmukta

unread,
May 18, 2022, 6:58:35 AM5/18/22
to
W dniu 18.05.2022 o 11:59, Jivanmukta pisze:
The problem not occurs and my program runs fine if I set: ulimit -s
unlimited.

Juha Nieminen

unread,
May 19, 2022, 11:15:21 AM5/19/22
to
Jivanmukta <jivan...@poczta.onet.pl> wrote:
>> SUMMARY: AddressSanitizer: stack-overflow
>> ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:433
>> in __interceptor_strcmp
>> ==18084==ABORTING
>>
> The problem not occurs and my program runs fine if I set: ulimit -s
> unlimited.

Without knowing how regex_search() is internally implemented, I suppose it's
possible it's overflowing the stack if, for some reason, it's a recursive
implementation and the input is so large that it causes a recursion that's
too deep.

I would find that a bit strange, though.

Tony Oliver

unread,
May 20, 2022, 10:53:19 AM5/20/22
to
It looks like you're encountering Catastrophic Backtracking, caused by
the spacing requirement between the third and fourth sub-expressions
being optional.

I would suggest replacing (at least) the third instance of \s* with \s+

Jivanmukta

unread,
May 22, 2022, 2:14:13 AM5/22/22
to
W dniu 20.05.2022 o 16:53, Tony Oliver pisze:
I cannot do that because in this case my regex woudn't match text:
public $x;
0 new messages