Issue 12339 in v8: "Regular expression too long" on simple RegExp

725 views
Skip to first unread message

alexa… via monorail

unread,
Oct 25, 2021, 3:33:09 PM10/25/21
to v8-re...@googlegroups.com
Status: Untriaged
Owner: ----
Type: Bug

New issue 12339 by alexa...@attio.com: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339

Version: Version 95.0.4638.54 (Official Build) (arm64)
OS: MacOS 11.6
Architecture: ARM (Apple Silicon M1)

What steps will reproduce the problem?
Hard to say precisely - however the following Chrome Extension exhibits the issue when run on any website: https://chrome.google.com/webstore/detail/attio/legacbojjmajoedfolbjlekjjkepadph

What is the expected output?
The code runs as expected


What do you see instead?
The code fails to execute citing a RegEx error "Uncaught SyntaxError: Invalid regular expression: /([A-Z])/: Regular expression too large"

This is my first time reporting to V8 so I apologise if I have missed a convention. We've confirmed that this is functioning in previous versions of Chrome and alternative browsers (such as Safari) - so I believe this is the result of the V8 changes that shipped with Chrome 95.

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

ecmzi… via monorail

unread,
Oct 28, 2021, 11:26:44 AM10/28/21
to v8-re...@googlegroups.com
Updates:
Components: Regexp
Labels: Priority-2

Comment #1 on issue 12339 by ecmzi...@chromium.org: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339#c1

(No comment was entered for this change.)

alexa… via monorail

unread,
Oct 28, 2021, 12:09:19 PM10/28/21
to v8-re...@googlegroups.com

Comment #2 on issue 12339 by alexa...@attio.com: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339#c2

Some additional findings since this report was opened.

It appears the issue is triggered by single javascript files above 32MB (unfortunately above the upload limit for this form).

Once a single file exceeds this size uncompressed it appears that all RegEx declared using inline syntax `/[A-Z]/` fails to compile at the V8 level (the javascript itself is not executed). Replacing the inline definitions with the regex constructor allows the code to run as expected.

mloug… via monorail

unread,
Oct 28, 2021, 5:00:35 PM10/28/21
to v8-re...@googlegroups.com

Comment #3 on issue 12339 by mloug...@gmail.com: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339#c3

Microsoft Outlook is also hitting this error when our local dev builds result in very large script files.

paolo… via monorail

unread,
Oct 28, 2021, 5:15:02 PM10/28/21
to v8-re...@googlegroups.com

Comment #4 on issue 12339 by paolo...@gmail.com: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339#c4

I have encountered the same problem with a large internal web application and was able to reproduce it with a debug build of version 95.
What I see happening is that in this function:

template <class CharT>
void RegExpParserImpl<CharT>::Advance() {
if (has_next()) {
if (GetCurrentStackPosition() < stack_limit_) {
if (FLAG_correctness_fuzzer_suppressions) {
FATAL("Aborting on stack overflow");
}
ReportError(RegExpError::kStackOverflow);
} else if (zone()->excess_allocation()) { <====== fails
if (FLAG_correctness_fuzzer_suppressions) {
FATAL("Aborting on excess zone allocation");
}
ReportError(RegExpError::kTooLarge);
} else {
current_ = ReadNext<true>();
}
} else {
current_ = kEndMarker;
// Advance so that position() points to 1-after-the-last-character. This is
// important so that Reset() to this position works correctly.
next_pos_ = input_length() + 1;
has_more_ = false;
}
}

zone()->excess_allocation() returns true because segment_bytes_allocated_ is > 256KB (== kExcessLimit) so the function reports that the regular expression is too large, while the actual problem is that the Zone is full; the error would be triggered by any regular expression at that point.

This is the status of the Zone:

this 0x000001ec15b91ba0 {allocation_size_=0x000000000fa866b8 segment_bytes_allocated_=0x00000000102c3430 ...} const v8::internal::Zone *
allocation_size_ 0x000000000fa866b8 unsigned __int64
segment_bytes_allocated_ 0x00000000102c3430 unsigned __int64
position_ 0x000001ec35c79f28 unsigned __int64
limit_ 0x000001ec35c7d110 unsigned __int64
...

And this is the call stack when the check fails:

v8.dll!v8::internal::Zone::excess_allocation() Line 144 C++
> v8.dll!v8::internal::`anonymous namespace'::RegExpParserImpl<unsigned char>::Advance() Line 467 C++
v8.dll!v8::internal::`anonymous namespace'::RegExpParserImpl<unsigned char>::RegExpParserImpl(const unsigned char * input, int input_length, v8::base::Flags<v8::internal::RegExpFlag,int> flags, unsigned __int64 stack_limit, v8::internal::Zone * zone, const v8::internal::CombinationAssertScope<v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::SAFEPOINTS_ASSERT,0>,v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::HEAP_ALLOCATION_ASSERT,0>> & no_gc) Line 416 C++
v8.dll!v8::internal::RegExpParser::VerifyRegExpSyntax<unsigned char>(v8::internal::Zone * zone, unsigned __int64 stack_limit, const unsigned char * input, int input_length, v8::base::Flags<v8::internal::RegExpFlag,int> flags, v8::internal::RegExpCompileData * result, const v8::internal::CombinationAssertScope<v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::SAFEPOINTS_ASSERT,0>,v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::HEAP_ALLOCATION_ASSERT,0>> & no_gc) Line 2429 C++
v8.dll!v8::internal::RegExp::VerifySyntax<unsigned char>(v8::internal::Zone * zone, unsigned __int64 stack_limit, const unsigned char * input, int input_length, v8::base::Flags<v8::internal::RegExpFlag,int> flags, v8::internal::RegExpError * regexp_error_out, const v8::internal::CombinationAssertScope<v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::SAFEPOINTS_ASSERT,0>,v8::internal::PerThreadAssertScopeDebugOnly<v8::internal::HEAP_ALLOCATION_ASSERT,0>> & no_gc) Line 117 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ValidateRegExpLiteral(const v8::internal::AstRawString * pattern, v8::base::Flags<v8::internal::RegExpFlag,int> flags, v8::internal::RegExpError * regexp_error) Line 1804 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParseRegExpLiteral() Line 1832 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParsePrimaryExpression() Line 1951 C++
[...]
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParseExpressionOrLabelledStatement(v8::internal::ZoneList<const v8::internal::AstRawString *> * labels, v8::internal::ZoneList<const v8::internal::AstRawString *> * own_labels, v8::internal::AllowLabelledFunctionStatement allow_function) Line 5441 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParseStatement(v8::internal::ZoneList<const v8::internal::AstRawString *> * labels, v8::internal::ZoneList<const v8::internal::AstRawString *> * own_labels, v8::internal::AllowLabelledFunctionStatement allow_function) Line 5284 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParseStatementListItem() Line 5179 C++
v8.dll!v8::internal::ParserBase<v8::internal::Parser>::ParseStatementList(v8::internal::ScopedList<v8::internal::Statement *,void *> * body, v8::internal::Token::Value end_token) Line 5128 C++
v8.dll!v8::internal::Parser::DoParseProgram(v8::internal::Isolate * isolate, v8::internal::ParseInfo * info) Line 633 C++
v8.dll!v8::internal::Parser::ParseOnBackground(v8::internal::ParseInfo * info, int start_position, int end_position, int function_literal_id) Line 3290 C++
v8.dll!v8::internal::BackgroundCompileTask::Run() Line 1612 C++
v8.dll!v8::ScriptCompiler::ScriptStreamingTask::Run() Line 2667 C++
[...]

The Zone used by the RegExpParser is the same Zone used by the ParserBase (ParserBase<Impl>::ValidateRegExpLiteral).
There could have been some changes in v.95 that caused V8 to allocate more in the Parser zone than it did in the previous version, and the app script file is large enough to trigger the error in the new version.

I don't know how to provide a repro for this issue, though, given that I only managed to repro it with an internal website.

jgru… via monorail

unread,
Nov 2, 2021, 4:03:19 AM11/2/21
to v8-re...@googlegroups.com
Updates:
Mergedinto: chromium:1264014
Owner: jgr...@chromium.org
Status: Duplicate

Comment #5 on issue 12339 by jgr...@chromium.org: "Regular expression too long" on simple RegExp
https://bugs.chromium.org/p/v8/issues/detail?id=12339#c5

Thanks for the report and the investigation, I'm currently looking into a fix.
Reply all
Reply to author
Forward
0 new messages