Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
RegExp: Add support for table-based character class (issue 9854020)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
erik.co...@gmail.com  
View profile  
 More options Mar 26 2012, 7:12 am
From: erik.co...@gmail.com
Date: Mon, 26 Mar 2012 11:12:35 +0000
Local: Mon, Mar 26 2012 7:12 am
Subject: RegExp: Add support for table-based character class (issue 9854020)
Reviewers: ulan,

Description:
RegExp: Add support for table-based character class
code generation.  This is performance neutral for
all our tests, but a factor 6 faster for the Unicode
based regexp in the new test (and much more compact
code).

Please review this at https://chromiumcodereview.appspot.com/9854020/

SVN Base: http://v8.googlecode.com/svn/branches/bleeding_edge/

Affected files:
   M     src/arm/regexp-macro-assembler-arm.h
   M     src/arm/regexp-macro-assembler-arm.cc
   M     src/bytecodes-irregexp.h
   M     src/ia32/regexp-macro-assembler-ia32.h
   M     src/ia32/regexp-macro-assembler-ia32.cc
   M     src/interpreter-irregexp.cc
   M     src/jsregexp.cc
   M     src/mips/regexp-macro-assembler-mips.h
   M     src/regexp-macro-assembler-irregexp-inl.h
   M     src/regexp-macro-assembler-irregexp.h
   M     src/regexp-macro-assembler-irregexp.cc
   M     src/regexp-macro-assembler-tracer.h
   M     src/regexp-macro-assembler-tracer.cc
   M     src/regexp-macro-assembler.h
   M     src/x64/regexp-macro-assembler-x64.h
   M     src/x64/regexp-macro-assembler-x64.cc
   A     test/mjsunit/unicodelctest-no-optimization.js
   A     test/mjsunit/unicodelctest.js


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
u...@chromium.org  
View profile  
 More options Mar 27 2012, 7:02 am
From: u...@chromium.org
Date: Tue, 27 Mar 2012 11:02:43 +0000
Local: Tues, Mar 27 2012 7:02 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)
First round of comments.

http://codereview.chromium.org/9854020/diff/6001/src/interpreter-irre...
File src/interpreter-irregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/interpreter-irre...
src/interpreter-irregexp.cc:475: byte b = pc[8 + ((current_char & mask)

>> kBitsPerByteLog2)];

I think the compiler will take care of optimizing division
and modulo operations with constant powers of 2.
So there is no need for (>>) and (&).

If you use the "conventional" bit ordering (see the comment in
regexp-macro-assembler-irregexp.cc) then this code can be simplified as
follows:

const int kTableStart = 8;
int index = current_char & mask;
int byte_index = index / kBitsPerByte;
int bit_index = index % kBitsPerByte;
int bit = (pc[kTableStart + byte_index] >> bit_index) & 1;
if (bit != 0) ...

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1540: Label* above,
'above_or_equal' would be more precise name.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler-irregexp-inl.h (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler-irregexp-inl.h:67: if (pc_ + 1 >=
buffer_.length()) {
I think we want to expand only if pc_ == buffer.length().

If pc_ + 1 == buffer_.length() then there is enough place in the buffer
for one byte.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler-irregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler-irregexp.cc:385: byte |= (table->get(i + j)
!= 0) ? 1 : 0;
I think it should be (table->get(i * kBitsPerByte + j)).

Also, I would prefer not reversing the "conventional" bit ordering.
Consider this:

if (table->get(i*kBitsPerByte + j)) {
   byte |= 1 << j;

}

This would also simplify decoding in the interpreter.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler.h (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler.h:126: virtual void
CheckBitInTable(Handle<ByteArray> table, Label* on_bit_set) = 0;
A description comment for this function would nice. It is unclear how
the bit index is computed from the current character.

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
src/x64/regexp-macro-assembler-x64.cc:596: Handle<ByteArray> table,
Can't we use a packed table here too? Like in the interpreter.

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
src/x64/regexp-macro-assembler-x64.cc:600: __ and_(rbx,
Immediate(kTableSize - 1));
kTableMask instead of (kTableSize - 1) would be better.

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
File test/mjsunit/unicodelctest-no-optimization.js (right):

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest-no-optimization.js:33: var re =
/^([a-zªµºß-öø-ÿāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıijĵķ-ĸĺļľŀłńņň-ʼnŋōŏőœŕŗřśŝşšţťŧũūŭů űųŵŷźżž-ƀƃƅƈƌ-ƍƒƕƙ-ƛƞơƣƥƨƪ-ƫƭưƴƶƹ-ƺƽ-ƿdžljnjǎǐǒǔǖǘǚǜ-ǝǟǡǣǥǧǩǫǭǯ-ǰdzǵǹǻǽǿȁȃȅȇȉȋȍ ȏȑȓȕȗșțȝȟȡȣȥȧȩȫȭȯȱȳ-ȹȼȿ-ɀɂɇɉɋɍɏ-ʓʕ-ʯͱͳͷͻ-ͽΐά-ώϐ-ϑϕ-ϗϙϛϝϟϡϣϥϧϩϫϭϯ-ϳϵϸϻ-ϼа-џѡ ѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿҁҋҍҏґғҕҗҙқҝҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎ-ӏӑӓӕӗәӛӝӟӡӣӥӧөӫӭӯӱӳӵӷӹӻӽ ӿԁԃԅԇԉԋԍԏԑԓԕԗԙԛԝԟԡԣա-ևᴀ-ᴫᵢ-ᵷᵹ-ᶚḁḃḅḇḉḋḍḏḑḓḕḗḙḛḝḟḡḣḥḧḩḫḭḯḱḳḵḷḹḻḽḿṁṃṅṇṉṋṍṏṑṓṕṗ ṙṛṝṟṡṣṥṧṩṫṭṯṱṳṵṷṹṻṽṿẁẃẅẇẉẋẍẏẑẓẕ-ẝẟạảấầẩẫậắằẳẵặẹẻẽếềểễệỉịọỏốồổỗộớờởỡợụủứừửữự ỳỵỷỹỻỽỿ-ἇἐ-ἕἠ-ἧἰ-ἷὀ-ὅὐ-ὗὠ-ὧὰ-ώᾀ-ᾇᾐ-ᾗᾠ-ᾧᾰ-ᾴᾶ-ᾷιῂ-ῄῆ-ῇῐ-ΐῖ-ῗῠ-ῧῲ-ῴῶ-ῷⁱⁿℊℎ-ℏℓℯ ℴℹℼ-ℽⅆ-ⅉⅎↄⰰ-ⱞⱡⱥ-ⱦⱨⱪⱬⱱⱳ-ⱴⱶ-ⱼⲁⲃⲅⲇⲉⲋⲍⲏⲑⲓⲕⲗⲙⲛⲝⲟⲡⲣⲥⲧⲩⲫⲭⲯⲱⲳⲵⲷⲹⲻⲽⲿⳁⳃⳅⳇⳉⳋⳍⳏⳑⳓⳕⳗⳙⳛⳝⳟ ⳡⳣ-ⳤⴀ-ⴥꙁꙃꙅꙇꙉꙋꙍꙏꙑꙓꙕꙗꙙꙛꙝꙟꙣꙥꙧꙩꙫꙭꚁꚃꚅꚇꚉꚋꚍꚏꚑꚓꚕꚗꜣꜥꜧꜩꜫꜭꜯ-ꜱꜳꜵꜷꜹꜻꜽꜿꝁꝃꝅꝇꝉꝋꝍꝏꝑꝓꝕꝗꝙꝛꝝꝟꝡꝣ ꝥꝧꝩꝫꝭꝯꝱ-ꝸꝺꝼꝿꞁꞃꞅꞇꞌff-stﬓ-ﬗa-z]|
\ud801[\udc28-\udc4f]|
\ud835[\udc1a-\udc33\udc4e-\udc54\udc56-\udc67\udc82-\udc9b\udcb6-\udcb9\ud cbb\udcbd-\udcc3\udcc5-\udccf\udcea-\udd03\udd1e-\udd37\udd52-\udd6b\udd86- \udd9f\uddba-\uddd3\uddee-\ude07\ude22-\ude3b\ude56-\ude6f\ude8a-\udea5\ude c2-\udeda\udedc-\udee1\udefc-\udf14\udf16-\udf1b\udf36-\udf4e\udf50-\udf55\ udf70-\udf88\udf8a-\udf8f\udfaa-\udfc2\udfc4-\udfc9\udfcb])$/;
Comments from unicodelctest.js apply here too.

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
File test/mjsunit/unicodelctest.js (right):

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest.js:30: var re =
/^([a-zªµºß-öø-ÿāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıijĵķ-ĸĺļľŀłńņň-ʼnŋōŏőœŕŗřśŝşšţťŧũūŭů űųŵŷźżž-ƀƃƅƈƌ-ƍƒƕƙ-ƛƞơƣƥƨƪ-ƫƭưƴƶƹ-ƺƽ-ƿdžljnjǎǐǒǔǖǘǚǜ-ǝǟǡǣǥǧǩǫǭǯ-ǰdzǵǹǻǽǿȁȃȅȇȉȋȍ ȏȑȓȕȗșțȝȟȡȣȥȧȩȫȭȯȱȳ-ȹȼȿ-ɀɂɇɉɋɍɏ-ʓʕ-ʯͱͳͷͻ-ͽΐά-ώϐ-ϑϕ-ϗϙϛϝϟϡϣϥϧϩϫϭϯ-ϳϵϸϻ-ϼа-џѡ ѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿҁҋҍҏґғҕҗҙқҝҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎ-ӏӑӓӕӗәӛӝӟӡӣӥӧөӫӭӯӱӳӵӷӹӻӽ ӿԁԃԅԇԉԋԍԏԑԓԕԗԙԛԝԟԡԣա-ևᴀ-ᴫᵢ-ᵷᵹ-ᶚḁḃḅḇḉḋḍḏḑḓḕḗḙḛḝḟḡḣḥḧḩḫḭḯḱḳḵḷḹḻḽḿṁṃṅṇṉṋṍṏṑṓṕṗ ṙṛṝṟṡṣṥṧṩṫṭṯṱṳṵṷṹṻṽṿẁẃẅẇẉẋẍẏẑẓẕ-ẝẟạảấầẩẫậắằẳẵặẹẻẽếềểễệỉịọỏốồổỗộớờởỡợụủứừửữự ỳỵỷỹỻỽỿ-ἇἐ-ἕἠ-ἧἰ-ἷὀ-ὅὐ-ὗὠ-ὧὰ-ώᾀ-ᾇᾐ-ᾗᾠ-ᾧᾰ-ᾴᾶ-ᾷιῂ-ῄῆ-ῇῐ-ΐῖ-ῗῠ-ῧῲ-ῴῶ-ῷⁱⁿℊℎ-ℏℓℯ ℴℹℼ-ℽⅆ-ⅉⅎↄⰰ-ⱞⱡⱥ-ⱦⱨⱪⱬⱱⱳ-ⱴⱶ-ⱼⲁⲃⲅⲇⲉⲋⲍⲏⲑⲓⲕⲗⲙⲛⲝⲟⲡⲣⲥⲧⲩⲫⲭⲯⲱⲳⲵⲷⲹⲻⲽⲿⳁⳃⳅⳇⳉⳋⳍⳏⳑⳓⳕⳗⳙⳛⳝⳟ ⳡⳣ-ⳤⴀ-ⴥꙁꙃꙅꙇꙉꙋꙍꙏꙑꙓꙕꙗꙙꙛꙝꙟꙣꙥꙧꙩꙫꙭꚁꚃꚅꚇꚉꚋꚍꚏꚑꚓꚕꚗꜣꜥꜧꜩꜫꜭꜯ-ꜱꜳꜵꜷꜹꜻꜽꜿꝁꝃꝅꝇꝉꝋꝍꝏꝑꝓꝕꝗꝙꝛꝝꝟꝡꝣ ꝥꝧꝩꝫꝭꝯꝱ-ꝸꝺꝼꝿꞁꞃꞅꞇꞌff-stﬓ-ﬗa-z]|
\ud801[\udc28-\udc4f]|
\ud835[\udc1a-\udc33\udc4e-\udc54\udc56-\udc67\udc82-\udc9b\udcb6-\udcb9\ud cbb\udcbd-\udcc3\udcc5-\udccf\udcea-\udd03\udd1e-\udd37\udd52-\udd6b\udd86- \udd9f\uddba-\uddd3\uddee-\ude07\ude22-\ude3b\ude56-\ude6f\ude8a-\udea5\ude c2-\udeda\udedc-\udee1\udefc-\udf14\udf16-\udf1b\udf36-\udf4e\udf50-\udf55\ udf70-\udf88\udf8a-\udf8f\udfaa-\udfc2\udfc4-\udfc9\udfcb])$/;
The correctness of this test is far from obvious.

If the 're' and 'get_answer' were generated with a script, would it be
possible to put the script source in comments, so that others could
verify it?

Even better would be to write a JS script that generates and evals this
JS script.

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest.js:43: var s = String.fromCharCode(+0xd800 +
(c >> 10)) + String.fromCharCode(+0xdc00 + (c &0x3ff));
We need a space after &. Consider extracting the expression into an
encoding function, so that we can avoid overflowing 80 characters in the
line.

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
erik.co...@gmail.com  
View profile  
 More options Mar 28 2012, 5:40 am
From: erik.co...@gmail.com
Date: Wed, 28 Mar 2012 09:40:26 +0000
Local: Wed, Mar 28 2012 5:40 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)
New version uploaded that fixes comments and slightly optimizes  
CheckBitInTable
on IA32, x64 and ARM.

http://codereview.chromium.org/9854020/diff/6001/src/interpreter-irre...
File src/interpreter-irregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/interpreter-irre...
src/interpreter-irregexp.cc:475: byte b = pc[8 + ((current_char & mask)

>> kBitsPerByteLog2)];

On 2012/03/27 11:02:43, ulan wrote:

> I think the compiler will take care of optimizing division
> and modulo operations with constant powers of 2.
> So there is no need for (>>) and (&).

No, the compiler can only do this optimization for unsigned variables.
I don't use unsigned much because it causes warnings (comparing signed
with unsigned) and bugs (-1 compares greater than all other numbers).

On Linux (this is compiler dependent) the result of % can also be
negative so it is not as fast as &.

> If you use the "conventional" bit ordering (see the comment in
> regexp-macro-assembler-irregexp.cc) then this code can be simplified

as follows:

> const int kTableStart = 8;
> int index = current_char & mask;
> int byte_index = index / kBitsPerByte;
> int bit_index = index % kBitsPerByte;
> int bit = (pc[kTableStart + byte_index] >> bit_index) & 1;
> if (bit != 0) ...

I switched to the opposite bit ordering.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1540: Label* above,
On 2012/03/27 11:02:43, ulan wrote:

> 'above_or_equal' would be more precise name.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler-irregexp-inl.h (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler-irregexp-inl.h:67: if (pc_ + 1 >=
buffer_.length()) {
On 2012/03/27 11:02:43, ulan wrote:

> I think we want to expand only if pc_ == buffer.length().
> If pc_ + 1 == buffer_.length() then there is enough place in the
buffer for one
> byte.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler-irregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler-irregexp.cc:385: byte |= (table->get(i + j)
!= 0) ? 1 : 0;
On 2012/03/27 11:02:43, ulan wrote:

> I think it should be (table->get(i * kBitsPerByte + j)).
> Also, I would prefer not reversing the "conventional" bit ordering.
Consider
> this:
> if (table->get(i*kBitsPerByte + j)) {
>    byte |= 1 << j;
> }
> This would also simplify decoding in the interpreter.

Done, but I kept the get(i + j) because I think it's clearer and I
didn't include your implicit boolean conversion.

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
File src/regexp-macro-assembler.h (right):

http://codereview.chromium.org/9854020/diff/6001/src/regexp-macro-ass...
src/regexp-macro-assembler.h:126: virtual void
CheckBitInTable(Handle<ByteArray> table, Label* on_bit_set) = 0;
On 2012/03/27 11:02:43, ulan wrote:

> A description comment for this function would nice. It is unclear how
the bit
> index is computed from the current character.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
src/x64/regexp-macro-assembler-x64.cc:596: Handle<ByteArray> table,
On 2012/03/27 11:02:43, ulan wrote:

> Can't we use a packed table here too? Like in the interpreter.

That would be slower.

http://codereview.chromium.org/9854020/diff/6001/src/x64/regexp-macro...
src/x64/regexp-macro-assembler-x64.cc:600: __ and_(rbx,
Immediate(kTableSize - 1));
On 2012/03/27 11:02:43, ulan wrote:

> kTableMask instead of (kTableSize - 1) would be better.

Done.

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
File test/mjsunit/unicodelctest-no-optimization.js (right):

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest-no-optimization.js:33: var re =
/^([a-zªµºß-öø-ÿāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıijĵķ-ĸĺļľŀłńņň-ʼnŋōŏőœŕŗřśŝşšţťŧũūŭů űųŵŷźżž-ƀƃƅƈƌ-ƍƒƕƙ-ƛƞơƣƥƨƪ-ƫƭưƴƶƹ-ƺƽ-ƿdžljnjǎǐǒǔǖǘǚǜ-ǝǟǡǣǥǧǩǫǭǯ-ǰdzǵǹǻǽǿȁȃȅȇȉȋȍ ȏȑȓȕȗșțȝȟȡȣȥȧȩȫȭȯȱȳ-ȹȼȿ-ɀɂɇɉɋɍɏ-ʓʕ-ʯͱͳͷͻ-ͽΐά-ώϐ-ϑϕ-ϗϙϛϝϟϡϣϥϧϩϫϭϯ-ϳϵϸϻ-ϼа-џѡ ѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿҁҋҍҏґғҕҗҙқҝҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎ-ӏӑӓӕӗәӛӝӟӡӣӥӧөӫӭӯӱӳӵӷӹӻӽ ӿԁԃԅԇԉԋԍԏԑԓԕԗԙԛԝԟԡԣա-ևᴀ-ᴫᵢ-ᵷᵹ-ᶚḁḃḅḇḉḋḍḏḑḓḕḗḙḛḝḟḡḣḥḧḩḫḭḯḱḳḵḷḹḻḽḿṁṃṅṇṉṋṍṏṑṓṕṗ ṙṛṝṟṡṣṥṧṩṫṭṯṱṳṵṷṹṻṽṿẁẃẅẇẉẋẍẏẑẓẕ-ẝẟạảấầẩẫậắằẳẵặẹẻẽếềểễệỉịọỏốồổỗộớờởỡợụủứừửữự ỳỵỷỹỻỽỿ-ἇἐ-ἕἠ-ἧἰ-ἷὀ-ὅὐ-ὗὠ-ὧὰ-ώᾀ-ᾇᾐ-ᾗᾠ-ᾧᾰ-ᾴᾶ-ᾷιῂ-ῄῆ-ῇῐ-ΐῖ-ῗῠ-ῧῲ-ῴῶ-ῷⁱⁿℊℎ-ℏℓℯ ℴℹℼ-ℽⅆ-ⅉⅎↄⰰ-ⱞⱡⱥ-ⱦⱨⱪⱬⱱⱳ-ⱴⱶ-ⱼⲁⲃⲅⲇⲉⲋⲍⲏⲑⲓⲕⲗⲙⲛⲝⲟⲡⲣⲥⲧⲩⲫⲭⲯⲱⲳⲵⲷⲹⲻⲽⲿⳁⳃⳅⳇⳉⳋⳍⳏⳑⳓⳕⳗⳙⳛⳝⳟ ⳡⳣ-ⳤⴀ-ⴥꙁꙃꙅꙇꙉꙋꙍꙏꙑꙓꙕꙗꙙꙛꙝꙟꙣꙥꙧꙩꙫꙭꚁꚃꚅꚇꚉꚋꚍꚏꚑꚓꚕꚗꜣꜥꜧꜩꜫꜭꜯ-ꜱꜳꜵꜷꜹꜻꜽꜿꝁꝃꝅꝇꝉꝋꝍꝏꝑꝓꝕꝗꝙꝛꝝꝟꝡꝣ ꝥꝧꝩꝫꝭꝯꝱ-ꝸꝺꝼꝿꞁꞃꞅꞇꞌff-stﬓ-ﬗa-z]|
\ud801[\udc28-\udc4f]|
\ud835[\udc1a-\udc33\udc4e-\udc54\udc56-\udc67\udc82-\udc9b\udcb6-\udcb9\ud cbb\udcbd-\udcc3\udcc5-\udccf\udcea-\udd03\udd1e-\udd37\udd52-\udd6b\udd86- \udd9f\uddba-\uddd3\uddee-\ude07\ude22-\ude3b\ude56-\ude6f\ude8a-\udea5\ude c2-\udeda\udedc-\udee1\udefc-\udf14\udf16-\udf1b\udf36-\udf4e\udf50-\udf55\ udf70-\udf88\udf8a-\udf8f\udfaa-\udfc2\udfc4-\udfc9\udfcb])$/;
On 2012/03/27 11:02:43, ulan wrote:

> Comments from unicodelctest.js apply here too.

See answer there.

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
File test/mjsunit/unicodelctest.js (right):

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest.js:30: var re =
/^([a-zªµºß-öø-ÿāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıijĵķ-ĸĺļľŀłńņň-ʼnŋōŏőœŕŗřśŝşšţťŧũūŭů űųŵŷźżž-ƀƃƅƈƌ-ƍƒƕƙ-ƛƞơƣƥƨƪ-ƫƭưƴƶƹ-ƺƽ-ƿdžljnjǎǐǒǔǖǘǚǜ-ǝǟǡǣǥǧǩǫǭǯ-ǰdzǵǹǻǽǿȁȃȅȇȉȋȍ ȏȑȓȕȗșțȝȟȡȣȥȧȩȫȭȯȱȳ-ȹȼȿ-ɀɂɇɉɋɍɏ-ʓʕ-ʯͱͳͷͻ-ͽΐά-ώϐ-ϑϕ-ϗϙϛϝϟϡϣϥϧϩϫϭϯ-ϳϵϸϻ-ϼа-џѡ ѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿҁҋҍҏґғҕҗҙқҝҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎ-ӏӑӓӕӗәӛӝӟӡӣӥӧөӫӭӯӱӳӵӷӹӻӽ ӿԁԃԅԇԉԋԍԏԑԓԕԗԙԛԝԟԡԣա-ևᴀ-ᴫᵢ-ᵷᵹ-ᶚḁḃḅḇḉḋḍḏḑḓḕḗḙḛḝḟḡḣḥḧḩḫḭḯḱḳḵḷḹḻḽḿṁṃṅṇṉṋṍṏṑṓṕṗ ṙṛṝṟṡṣṥṧṩṫṭṯṱṳṵṷṹṻṽṿẁẃẅẇẉẋẍẏẑẓẕ-ẝẟạảấầẩẫậắằẳẵặẹẻẽếềểễệỉịọỏốồổỗộớờởỡợụủứừửữự ỳỵỷỹỻỽỿ-ἇἐ-ἕἠ-ἧἰ-ἷὀ-ὅὐ-ὗὠ-ὧὰ-ώᾀ-ᾇᾐ-ᾗᾠ-ᾧᾰ-ᾴᾶ-ᾷιῂ-ῄῆ-ῇῐ-ΐῖ-ῗῠ-ῧῲ-ῴῶ-ῷⁱⁿℊℎ-ℏℓℯ ℴℹℼ-ℽⅆ-ⅉⅎↄⰰ-ⱞⱡⱥ-ⱦⱨⱪⱬⱱⱳ-ⱴⱶ-ⱼⲁⲃⲅⲇⲉⲋⲍⲏⲑⲓⲕⲗⲙⲛⲝⲟⲡⲣⲥⲧⲩⲫⲭⲯⲱⲳⲵⲷⲹⲻⲽⲿⳁⳃⳅⳇⳉⳋⳍⳏⳑⳓⳕⳗⳙⳛⳝⳟ ⳡⳣ-ⳤⴀ-ⴥꙁꙃꙅꙇꙉꙋꙍꙏꙑꙓꙕꙗꙙꙛꙝꙟꙣꙥꙧꙩꙫꙭꚁꚃꚅꚇꚉꚋꚍꚏꚑꚓꚕꚗꜣꜥꜧꜩꜫꜭꜯ-ꜱꜳꜵꜷꜹꜻꜽꜿꝁꝃꝅꝇꝉꝋꝍꝏꝑꝓꝕꝗꝙꝛꝝꝟꝡꝣ ꝥꝧꝩꝫꝭꝯꝱ-ꝸꝺꝼꝿꞁꞃꞅꞇꞌff-stﬓ-ﬗa-z]|
\ud801[\udc28-\udc4f]|
\ud835[\udc1a-\udc33\udc4e-\udc54\udc56-\udc67\udc82-\udc9b\udcb6-\udcb9\ud cbb\udcbd-\udcc3\udcc5-\udccf\udcea-\udd03\udd1e-\udd37\udd52-\udd6b\udd86- \udd9f\uddba-\uddd3\uddee-\ude07\ude22-\ude3b\ude56-\ude6f\ude8a-\udea5\ude c2-\udeda\udedc-\udee1\udefc-\udf14\udf16-\udf1b\udf36-\udf4e\udf50-\udf55\ udf70-\udf88\udf8a-\udf8f\udfaa-\udfc2\udfc4-\udfc9\udfcb])$/;
On 2012/03/27 11:02:43, ulan wrote:

> The correctness of this test is far from obvious.
> If the 're' and 'get_answer' were generated with a script, would it be
possible
> to put the script source in comments, so that others could verify it?
> Even better would be to write a JS script that generates and evals
this JS
> script.

The test data was generated with V8 revision 3.7 and a script that is
95% the same as this script.  I can't put the script inside itself as a
comment :-)

http://codereview.chromium.org/9854020/diff/6001/test/mjsunit/unicode...
test/mjsunit/unicodelctest.js:43: var s = String.fromCharCode(+0xd800 +
(c >> 10)) + String.fromCharCode(+0xdc00 + (c &0x3ff));
On 2012/03/27 11:02:43, ulan wrote:

> We need a space after &. Consider extracting the expression into an
encoding
> function, so that we can avoid overflowing 80 characters in the line.

Done.
...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
u...@chromium.org  
View profile  
 More options Mar 28 2012, 9:14 am
From: u...@chromium.org
Date: Wed, 28 Mar 2012 13:14:08 +0000
Local: Wed, Mar 28 2012 9:14 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)
Second round of comments, it is close to looking good.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1555: Label* even_label,
Odd/even doesn't make sense without the ranges2 array. Consider
something like in_range, out_of_range.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1573:
A comment about odd/even/ranges2 would be helpful here.
If I understand correctly there is also an implicit precondition on
values in ranges2:
for all i, j: (ranges2[i] >> kTableSizeBits) == (ranges[j] >>
kTableSizeBits).

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1607: j < ranges2->at(i + 1) - base && j < kSize;
Is (ranges2->at(i + 1) - base) == ranges2->at(i + 1) & kMask?
If yes, I would prefer the kMask version because it is consistent with j
initializer.
If not, I don't understand this part.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1626: // Gets a series of segments representing a
character class.  If the character
Gets a series of segment boundaries ...

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1669: int c = ranges2->at(i);
If you extract the body of the loop into a function, then you could
avoid SearchOrder trickery:
1. Search for a single char range.
2. If found, call the function on this char range.
3. Otherwise, call the function on the first range.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1674: Label rest;
'fall_through' seems to be a better name than 'rest'.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1677: // Cut out the single range by rewriting the
array.
This implicitly creates a new range with endpoints ranges2[i-1] ..
ranges2[i+2]. A comment that it preserves the oddity of the labels would
be helpful.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1684: GenerateBranches(masm,
I think start_index + 1 can be larger that end_index - 1, but
GenerateBranches doesn't handle it.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1735: if (ranges2->at(new_start_index) > border) break;
I would expect to break on >=

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1749: int binary_chop_index = (end_index + start_index)
/ 2;
Why not (end_index + new_start_index) / 2 ?

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1750: if (border - 1 > String::kMaxAsciiCharCode &&
border >= String::kMaxAsciiCharCode

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1752: last - first > kSize * 4 &&
Are 4s above magic?

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1757: while (scan_forward_for_section_border <
end_index) {
This loop construct appears twice. Maybe put it into a separate
function?

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1778: }
Can we extract the code above into a separate function that returns a
new_start_index and a border? There are many edge cases, I would be
happy if those would be unit tested.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1886: ZoneList<int>* ranges2 =
This fits into one line. Can we find a better name than 'ranges2'? How
about 'range_endpoints', 'range_points', 'range_boundaries',
'alternating_ranges'.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1890: if (cc->is_negated()) zeroth_entry_is_failure =
false;
zeroth_entry_is_failure = !cc->is_negated();

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1896: ranges2->Add(0);
I wonder why we need ranges2[0] to be zero? GenerateBranch doens't seem
to rely on ranges2[0] == 0 and uses min_char.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1904: int ranges2_length = ranges2->length() - 1;
In 'ranges2_length' the 'length' suffix is misleading. I think it is not
a length but an index, so it should be called something like
'end_index'.

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
l...@google.com  
View profile  
 More options Mar 28 2012, 10:02 am
From: l...@google.com
Date: Wed, 28 Mar 2012 14:02:10 +0000
Local: Wed, Mar 28 2012 10:02 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)
DBC.
Like change!

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
File src/x64/regexp-macro-assembler-x64.cc (right):

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
src/x64/regexp-macro-assembler-x64.cc:578: Label* on_in_range) {
Maybe special-case if 'from' is zero (or even if 'to' is 0xffff)

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
src/x64/regexp-macro-assembler-x64.cc:579: __ lea(rax,
Operand(current_character(), -from));
lea->leal (saves a byte).

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
src/x64/regexp-macro-assembler-x64.cc:602: __ and_(rbx,
Immediate(kTableMask));
I'd consider swapping these:
   movq(rbx, Immediate(kTableMask));
   and_(rbx, current_character());
(speculating that the constant load could hoist better).

https://chromiumcodereview.appspot.com/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
erik.co...@gmail.com  
View profile  
 More options Mar 29 2012, 10:48 am
From: erik.co...@gmail.com
Date: Thu, 29 Mar 2012 14:48:59 +0000
Local: Thurs, Mar 29 2012 10:48 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1555: Label* even_label,
On 2012/03/28 13:14:08, ulan wrote:

> Odd/even doesn't make sense without the ranges2 array. Consider
something like
> in_range, out_of_range.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1573:
On 2012/03/28 13:14:08, ulan wrote:

> A comment about odd/even/ranges2 would be helpful here.
> If I understand correctly there is also an implicit precondition on
values in
> ranges2:
> for all i, j: (ranges2[i] >> kTableSizeBits) == (ranges[j] >>

kTableSizeBits).

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1607: j < ranges2->at(i + 1) - base && j < kSize;
On 2012/03/28 13:14:08, ulan wrote:

> Is (ranges2->at(i + 1) - base) == ranges2->at(i + 1) & kMask?
> If yes, I would prefer the kMask version because it is consistent with
j
> initializer.
> If not, I don't understand this part.

The answer is yes, and it is changed to use the kMask version.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1626: // Gets a series of segments representing a
character class.  If the character
On 2012/03/28 13:14:08, ulan wrote:

> Gets a series of segment boundaries ...

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1669: int c = ranges2->at(i);
On 2012/03/28 13:14:08, ulan wrote:

> If you extract the body of the loop into a function, then you could
avoid
> SearchOrder trickery:
> 1. Search for a single char range.
> 2. If found, call the function on this char range.
> 3. Otherwise, call the function on the first range.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1674: Label rest;
On 2012/03/28 13:14:08, ulan wrote:

> 'fall_through' seems to be a better name than 'rest'.

Renamed to dummy since it is not actually used.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1677: // Cut out the single range by rewriting the
array.
On 2012/03/28 13:14:08, ulan wrote:

> This implicitly creates a new range with endpoints ranges2[i-1] ..
ranges2[i+2].
> A comment that it preserves the oddity of the labels would be helpful.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1684: GenerateBranches(masm,
On 2012/03/28 13:14:08, ulan wrote:

> I think start_index + 1 can be larger that end_index - 1, but
GenerateBranches
> doesn't handle it.

We do not get into this if() unless they are 6 apart.
Assert added.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1735: if (ranges2->at(new_start_index) > border) break;
On 2012/03/28 13:14:08, ulan wrote:

> I would expect to break on >=

I think this is correct.  If there is a border right on the edge we want
to cut right before the border, but we don't want the new start_index to
be on the border, since that would make a trivial zero-width range at
the start.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1749: int binary_chop_index = (end_index + start_index)
/ 2;
On 2012/03/28 13:14:08, ulan wrote:

> Why not (end_index + new_start_index) / 2 ?

If we are doing binary chop then we would want to do it exactly in the
middle, but new_start_index is one page higher than the middle.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1750: if (border - 1 > String::kMaxAsciiCharCode &&
On 2012/03/28 13:14:08, ulan wrote:

> border >= String::kMaxAsciiCharCode

This line makes a cut at the ASCII non-ASCII boundary because the ASCII
area is so important for performance.  Even a Cyrillic text like the
Hippo article on todays Russian Wikipedia front page is 20% ASCII.  In
order to bypass the binary chop branch tree for ASCII this is the
correct test.  Comment added, though it does not explicitly mention the
Hippo.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1752: last - first > kSize * 4 &&
On 2012/03/28 13:14:08, ulan wrote:

> Are 4s above magic?

Yes.  I changed it to 2, which makes little difference to performance,
but makes more sense (there's no difference between binary chop and just
one at a time when there are only two pages left).

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1757: while (scan_forward_for_section_border <
end_index) {
On 2012/03/28 13:14:08, ulan wrote:

> This loop construct appears twice. Maybe put it into a separate

function?

The two occurrences are slightly different.  A function that has 7 lines
of body and returns two values will have much bigger overhead, and given
that they are not doing exactly the same I can't see a nice way to do
it.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1778: }
On 2012/03/28 13:14:08, ulan wrote:

> Can we extract the code above into a separate function that returns a
> new_start_index and a border? There are many edge cases, I would be
happy if
> those would be unit tested.

Extracted, and more tests added.  It's fuzzing rather than unit, but it
improves the coverage and caught an off-by-one error.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1886: ZoneList<int>* ranges2 =
On 2012/03/28 13:14:08, ulan wrote:

> This fits into one line. Can we find a better name than 'ranges2'? How
about
> 'range_endpoints', 'range_points', 'range_boundaries',

'alternating_ranges'.

range_boundaries, but I changed it to just ranges inside most of the
other functions because anything longer will cause cascading reformats
to stay within 80 columns.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1890: if (cc->is_negated()) zeroth_entry_is_failure =
false;
On 2012/03/28 13:14:08, ulan wrote:

> zeroth_entry_is_failure = !cc->is_negated();

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1896: ranges2->Add(0);
On 2012/03/28 13:14:08, ulan wrote:

> I wonder why we need ranges2[0] to be zero? GenerateBranch doens't
seem to rely
> on ranges2[0] == 0 and uses min_char.

Done.

http://codereview.chromium.org/9854020/diff/6001/src/jsregexp.cc#newc...
src/jsregexp.cc:1904: int ranges2_length = ranges2->length() - 1;
On 2012/03/28 13:14:08, ulan wrote:

> In 'ranges2_length' the 'length' suffix is misleading. I think it is
not a
> length but an index, so it should be called something like

'end_index'.

Done.

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
src/x64/regexp-macro-assembler-x64.cc:578: Label* on_in_range) {
On 2012/03/28 14:02:10, Lasse Reichstein Nielsen wrote:

> Maybe special-case if 'from' is zero (or even if 'to' is 0xffff)

Never happens.

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
src/x64/regexp-macro-assembler-x64.cc:579: __ lea(rax,
Operand(current_character(), -from));
On 2012/03/28 14:02:10, Lasse Reichstein Nielsen wrote:

> lea->leal (saves a byte).

I think you mean on x64.
Fixed.

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
src/x64/regexp-macro-assembler-x64.cc:602: __ and_(rbx,
Immediate(kTableMask));
On 2012/03/28 14:02:10, Lasse Reichstein Nielsen wrote:

> I'd consider swapping these:
>    movq(rbx, Immediate(kTableMask));
>    and_(rbx, current_character());
> (speculating that the constant load could hoist better).

OTOH a mov can be eliminated completely by the register renamer.

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
l...@chromium.org  
View profile  
 More options Mar 29 2012, 11:25 am
From: l...@chromium.org
Date: Thu, 29 Mar 2012 15:25:32 +0000
Local: Thurs, Mar 29 2012 11:25 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
src/x64/regexp-macro-assembler-x64.cc:602: __ and_(rbx,
Immediate(kTableMask));
I don't think the register renamer will do that since the value of
current_character() needs to be retained as well. A move will use a
different physical register.

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
u...@chromium.org  
View profile  
 More options Mar 29 2012, 12:19 pm
From: u...@chromium.org
Date: Thu, 29 Mar 2012 16:19:17 +0000
Local: Thurs, Mar 29 2012 12:19 pm
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)
LGTM!

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc#new...
src/jsregexp.cc:1843: ASSERT_LT(new_end_index, end_index);
Also: start_index <= new_end_index && new_start_index <= end_index.

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc#new...
src/jsregexp.cc:1969: if (range.from() == 0) {
I think we don't need to check for range[i] == 0. GenerateBranches
doesn't count the range [minchar ... range_boundaries[0]], so the oddity
of ranges should be correct without this check.

for (int i = 0; i <= last_valid_range; i++) {
     CharacterRange& range = ranges->at(i);
     range_boundaries->Add(range.from());
     range_boundaries->Add(range.to() + 1);

}

http://codereview.chromium.org/9854020/diff/15001/src/mips/regexp-mac...
File src/mips/regexp-macro-assembler-mips.h (right):

http://codereview.chromium.org/9854020/diff/15001/src/mips/regexp-mac...
src/mips/regexp-macro-assembler-mips.h:84: virtual void
CheckCharacterInRange(uc16 from,
These functions do not have implementation. If it is intentional please
ignore this comment.

http://codereview.chromium.org/9854020/diff/15001/test/mjsunit/unicod...
File test/mjsunit/unicodelctest-no-optimization.js (right):

http://codereview.chromium.org/9854020/diff/15001/test/mjsunit/unicod...
test/mjsunit/unicodelctest-no-optimization.js:110: var re = new
RegExp("[" + cc + "]");
Maybe also test negated character class?

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
erik.co...@gmail.com  
View profile   Translate to Translated (View Original)
 More options Mar 30 2012, 3:46 am
From: erik.co...@gmail.com
Date: Fri, 30 Mar 2012 07:46:28 +0000
Local: Fri, Mar 30 2012 3:46 am
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
File src/x64/regexp-macro-assembler-x64.cc (right):

http://codereview.chromium.org/9854020/diff/11001/src/x64/regexp-macr...
src/x64/regexp-macro-assembler-x64.cc:602: __ and_(rbx,
Immediate(kTableMask));
On 2012/03/29 15:25:32, Lasse Reichstein wrote:

> I don't think the register renamer will do that since the value of
> current_character() needs to be retained as well. A move will use a
different
> physical register.

My understanding is that after renaming you have gone from 2-register
instructions to 3-register instructions so the mov disappears.

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc
File src/jsregexp.cc (right):

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc#new...
src/jsregexp.cc:1843: ASSERT_LT(new_end_index, end_index);
On 2012/03/29 16:19:17, ulan wrote:

> Also: start_index <= new_end_index && new_start_index <= end_index.

Done.

http://codereview.chromium.org/9854020/diff/15001/src/jsregexp.cc#new...
src/jsregexp.cc:1969: if (range.from() == 0) {
On 2012/03/29 16:19:17, ulan wrote:

> I think we don't need to check for range[i] == 0. GenerateBranches
doesn't count
> the range [minchar ... range_boundaries[0]], so the oddity of ranges
should be
> correct without this check.
> for (int i = 0; i <= last_valid_range; i++) {
>      CharacterRange& range = ranges->at(i);
>      range_boundaries->Add(range.from());
>      range_boundaries->Add(range.to() + 1);
> }

There is an explicit assumption in GenerateBranches that min_char is
strictly less than ranges[start_index].  I made that assumption explicit
with an assert and kept this code, which makes the assumption true.

http://codereview.chromium.org/9854020/diff/15001/src/mips/regexp-mac...
File src/mips/regexp-macro-assembler-mips.h (right):

http://codereview.chromium.org/9854020/diff/15001/src/mips/regexp-mac...
src/mips/regexp-macro-assembler-mips.h:84: virtual void
CheckCharacterInRange(uc16 from,
On 2012/03/29 16:19:17, ulan wrote:

> These functions do not have implementation. If it is intentional
please ignore
> this comment.

It is intentional.  I don't have time to fix this right now, so landing
will break MIPS.

http://codereview.chromium.org/9854020/diff/15001/test/mjsunit/unicod...
File test/mjsunit/unicodelctest-no-optimization.js (right):

http://codereview.chromium.org/9854020/diff/15001/test/mjsunit/unicod...
test/mjsunit/unicodelctest-no-optimization.js:110: var re = new
RegExp("[" + cc + "]");
On 2012/03/29 16:19:17, ulan wrote:

> Maybe also test negated character class?

Done.

http://codereview.chromium.org/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
erik.co...@gmail.com  
View profile  
 More options Mar 31 2012, 4:04 pm
From: erik.co...@gmail.com
Date: Sat, 31 Mar 2012 20:04:04 +0000
Local: Sat, Mar 31 2012 4:04 pm
Subject: Re: RegExp: Add support for table-based character class (issue 9854020)

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
File src/x64/regexp-macro-assembler-x64.cc (right):

https://chromiumcodereview.appspot.com/9854020/diff/11001/src/x64/reg...
src/x64/regexp-macro-assembler-x64.cc:602: __ and_(rbx,
Immediate(kTableMask));
On 2012/03/30 07:46:28, Erik Corry wrote:

> On 2012/03/29 15:25:32, Lasse Reichstein wrote:
> > I don't think the register renamer will do that since the value of
> > current_character() needs to be retained as well. A move will use a
different
> > physical register.
> My understanding is that after renaming you have gone from 2-register
> instructions to 3-register instructions so the mov disappears.

But I tested it and your version is marginally faster.

https://chromiumcodereview.appspot.com/9854020/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »