Bug in v8/src/parsing/scanner?

17 views
Skip to first unread message

J Decker

unread,
Dec 15, 2017, 10:44:17 AM12/15/17
to v8-u...@googlegroups.com
The code below comes from ScanString()
I noticed that if the scanner encounters a '\' before a c0 > kMaxAscii HandleLeadSurrogate() (0xd800+0xdc00 surrogate handling) is no longer done.

I tried to make a test file that had such things, but node failed to read utf-16 encoded file even with BOM marks....

But also, if there is no backslash, then PS and LS (0x2028, 0x2029) characters are not considered valid line ending and would be stored in the string literal.... (IsLineTerminator test)

  while (true) {
    if (c0_ > kMaxAscii) {
      HandleLeadSurrogate();
      break;
    }
    if (c0_ == kEndOfInput || c0_ == '\n' || c0_ == '\r') return Token::ILLEGAL;
    if (c0_ == quote) {
      literal.Complete();
      Advance<false, false>();
      return Token::STRING;
    }
    char c = static_cast<char>(c0_);
    if (c == '\\') break;
    Advance<false, false>();
    AddLiteralChar(c);
  }

  while (c0_ != quote && c0_ != kEndOfInput && !IsLineTerminator(c0_)) {
    uc32 c = c0_;
    Advance();
    if (c == '\\') {
      if (c0_ == kEndOfInput || !ScanEscape<false, false>()) {
        return Token::ILLEGAL;
      }
    } else {
      AddLiteralChar(c);
    }
  }


Mathias Bynens

unread,
Dec 16, 2017, 5:36:04 PM12/16/17
to v8-users, Marja Hölttä
On Fri, Dec 15, 2017 at 4:44 PM, J Decker <d3c...@gmail.com> wrote:
The code below comes from ScanString()
I noticed that if the scanner encounters a '\' before a c0 > kMaxAscii HandleLeadSurrogate() (0xd800+0xdc00 surrogate handling) is no longer done.

I tried to make a test file that had such things, but node failed to read utf-16 encoded file even with BOM marks... 

I’m not sure what you mean exactly, but it sounds like you could use `eval()` to test this more directly.
 
But also, if there is no backslash, then PS and LS (0x2028, 0x2029) characters are not considered valid line ending and would be stored in the string literal.... (IsLineTerminator test)

This doesn’t seem to be the case:

$ rlwrap v8
V8 version 6.5.20
d8> eval('"a\u2028b"')
undefined:1: SyntaxError: Invalid or unexpected token
"a
^^
SyntaxError: Invalid or unexpected token
    at (d8):1:1

  while (true) {
    if (c0_ > kMaxAscii) {
      HandleLeadSurrogate();
      break;
    }
    if (c0_ == kEndOfInput || c0_ == '\n' || c0_ == '\r') return Token::ILLEGAL;
    if (c0_ == quote) {
      literal.Complete();
      Advance<false, false>();
      return Token::STRING;
    }
    char c = static_cast<char>(c0_);
    if (c == '\\') break;
    Advance<false, false>();
    AddLiteralChar(c);
  }

  while (c0_ != quote && c0_ != kEndOfInput && !IsLineTerminator(c0_)) {
    uc32 c = c0_;
    Advance();
    if (c == '\\') {
      if (c0_ == kEndOfInput || !ScanEscape<false, false>()) {
        return Token::ILLEGAL;
      }
    } else {
      AddLiteralChar(c);
    }
  }


--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

J Decker

unread,
Dec 16, 2017, 9:47:14 PM12/16/17
to v8-u...@googlegroups.com
On Sat, Dec 16, 2017 at 4:35 PM, 'Mathias Bynens' via v8-users <v8-u...@googlegroups.com> wrote:

On Fri, Dec 15, 2017 at 4:44 PM, J Decker <d3c...@gmail.com> wrote:
The code below comes from ScanString()
I noticed that if the scanner encounters a '\' before a c0 > kMaxAscii HandleLeadSurrogate() (0xd800+0xdc00 surrogate handling) is no longer done.

I tried to make a test file that had such things, but node failed to read utf-16 encoded file even with BOM marks... 

I’m not sure what you mean exactly, but it sounds like you could use `eval()` to test this more directly.
 
But also, if there is no backslash, then PS and LS (0x2028, 0x2029) characters are not considered valid line ending and would be stored in the string literal.... (IsLineTerminator test)

This doesn’t seem to be the case:

$ rlwrap v8
V8 version 6.5.20
d8> eval('"a\u2028b"')
undefined:1: SyntaxError: Invalid or unexpected token
"a
^^
SyntaxError: Invalid or unexpected token
    at (d8):1:1

I see; that ends up being > kMaxAscii which I missed; and the other case Advance() ends up doing the handle lead surrage internally. Got it; thanx.
Reply all
Reply to author
Forward
0 new messages