RegEx problem solved in VS2017

59 views
Skip to first unread message

thomas_li...@hotmail.com

unread,
Apr 6, 2018, 8:37:29 AM4/6/18
to scintilla-interest
Hi.

Document.cxx contains this comment:

template<typename Iterator, typename Regex>
bool MatchOnLines(const Document *doc, const Regex &regexp, const RESearchRange &resr, RESearch &search) {
    bool matched = false;
    std::match_results<Iterator> match;

    // MSVC and libc++ have problems with ^ and $ matching line ends inside a range
    // If they didn't then the line by line iteration could be removed for the forwards
    // case and replaced with the following 4 lines:
    // Iterator uiStart(doc, startPos);
    // Iterator uiEnd(doc, endPos);
    // flagsMatch = MatchFlags(doc, startPos, endPos);
    // matched = std::regex_search(uiStart, uiEnd, match, regexp, flagsMatch);

    // Line by line.
    for (Sci::Line line = resr.lineRangeStart; line != resr.lineRangeBreak; line += resr.increment) {

But I believe this problem has been solved with Visual Studio 2017.

The 4 lines does however not seem to be up-to-date.

Regards Thomas

Message has been deleted

thomas_li...@hotmail.com

unread,
Apr 6, 2018, 8:57:08 AM4/6/18
to scintilla-interest
This code seems to work:

template<typename Iterator, typename Regex>
bool MatchOnLines(const Document *doc, const Regex &regexp, const RESearchRange &resr, RESearch &search) {
    std::match_results<Iterator> match;
    Iterator uiStart(doc, resr.startPos);
    Iterator uiEnd(doc, resr.endPos);
    std::regex_constants::match_flag_type flagsMatch = MatchFlags(doc, resr.startPos, resr.endPos);
    bool matched = std::regex_search(uiStart, uiEnd, match, regexp, flagsMatch);
    if (matched) {
        for (size_t co = 0; co < match.size(); co++) {
            search.bopat[co] = match[co].first.Pos();
            search.eopat[co] = match[co].second.PosRoundUp();
            Sci::Position lenMatch = search.eopat[co] - search.bopat[co];
            search.pat[co].resize(lenMatch);
            for (Sci::Position iPos = 0; iPos < lenMatch; iPos++) {
                search.pat[co][iPos] = doc->CharAt(iPos + search.bopat[co]);
            }
        }
    }
    return matched;
}

Neil Hodgson

unread,
Apr 6, 2018, 7:04:37 PM4/6/18
to Scintilla mailing list
Hi Thomas,

> This code seems to work:
> …

That’s certainly a simplification and works well in many cases.

However, it doesn’t work with the <regex> implementation in the standard library for GCC 7.3 as provided by current Fedora Linux 27.

With MSVC, it has problems with files containing CRLF line ends. Search the file “a\r\n” for “$” and it returns 2: between the CR and LF. That, by itself, could be worked around by nudging the match out of the composite line end. But worse, searching “a\r\n” for “a$” doesn’t match.

Its possible that the UTF8Iterator class could be enhanced to report “\r\n” as a single “\n” but that appears complex.

Neil

thomas_li...@hotmail.com

unread,
Apr 9, 2018, 7:10:40 AM4/9/18
to scintilla-interest

   However, it doesn’t work with the <regex> implementation in the standard library for GCC 7.3 as provided by current Fedora Linux 27.

   With MSVC, it has problems with files containing CRLF line ends. Search the file “a\r\n” for “$” and it returns 2: between the CR and LF. That, by itself, could be worked around by nudging the match out of the composite line end. But worse, searching “a\r\n” for “a$” doesn’t match.

   Its possible that the UTF8Iterator class could be enhanced to report “\r\n” as a single “\n” but that appears complex.

   Neil

So problems still exist.  The comment just describes a wrong problem.

Thank you. 

Neil Hodgson

unread,
Apr 9, 2018, 6:11:38 PM4/9/18
to Scintilla mailing list
thomas_linder_puls:

> So problems still exist. The comment just describes a wrong problem.

The comment is quite generic so still appears correct to me.

With good open-source regex libraries available in Boost.Regex and PCRE I had hoped that C++ <regex> implementations would rapidly improve but this doesn’t appear to be a focus for vendors yet.

Applications that want better regex support should consider copying the approach of Notepad++ which grafts Boost.Regex into Scintilla.

Neil

Reply all
Reply to author
Forward
0 new messages