Hi there,
Thanks for your reply.
Short answer: no.
Neil deliberately chose a simple RE engine, to keep the code and binary small (and no need
for upgrade on each change of the chosen library...).
As said, plugin a better engine is possible.
> Is there any interest in this as well as search backwards ("Up" option) using regex and
> recursive patterns?
That's funny, I realize I nearly never use backward search, even for simple strings. I
used to use it in the past, but I rarely do today.
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
> I was wondering if there is any interest in improving the regular expression
> support in Scintilla. Currently the syntax supported is very basic indeed.
I believe including regular expression support inside Scintilla was
a mistake. Applications that use Scintilla and expose regular
expression functionality should aim to incorporate another regex
library. A Perl environment is likely to want to use Perl's built-in
regular expressions rather than another library which may be largely
compatible but differs in details. A Lua environment may wish to allow
a choice between Lua patterns and a more standard library.
Any additional regex functionality in Scintilla should minimize the
costs. There should not be any need to download an external library,
since this makes it harder to build Scintilla. Copying a library like
boost::regex into the Scintilla source tree makes the code base larger
and could require Scintilla releases whenever upstream fixes a
significant or security bug.
If each of the 3 platforms included a compatible regex library then
it may be beneficial to use this from Scintilla but the current
situation is that they don't. tr1::regex support appears to be
unfinished in libstdc++ which seems to be moving towards C++0x
std::regex instead. OS X is probably worse, a search just produced the
old BSD libc regex(3) man page.
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
As Simon mentions, some regular expression libraries only work well
with continuous buffers and Scintilla uses a split buffer. If
Scintilla were to use such a library, the cost of joining the two
buffers whenever a search was to be performed could cause some
operations to be unreasonably slow.
Neil
Hi Ben,Thanks, backwards regex is not something I've put a lot of effort into - like Philippe it's not something I generally seem to do! That being said, any help/contribution is always welcome!
Could you give me some examples of where recursive regexes are useful? I wonder also whether Boost::Regex supports them?
If so and the main regex library now supports named captures (my main reason for choosing Xpressive) then it might be worth switching PN from Xpressive to Regex.
Ben Hanson:> I was wondering if there is any interest in improving the regular expression
> support in Scintilla. Currently the syntax supported is very basic indeed.I believe including regular expression support inside Scintilla was
a mistake. Applications that use Scintilla and expose regular
expression functionality should aim to incorporate another regex
library. A Perl environment is likely to want to use Perl's built-in
regular expressions rather than another library which may be largely
compatible but differs in details. A Lua environment may wish to allow
a choice between Lua patterns and a more standard library.
Any additional regex functionality in Scintilla should minimize the
costs. There should not be any need to download an external library,
since this makes it harder to build Scintilla. Copying a library like
boost::regex into the Scintilla source tree makes the code base larger
and could require Scintilla releases whenever upstream fixes a
significant or security bug.
If each of the 3 platforms included a compatible regex library then
it may be beneficial to use this from Scintilla but the current
situation is that they don't. tr1::regex support appears to be
unfinished in libstdc++ which seems to be moving towards C++0x
std::regex instead. OS X is probably worse, a search just produced the
old BSD libc regex(3) man page.
http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
As Simon mentions, some regular expression libraries only work well
with continuous buffers and Scintilla uses a split buffer. If
Scintilla were to use such a library, the cost of joining the two
buffers whenever a search was to be performed could cause some
operations to be unreasonably slow.
On Sat, 12 Mar 2011, Neil Hodgson wrote:
> Ben Hanson:
>
>> I was wondering if there is any interest in improving the regular expression
>> support in Scintilla. Currently the syntax supported is very basic indeed.
>
> I believe including regular expression support inside Scintilla was
> a mistake. Applications that use Scintilla and expose regular
> expression functionality should aim to incorporate another regex
> library. A Perl environment is likely to want to use Perl's built-in
> regular expressions rather than another library which may be largely
> compatible but differs in details. A Lua environment may wish to allow
> a choice between Lua patterns and a more standard library.
I beg to differ. I love having the basic regexp support for simple
scripting purposes. It is so much easier to have that than to use an
external regex lib, something that would bloat my application.
Thank you for including it.
mitchell
>
> Any additional regex functionality in Scintilla should minimize the
> costs. There should not be any need to download an external library,
> since this makes it harder to build Scintilla. Copying a library like
> boost::regex into the Scintilla source tree makes the code base larger
> and could require Scintilla releases whenever upstream fixes a
> significant or security bug.
>
> If each of the 3 platforms included a compatible regex library then
> it may be beneficial to use this from Scintilla but the current
> situation is that they don't. tr1::regex support appears to be
> unfinished in libstdc++ which seems to be moving towards C++0x
> std::regex instead. OS X is probably worse, a search just produced the
> old BSD libc regex(3) man page.
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
>
> As Simon mentions, some regular expression libraries only work well
> with continuous buffers and Scintilla uses a split buffer. If
> Scintilla were to use such a library, the cost of joining the two
> buffers whenever a search was to be performed could cause some
> operations to be unreasonably slow.
>
> Neil
>
> --
> You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
> To post to this group, send email to scintilla...@googlegroups.com.
> To unsubscribe from this group, send email to scintilla-inter...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scintilla-interest?hl=en.
>
>
>
mitchell
On Sat, 12 Mar 2011, Ben Hanson wrote:
> On Friday, March 11, 2011 11:54:41 PM UTC, Neil Hodgson wrote:
> Ben Hanson:
>
> > I was wondering if there is any interest in improving the regular expression
> > support in Scintilla. Currently the syntax supported is very basic indeed.
>
> � �I believe including regular expression support inside Scintilla was
> a mistake. Applications that use Scintilla and expose regular
> expression functionality should aim to incorporate another regex
> library. A Perl environment is likely to want to use Perl's built-in
> regular expressions rather than another library which may be largely
> compatible but differs in details. A Lua environment may wish to allow
> a choice between Lua patterns and a more standard library.
>
> I sympathise with this point of view. The problem (as I see it) is that the default engine gets used because it is available
> (Notepad++ for example)� and then no more attention is paid to it. I would be very interested to see an editor support Lua pattern
> matching! :-)
Textadept[1] uses Lua patterns[2] instead of regex
[1]: http://caladbolg.net/textadept
[2]: http://caladbolg.net/luadoc/textadept/manual/6_AdeptEditing.html#find_and_replace
mitchell
>
> � �Any additional regex functionality in Scintilla should minimize the
> costs. There should not be any need to download an external library,
> since this makes it harder to build Scintilla. Copying a library like
> boost::regex into the Scintilla source tree makes the code base larger
> and could require Scintilla releases whenever upstream fixes a
> significant or security bug.
>
> The fact that other regex engines can be plugged in makes replacing the default one a lot less interesting, I agree.
>
> � �If each of the 3 platforms included a compatible regex library then
> it may be beneficial to use this from Scintilla but the current
> situation is that they don't. tr1::regex support appears to be
> unfinished in libstdc++ which seems to be moving towards C++0x
> std::regex instead. OS X is probably worse, a search just produced the
> old BSD libc regex(3) man page.
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
>
> C++0x is what interests me the most. I have recently ditched support for VC++ 6 in lexertl, and I'm not looking back...
>
> � �As Simon mentions, some regular expression libraries only work well
> with continuous buffers and Scintilla uses a split buffer. If
> Scintilla were to use such a library, the cost of joining the two
> buffers whenever a search was to be performed could cause some
> operations to be unreasonably slow.
>
> The easiest (and modern) solution to dealing with split buffers is to code an iterator that is aware of Scintillas internal
> structure and use that with whatever regex engine you like. Of course that implies a regex engine that copes with iterators
> correctly.
>
> Regards,
>
> Ben
>
I agree. In SciTE, the current support of regexes covers 99% of my needs for text editing.
And it is getting better with small, useful contributions that doesn't bloat it.
For heavy duty text processing, I believe that an external, specialized program will be
both more efficient (buffer management with gap can be weak on lot of automated small
changes through a whole big file, but it isn't not its purpose either), more flexible and
powerful.
From sed to awk to a script in your favorite language...
Now, I understand Neil which might be tired of hearing the same complaints. In hindsight,
perhaps it would have been a better idea to just ship a good API to integrate tightly a
regex engine supporting iterators (because of the gap), and to give the current library as
an optional example (so we can get it quickly in SciTE or other small projects using
Scintilla) while letting other projects to use their favorite engine. (Actually, it might
even be in this state already, but well, it is compiled and integrated to Scintilla by
default.)
BTW, I recently discovered that Google released two regex libraries, one lightweight, with
limitation but fast, and another more featured. PCRE is powerful, but with the version 7,
it starts looking like a parsing library, something I would rather prefer to defer to a
full PEG...