this is the first mail where I'd like to go through the remaining features
for Qt 5.0.
I know that this bug is blocked by not having PCRE in 3rdparty/. But how
is the status apart from this? Thiago, simply tell me if you feel
comfortable with the code and I'll do final review and approval. If it
doesn't break BC and is standalone from other code in qtcore, it can go
into master, otherwise I'd prefer it to go into api_changes.
There's another related issue to this bug: What do we do with the old
QRegExp? I've gripped through our code and removing it is a larger
surgical operation.
So I'd propose that we now bite the bullet and leave it in QtCore. Let's
simply mark it as deprecated since 5.0, and live with the additional 100k
in QtCore.
Cheers,
Lars
_______________________________________________
Development mailing list
Devel...@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development
The "final" reviews (API, code, tests, docs) are missing, as well as
the big rename to QRegex (or whatever).
I think the code is more or less OK, apart the usual cleanups, but I
defer to your judgement :)
> Thiago, simply tell me if you feel
> comfortable with the code and I'll do final review and approval. If it
> doesn't break BC and is standalone from other code in qtcore, it can go
> into master, otherwise I'd prefer it to go into api_changes.
It doesn't break BC or SC in any classes, it's a pure API addition.
The proposed rename to QRegex of course will break stuff, but as of
now noone is using those methods because they're not available -- the
patches are not merged ;-)
Or did you mean something else?
The bigger issue was related to (re)moving QRegExp: make
QString::indexOf/replace/etc. templated (on the regexp class). I can't
estimate the breakage that would do.
> There's another related issue to this bug: What do we do with the old
> QRegExp? I've gripped through our code and removing it is a larger
> surgical operation.
There are certain places where QRegExp in which the maintainer doesn't
want to depend on PCRE; for instance, qmake. In most of other places,
the replacement is simply a s/QRegExp/QRegularExpression/.
The only points where the API isn't a replecement is where the other
non-Perl QRegExp matchings are used (f.i. wildcard matching; but I saw
that QRegExp has a nice wc2rx internal method to convert wildcards to
a Perl-like regex, that could potentially even be exposed as part of
QRegularExpression API, and that can help the conversion...).
> So I'd propose that we now bite the bullet and leave it in QtCore. Let's
> simply mark it as deprecated since 5.0, and live with the additional 100k
> in QtCore.
Can it eventually become private in 5.0, accessible only through a
public interface in the "qt4support" repository (or whatever is called
the place where QHttp, QFtp etc. will go), and then get (re)moved from
QtCore later without anyone noticing?
Cheers,
--
Giuseppe D'Angelo
That's a missing functionality we'll need to add to QRegularExpression.
QRegExp is today used for globbing in QDir, QDirIterator and QSslSocket.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
I'm comfortable with the code as-is. Peppe has already submitted a couple of
changes on top, which fix issues he's already found. Arguably, they could be
squashed together, but I don't think that matters.
The one remaining issue is the use of mutex-protected data. But I think we've
got the proper fix in already. We'll only know more once people start using it.
> There's another related issue to this bug: What do we do with the old
> QRegExp? I've gripped through our code and removing it is a larger
> surgical operation.
Indeed. I really have no clue how hard this would be.
The uses of QRegExp in qtbase and my proposals would be:
- qmake: lots of use, keep it by copying qregexp.cpp into qmake
- rcc: one single use, very easy to rewrite without regexes
- uic: #includes are unnecessary; remove immediately
- QtCore API:
* QString non-const QRegExp&: use templates to support QRegExp in inline
code (find a way), or simply stop supporting this
* QMetaType & QVariant: let RegExp point to the new class
* everywhere else (including QStringList): replace with the new class
- QtCore internal uses: use the new class or rewrite (if possible, I
recommend a non-regex globbing in QDir and QDirIterator). If the class is
used in bootstrap, then the function in question must be opted out or the
code rewritten, like qdatetime.cpp and this comment in qxmlutils.cpp:
/* Right, we here have a dependency on QRegExp. Writing a manual parser to
* replace that regexp is probably a 70 lines so I prioritize this to when
* the dependency is considered alarming, or when the rest of the bugs
* are fixed. */
- QtXml: used in bootstrap, so must rewrite without regex. Or refactor rcc to
use QXmlStreamReader instead of QDom, then remove
- QtGui API:
* QTextDocument: replace with the new class
* QRegExpValidator: move alongside QRegExp, but add a new class to QtGui
- QtNetwork API:
* QSslSocket (globbing functions for certificates): replace with new enums
- QtScript API: replace with new class (same goes for QJSEngine)
Maybe since QtScript is done, we leave it as is. But QJSEngine should use
the new class.
- document gallery: I don't know
Everywhere else it's either internal use or doesn't have QRegExp in the API.
Then we move QRegExp and QRegExpValidator to a new library, like QtConcurrent.
We also modify QRegExp to add:
QRegExp(const QRegularExpression &); // implicit
operator QRegularExpression() const;
> So I'd propose that we now bite the bullet and leave it in QtCore. Let's
> simply mark it as deprecated since 5.0, and live with the additional 100k
> in QtCore.
That's another option, unless someone manages to pull off what I said above.
I'd still recommend we clean up the API more or less along the lines of what I
said above.
Since I understand that the code would still be available in a separate
library anyway, then there is no need to copy the file, qmake pulls in the
files it needs directly from the current location so it could just
continue to do the same without having to copy the qregexp.cpp
specifically.
Andy
I think my typing didn't follow the speed of my thoughts. So let me elaborate
a bit:
Step 1: qregexp in QtCore, where it is today
We do the cleanups of the APIs as I suggested
Step 2: qregexp.cpp moves to a separate lib, still inside qtbase
We verify the non-qtbase modules that require QRegExp and we decide whether
they'll use QRegExp or they'll move to PCRE. One example of not moving is
QtScript.
Step 3: qregexp.cpp moves out of qtbase
We need to keep a copy inside qmake
In short, the biggest deal is how to properly replace QString methods;
since everywhere else (QVariant, QObject, QStringList,
QSortFilterProxyModel) you want it to be replaced by the new class, or
a new class to be added.
I'm evaluating these approaches:
1) add the operator QRegularExpression() const to QRegExp and remove
the QString overloads taking a QRegExp:
Drawbacks:
* possible subtle bugs introduced by the different QRegExp and
QRegularExpression behaviours/quirks/syntax support/etc.
* decide what to do with the overloads taking a non-const-ref QRegExp
(used to extract captures, matched length, etc.)
* ... fix QRegExp. Right now it is bugged, and things like
QRegExp re("...");
/* do something like string.indexOf(re),
which does not actually *use* re,
but it just converts it to QRegularExpression */
re.matchedLength();
will crash (I have a patch that I think fixes it, but I don't know
what it *exactly* does... who's the QRegExp maintainer?)
2) Go with something like
template <typename RE> inline QT_DEPRECATED QString &replace(const
RE &rx, const QString &after, typename RE::isQRegExp * = 0)
{
// inline implementation, or call a method exported by the
QRegExp library
}
Pro
* Should be 100% compatible
Drawbacks
* str.replace(re, "foo") doesn't compile any more -- it's not found
during template lookup, since no overload is found taking (QRegExp,
const char[3]). That is, more template black magic is required for
each and every parameter but the first, to enable implicit
conversions. Help! :)
Comments?
--
Giuseppe D'Angelo
This would be my preferred way, but if a slight behaviour change exists, then
these two operations would produce different results:
int idx = string.indexOf(rx);
int idx = rx.indexIn(string);
The whole reason why we kept QRegExp in the first place was so that we wouldn't
introduce those subtle issues (and for qmake). So, aside from qmake, if we
managed to convert the QRegExp expressions to PCRE, we could also get rid of
the QRegExp engine altogether.
In which case we could even keep QRegExp as a deprecated class for these
methods:
> * decide what to do with the overloads taking a non-const-ref QRegExp
> (used to extract captures, matched length, etc.)
> * ... fix QRegExp. Right now it is bugged, and things like
>
> QRegExp re("...");
> /* do something like string.indexOf(re),
> which does not actually *use* re,
> but it just converts it to QRegularExpression */
> re.matchedLength();
Just so people understand, this crashes:
QRegExp re;
re.matchedLength();
> will crash (I have a patch that I think fixes it, but I don't know
> what it *exactly* does... who's the QRegExp maintainer?)
I guess that's now you.
> 2) Go with something like
>
> template <typename RE> inline QT_DEPRECATED QString &replace(const
> RE &rx, const QString &after, typename RE::isQRegExp * = 0)
> {
> // inline implementation, or call a method exported by the
> QRegExp library
> }
>
> Pro
> * Should be 100% compatible
> Drawbacks
> * str.replace(re, "foo") doesn't compile any more -- it's not found
> during template lookup, since no overload is found taking (QRegExp,
> const char[3]). That is, more template black magic is required for
> each and every parameter but the first, to enable implicit
> conversions. Help! :)
>
> Comments?
How difficult is it to convert the QRegExp expressions to PCRE?
I can't figure out an example with indexIn, but f.i. with replace:
QString str("aaa");
str.replace(QRegExp("a*(a)*"), "<\\1>")
// str is now "<a><>", instead of the expected "<><>"
This is because of the default pattern syntax (RegExp) which exhibits
this totally odd behaviour (why did it capture only ONE a? It's even
documented that it should have captured all three...). The RegExp2 and
QRegularExpression do "the right thing". By the way, this is also why
the docs say that QRegExp will switch to RegExp2 as default syntax in
Qt 5 (something not already done, actually).
This is more or less my #1 concern of the conversion, because it's
about the engine itself and not the patterns.
From a pure syntax point of view:
- RegExp and RegExp2: they seem to follow the general rule that an
escaped character that doesn't have any special meaning (for QRegExp)
stands for the character itself. That means that f.i. the pattern /\z/
matches (in QRegExp) a literal "z". But that, inside a
QRegularExpression, matches the very end of the string. So the pattern
needs to be analyzed and these escapes fixed.
- Wildcard and WildcardUnix: QRegExp already converts them internally
in Perl-like regular expressions (cf. qt_regexp_toCanonical and wc2rx
in qregexp.cpp), so it should just be a matter of doing the same.
- FixedString: simply needs to be escaped through QRegularExpression::escape.
- W3CXmlSchema11: I don't know yet. From a quick glance over the spec:
* new sequences like \c, \i, \p and their uppercase variants are added;
* it supports subtraction inside character classes, f.i.
[a-z-[aeiou]]. It can be nested.
- Case sensitive: set the right option
- Minimal matching: should be doable by setting the inverted greediness option
> The whole reason why we kept QRegExp in the first place was so that we wouldn't
> introduce those subtle issues (and for qmake). So, aside from qmake, if we
> managed to convert the QRegExp expressions to PCRE, we could also get rid of
> the QRegExp engine altogether.
I'll try to write down a converter and see the results.
>
> In which case we could even keep QRegExp as a deprecated class for these
> methods:
>
>> * decide what to do with the overloads taking a non-const-ref QRegExp
>> (used to extract captures, matched length, etc.)
What about this?
>> what it *exactly* does... who's the QRegExp maintainer?)
>
> I guess that's now you.
Uh oh :-)
>> Comments?
>
> How difficult is it to convert the QRegExp expressions to PCRE?
I tried to elaborate about this before.
The PCRE import landed in master 🙌 :)
I just rebased and repushed all of my QRE commits on api_changes, just
to stay a little bit on the safe side since they're going to touch
QString; you can find them from # 18659 to 18672 (sadly, gerrit
created new changes instead of just "moving" the old ones).
All but 18672 are the old ones just rebased, that instead is new and
adds the (missing) error strings handling.
NB: Since PCRE landed in *master*, there's the need of a merge before
staging these ones, but I didn't do it just yet (I don't even think I
can push merges).
Cheers,
--
Giuseppe D'Angelo
Lars has promised to do that this weekend.
Actually, it would be so much easier for everyone involved if you just
submitted the ones in master, except for the QString methods.
I'm going to do that now.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
>On sábado, 3 de março de 2012 07.13.54, Thiago Macieira wrote:
>> On sábado, 3 de março de 2012 05.17.37, Giuseppe D'Angelo wrote:
>> > NB: Since PCRE landed in *master*, there's the need of a merge before
>> > staging these ones, but I didn't do it just yet (I don't even think I
>> > can push merges).
>>
>> Lars has promised to do that this weekend.
>
>Actually, it would be so much easier for everyone involved if you just
>submitted the ones in master, except for the QString methods.
>
>I'm going to do that now.
Yes, it's fine to submit the new classes to master, as they can't possibly
break existing code.
Cheers,
Lars
Ok, I'll rebase/push on master and leave QString changes out
(targeting api_changes instead).
Is it fine with you if I do it right now?
--
Giuseppe D'Angelo
All done :)
Now I have to re-review everything because you rebased.
Please stop rebasing. Let the first 4 changes be reviewed and try to go in.
Only if the CI system complains should you make changes now.
Lars: it looks good on a cursory glance, pretty much what had been discussed
for months. I'm pretty tired now having just arrived in SF.
Yes, I already stopped rebasing -- I had do it once just to fix merge conflicts.
I'm sorry about the mess, I didn't think that changing branch caused
new changes to be created on gerrit.
2012/3/4 Thiago Macieira <thiago....@intel.com>:
> Please stop rebasing. Let the first 4 changes be reviewed and try to go in.
The QRegularExpression classes have finally landed in master :-)
Some updates on this. After yesterday's discussion it was decided to
keep QRegExp in QtCore as a deprecated class. At the same time all
QRegExp-related classes or overloads get deprecated as well (although
probably not disabled with QT_DEPRECATED_SINCE because very few of
them are actually inline).
The main reason for keeping QRegExp is the possible silent breakage
due to attempting to "translate" regexps or any other behavioural
changes; especially if someone tried to outsmart QRegExp. So unless
something changes again QRegExp is going away in Qt 6.
>> There's another related issue to this bug: What do we do with the old
>> QRegExp? I've gripped through our code and removing it is a larger
>> surgical operation.
>
> Indeed. I really have no clue how hard this would be.
>
> The uses of QRegExp in qtbase and my proposals would be:
>
> - qmake: lots of use, keep it by copying qregexp.cpp into qmake
Not needed anymore since qregexp stays there
> - rcc: one single use, very easy to rewrite without regexes
Not needed anymore since qregexp stays there
> - uic: #includes are unnecessary; remove immediately
Done
> - QtCore API:
> * QString non-const QRegExp&: use templates to support QRegExp in inline
> code (find a way), or simply stop supporting this
Not needed anymore since qregexp stays there
> * QMetaType & QVariant: let RegExp point to the new class
Done (+2 but not merged yet). I added new methods instead:
http://codereview.qt-project.org/19514
> * everywhere else (including QStringList): replace with the new class
Replace or just add overloads? For now I've just started to add overloads:
- QString (+2, not merged yet) http://codereview.qt-project.org/18666
- QObject: http://codereview.qt-project.org/19723
- QStringList: http://codereview.qt-project.org/19765
- QSortFilterProxyModel: it has *four* QRegExp-like setters (!), must
find a way to keep the API sane.
> - QtCore internal uses: use the new class or rewrite (if possible, I
> recommend a non-regex globbing in QDir and QDirIterator).
This is (apparently) opening a Pandora's box, since both
QRegExp::Wildcard and WildcardUnix have very strange behaviours w.r.t.
to escaping, and I'm not too willing to touch them being so critically
tied to pretty much everything.
But this could lead to a discussion about the behaviour of an eventual
QRegularExpression::fromWildcard.
> If the class is
> used in bootstrap, then the function in question must be opted out or the
> code rewritten, like qdatetime.cpp and this comment in qxmlutils.cpp:
> /* Right, we here have a dependency on QRegExp. Writing a manual parser to
> * replace that regexp is probably a 70 lines so I prioritize this to when
> * the dependency is considered alarming, or when the rest of the bugs
> * are fixed. */
> - QtXml: used in bootstrap, so must rewrite without regex.
Keep QtXml as-is?
> Or refactor rcc to
> use QXmlStreamReader instead of QDom, then remove
Done
> - QtGui API:
> * QTextDocument: replace with the new class
Will probably need help with this since I don't know anything about it.
> * QRegExpValidator: move alongside QRegExp, but add a new class to QtGui
WIP (QRegularExpressionValidator).
> - QtNetwork API:
> * QSslSocket (globbing functions for certificates): replace with new enums
WIP; this will be a SIC
> - QtScript API: replace with new class (same goes for QJSEngine)
> Maybe since QtScript is done, we leave it as is. But QJSEngine should use
> the new class.
It's unlikely that I'll touch any of these classes, but they can
switch anytime now...
> We also modify QRegExp to add:
>
> QRegExp(const QRegularExpression &); // implicit
> operator QRegularExpression() const;
At this point I might just work to a toRegularExpression() method for
the QRegExp->QRegularExpression conversion -- I just need to know what
I'm supposed to do with the XSD regexp syntax, i.e. if there's any
further interest in keeping it supported or not (since apparently it
has never worked properly, see my other mail).
Opinions? Comments?
Cheers,
--
Giuseppe D'Angelo
'Cause I can't find any discussion about it (or disclosure of the results of off-list discussions) on the mailing list, since before the feature freeze (when Thiago Macieira suggested to postpone further discussion unit after the freeze).
Regards,
Davet Jacques
No.
> 'Cause I can't find any discussion about it (or disclosure of the results of off-list discussions) on the mailing list, since before the feature freeze (when Thiago Macieira suggested to postpone further discussion unit after the freeze).
After the proper, full API review. So if you want to start it... any
feedback is welcome :-)
--
Giuseppe D'Angelo
I like the name.
And I especially DO NOT like QRegex because it's too close to the existing
class and might lead to confusion.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
Yes, I second that.
(That doesn't mean I will oppose a renaming there's an agreement).
Cheers,
--
Giuseppe D'Angelo
>2012/3/31 Thiago Macieira <thiago....@intel.com>:
>> On sábado, 31 de março de 2012 17.00.17, Davet Jacques wrote:
>>> Out of interest: Has there been a final decision to keep the name
>>> QRegularExpression (rather than QRegex)?
>>
>> I like the name.
>>
>> And I especially DO NOT like QRegex because it's too close to the
>>existing
>> class and might lead to confusion.
>
>Yes, I second that.
>
>(That doesn't mean I will oppose a renaming there's an agreement).
We'll keep QRegularExpression. I can't possibly see a better name.
Cheers,
Lars