Jim Quinn has uploaded this change for review.
reduce string copies in prefilter
When prefilter operates on longer regexes with many alterantes, the
merging of a and b into ab results in unnecessary string shuffling. By
moving the larger of the two directly into ab, then inserting the
smaller contents, we can avoid that work.
(If C++17 were enabled for RE2, we could use std::set<>::merge to make this
cheaper, but that's not possible in C++11.)
Change-Id: I02dc4e294fe7e7252d0eb9fd045aa1bf97d0cdc8
---
M re2/prefilter.cc
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/re2/prefilter.cc b/re2/prefilter.cc
index a8af821..f5ed631 100644
--- a/re2/prefilter.cc
+++ b/re2/prefilter.cc
@@ -8,6 +8,7 @@
#include <stdint.h>
#include <string>
#include <vector>
+#include <utility>
#include "util/util.h"
#include "util/logging.h"
@@ -286,13 +287,6 @@
return "";
}
-// Add the strings from src to dst.
-static void CopyIn(const std::set<std::string>& src,
- std::set<std::string>* dst) {
- for (ConstSSIter i = src.begin(); i != src.end(); ++i)
- dst->insert(*i);
-}
-
// Add the cross-product of a and b to dst.
// (For each string i in a and j in b, add i+j.)
static void CrossProduct(const std::set<std::string>& a,
@@ -343,8 +337,13 @@
Info *ab = new Info();
if (a->is_exact_ && b->is_exact_) {
- CopyIn(a->exact_, &ab->exact_);
- CopyIn(b->exact_, &ab->exact_);
+ // Avoid string copies by moving the larger exact_.set into
+ // ab directly, then merge in the smaller set.
+ if (a->exact_.size() < b->exact_.size())
+ std::swap(a, b);
+
+ ab->exact_ = std::move(a->exact_);
+ ab->exact_.insert(b->exact_.begin(), b->exact_.end());
ab->is_exact_ = true;
} else {
// Either a or b has is_exact_ = false. If the other
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Jim Quinn uploaded patch set #2 to this change.
reduce string copies in prefilter
When prefilter operates on longer regexes with many alterantes, the
merging of a and b into ab results in unnecessary string shuffling. By
moving the larger of the two directly into ab, then inserting the
smaller contents, we can avoid that work.
(If C++17 were enabled for RE2, we could use std::set<>::merge to make this cheaper, but that's not possible in C++11.)
Change-Id: I02dc4e294fe7e7252d0eb9fd045aa1bf97d0cdc8
---
M re2/prefilter.cc
1 file changed, 24 insertions(+), 9 deletions(-)
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Jim Quinn uploaded patch set #3 to this change.
reduce string copies in prefilter
When prefilter operates on longer regexes with many alternates, the
merging of a and b into ab results in unnecessary string shuffling. By
moving the larger of the two directly into ab, then inserting the
smaller contents, we can avoid that work.
(If C++17 were enabled for RE2, we could use std::set<>::merge to make this cheaper, but that's not possible in C++11.)
Change-Id: I02dc4e294fe7e7252d0eb9fd045aa1bf97d0cdc8
---
M re2/prefilter.cc
1 file changed, 24 insertions(+), 9 deletions(-)
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jim Quinn.
1 comment:
File re2/prefilter.cc:
Patch Set #1, Line 345: ab->exact_
There is a subtle difference in logic between the previous code and this version.
In the earlier version, content of a->exact_ and b->exact_ was copied (set-unioned) into ab->exact_.
In this version, ab->exact_ = a->exact_ UNION b->exact_.
The difference being that any pre-existing values in ab->exact_ (that may be defined by the constructor of Info()) will get lost in the second version.
Now looking at the Info class, it doesn't seem possible right now, but we don't know how this file is touched or evolves in the future.
I was thinking of the following two options:
1. simply do ab->exact_.insert(a) and ab->exact_.insert(b), keeping the current behavior intact.
2. overload the constructor of Info to take two sets, whose union it stores in exact_. Then, we simply call Info *ab = Info(a->exact_, b->exact_).
Doing so will ensure anyone in future changing the constructor to set any pre-existing value in exact_ does not need to worry about the code here.
2 can follow 1 in a separate CL as well. 😊
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
File re2/prefilter.cc:
Patch Set #1, Line 345: ab->exact_
There is a subtle difference in logic between the previous code and this version. […]
Spoke too soon.
Reading more about it, 1 doesn't help at all, since the contents of a set are const.
Consider doing 2 to provide a safer interface (unless re2 maintainers are okay with this design as well).
C++17 merge would have been super useful indeed. Can't we do something like:
if C++17 do X else do Y? :P
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Akshay Jaggi.
1 comment:
File re2/prefilter.cc:
Patch Set #1, Line 345: ab->exact_
There is a subtle difference in logic between the previous code and this version. […]
I'm not the steward obviously, but I'm going to advise against 2.
`Alt` is a static factory "constructor" for this class, and documented as such on
line 209. There are 13 other functions that do similar work, and make similar assumptions about the state of a new Info().
If we had a magic wand, I agree that traditional constructors may be useful, however, we get nearly all the performance improvements with these few lines.
I think what you're recommending would be a cleaner C++ implementation. However, I'd expect the entirety of this class's factory functions to be moved over, to keep it consistent with other functions like `Quest`. I'm not seeing the ROI on that.
---
C++17 would be useful. If I were the owner, I probably wouldn't want this to be the first place that someone introduces a `#ifdef cpp17`, but if that mechanism already exists elsewhere, I'd be happy to update it.
In practice? It's probably not a huge win in our production workload. In tests on my end, I was seeing calls to `Alt` like:
```
|a.exact| = 1, |b.exact| = 0. Copy 1 strings into ab. Copy 0 string into ab.
Destroy |a|, destroy |b| (2 string destructors called)
... <15k iterations later>
|a.exact| = 15k. |b.exact| = 1. Copy 15k strings into ab. Copy 1 string into ab.
Destroy |a|, destroy |b|, (15001 string desctructors called)
```
With this version:
```
|a.exact| = 15k, |b.exact| = 1. Move 15k in O(1), copy 1 string, destroy 1 string.
```
We're *still* destroying 15k strings when we could just be potentially saving them. That being said, the move saves the lion share of the work. If cpp17 is a no-go, I'm not hung up on it.
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jim Quinn.
1 comment:
File re2/prefilter.cc:
Patch Set #1, Line 345: ab->exact_
I'm not the steward obviously, but I'm going to advise against 2. […]
Ah! That makes sense.
Thanks for the detailed description.
I saw a few compiler based #ifdefs in re2, but non specifically for C++17.
Keeping this as resolved.
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jim Quinn.
Patch set 5:Code-Review +2
Paul Wankadia submitted this change.
reduce string copies in prefilter
When prefilter operates on longer regexes with many alternates, the
merging of a and b into ab results in unnecessary string shuffling. By
moving the larger of the two directly into ab, then inserting the
smaller contents, we can avoid that work.
(If C++17 were enabled for RE2, we could use std::set<>::merge to make this cheaper, but that's not possible in C++11.)
Change-Id: I02dc4e294fe7e7252d0eb9fd045aa1bf97d0cdc8
Reviewed-on: https://code-review.googlesource.com/c/re2/+/60490
Reviewed-by: Paul Wankadia <jun...@google.com>
---
M re2/prefilter.cc
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/re2/prefilter.cc b/re2/prefilter.cc
index a8af821..979c404 100644
--- a/re2/prefilter.cc
+++ b/re2/prefilter.cc
@@ -7,6 +7,7 @@
#include <stddef.h>
#include <stdint.h>
#include <string>
+#include <utility>
#include <vector>
#include "util/util.h"
@@ -286,13 +287,6 @@
return "";
}
-// Add the strings from src to dst.
-static void CopyIn(const std::set<std::string>& src,
- std::set<std::string>* dst) {
- for (ConstSSIter i = src.begin(); i != src.end(); ++i)
- dst->insert(*i);
-}
-
// Add the cross-product of a and b to dst.
// (For each string i in a and j in b, add i+j.)
static void CrossProduct(const std::set<std::string>& a,
@@ -343,8 +337,14 @@
Info *ab = new Info();
if (a->is_exact_ && b->is_exact_) {
- CopyIn(a->exact_, &ab->exact_);
- CopyIn(b->exact_, &ab->exact_);
+ // Avoid string copies by moving the larger exact_ set into
+ // ab directly, then merge in the smaller set.
+ if (a->exact_.size() < b->exact_.size()) {
+ using std::swap;
+ swap(a, b);
+ }
+ ab->exact_ = std::move(a->exact_);
+ ab->exact_.insert(b->exact_.begin(), b->exact_.end());
ab->is_exact_ = true;
} else {
// Either a or b has is_exact_ = false. If the other
To view, visit change 60490. To unsubscribe, or for help writing mail filters, visit settings.