Using vector based sets/maps to improve performance of HQP

643 views
Skip to first unread message

Денис

unread,
Aug 23, 2016, 6:55:31 PM8/23/16
to Chromium-dev
Hi.

Carefully using sorted vector based sets/maps (Standardisation paper http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0038r0.html) I've managed to get more then 10 times speed up of this particular function, which lead to 3-5 times speed up of whole suggest.

Have you tried it? Would you be interested in such optimisation?

In order to do it, we need:
  1.  sorted vector based set/map. I used my own implementation, there are ones in boost::container, eastl, locky, folly. Mine has the advantage of being third party independent, but it's not as tested as boost or eastl. 
  2.  suggest benchmarks on a big profile. I can modify the benchmark i have, but I need a profile that is ok to commit in chromium repo.

P.S. We can get some of the performance benefits by manually using vectors, but
  • It's quite hard
  • Code requires much more context, because reader have to understand that particular vector is sorted and doesn't contain duplicates.
  • Typedefs replacements + profiling to optimise insertions gave me 3-7 better speed up, than manual vector replacements.

Peter Kasting

unread,
Aug 23, 2016, 7:04:53 PM8/23/16
to dyar...@yandex-team.ru, Chromium-dev, Mark Pearson
On Tue, Aug 23, 2016 at 3:55 PM, Денис <dyar...@yandex-team.ru> wrote:
Carefully using sorted vector based sets/maps (Standardisation paper http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0038r0.html) I've managed to get more then 10 times speed up of this particular function, which lead to 3-5 times speed up of whole suggest.

Have you tried it? Would you be interested in such optimisation?

I haven't used vector-based sets, but something which provides this kind of perf boost would certainly be interesting.

In order to do it, we need:
  1.  sorted vector based set/map. I used my own implementation, there are ones in boost::container, eastl, locky, folly. Mine has the advantage of being third party independent, but it's not as tested as boost or eastl.
The Boost flat containers are on the list of approved libraries in the Google style guide, so we would probably be willing to take them in Chrome, if there were a compelling reason.
  1.  suggest benchmarks on a big profile. I can modify the benchmark i have, but I need a profile that is ok to commit in chromium repo.
I don't know how we normally go about adding perf bechmarks for nontrivial profiles as part of the checked-in tree.  Let's worry first about the implementation of this and second about getting checked-in benchmarks.

I would suggest uploading your current CL to somewhere so it's visible as a reference, then sending email to c...@chromium.org asking for their thoughts on adding your containers, pulling in Boost's flat set, or neither.  I'd reference your CL if they wish to look and point out that the Boost container is on the whitelist at http://google.github.io/styleguide/cppguide.html#Boost .

Once you have a direction from them we can figure out next steps.

PK

Brett Wilson

unread,
Aug 23, 2016, 7:15:58 PM8/23/16
to dyar...@yandex-team.ru, Chromium-dev
Thanks for looking into this kind of thing.

Can you quantify more the conditions under which you see a speedup? In other words, do we expect this is for vectors below a certain size, or for certain workloads? Also, I'm curious for more information about the times involved. If this makes something that is already fast faster, then it's less interesting. To me, a lot of whether or not this is a good idea or not is whether we can write clear advice for when to use it and when not to use it that any programmer can understand.

I often write code where I know searching a vector would be faster than using a set or a map, but I've concluded it's almost always better to just use the standard thing that has the semantic meaning you want and not worry about it. There is also a danger that adding many custom contains makes the codebase overall more difficult to understand. As an example, I added base/containers/small_map.h which does brute-force searching for small containers, but because of code bloat (probably doesn't apply to your proposal) and the fact that the operations one would expect for small maps are small to begin with, performance doesn't really end up mattering. In retrospect, I'm not sure me adding this concept was a good idea.

Brett

On Tue, Aug 23, 2016 at 3:55 PM, Денис <dyar...@yandex-team.ru> wrote:

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Scott Hess

unread,
Aug 23, 2016, 7:37:42 PM8/23/16
to dyar...@yandex-team.ru, Chromium-dev
If you have a reproducible case involving large profiles, I do think
it would be worthwhile to log a bug with repro instructions, perhaps
in the form of a unit test which generates synthetic input data.

-scott


On Tue, Aug 23, 2016 at 3:55 PM, Денис <dyar...@yandex-team.ru> wrote:

Денис

unread,
Aug 23, 2016, 7:43:50 PM8/23/16
to Chromium-dev, dyar...@yandex-team.ru


Thanks for looking into this kind of thing.

Can you quantify more the conditions under which you see a speedup? In other words, do we expect this is for vectors below a certain size, or for certain workloads? Also, I'm curious for more information about the times involved. If this makes something that is already fast faster, then it's less interesting. To me, a lot of whether or not this is a good idea or not is whether we can write clear advice for when to use it and when not to use it that any programmer can understand.

HistoryItemsForTerms becomes very visible for popular inputs. I've tested on urls for build system, so there are lots of them and they are similar.
I would say you need about a few thousands of entries for this case. 

I often write code where I know searching a vector would be faster than using a set or a map, but I've concluded it's almost always better to just use the standard thing that has the semantic meaning you want and not worry about it. There is also a danger that adding many custom contains makes the codebase overall more difficult to understand. As an example, I added base/containers/small_map.h which does brute-force searching for small containers, but because of code bloat (probably doesn't apply to your proposal) and the fact that the operations one would expect for small maps are small to begin with, performance doesn't really end up mattering. In retrospect, I'm not sure me adding this concept was a good idea.

 I worked on chromium fork so numbers might not be the same for chromium, but:
 
Typing often request in omnibox:
                                                       before                          after
CharTypedToRepaintLatency -  35.5 +- 29ms             16.8 +- 11ms
ProviderTime2.HistoryQuick   -  22.8 +- 28ms              6 +- 7ms
HistoryIDSetFromWords         -  18.9 +- 21ms              1.4 +- 2ms

Deleting long url that has many similar entries.
                                                   before                        after
CharDeletedToRepaintLatency   -  51.9 +- 12.7ms            12.6 +- 9.3ms (chromium doesn't have this histogram, but it's similar to CharTypedToRepaintLatency)
ProviderTime2.HistoryQuick   -  45.6 +- 51ms             7.8 +- 9.5ms
HistoryIDSetFromWords        -   38.4 +- 40ms             2.3 +- 2.9ms

I've tested on a real but big working profile for popular requests.

Brett Wilson

unread,
Aug 23, 2016, 7:51:43 PM8/23/16
to dyar...@yandex-team.ru, Chromium-dev
There are interesting number, it seems like we should make some improvement here.

In order to add a generic container we should be able to say in a general case what it would be used for and how much it would help. With what you've posted, I can't tell that case apart from the URL index code doing silly things and needing to be fixed.

Brett

--

Денис

unread,
Aug 23, 2016, 7:53:46 PM8/23/16
to Chromium-dev, dyar...@yandex-team.ru
If you have a reproducible case involving large profiles, I do think
it would be worthwhile to log a bug with repro instructions, perhaps
in the form of a unit test which generates synthetic input data.

-scott

I'm not sure, that I understood you entirely but it's quite hard  to create a realistic synthetic user data. I have already seen troubles
with hot spots that are not real hot spots and, which is even more dangerous, some troubling places are not seen on synthetic data.
For this particular case, here https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_private_data.cc?type=cs&q=url_index_pr&sq=package:chromium&l=651
it's absolutely crucial to insert in a vector first, and then do sort and unique to create a flat_set which wasn't seen on reasonably good synthetic data.

Денис

unread,
Aug 23, 2016, 8:08:27 PM8/23/16
to Chromium-dev, dyar...@yandex-team.ru

In order to add a generic container we should be able to say in a general case what it would be used for and how much it would help.
According to some research, flat containers are generally preferable to node based like std::set/std::map. For example Chandler Carruth touches the subject: https://youtu.be/fHNmRkzxHWs There is an exception - a case when one does lots of insertions and deletions in a mix with searches, in which case careful measurements are required.

 
With what you've posted, I can't tell that case apart from the URL index code doing silly things and needing to be fixed.
 
It does lots of set operations (union, difference, intesections and so on). The elements in sets are longs. If one uses vectors of longs instead of node based sets, speed increases dramatically.

Jeffrey Yasskin

unread,
Aug 23, 2016, 11:12:37 PM8/23/16
to dyar...@yandex-team.ru, Chromium-dev
On Tue, Aug 23, 2016 at 5:08 PM, Денис <dyar...@yandex-team.ru> wrote:

In order to add a generic container we should be able to say in a general case what it would be used for and how much it would help.
According to some research, flat containers are generally preferable to node based like std::set/std::map. For example Chandler Carruth touches the subject: https://youtu.be/fHNmRkzxHWs There is an exception - a case when one does lots of insertions and deletions in a mix with searches, in which case careful measurements are required.

IIUC, Blink has a non-node-based hashed container which might be even better than a flat_map: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/HashSet.h. Have you tried that sort of container too, or just sorted vectors?

I could imagine sorted vectors being better if all the operations are pure unions, intersections, and differences. I'm having trouble navigating HistoryItemsForTerms well enough to tell whether that's the case. Where are the sets there?

Thanks,
Jeffrey

Денис

unread,
Aug 24, 2016, 2:04:57 AM8/24/16
to Chromium-dev, dyar...@yandex-team.ru

IIUC, Blink has a non-node-based hashed container which might be even better than a flat_map: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/wtf/HashSet.h. Have you tried that sort of container too, or just sorted vectors?

No, I haven't - it's quite hard to efficiently rewrite set operations for hash sets. Most of the sets are quite small, I don't know if hashing will help. Never the less, it's interesting to try.
 
I could imagine sorted vectors being better if all the operations are pure unions, intersections, and differences. I'm having trouble navigating HistoryItemsForTerms well enough to tell whether that's the case. Where are the sets there?


Actually vectors are better in copying as well as iterating and not just in set operations. Smth like remove_if is also faster than erasing from map by condition.

For example: replacing caching with saving vectors here: https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_private_data.h?sq=package:chromium&l=183
gives 3-4 ms

Some of the set operations
https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_private_data.cc?q=url_index_pr&sq=package:chromium&l=544






Денис

unread,
Sep 2, 2016, 11:18:50 AM9/2/16
to Chromium-dev
First step: created a cl with performance test: https://codereview.chromium.org/2300323003/
It's a good benchmark to have even if flat_sets/maps would not be accepted.

Andrew Grieve

unread,
Sep 3, 2016, 8:49:30 PM9/3/16
to dyar...@yandex-team.ru, Chromium-dev
As a small aside - Thanks for pointing out that CppCon video! Quite enlightening.

Денис

unread,
Sep 12, 2016, 8:46:27 AM9/12/16
to Chromium-dev
It seems like accepting performance test in chromium is gonna take some time. But since I've already got it, I ran some measurements.

Legend:
I've measured 3 cases: current situation, using std::set/std::map; replacing typedefs to flat version (requires careful handling of iterators); optimising insertions here https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_private_data.cc?q=url_index_pri&sq=package:chromium&l=651 The last optimization doesn't give that much of a boost on current test but I've seen data where it's crucial. If chromium is going to accept these changes, I'm planning to use such an optimization wherever I can, so we don't get sudden regression. History for each test is same 10'000 generated urls - it's a lot for one url but can happen in real data. For each code modifications I did 6 texts - 3 inputing url, 3 - backspacing it.

Summary: pure typedefs replacement can give 4 - 16 times improvement, with small tweaking - 6 - 30 times. I think - flat_sets/maps  are very useful.

Measurements:
https://docs.google.com/spreadsheets/d/1xDK_74OH37q1HUoqYf5F9zh0uchwCOQuVreS9yqfaZU/edit?usp=sharing

пятница, 2 сентября 2016 г., 18:18:50 UTC+3 пользователь Денис написал:

Peter Kasting

unread,
Sep 13, 2016, 12:47:12 AM9/13/16
to dyar...@yandex-team.ru, Chromium-dev
On Mon, Sep 12, 2016 at 5:46 AM, Денис <dyar...@yandex-team.ru> wrote:
Legend:
I've measured 3 cases: current situation, using std::set/std::map; replacing typedefs to flat version (requires careful handling of iterators); optimising insertions here https://cs.chromium.org/chromium/src/components/omnibox/browser/url_index_private_data.cc?q=url_index_pri&sq=package:chromium&l=651 The last optimization doesn't give that much of a boost on current test but I've seen data where it's crucial. If chromium is going to accept these changes, I'm planning to use such an optimization wherever I can, so we don't get sudden regression. History for each test is same 10'000 generated urls - it's a lot for one url but can happen in real data. For each code modifications I did 6 texts - 3 inputing url, 3 - backspacing it.

Summary: pure typedefs replacement can give 4 - 16 times improvement, with small tweaking - 6 - 30 times. I think - flat_sets/maps  are very useful.

Is the "necessary optimizations" part dependent on flat maps/sets?  Or is it fixing dumb code today, that we can/should just fix anyway?

In your original email, you mentioned "manual vector replacements" and some perf comparisons.  Do you still have those changes to comparatively benchmark with this same framework?

PK

Денис

unread,
Sep 13, 2016, 8:32:20 AM9/13/16
to Chromium-dev, dyar...@yandex-team.ru
Ok, so for the sake of the sake of discussion I've created CLs and updated the measurements:

updated measurements: https://docs.google.com/spreadsheets/d/1xDK_74OH37q1HUoqYf5F9zh0uchwCOQuVreS9yqfaZU/edit?usp=sharing

manually optimising with vectors:

using flat_sets/maps:

Measurements summary:
My version of doing things manually is 2 times slower than with flat maps and way harder to support. It's slower is because I've decided not to optimise anything that is a) hard b) gives less than a 1 ms improvement. Some of the things can be reverted without harm to the performance but I felt like my way is more right.



вторник, 13 сентября 2016 г., 7:47:12 UTC+3 пользователь Peter Kasting написал:

Scott Hess

unread,
Sep 21, 2016, 5:02:18 PM9/21/16
to dyar...@yandex-team.ru, Chromium-dev
I've been looking at the url_index_private_data code (the interior of
HQP), and I'm wondering if this kind of flat-container solution is
worthwhile in the long term. Right now, the code has a lot of
maps-of-sets, and it looks like some of the inner loop does things
like "Get a bunch of sets, then intersect them" to implement what
seems to be a prefix match. It seems kind of antagonistic to the
cache. I think it's plausible that your problem profiles are
happening when the data gets too big and starts thrashing the cache,
and your flat-container solution pulls it back into cache-sized. I'm
still trying to quantify the problem, but I'm wondering if it would be
worthwhile to instead build something a little tighter in the first
place.

ALL OF THAT SAID, note these typedefs:
typedef size_t WordID;
typedef std::set<WordID> WordIDSet;
typedef std::map<base::char16, WordIDSet> CharWordIDMap;
typedef int64_t URLID;
typedef history::URLID HistoryID;
typedef std::set<HistoryID> HistoryIDSet;
typedef std::vector<HistoryID> HistoryIDVector;
typedef std::map<WordID, HistoryIDSet> WordIDHistoryMap;
typedef std::map<HistoryID, WordIDSet> HistoryIDWordMap;
At cache load time, the 99th percentile for word count is <10k, and if
you have more than 2B history entries none of this stuff is going to
work well. Unfortunately, due to padding it's unlikely that simply
changing the typedefs to 32-bit values would work to reduce memory
usage (and URLID may need a mapping layer). Padding also makes it
unlikely that a straight-forward conversion to sorted vectors would
help, either. But it seems pretty likely that alternative
more-efficient structures for the wordid->historyid and
historyid->wordid would be pretty reasonable.

[I'm also thinking those maps could probably be unified into a single
structure requiring perhaps 1/2 or 2/3 of the space.]

-scott

Peter Kasting

unread,
Sep 21, 2016, 6:23:32 PM9/21/16
to Scott Hess, dyar...@yandex-team.ru, Chromium-dev
On Wed, Sep 21, 2016 at 2:01 PM, Scott Hess <sh...@chromium.org> wrote:
I've been looking at the url_index_private_data code (the interior of
HQP), and I'm wondering if this kind of flat-container solution is
worthwhile in the long term.  Right now, the code has a lot of
maps-of-sets, and it looks like some of the inner loop does things
like "Get a bunch of sets, then intersect them" to implement what
seems to be a prefix match.  It seems kind of antagonistic to the
cache.

My general reaction:

* I would love for someone to take a hard look at how to do this in the most-optimal way.  This is an important piece of UI performance for the most commonly-used control in Chrome, so it's an area where some dedicated effort has meaningful impact.

* I am reluctant to direct much stop-energy at solutions that make this noticeably better but may be less-than-perfect unless we have a firm plan + engineering commitment to do the better thing.  The proposal here is a significant win, so it's worth getting it in there unless someone is going to do something even cooler within a couple of months.

* Flat maps/sets are a good idea anyway.  In general, the majority of places in Chrome that use maps/sets today should probably just be defaulting to flat maps/sets.  I think it's worth having these containers available and encouraging people to use them.  So I don't see this as being about pulling in an entirely new, weird data structure that no one else should care about, just for a place that might be able to do something even better in theory.

So while I very much encourage you to keep quantifying the problem, and if you want to jump in and work on the engineering of how to get this to the best place, I would personally choose to proceed with trying to add the relevant data structures to the tree, make the existing HQP code use them, and possibly update our style guides to suggest the use of these types.  (I think there was a relevant internal G+ post about this last recently, but G+ search sucks.)

PK

Yuri Wiitala

unread,
Sep 22, 2016, 12:48:34 AM9/22/16
to Peter Kasting, Scott Hess, dyar...@yandex-team.ru, Chromium-dev
On Wed, Sep 21, 2016 at 3:22 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Wed, Sep 21, 2016 at 2:01 PM, Scott Hess <sh...@chromium.org> wrote:
* Flat maps/sets are a good idea anyway.  In general, the majority of places in Chrome that use maps/sets today should probably just be defaulting to flat maps/sets.

So while I very much encourage you to keep quantifying the problem, and if you want to jump in and work on the engineering of how to get this to the best place, I would personally choose to proceed with trying to add the relevant data structures to the tree, make the existing HQP code use them, and possibly update our style guides to suggest the use of these types.  (I think there was a relevant internal G+ post about this last recently, but G+ search sucks.)

I posted an e-mail to projec...@chromium.org last night along these lines. tl;dr: I'm working on a research project to evaluate whether we should replace some of our STL containers (especially the associative containers) with data structures that are much more CPU, power, and memory efficient. Existing research in this area suggests BTrees are an excellent candidate here.

Денис

unread,
Sep 22, 2016, 5:39:38 AM9/22/16
to Chromium-dev, dyar...@yandex-team.ru
I think that your work is absolutely necessary, regardless wether we use flat_* or std:: sets/maps it's a messy piece of code that hides performance in which live a lot of performance on the table because of extremely complex data structures. It would be great to discover better algorithms. 

However, I think that vector based containers are an useful tool that we should use. I'm positive that whatever solution you'd find it will require some mapping and it would be better done with flat maps than std ones. Anyways, there are other places in code that would greatly benefit from flat containers like MRU caches, for example.

To sum up: Finding better algorithms is an important and hard work. We should totally do it, especially in hot spots. However flat containers is an easily applied and reusable tool to improve performance that we should have.

четверг, 22 сентября 2016 г., 0:02:18 UTC+3 пользователь Scott Hess написал:

Денис

unread,
Sep 22, 2016, 5:46:57 AM9/22/16
to Chromium-dev, pkas...@chromium.org, sh...@chromium.org, dyar...@yandex-team.ru
Do you have a particular implementation of btree based maps/sets that I could reasonably easy try? I would like to compare measurements.

Anyways, I suspect that btrees are better for big maps/sets, while flat are better for small ones.

In the particular case of HPQ:

- The items are longs (sorting and copying is super fast)
- Most of insertions go all at once (we can insert everything and then sort everything)
- Sets/Maps are not extremely big (at some point I've seen smth like few thousand elements).

To sum up:
I think that both b-trees and vectors are valid variations of a fast set/map that can be faster in different cases. I think we should measure for important cases.

четверг, 22 сентября 2016 г., 7:48:34 UTC+3 пользователь Yuri Wiitala написал:

dan...@chromium.org

unread,
Sep 22, 2016, 4:34:59 PM9/22/16
to dyar...@yandex-team.ru, Chromium-dev, Peter Kasting, Scott Hess
On Thu, Sep 22, 2016 at 2:46 AM, Денис <dyar...@yandex-team.ru> wrote:
Do you have a particular implementation of btree based maps/sets that I could reasonably easy try? I would like to compare measurements.

I have been pointed to https://code.google.com/archive/p/cpp-btree/ previously it may be of interest. I'd love to here of others as well.

Денис

unread,
Nov 4, 2016, 5:04:34 AM11/4/16
to Chromium-dev
Quick update:

Patch with perf test was merged. It's still not enabled in regular runs of perftests but this is a job for somebody who knows how to do that.
I'm currently working on a document that discusses adding flat sets/maps for chromium, my hope is that I'm gonna publish it by the end of next week.

среда, 24 августа 2016 г., 1:55:31 UTC+3 пользователь Денис написал:

Andrew Grieve

unread,
Nov 8, 2016, 10:23:36 AM11/8/16
to dyar...@yandex-team.ru, Chromium-dev
Thanks for the update! Looking forward to your doc :)

--

Денис

unread,
Nov 27, 2016, 10:32:31 AM11/27/16
to Chromium-dev
Hi. I'm still working on that doc and reasonably close to finishing it. Once again my time estimations were wrong.

From interesting: turns out that I didn't implement a few methods in the best possible way, if done right - special tricks with bulk insertions become unnecessary.


среда, 24 августа 2016 г., 1:55:31 UTC+3 пользователь Денис написал:
Hi.

Денис

unread,
Nov 29, 2016, 11:49:42 AM11/29/16
to Chromium-dev
Hi again.

I finally finished the proposal to add flat containers to Chromium to the stage where we can discuss it.
Lots of measurements for different libraries on windows and mac!

Questions, comments, tomatoes - all welcome.

Please take a look, I really hope we can come up with a decision reasonably soon.

Best,
Denis.


среда, 24 августа 2016 г., 1:55:31 UTC+3 пользователь Денис написал:
Hi.

Peter Kasting

unread,
Nov 29, 2016, 2:34:21 PM11/29/16
to dyar...@yandex-team.ru, Chromium-dev
On Tue, Nov 29, 2016 at 8:49 AM, Денис <dyar...@yandex-team.ru> wrote:
Hi again.

I finally finished the proposal to add flat containers to Chromium to the stage where we can discuss it.
Lots of measurements for different libraries on windows and mac!

Questions, comments, tomatoes - all welcome.

This is very interesting.  I am encouraged that the patch to add a prototype makes no changes to the HQP's code other than typedefs and one small iterator fix that we should just make anyway (commented on CL).  The earlier prototypes requiring some changes to access patterns to get optimal performance were more worrisome.

I found it noteworthy that a B-tree performed 50% worse in your tests than the flat container.  Yuri may have thoughts there.

Some things not necessarily clear in the doc:
* What's the confidence level/error bars for the measurements?
* How representative of typical user workloads is this particular benchmark?  Having 10K similar URLs as the benchmark does seems a bit artificial to me, but perhaps the effects are similar in more "typical" cases, just less pronounced?
* You commented that significant work would still be needed to bring the prototype container implementations up to production quality.  Broadly, what kind of work is still needed?

Even without answers to the first two questions, this seems compelling enough to me to justify moving forward.  The answer to the third question would help inform the route; the document clearly prefers Boost, but I'm leery about the large number of files to be pulled in, and would tend to prefer our own implementation if it can be done without extreme effort.

PK

Денис

unread,
Nov 29, 2016, 4:00:25 PM11/29/16
to Chromium-dev, dyar...@yandex-team.ru
This is very interesting.  I am encouraged that the patch to add a prototype makes no changes to the HQP's code other than typedefs and one small iterator fix that we should just make anyway (commented on CL).  The earlier prototypes requiring some changes to access patterns to get optimal performance were more worrisome.

I found it noteworthy that a B-tree performed 50% worse in your tests than the flat container.  Yuri may have thoughts there.

Well, most of work we do is on the vectors of integers. We can copy integers very fast. I would assume that better cache locality of vectors is more important than better insertion speed.

I'll think of a good way to integrate the answers in the doc, but quickly answer here. 

Some things not necessarily clear in the doc:
* What's the confidence level/error bars for the measurements?

Not precisely know how to answer. I've run perftests we've added several times on my mac and windows machine. Mac results are more or less stable, results are quite consistant. On windows results vary from run to run quite a bit and occasionally have 1-2 outliers, but tendencies are represented correctly. Just in case you are already worried - at no point for the worst outlier no tested implementation was slower that their standard analogs for the same input.
 
* How representative of typical user workloads is this particular benchmark?  Having 10K similar URLs as the benchmark does seems a bit artificial to me, but perhaps the effects are similar in more "typical" cases, just less pronounced?

When we chose the use case for testing, we discussed them as a realistic cases. They originate from data collected on a real but quite a big profile. If user does not have many similar urls to the one he or she types, sets/maps are not a problem (all manipulations take less than a millisecond). However - if the user is trying to delete a part of some often used url - it can be painfully slow due to the sets/maps. Consider this - I've measured on a very fast machines and one keyboard stroke took more than 50ms only in history!
If you want - you can time: HistoryItemsForTerms on some real data - this is the function we mostly optimise.
I might have some extra data but I'll have to check about NDA, but it supports the idea that this optimisation makes sense.
 
* You commented that significant work would still be needed to bring the prototype container implementations up to production quality.  Broadly, what kind of work is still needed?

First and the most important - tests. flat_set/flat_map have lot's of methods, each can do all kinds of different staff and have many corner cases.
On top of that something like a good implementation of void insert(first, last) has to have a few versions for different types of iterators (this type of staff can probably be added later, but boost has that and it works). Whatever I've tried so far was just oh so much copy and paste - this is no good, we would have to do smth like boost does for it's maps.

Second: design of a few things: like separate key/value storage, what do we want to pass as a template parameter : container vs allocator, special iterators etc, etc (this is something I would love to do, but this is still quite some time).

It would be nice to do benchmarks for different type of things, how well do they perform in different conditions. Maybe a small buffer optimisation?

Some of the things I've mentioned are not on the first order of business but good tests are most certainly are.

On top of that I'm sure that I won't do this at work, I will have to work on this mostly in my spare time, and it's never fast. I would say, optimistically, a few weeks it would take to commit tests for set (I think I would wanna do them in a separate patch and test std::set). Then simple version of flat_set is not that hard - algorithms on a vector are not very complicated . As far as I can tell flat map is a flat set + a parallel vector, but I would have to write an iterator. I can't tell right now how much of flat set's tests could be reused for flat maps (different interface, pair<Key&, Value&> all those things), but we would definitely need more, map specific. I think we would get multiset/multimap mostly for free (many versions utilise one base class with unique/non_unique methods).

After that we can think about more serious tweaking performance tweaking.

So - 4-5 complicated patches over next few months would be my optimistic estimate.

Or we can pull in some boost.

Alex Clarke

unread,
Nov 29, 2016, 4:32:53 PM11/29/16
to dyar...@yandex-team.ru, Chromium-dev
I didn't end up landing it but fairly recently I looked into flat sets and maps, and I think they are worth considering.  See https://codereview.chromium.org/2396533004/
I wrote a few benchmarks to try and explore the performance (note the graphs are log scale) and for small data stets they seemed to be pretty fast, but folks where concerned about the linear worst case performance for insert and erase.

I didn't try a comparison with boot's containers but that would be interesting to see since the boost flat_map looks like it's using a much more complicated data structure.



--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Денис

unread,
Nov 29, 2016, 5:44:10 PM11/29/16
to Chromium-dev, dyar...@yandex-team.ru
среда, 30 ноября 2016 г., 0:32:53 UTC+3 пользователь Alex Clarke написал:
I didn't end up landing it but fairly recently I looked into flat sets and maps, and I think they are worth considering.  See https://codereview.chromium.org/2396533004/
I wrote a few benchmarks to try and explore the performance (note the graphs are log scale) and for small data stets they seemed to be pretty fast, but folks where concerned about the linear worst case performance for insert and erase.

I will definetly take a closer look tommorow.

About linear worst case: it's not entirely correct to compare flat_set/flat_map to the std::set/std::map in terms of insertions erasures one to one.
It has an assumption to it that you constantly modify your map by one element. This is typically not true (especially in HQP).
You can do an erase(remove) idiom and it will do all your erasures at one linear time. You can do sort + insert, if you insert multiple elements at once.

Anyways, I don't think that it's fare to judge flat containers from insertion/erasure speed (at least on the big sizes) the same way as judging lists for their iteration time. This is not what they are made for. 
 

I didn't try a comparison with boot's containers but that would be interesting to see since the boost flat_map looks like it's using a much more complicated data structure.

It's not - it's just a sorted vector that has some staff over it. Actually your ring buffer solution us much more unusual, I would say)
Is there a good reason why you deviated from tipical interfaces so much?

 

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Денис

unread,
Nov 30, 2016, 1:41:56 AM11/30/16
to Chromium-dev, dyar...@yandex-team.ru
* You commented that significant work would still be needed to bring the prototype container implementations up to production quality.  Broadly, what kind of work is still needed?

First and the most important - tests. flat_set/flat_map have lot's of methods, each can do all kinds of different staff and have many corner cases.
On top of that something like a good implementation of void insert(first, last) has to have a few versions for different types of iterators (this type of staff can probably be added later, but boost has that and it works). Whatever I've tried so far was just oh so much copy and paste - this is no good, we would have to do smth like boost does for it's maps.

Second: design of a few things: like separate key/value storage, what do we want to pass as a template parameter : container vs allocator, special iterators etc, etc (this is something I would love to do, but this is still quite some time).

It would be nice to do benchmarks for different type of things, how well do they perform in different conditions. Maybe a small buffer optimisation?

Some of the things I've mentioned are not on the first order of business but good tests are most certainly are.

On top of that I'm sure that I won't do this at work, I will have to work on this mostly in my spare time, and it's never fast. I would say, optimistically, a few weeks it would take to commit tests for set (I think I would wanna do them in a separate patch and test std::set). Then simple version of flat_set is not that hard - algorithms on a vector are not very complicated . As far as I can tell flat map is a flat set + a parallel vector, but I would have to write an iterator. I can't tell right now how much of flat set's tests could be reused for flat maps (different interface, pair<Key&, Value&> all those things), but we would definitely need more, map specific. I think we would get multiset/multimap mostly for free (many versions utilise one base class with unique/non_unique methods).

After that we can think about more serious tweaking performance tweaking.

So - 4-5 complicated patches over next few months would be my optimistic estimate.

Or we can pull in some boost.

Maybe I was a bit pessimistic yesterday. If you think that rolling out a new implementation is a preferred thing to do, I am more than willing to do it - this is quite unusual and interesting task. The question is - how quickly I'll be able to push patches through the review process. And yeah, this is couple of thousands lines of code that have to be thorrowly reviewed - but it's doable.

Vladimir Levin

unread,
Nov 30, 2016, 3:08:16 PM11/30/16
to dyar...@yandex-team.ru, Chromium-dev
I did some profiling on the different ways we can search things, so I thought I'd just put it here just as FYI.

I used a "branch-free" algorithm as described in https://arxiv.org/ftp/arxiv/papers/1509/1509.05053.pdf :

int branch_free(int x, const std::vector<int>& v) {
  const int* base = v.data();
  int n = v.size();
  while (n > 1) {
    const int half = n / 2;
    base = (base[half] < x) ? &base[half] : base;
    n -= half;
  }
  return (*base < x) + base - v.data();
}

which on clang that we use in chromium does generate cmovl instructions instead of jumps:

...
  21: 39 39                 cmp    %edi,(%rcx)
  23: 48 0f 4c d1           cmovl  %rcx,%rdx
  27: 29 f0                 sub    %esi,%eax
...

(this isn't the case with my system clang++ 3.4 with -O2, which generates regular jmp; however, with -Os, it does generate cmovl)

The test was to construct a sorted vector of N ints and then do repeated (2 million) searches for random items (with about 50% miss rate). I ran this all on my z620.

bsearch here is the function above, bsearch+prefetch is the same function with 

      __builtin_prefetch(base + half/2, 0, 0); 
      __builtin_prefetch(base + half + half/2, 0, 0);

lines before the ternary operator. std::lower_bound is just that, a std::lower_bound call to find the element. Finally, std::set is using a constructed set and doing repeated ::count(x) calls.

vector size 1000
bsearch time             55.231ms
bsearch+prefetch time    62.709ms
std::lower_bound time   128.743ms
std::set time           175.023ms

vector size 10000
bsearch time             59.111ms
bsearch+prefetch time    69.513ms
std::lower_bound time   150.696ms
std::set time           304.605ms

vector size 100000
bsearch time             98.685ms
bsearch+prefetch time   110.035ms
std::lower_bound time   204.633ms
std::set time           656.352ms

vector size 1000000
bsearch time            190.722ms
bsearch+prefetch time   161.899ms
std::lower_bound time   277.903ms
std::set time          1787.064ms

vector size 10000000
bsearch time            872.206ms
bsearch+prefetch time   450.749ms
std::lower_bound time   582.058ms
std::set time          3189.397ms

vector size 100000000
bsearch time           1403.466ms
bsearch+prefetch time   887.095ms
std::lower_bound time  1031.391ms
std::set time           ...

vector size 1000000000
bsearch time           2219.824ms
bsearch+prefetch time  1263.672ms
std::lower_bound time  1823.186ms
std::set time           ...

It seems that for smaller data sizes (smaller here meaning probably the largest data sizes we might encounter in chrome), bsearch performs the best by a significant margin. As the data becomes bigger, prefetch starts to help. std::lower bound is worse for small data, but becomes better than bsearch for large data. Finally, std::set is just slower all around (... means i stopped the test before it could finish, so it would likely be >10s).

This doesn't actually address intersection calls, or any other operations other than search, but I imagine the results would be similar if the operations to do are the same.

With that, I think that we could certainly do better than associative containers, if the access pattern is mostly searches. This is because for "chrome large" data, the inserts and construction would probably be acceptable with a sorted vector.

So, I think we definitely should create a base:: container that implements these operations, since the wins are clear. In fact, it seems that most regular maps can be replaced by this, maybe. As a disclaimer, I'd like to reiterate that this is on z620; I haven't measured android devices' performance (yet?). I also haven't compared it to boost flat containers or blink flat containers. 

Vlad


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Денис

unread,
Nov 30, 2016, 3:28:11 PM11/30/16
to Chromium-dev, dyar...@yandex-team.ru
Awesome research! If we would have a decent tested container, we could try out different implementations for different sizes, iterator_categories etc.

Can I upload the code of your profiling somewhere?

среда, 30 ноября 2016 г., 23:08:16 UTC+3 пользователь Vladimir Levin написал:
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Vladimir Levin

unread,
Nov 30, 2016, 3:59:14 PM11/30/16
to dyar...@yandex-team.ru, Chromium-dev
On Wed, Nov 30, 2016 at 12:28 PM, Денис <dyar...@yandex-team.ru> wrote:
Awesome research! If we would have a decent tested container, we could try out different implementations for different sizes, iterator_categories etc.

Can I upload the code of your profiling somewhere?

The code is kind of one off with all kinds of one letter variables and commented code :) I'll clean it up and send it in a follow-up email. 
 
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Денис

unread,
Nov 30, 2016, 4:05:15 PM11/30/16
to Chromium-dev, dyar...@yandex-team.ru
 
* How representative of typical user workloads is this particular benchmark?  Having 10K similar URLs as the benchmark does seems a bit artificial to me, but perhaps the effects are similar in more "typical" cases, just less pronounced?

Found some real data for you - my machine) It's yandex browser, but in terms of performance here we don't differ much.

HistoryIDSetFromWordsForked doesn't differ much in terms of performance from chrome's version in terms of performance. As you can see, from time to time there are some pretty awful numbers.

Histogram: Omnibox.HistoryIDSetFromWordsForked recorded 950 samples, average = 1.6 (flags = 0x1)
0 ------------------------------------------------------------------------O (388 = 40.8%)
1 ------------------------------------------O (225 = 23.7%) {40.8%}
2 --------------------------O (140 = 14.7%) {64.5%}
3 ---------------O (83 = 8.7%) {79.3%}
4 --------O (41 = 4.3%) {88.0%}
5 -----O (28 = 2.9%) {92.3%}
6 --O (12 = 1.3%) {95.3%}
7 -O (8 = 0.8%) {96.5%}
8 -O (9 = 0.9%) {97.4%}
10 -O (7 = 0.7%) {98.3%}
12 O (4 = 0.4%) {99.1%}
14 O (1 = 0.1%) {99.5%}
17 O (2 = 0.2%) {99.6%}
20 O (2 = 0.2%) {99.8%}
24 ... 

The hole HQP can take up to 37 - 40 seconds - so the sets/maps take about half of the worst case. Once again, my mac is super fast - 2.5 GHz Intel Core i7 processor and 16GB RAM 

0   ---O                                                                      (1 = 0.7%)
1 ...
5 ---------------------------O (10 = 6.7%) {0.7%}
6 -------------------------------------------O (16 = 10.7%) {7.4%}
7 ------------------------------------------------------------------------O (27 = 18.1%) {18.1%}
8 ---------------------------------------------O (17 = 11.4%) {36.2%}
9 ------------------------------------------------O (18 = 12.1%) {47.7%}
10 --------------------------------O (12 = 8.1%) {59.7%}
11 -----------O (4 = 2.7%) {67.8%}
12 -------------O (10 = 6.7%) {70.5%}
14 -----------O (8 = 5.4%) {77.2%}
16 -------O (5 = 3.4%) {82.6%}
18 -----------O (8 = 5.4%) {85.9%}
20 ----O (5 = 3.4%) {91.3%}
23 -----O (6 = 4.0%) {94.6%}
26 -O (1 = 0.7%) {98.7%}
29 ...
37 -O (1 = 0.7%) {99.3%}
42 ...

Денис

unread,
Nov 30, 2016, 4:07:56 PM11/30/16
to Chromium-dev, dyar...@yandex-team.ru
Thanks a lot!

среда, 30 ноября 2016 г., 23:59:14 UTC+3 пользователь Vladimir Levin написал:


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Peter Kasting

unread,
Nov 30, 2016, 4:35:16 PM11/30/16
to dyar...@yandex-team.ru, Chromium-dev
On Wed, Nov 30, 2016 at 1:05 PM, Денис <dyar...@yandex-team.ru> wrote:
 
* How representative of typical user workloads is this particular benchmark?  Having 10K similar URLs as the benchmark does seems a bit artificial to me, but perhaps the effects are similar in more "typical" cases, just less pronounced?

Found some real data for you - my machine) It's yandex browser, but in terms of performance here we don't differ much.

HistoryIDSetFromWordsForked doesn't differ much in terms of performance from chrome's version in terms of performance. As you can see, from time to time there are some pretty awful numbers.

You will need to remind me what those numbers mean (this particular code is not actually mine).

PK 

dan...@chromium.org

unread,
Nov 30, 2016, 4:40:09 PM11/30/16
to Vladimir Levin, dyar...@yandex-team.ru, Chromium-dev
It sounds to me like making a nice interface and using std::vector, std::lower_bound, std::set_intersection, etc as the implementation of it would be an easy first step to some big performance wins, and we could do guided profiling and optimization to improve its performance (such as the search implementation below) afterward. This would immediately allow us to start replacing sets and maps in chromium and see wins, right? An interesting part up-front is defining a good API that will not end up letting developers re-sort the vector between inserts accidentally.

Peter Kasting

unread,
Nov 30, 2016, 4:47:31 PM11/30/16
to Dana Jansens, Vladimir Levin, dyar...@yandex-team.ru, Chromium-dev
On Wed, Nov 30, 2016 at 1:39 PM, <dan...@chromium.org> wrote:
It sounds to me like making a nice interface and using std::vector, std::lower_bound, std::set_intersection, etc as the implementation of it would be an easy first step to some big performance wins, and we could do guided profiling and optimization to improve its performance (such as the search implementation below) afterward. This would immediately allow us to start replacing sets and maps in chromium and see wins, right? An interesting part up-front is defining a good API that will not end up letting developers re-sort the vector between inserts accidentally.

The relevant CL for this (which indeed uses vector<> and lower_bound()) is https://codereview.chromium.org/2333253002/ .  (You can ignore the changes outside of base/).

The critical bits are:
* Do we want these containers in Chromium at all?  (It seems like sentiment is positive)
* Do we want to build them ourselves or import e.g. Boost?  (The doc recommends Boost, but I think the former is better)
* If we build them ourselves, who can be the relevant reviewer(s) to help get them landed?  (Seems like expertise in the STL set/map guarantees and algorithms would help make sure we don't miss anything subtle)

PK

Денис

unread,
Nov 30, 2016, 4:52:52 PM11/30/16
to Chromium-dev, dyar...@yandex-team.ru
I'm not that much of an expert either. So take this with a grain of salt.

Here is the cl, with the histogram, very similar to the first one: https://codereview.chromium.org/2545603002/

Well, left column is milliseconds. Right number, as far as I know, is how much we've landed into a particular bucket - so in 8% of all cases sets/maps took more than 5ms, and in 2% more than 10 ms, and sometimes (0.2% of cases) took even more than 20 ms. Considering how often HQP runs, this is a lot.

And, as I mentioned in one of my fist comments - this number (on previously bad inputs) goes down to 0.5ms. (check this table from my earlier comments https://docs.google.com/spreadsheets/d/1xDK_74OH37q1HUoqYf5F9zh0uchwCOQuVreS9yqfaZU/edit?usp=sharing, it still talks about hot spot optimisation, which was a mistake, but the measurements are sill correct).


четверг, 1 декабря 2016 г., 0:35:16 UTC+3 пользователь Peter Kasting написал:

Peter Kasting

unread,
Nov 30, 2016, 5:31:57 PM11/30/16
to dyar...@yandex-team.ru, Chromium-dev
On Wed, Nov 30, 2016 at 1:52 PM, Денис <dyar...@yandex-team.ru> wrote:
And, as I mentioned in one of my fist comments - this number (on previously bad inputs) goes down to 0.5ms.

I think the key question I had was whether, in fact, this number would go down in the same way the perf benchmark did, or whether the numbers in the field are large for reasons that differ from (and thus would not be affected the same way as) the perf benchmark.

That said, as noted before, I'm sufficiently convinced that this is worth doing.

PK

Денис

unread,
Nov 30, 2016, 5:33:19 PM11/30/16
to Chromium-dev
I would be cautious about replacing all over the place, for big mutating sets/maps at least. Or for something that doesn't have a move constructor yet (we probably should measure sets of gurl at some point).

I would say we should gain some experience (with benchmarks =) ) before doing any massive changes)

However, I think that flat containers could be a good default choice)

Денис

unread,
Nov 30, 2016, 5:53:34 PM11/30/16
to Chromium-dev
I understand your conserns.

I would mention again, that this is not a very sintetic benchmark's data - this a real person's history, we run an interactive test, and type symbols into omnibox.

Sure, I chose the most popular url in history and the person has a big profile and not the most common worload in the world etc, but I would bet that we'll see at least some improvements.

Alex Clarke

unread,
Dec 1, 2016, 6:26:59 AM12/1/16
to dyar...@yandex-team.ru, Chromium-dev
On 29 November 2016 at 22:44, Денис <dyar...@yandex-team.ru> wrote:
среда, 30 ноября 2016 г., 0:32:53 UTC+3 пользователь Alex Clarke написал:
I didn't end up landing it but fairly recently I looked into flat sets and maps, and I think they are worth considering.  See https://codereview.chromium.org/2396533004/
I wrote a few benchmarks to try and explore the performance (note the graphs are log scale) and for small data stets they seemed to be pretty fast, but folks where concerned about the linear worst case performance for insert and erase.

I will definetly take a closer look tommorow.

About linear worst case: it's not entirely correct to compare flat_set/flat_map to the std::set/std::map in terms of insertions erasures one to one.
It has an assumption to it that you constantly modify your map by one element. This is typically not true (especially in HQP).
You can do an erase(remove) idiom and it will do all your erasures at one linear time. You can do sort + insert, if you insert multiple elements at once.

Anyways, I don't think that it's fare to judge flat containers from insertion/erasure speed (at least on the big sizes) the same way as judging lists for their iteration time. This is not what they are made for. 
 

I didn't try a comparison with boot's containers but that would be interesting to see since the boost flat_map looks like it's using a much more complicated data structure.

It's not - it's just a sorted vector that has some staff over it. Actually your ring buffer solution us much more unusual, I would say)
Is there a good reason why you deviated from tipical interfaces so much?

Do you mean why wasn't it more like an stl container?  Mostly because I didn't get round to that :)  

My motivation was optimizing some code in the blink scheduler, we ended up using an IntrusiveHeap which was a fair bit faster than std::set in micro benchmarks.  If a flat set/map turned out to be faster I'd happily migrate over to that. 
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Денис

unread,
Dec 1, 2016, 12:38:35 PM12/1/16
to Chromium-dev, dyar...@yandex-team.ru
It will be worth measuring, but seems like a more specialised structure with less requirements (like intrusive heap) should be better)

четверг, 1 декабря 2016 г., 14:26:59 UTC+3 пользователь Alex Clarke написал:


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Vladimir Levin

unread,
Dec 1, 2016, 7:35:49 PM12/1/16
to dyar...@yandex-team.ru, Chromium-dev
On Wed, Nov 30, 2016 at 12:57 PM, Vladimir Levin <vmp...@google.com> wrote:


On Wed, Nov 30, 2016 at 12:28 PM, Денис <dyar...@yandex-team.ru> wrote:
Awesome research! If we would have a decent tested container, we could try out different implementations for different sizes, iterator_categories etc.

Can I upload the code of your profiling somewhere?

The code is kind of one off with all kinds of one letter variables and commented code :) I'll clean it up and send it in a follow-up email. 

Денис

unread,
Dec 2, 2016, 1:50:46 AM12/2/16
to Chromium-dev, dyar...@yandex-team.ru
Appreciate it.

пятница, 2 декабря 2016 г., 3:35:49 UTC+3 пользователь Vladimir Levin написал:

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.


Денис

unread,
Dec 2, 2016, 10:44:12 AM12/2/16
to Chromium-dev, dan...@chromium.org, vmp...@google.com, dyar...@yandex-team.ru
On Peter Kasting wrote:
The critical bits are:
* Do we want these containers in Chromium at all?  (It seems like sentiment is positive)
* Do we want to build them ourselves or import e.g. Boost?  (The doc recommends Boost, but I think the former is better)

Ok, I start writing tests)
What do you think are the chances that my effort goes to waste?

Peter Kasting

unread,
Dec 2, 2016, 1:53:18 PM12/2/16
to dyar...@yandex-team.ru, Chromium-dev, Dana Jansens, Vladimir Levin
To minimize any possibility of wasted effort, I would aim first to get the API itself into the best state, and start the review process then.  In parallel with the review of that, you can flesh out missing tests.  If there are major concerns about overall direction, they'll have to do with the API and overall approach, and can be raised as quickly as possible.

PK

Денис

unread,
Dec 2, 2016, 2:40:13 PM12/2/16
to Chromium-dev, dyar...@yandex-team.ru, dan...@chromium.org, vmp...@google.com
Oh.

I can see how there can be some problems with map interface but isn't C++17 std::set interface (except for the node parts, it's not applicable) is good enough?

It's not like there are missing tests, we have to do tests for:
 - different types
 - different comparators
 - different iterator categories

All of these in various combinations. I was planning to generate them with templates.

пятница, 2 декабря 2016 г., 21:53:18 UTC+3 пользователь Peter Kasting написал:

Peter Kasting

unread,
Dec 2, 2016, 3:02:45 PM12/2/16
to dyar...@yandex-team.ru, Chromium-dev, Dana Jansens, Vladimir Levin
On Fri, Dec 2, 2016 at 11:40 AM, Денис <dyar...@yandex-team.ru> wrote:
I can see how there can be some problems with map interface but isn't C++17 std::set interface (except for the node parts, it's not applicable) is good enough?

I was merely going off your comment earlier about "design of a few things: like separate key/value storage, what do we want to pass as a template parameter : container vs allocator, special iterators etc", and danakj's comment about "An interesting part up-front is defining a good API that will not end up letting developers re-sort the vector between inserts accidentally".

It's quite possible that these APIs, especially the simpler set interface, are already sufficiently good to start the formal review process, in which case I'd say go for it immediately.

PK

Денис

unread,
Dec 2, 2016, 3:09:48 PM12/2/16
to Chromium-dev, dyar...@yandex-team.ru, dan...@chromium.org, vmp...@google.com
Most of these things are for maps. Container vs Allocators are the only thing concerning sets, but I still can't test it without some working implementation of flat_sets.

I can write an api, if you think this is a good idea but an api without tests can be buggy: for example I tried to implement "void insert(It, Sent)" like in the proposal and found out that then it conflicts with overload iterator "insert (const_iterator, const value_type&)".

However, if you think that this is the way to go - ok.

пятница, 2 декабря 2016 г., 23:02:45 UTC+3 пользователь Peter Kasting написал:

Peter Kasting

unread,
Dec 2, 2016, 3:20:52 PM12/2/16
to dyar...@yandex-team.ru, Chromium-dev, Dana Jansens, Vladimir Levin
On Fri, Dec 2, 2016 at 12:09 PM, Денис <dyar...@yandex-team.ru> wrote:
Most of these things are for maps. Container vs Allocators are the only thing concerning sets, but I still can't test it without some working implementation of flat_sets.

I can write an api, if you think this is a good idea but an api without tests can be buggy: for example I tried to implement "void insert(It, Sent)" like in the proposal and found out that then it conflicts with overload iterator "insert (const_iterator, const value_type&)".

However, if you think that this is the way to go - ok.

I would try to match the C++11 APIs for map/set for now, and not add anything fancy.  We can do that later if need be.

I think maybe my advice was unclear.  I wasn't trying to say that based on what you have now, your APIs still need work.  I was trying to save you time by saying "as soon as your API is good enough to land something useful in the tree, start the review process, and add more over time".  It seemed like you were convinced that there would be months of effort ahead and didn't want it wasted, and I wanted to make sure you actually got a working class on a path to land before doing any of that effort.

PK

Денис

unread,
Dec 2, 2016, 3:24:32 PM12/2/16
to Chromium-dev, dyar...@yandex-team.ru, dan...@chromium.org, vmp...@google.com
Oh, I got you. Start the review process early, maybe with just a few methods. I can do that.

пятница, 2 декабря 2016 г., 23:20:52 UTC+3 пользователь Peter Kasting написал:

Montrice Tonui

unread,
Dec 2, 2016, 3:29:00 PM12/2/16
to Chromium-dev
unsubscribing form receiving email.

Yuri Wiitala

unread,
Dec 3, 2016, 6:27:26 PM12/3/16
to Peter Kasting, dyar...@yandex-team.ru, Chromium-dev
On Tue, Nov 29, 2016 at 11:33 AM, Peter Kasting <pkas...@chromium.org> wrote:

On Tue, Nov 29, 2016 at 8:49 AM, Денис <dyar...@yandex-team.ru> wrote:
I finally finished the proposal to add flat containers to Chromium to the stage where we can discuss it.
Lots of measurements for different libraries on windows and mac!

Very interesting! Thanks for doing this. :)

I took a look at the code review link. I'm not familiar at all with the HQP code or design, so I'm probably misinterpreting what we are measuring and which map containers were being replaced for these tests. Perhaps you could share a code review link that has the exact patch you were using for testing the flat_set/map replacement, and also point us to the exact benchmark functions used to get the data for each graph? For me, it's otherwise hard to reason about the access patterns on the data structures.

BTW--At line 173-175 in components/omnibox/browser/history_quick_provider_performance_unittest.cc in this CL, you should use base::TimeTicks instead of base::Time. The latter is not a high-resolution monotonic clock.
 
I found it noteworthy that a B-tree performed 50% worse in your tests than the flat container.  Yuri may have thoughts there.

I think this result makes sense if the vast majority of operations in this code are look-ups. To do each look-up, the B-Tree must scan through a few layers of nodes to find each one value (of 10,000); whereas the flat_map is just searching an ordered array. BTW--I'm not sure there exists a data structure that could perform look-ups better than a sorted array (assuming randomly-distributed queries and elements).

Also, I've found in some of my own research that the Google B-tree might not be an optimal general-purpose B-tree solution: For one, it stores all keys on internal nodes and all values on leaf nodes; but I've seen other B-trees that always keep key-value pairs stored together (and can exist on internal nodes). IIRC, the Google B-tree also targets node sizes to 128 bytes; but on 64-bit x86, 256 bytes seems to be the performance sweet spot.

My prior statements in other e-mails on B-tree performance were based on general-purpose access patterns, and whether B-trees would make a good replacement for std::set/map in-general. Like all data structures, if your know your access pattern ahead of time, that will (and should) influence your choice. For one-time bulk-insert, followed by almost always doing look-ups, a flat map will obviously perform much better.


Денис

unread,
Dec 4, 2016, 3:17:07 AM12/4/16
to Chromium-dev, pkas...@chromium.org, dyar...@yandex-team.ru
воскресенье, 4 декабря 2016 г., 2:27:26 UTC+3 пользователь Yuri Wiitala написал:
On Tue, Nov 29, 2016 at 11:33 AM, Peter Kasting <pkas...@chromium.org> wrote:
On Tue, Nov 29, 2016 at 8:49 AM, Денис <dyar...@yandex-team.ru> wrote:
I finally finished the proposal to add flat containers to Chromium to the stage where we can discuss it.
Lots of measurements for different libraries on windows and mac!

Very interesting! Thanks for doing this. :)

I took a look at the code review link. I'm not familiar at all with the HQP code or design, so I'm probably misinterpreting what we are measuring and which map containers were being replaced for these tests. Perhaps you could share a code review link that has the exact patch you were using for testing the flat_set/map replacement, and also point us to the exact benchmark functions used to get the data for each graph? For me, it's otherwise hard to reason about the access patterns on the data structures.

BTW--At line 173-175 in components/omnibox/browser/history_quick_provider_performance_unittest.cc in this CL, you should use base::TimeTicks instead of base::Time. The latter is not a high-resolution monotonic clock.
 
I found it noteworthy that a B-tree performed 50% worse in your tests than the flat container.  Yuri may have thoughts there.

I think this result makes sense if the vast majority of operations in this code are look-ups. To do each look-up, the B-Tree must scan through a few layers of nodes to find each one value (of 10,000); whereas the flat_map is just searching an ordered array. BTW--I'm not sure there exists a data structure that could perform look-ups better than a sorted array (assuming randomly-distributed queries and elements).

The main factors are speed of iterations, set operations and copying. These is where flat containers shine.

Patch for testing flat containers: https://codereview.chromium.org/2333253002/

For graphs I used results of HQPPerfTestOnePopularURL.Typing.

dan...@chromium.org

unread,
Dec 5, 2016, 6:43:00 PM12/5/16
to Peter Kasting, dyar...@yandex-team.ru, Chromium-dev, Vladimir Levin
On Fri, Dec 2, 2016 at 12:20 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Dec 2, 2016 at 12:09 PM, Денис <dyar...@yandex-team.ru> wrote:
Most of these things are for maps. Container vs Allocators are the only thing concerning sets, but I still can't test it without some working implementation of flat_sets.

I can write an api, if you think this is a good idea but an api without tests can be buggy: for example I tried to implement "void insert(It, Sent)" like in the proposal and found out that then it conflicts with overload iterator "insert (const_iterator, const value_type&)".

However, if you think that this is the way to go - ok.

I would try to match the C++11 APIs for map/set for now, and not add anything fancy.  We can do that later if need be.

I'm not sure that's the right approach. There are proposals for flat_set in STL linked to in your document which both seem to be better starting points than the worst example of std::set.

The 2nd proposal exposes the underlying iterators and container to provide the maximum flexibility of the flat_set adapter. I'm not convinced that we need that here, tho allowing specifying a different underlying container does seem very interesting, as possibly very useful for blink which may want to use WTF::Vector for instance to allow garbage collected objects to be stored.

I'd like to see a design proposal that considers and explains its choices with respect to the boost API, these proposals linked from your explainer document, and takes into consideration Oilpan and Mojo.

Peter Kasting

unread,
Dec 5, 2016, 8:24:39 PM12/5/16
to Dana Jansens, dyar...@yandex-team.ru, Chromium-dev, Vladimir Levin
On Mon, Dec 5, 2016 at 3:41 PM, <dan...@chromium.org> wrote:
On Fri, Dec 2, 2016 at 12:20 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Dec 2, 2016 at 12:09 PM, Денис <dyar...@yandex-team.ru> wrote:
Most of these things are for maps. Container vs Allocators are the only thing concerning sets, but I still can't test it without some working implementation of flat_sets.

I can write an api, if you think this is a good idea but an api without tests can be buggy: for example I tried to implement "void insert(It, Sent)" like in the proposal and found out that then it conflicts with overload iterator "insert (const_iterator, const value_type&)".

However, if you think that this is the way to go - ok.

I would try to match the C++11 APIs for map/set for now, and not add anything fancy.  We can do that later if need be.

I'm not sure that's the right approach. There are proposals for flat_set in STL linked to in your document which both seem to be better starting points than the worst example of std::set.

The 2nd proposal exposes the underlying iterators and container to provide the maximum flexibility of the flat_set adapter. I'm not convinced that we need that here, tho allowing specifying a different underlying container does seem very interesting, as possibly very useful for blink which may want to use WTF::Vector for instance to allow garbage collected objects to be stored.

I'd like to see a design proposal that considers and explains its choices with respect to the boost API, these proposals linked from your explainer document, and takes into consideration Oilpan and Mojo.

My concern here is that we set the bar so high to get anything into the codebase that we don't actually ship major, meaningful improvements to users in a timely way.

The improvements proposed to the omnibox here will have significant user impact.  They have already been delayed a couple of months in the process of trying to get more data to support this change than I thought we needed.  Since this is effectively volunteer-time work from an external contributor, I'm concerned about adding Yet More Months onto that process.  Can we create something that's a workable improvement now, and leave "make the perfect ultimate API that solves Oilpan and Mojo and Chromium and considers C++11 and boost and all the other external implementations" as a followup?

This has other advantages as well.  Using C++11 APIs lets these be a drop-in replacement for map/set today, maximizing the ease of porting existing Chromium code to them (which then increases the surface area that will be improvable if we later improve the API/implementation).  It also provides the greatest chance for migration to a future std::flat_map/set.

Are there ways in which using the C++11 APIs _prevents_ us from adding more power and customizability later?  If such capabilities are purely "added value", then staging their addition also makes the initial API more reviewable.

PK

dan...@chromium.org

unread,
Dec 5, 2016, 9:54:32 PM12/5/16
to Peter Kasting, dyar...@yandex-team.ru, Chromium-dev, Vladimir Levin
On Mon, Dec 5, 2016 at 5:23 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Dec 5, 2016 at 3:41 PM, <dan...@chromium.org> wrote:
On Fri, Dec 2, 2016 at 12:20 PM, Peter Kasting <pkas...@chromium.org> wrote:
On Fri, Dec 2, 2016 at 12:09 PM, Денис <dyar...@yandex-team.ru> wrote:
Most of these things are for maps. Container vs Allocators are the only thing concerning sets, but I still can't test it without some working implementation of flat_sets.

I can write an api, if you think this is a good idea but an api without tests can be buggy: for example I tried to implement "void insert(It, Sent)" like in the proposal and found out that then it conflicts with overload iterator "insert (const_iterator, const value_type&)".

However, if you think that this is the way to go - ok.

I would try to match the C++11 APIs for map/set for now, and not add anything fancy.  We can do that later if need be.

I'm not sure that's the right approach. There are proposals for flat_set in STL linked to in your document which both seem to be better starting points than the worst example of std::set.

The 2nd proposal exposes the underlying iterators and container to provide the maximum flexibility of the flat_set adapter. I'm not convinced that we need that here, tho allowing specifying a different underlying container does seem very interesting, as possibly very useful for blink which may want to use WTF::Vector for instance to allow garbage collected objects to be stored.

I'd like to see a design proposal that considers and explains its choices with respect to the boost API, these proposals linked from your explainer document, and takes into consideration Oilpan and Mojo.

My concern here is that we set the bar so high to get anything into the codebase that we don't actually ship major, meaningful improvements to users in a timely way.

That's a good concern to voice, thanks.

The improvements proposed to the omnibox here will have significant user impact.  They have already been delayed a couple of months in the process of trying to get more data to support this change than I thought we needed.  Since this is effectively volunteer-time work from an external contributor, I'm concerned about adding Yet More Months onto that process.  Can we create something that's a workable improvement now, and leave "make the perfect ultimate API that solves Oilpan and Mojo and Chromium and considers C++11 and boost and all the other external implementations" as a followup?

Certainly. I think if we want to lower the bar we should just write something close to where it will be used for the Omnibox and make it very specific to that, not a general container in base as a first step.

This has other advantages as well.  Using C++11 APIs lets these be a drop-in replacement for map/set today, maximizing the ease of porting existing Chromium code to them (which then increases the surface area that will be improvable if we later improve the API/implementation).  It also provides the greatest chance for migration to a future std::flat_map/set.

The proposals for flat_set/map APIs are what I suggested working from, because as you say it would be nice to use the standard stuff if we can once it exists. Using std::set/map as a starting point is subtlely different. For instance the bulk insert for set::set is

template< class InputIt >
void insert( InputIt first, InputIt last );

Whereas for the first flat_set proposal it is

template <class InputIterator, class Sentinel>
void insert(InputIterator first, Sentinel last);

And boost has two

template<typename InputIterator>
void insert(InputIterator, InputIterator);

template<typename InputIterator>
void insert(ordered_unique_range_t, InputIterator, InputIterator);

dyaroshev did a good job comparing these two in boost and found the latter to not perform well. I'm not honestly sure what the flat_set proposal's function signature means and why it differs. I'd like API considerations like these to be documented and explained.

The second flat_set proposal has other things such as underlaying_iterator and container() which greatly change how it can be used (and its type signature), and allow bulk insert directly into the underlying vector and then std::sort on it for example.

So we must trade-off being a drop-in replacement for nodebased std::set vs matching one of 3 potential future worlds, we probably can't do both (though boost's is compatible).
 
Are there ways in which using the C++11 APIs _prevents_ us from adding more power and customizability later?  If such capabilities are purely "added value", then staging their addition also makes the initial API more reviewable.

While most of the API looks the same and non-controversial (like clear() will always be clear()), it's at the performance points where flat_set differs from set that each proposed solution differs subtlely, and none match exactly std::set. I only looked at insert() while writing this so I can't be sure what other differences exist. This is why I'd like us to understand more clearly and document our decisions before committing to something adhoc. The cost of incremental changes in base/ is extremely large, so investing some time up front is worth it.

Here's a possible path forward. What do you think?

1. Take the prototype written by dyaroshev, and specialize it for omnibox.
 - Drop methods that omnibox doesn't need. This lowers the test coverage requirements.
 - Can we add tests based on what boost tests?
 - Change APIs if they can produce much better performance. With fewer methods this evaluation is easier.
 - We can do this iteratively as needed.
2. Survey the differences in API between STL, the proposed flat_set, boost's, and what we end up from step 1. Explain the differences and choose the best for each variation.
3. Generalize the class with templates, extend/change the API based on step 2, and move it to base.
4. Use it more widely.


- Dana

Peter Kasting

unread,
Dec 5, 2016, 10:27:39 PM12/5/16
to Dana Jansens, dyar...@yandex-team.ru, Chromium-dev, Vladimir Levin
Thanks for the detailed reply.  I had read the whole paper but had not recalled that the proposed future flat_set/map API didn't match today's set/map API.  That is a crucial detail.  :/

On Mon, Dec 5, 2016 at 6:53 PM, <dan...@chromium.org> wrote:
Here's a possible path forward. What do you think?

1. Take the prototype written by dyaroshev, and specialize it for omnibox.
 - Drop methods that omnibox doesn't need. This lowers the test coverage requirements.
 - Can we add tests based on what boost tests?
 - Change APIs if they can produce much better performance. With fewer methods this evaluation is easier.
 - We can do this iteratively as needed.
2. Survey the differences in API between STL, the proposed flat_set, boost's, and what we end up from step 1. Explain the differences and choose the best for each variation.
3. Generalize the class with templates, extend/change the API based on step 2, and move it to base.
4. Use it more widely.

The general pattern here seems OK, but I would avoid dropping methods the omnibox doesn't need in step 1.  Instead, I'd change this to "make sure these methods are well-tested before moving this class to base/".  I just figure it will be more work to drop these and re-add them later than to simply check them in as-is.

PK

Денис

unread,
Dec 6, 2016, 5:02:47 AM12/6/16
to Chromium-dev, pkas...@chromium.org, dyar...@yandex-team.ru, vmp...@google.com


вторник, 6 декабря 2016 г., 5:54:32 UTC+3 пользователь danakj написал:
I can address that particular consern.
The idea behind

template <class InputIterator, class Sentinel>
void insert(InputIterator first, Sentinel last);

has to do with Ranges TS, where they allow two different types for begin and end iterators.
I've tried to do that, and I've got into trouble with

void insert(const_iterator pos, const value_type& x)

Insert in the case of my_set.insert(my_set.begin(), 3); becomes ambiguous, it can probably be dealt with some enable_if tricks but since nobody uses this interface right now, I think it could be added later.

About std::set vs flat_set - flat_set has to support as much std::set api as possible, this is what every implementation is doing and this is a good starting point.
We can do lot's of more cool things with underlying type and with ordered_unique_t, like for example

template <Vector>
void insert(std::remove_reference<Vector>::type&& vec) { // 
rvalue, not forwrading (not sure that it's spelled like this) 
  std::stable_sort(vec.begin(), vec.end()); // use vec as a buffer.
  // merge implementation, that takes sizes into account
}

but all of these could be done as pure extensions. So - my thinking is - let's land something usable in base, with the api that's very likely to stay there (based on the first proposal, it seems to be the closest one to the future standard) and start getting benefits now. Then we have to list interesting for us cases - benchmark them and see what gives us benefits. For example - I'm really looking forward to checking what we can win with get_temporary_buffer() and bulk insert for different types.

- Denis

Денис

unread,
Dec 6, 2016, 3:27:01 PM12/6/16
to Chromium-dev
I've created a bug https://bugs.chromium.org/p/chromium/issues/detail?id=671759 and corresponding cl https://codereview.chromium.org/2552343003/

Obviously cl has a long way to go yet but we can start the review process.

Денис

unread,
Dec 6, 2016, 5:01:49 PM12/6/16
to Chromium-dev, dan...@chromium.org, dyar...@yandex-team.ru, vmp...@google.com


вторник, 6 декабря 2016 г., 6:27:39 UTC+3 пользователь Peter Kasting написал:
Thanks for the detailed reply.  I had read the whole paper but had not recalled that the proposed future flat_set/map API didn't match today's set/map API.  That is a crucial detail.

It does in all the important places, there are some extra methods from C++17 etc. I replied about that particular difference.
The second proposal seems to be further from what's actaully going to land. However we can add the ability to castomise the underlying type later and we can even try to more interesting things, I just think that experimenting should be done with benchmarks for the important cases (I have some suggestions about what those might be).

However, none of what's said prevents us from landing flat_set<Key, Compare> with common api and implementation similar to popular open source solutions.

Denis.

Денис

unread,
Dec 27, 2016, 8:58:34 AM12/27/16
to Chromium-dev
Hi again.

I would really like to make some progress with flat sets cl: https://codereview.chromium.org/2552343003/
If some base owners are willing to review, it would be great.

Thanks.
Denis.

Avi Drissman

unread,
Dec 27, 2016, 9:05:54 AM12/27/16
to dyar...@yandex-team.ru, Chromium-dev
Many Chromium team members are off for the holidays until the beginning of January. My suggestion would be to try again then.

Avi

--

Денис

unread,
Feb 8, 2017, 6:23:06 AM2/8/17
to Chromium-dev
It was hard but first version of flat_set has landed!
I'm gonna take a small vacation but if someone wants to try it out somewhere - please share here and I'm happy to review if you want my opinion.

вторник, 6 декабря 2016 г., 23:27:01 UTC+3 пользователь Денис написал:

Dmitry Skiba

unread,
Feb 8, 2017, 11:45:47 AM2/8/17
to dyar...@yandex-team.ru, Chromium-dev
Congrats! Is there a CL to use them in HQP? I'm curious from a memory usage standpoint, because maps/sets have overhead close to 50%.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Daniel Cheng

unread,
Feb 8, 2017, 12:12:01 PM2/8/17
to dsk...@google.com, dyar...@yandex-team.ru, Chromium-dev
What's HQP?

Daniel

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Ryan Sleevi

unread,
Feb 8, 2017, 12:13:29 PM2/8/17
to Daniel Cheng, Dmitry Skiba, dyar...@yandex-team.ru, Chromium-dev

What's HQP?

Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Brett Wilson

unread,
Feb 8, 2017, 12:31:25 PM2/8/17
to dyar...@yandex-team.ru, Chromium-dev
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

Brett



On Wed, Feb 8, 2017 at 3:23 AM, Денис <dyar...@yandex-team.ru> wrote:

--

Vladimir Levin

unread,
Feb 8, 2017, 2:29:59 PM2/8/17
to Brett Wilson, dyar...@yandex-team.ru, Chromium-dev
On Wed, Feb 8, 2017 at 9:30 AM, Brett Wilson <bre...@chromium.org> wrote:
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

+1 for flat_map as well. I think it would be very useful and should be pretty straightforward to implement with the flat_set as an example.
 

dan...@chromium.org

unread,
Feb 8, 2017, 3:24:22 PM2/8/17
to Vladimir Levin, Brett Wilson, dyar...@yandex-team.ru, Chromium-dev
On Wed, Feb 8, 2017 at 2:28 PM, Vladimir Levin <vmp...@chromium.org> wrote:


On Wed, Feb 8, 2017 at 9:30 AM, Brett Wilson <bre...@chromium.org> wrote:
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

+1 for flat_map as well. I think it would be very useful and should be pretty straightforward to implement with the flat_set as an example.

I think it's a good idea but it'd be nice to get a better handle on the perf for flat_set and where it is good/bad to use. The guidance in the .h should be fairly conservative right now. And it doesn't support any kind of bulk insertion yet. So there's a few avenues that we could go next.

Also it's worth considering how we can evolve dealing with keys for both flat_set and flat_map. Specifically if you have std::string keys, it'd be nice if find("literal") didn't have to construct a std::string. Also it'd be nice to be able to use unique_ptr as a key and be able to find() them based on the value inside. I'm not sure if we can do those things and not break the container's API compatibility with STL.

Daniel Cheng

unread,
Feb 8, 2017, 5:00:29 PM2/8/17
to dan...@chromium.org, Vladimir Levin, Brett Wilson, dyar...@yandex-team.ru, Chromium-dev
On Wed, Feb 8, 2017 at 12:23 PM <dan...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 2:28 PM, Vladimir Levin <vmp...@chromium.org> wrote:


On Wed, Feb 8, 2017 at 9:30 AM, Brett Wilson <bre...@chromium.org> wrote:
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

+1 for flat_map as well. I think it would be very useful and should be pretty straightforward to implement with the flat_set as an example.

I think it's a good idea but it'd be nice to get a better handle on the perf for flat_set and where it is good/bad to use. The guidance in the .h should be fairly conservative right now. And it doesn't support any kind of bulk insertion yet. So there's a few avenues that we could go next.

Also it's worth considering how we can evolve dealing with keys for both flat_set and flat_map. Specifically if you have std::string keys, it'd be nice if find("literal") didn't have to construct a std::string. Also it'd be nice to be able to use unique_ptr as a key and be able to find() them based on the value inside. I'm not sure if we can do those things and not break the container's API compatibility with STL.

Another problem with strings comes up in interactions with StringPiece:
- It's nice to use StringPiece, especially if the arguments are often string literals. This avoids binary bloat, since there won't be N conversions to a temporary std::string; there's just the one conversion when actually invoking find()/insert()/emplace().
- However, if the supplied argument is already a std::string, this results in an extra std::string copy.
- The "solution" is to overload each method on const std::string& and const char*. Ugh!

It's things like this that make me wonder if we're not just better off with a StringSet/StringMap container for string keys: then code can just use StringPiece throughout and convert it to a std::string only when it needs to store it.

For the unique_ptr case... we could implement C++14 transparent comparator support in flat_set/flat_map to help with, but it'd be a bit sad to do that without getting transparent comparator support in our other containers.

Daniel
 
 
 
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Jeremy Roman

unread,
Feb 8, 2017, 5:08:16 PM2/8/17
to Daniel Cheng, Dana Jansens, Vladimir Levin, Brett Wilson, dyar...@yandex-team.ru, Chromium-dev
On Wed, Feb 8, 2017 at 4:59 PM, Daniel Cheng <dch...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 12:23 PM <dan...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 2:28 PM, Vladimir Levin <vmp...@chromium.org> wrote:


On Wed, Feb 8, 2017 at 9:30 AM, Brett Wilson <bre...@chromium.org> wrote:
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

+1 for flat_map as well. I think it would be very useful and should be pretty straightforward to implement with the flat_set as an example.

I think it's a good idea but it'd be nice to get a better handle on the perf for flat_set and where it is good/bad to use. The guidance in the .h should be fairly conservative right now. And it doesn't support any kind of bulk insertion yet. So there's a few avenues that we could go next.

Also it's worth considering how we can evolve dealing with keys for both flat_set and flat_map. Specifically if you have std::string keys, it'd be nice if find("literal") didn't have to construct a std::string. Also it'd be nice to be able to use unique_ptr as a key and be able to find() them based on the value inside. I'm not sure if we can do those things and not break the container's API compatibility with STL.

Another problem with strings comes up in interactions with StringPiece:
- It's nice to use StringPiece, especially if the arguments are often string literals. This avoids binary bloat, since there won't be N conversions to a temporary std::string; there's just the one conversion when actually invoking find()/insert()/emplace().
- However, if the supplied argument is already a std::string, this results in an extra std::string copy.
- The "solution" is to overload each method on const std::string& and const char*. Ugh!

It's things like this that make me wonder if we're not just better off with a StringSet/StringMap container for string keys: then code can just use StringPiece throughout and convert it to a std::string only when it needs to store it.

I'm not sure I understand which part of this isn't resolved by C++14-style heterogeneous lookup (in std::set/map, at least, and as you point out we can do the same for our own containers).

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Daniel Cheng

unread,
Feb 8, 2017, 5:25:41 PM2/8/17
to Jeremy Roman, Dana Jansens, Vladimir Levin, Brett Wilson, dyar...@yandex-team.ru, Chromium-dev
On Wed, Feb 8, 2017 at 2:07 PM Jeremy Roman <jbr...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 4:59 PM, Daniel Cheng <dch...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 12:23 PM <dan...@chromium.org> wrote:
On Wed, Feb 8, 2017 at 2:28 PM, Vladimir Levin <vmp...@chromium.org> wrote:


On Wed, Feb 8, 2017 at 9:30 AM, Brett Wilson <bre...@chromium.org> wrote:
For the curious, this is in:

Is a flat_map planned also? If so, can we replace small_map with it? I think the performance properties will overlap enough such that having the super-small small_map optimization is no longer necessary, especially given the code bloat associated with small_map.

+1 for flat_map as well. I think it would be very useful and should be pretty straightforward to implement with the flat_set as an example.

I think it's a good idea but it'd be nice to get a better handle on the perf for flat_set and where it is good/bad to use. The guidance in the .h should be fairly conservative right now. And it doesn't support any kind of bulk insertion yet. So there's a few avenues that we could go next.

Also it's worth considering how we can evolve dealing with keys for both flat_set and flat_map. Specifically if you have std::string keys, it'd be nice if find("literal") didn't have to construct a std::string. Also it'd be nice to be able to use unique_ptr as a key and be able to find() them based on the value inside. I'm not sure if we can do those things and not break the container's API compatibility with STL.

Another problem with strings comes up in interactions with StringPiece:
- It's nice to use StringPiece, especially if the arguments are often string literals. This avoids binary bloat, since there won't be N conversions to a temporary std::string; there's just the one conversion when actually invoking find()/insert()/emplace().
- However, if the supplied argument is already a std::string, this results in an extra std::string copy.
- The "solution" is to overload each method on const std::string& and const char*. Ugh!

It's things like this that make me wonder if we're not just better off with a StringSet/StringMap container for string keys: then code can just use StringPiece throughout and convert it to a std::string only when it needs to store it.

I'm not sure I understand which part of this isn't resolved by C++14-style heterogeneous lookup (in std::set/map, at least, and as you point out we can do the same for our own containers).

AFAIK, it doesn't help for unordered_map / unordered_set. In addition, while you can't forget the comparator for unique_ptr keys, you can for std::string keys. If you do, you'll silently get worse code.

Daniel
 

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Денис

unread,
Feb 14, 2017, 3:13:32 PM2/14/17
to Chromium-dev
Hi, sorry that I could not response for a while.

Unfortunately, I've been reassigned to a different task that occupies almost all of my time and I won't be able to spent any significant effort on this project any time soon.
Feel free to use my results, I'm happy to review and argue if you want me to.

On specific problems:

Doable in reasonable time:

In any related cl somewhere there please fix this comment): https://cs.chromium.org/chromium/src/base/containers/flat_set.h?l=111
It's complexity requirements for stable_sort and we use sort))

HQP:
CL that tested prototype won't work due to the fact that we dropped void insert(first, last) - it has to be fixed or just recreated. Prefer collecting in a buffer and than constructing from it with flat_set(first, last) this is nlog vs n^2 when inserting one by one. I would also suggest to revise calls to insert with one element - that code is often can be replaced with range construction.

flat_map/flat_multiset/flat_multimap:
I've looked into the possibility to store keys/values separately - in short, I don't believe this can be done in a reasonalbe way. We would like for flat_map to work with algorithms because we really want remove, unique, set operations etc to work. The only solution I know is Eric Niebler's which is very complex and requires reimplementing the algorithms. + for (auto x : my_map) means auto& and  for(auto& x : my_map) doesn't work.
So, I think implementing these containers in a similar to flat_set fashion is the way to go. I've seen a few implementations for std::set/std::map/flat_set/flat_map - most use common base class - I think this could be a way to go. For flat_set I took tests from C++, I suggest to do the same here.

improving the interface: (crbug.com/682254)
  • a better interface to collect everything and then insert it in a set. Right now we have to collect evertything in a vector and then do
    our_set = {std::make_move_iterator(vec.begin()) std::make_move_iterator(vec.end())} which is too complicated and sub optimal. There are already a few uses of flat_set in code (which is awesome), and people insert elements one by one (which is quadratic) I suspect specifically because of this problem. I see two possible solutions: expose the underlying_type in the interface and add smth like flat_set(underlying_type x) : impl(std::move(x)). Exposing underlying type in the interface might be good for other reasons too - we have a few uses allocators in code. Another possible solution is  we can add delayed_insert(elem), delayed_insert(first, last), commit_delayed_insertions (I don't think that querying if set is in the sorted state is a good idea, we would have to many unnessary dchecks). This has some of the same questions with insert(first, last)  (see below). Maybe we ought to have some form of a builder. However this things are complimentary and if we do not know how to do the second one, we should at least expose the underlying type.
  • Already mentioned problem of find("name") is actually somewhat worse - for example std::set<std::unique_ptr<T>> is unusable for most practical purposes. Some of it was solved in C++14 with std::less<> and is_transparent traits - this would be a good addition to flat_set. libc++ has decent tests for this, see comments in flat_set_unittests.
  • Another problem with std::set<std::unique_ptr<T>> is already solved: we didn't make keys const, so this: unique_ptr<T> = std::move(*my_set_iterator); my_set.erase(my_set_iterator); compiles. Maybe we can do some form of C++17 std::set::extract, but it's not clear to me what a good version of that would be. 
benchmarks: (crbug.com/682215)
While landing flat_set I did one - I've tested insert(begin()) for set vs flat_set -> under a 100 32bit integers flat_set seemed to be a win (I followed all of my suggestions))). Watch out for two things: compiler can optimise all of your code away (a tool to look at assembler), flat_sets can be worse on older machines - be sure to check on those.
I would suggest following benchmarks:
  • constructor(first, last)
  • copying.
  • inserting increasing/decreasing/uniformaly distributed/normally distributed elements.
  • lookup speed.
  • set operations
  • merging many sets - I think there might be a good algorithm to do this but so far putting everything in a vector and sorting beats me to the pulp.
Requires some difficult research:

insert(first, last):(crbug.com/682249)
This is hard and requires benchmarks: essentially I haven't seen this done properly anywhere. The issue of stability comes up: I would suggest that new elements can not override existing elements - this is consistent with the constructor. While benchmarking watch out for relative sizes of ranges and see if knowledge of a set being sorted is helpful.
Watch out: std::inplace_merge is not stable in libc++, std::get_temporary_buffer() very well might be just new, consider allocating some space on the stack if you need it.

small buffer optimisation: (crbug.com/682240)
I would suggest either writing or taking existent small vector(llvm). This could be really good for flat_set<flat_set<int>> - we would get almost all of our elements in a contiguous memory location.

Once again: it kills me but I have to step back for quite a while, I'm extremely sorry. Happy to review and argue if somebody picks up on this and would want my input.

Денис

unread,
Feb 14, 2017, 6:02:49 PM2/14/17
to Chromium-dev
Typos:


For flat_set I took tests from C++, I suggest to do the same here.
- libc++.

 The issue of stability comes up: I would suggest that new elements can not override existing elements - this is consistent with the constructor.

New elements can not override existing elements but to not guarantie anything about order of new elements. Even the restriction of stability to previous one may cost us something, but it should be a really good performance win to violate that. 

Peter Kasting

unread,
Mar 8, 2017, 4:29:34 AM3/8/17
to dyar...@yandex-team.ru, Chromium-dev
On Tue, Feb 14, 2017 at 12:13 PM, Денис <dyar...@yandex-team.ru> wrote:
flat_map/flat_multiset/flat_multimap:
I think implementing these containers in a similar to flat_set fashion is the way to go. I've seen a few implementations for std::set/std::map/flat_set/flat_map - most use common base class - I think this could be a way to go. For flat_set I took tests from C++, I suggest to do the same here.

Is it correct to assume that since most of the HQP types are maps rather than sets, the majority of the hoped-for perf wins in the HQP won't come until we add (and use) flat_map?

If yes, and this were next on your list, do you know when you'd be able to get back to this?  I'm keen to see this happen, but I know your time and energy is limited.

PK 

Денис

unread,
Mar 8, 2017, 12:36:38 PM3/8/17
to Chromium-dev, dyar...@yandex-team.ru
1) brettw is working on flat_maps here: https://codereview.chromium.org/2715433007/
2) actually the big problem for HQP is sets, most maps don't affect it much, except for one, that has already been addressed: https://codereview.chromium.org/1482443002 - flat_map didn't win over unordered when I tried it. So - just switching to sets is good enough I think, at least for now.

среда, 8 марта 2017 г., 12:29:34 UTC+3 пользователь Peter Kasting написал:

Peter Kasting

unread,
Mar 8, 2017, 5:08:17 PM3/8/17
to dyar...@yandex-team.ru, Chromium-dev
On Wed, Mar 8, 2017 at 9:36 AM, Денис <dyar...@yandex-team.ru> wrote:
1) brettw is working on flat_maps here: https://codereview.chromium.org/2715433007/

Oh, awesome!

2) actually the big problem for HQP is sets, most maps don't affect it much, except for one, that has already been addressed: https://codereview.chromium.org/1482443002 - flat_map didn't win over unordered when I tried it. So - just switching to sets is good enough I think, at least for now.

OK.

For clarity, once we have flat_map, should we try to use it for HQP maps in most places anyway for small perf wins and lower memory use?  Or would it be actively harmful or utterly meaningless churn?

PK

Денис

unread,
Mar 8, 2017, 5:41:21 PM3/8/17
to Chromium-dev, dyar...@yandex-team.ru


четверг, 9 марта 2017 г., 1:08:17 UTC+3 пользователь Peter Kasting написал:

Switching that unordered_map to flat can be actively harmful. Other maps don't seem to affect performance that much - so they probably can be switched.

Денис

unread,
Mar 8, 2017, 5:42:11 PM3/8/17
to Chromium-dev, dyar...@yandex-team.ru


четверг, 9 марта 2017 г., 1:08:17 UTC+3 пользователь Peter Kasting написал:
Reply all
Reply to author
Forward
0 new messages