Support for Writable Memory Mapped Files

412 views
Skip to first unread message

Brian White

unread,
Oct 16, 2015, 2:28:26 PM10/16/15
to chromium-dev
I have a use-case (UMA metrics) where we're considering writing the values to file-backed memory so that metrics gathered during activities like shutdown and install can be persistently stored and forwarded next time Chrome is running normally.

Would it be reasonable to extend MemoryMappedFile to allow read/write access?

The reason for using an mmap'd file is speed.  UMA histograms have to be extremely low overhead and typically boil down to a few (predictable) jumps, a couple pointer dereferences, and an incremented atomic integer.  Having direct memory access to these is important and letting the OS flush out dirty pages at its convenience means less impact on Chrome itself.

  Brian
  bcw...@google.com
-----------------------------------------------------------------------------------------
Treat someone as they are and they will remain that way.
Treat someone as they can be and they will become that way.

Lei Zhang

unread,
Oct 16, 2015, 4:11:13 PM10/16/15
to Brian White, chromium-dev
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)
> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev

Brian White

unread,
Oct 16, 2015, 5:04:22 PM10/16/15
to Lei Zhang, chromium-dev
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

Windows is a bit different.  I could find this in a document from Microsoft:

The mapped page writer thread writes dirty pages from mapped files out to their backing store at timed intervals. In Windows Vista and later Windows releases, the mapped page writer sweeps through the dirty page list at regular intervals and flushes all the pages out to their backing store. If the number of free pages is low, the system accelerates the sweep by using a shorter interval. In earlier Windows releases, the mapped page writer flushed everything at absolute 5-minute intervals. Windows Vista writes dirty pages sooner than earlier Windows releases and the write operations typically involve less data.

It doesn't sound frequent enough to be a wear problem but it's a good point.  I'd expect that it's a problem that MS would have considered.

-- Brian

 

On Fri, Oct 16, 2015 at 11:27 AM, 'Brian White' via Chromium-dev
<chromi...@chromium.org> wrote:
> I have a use-case (UMA metrics) where we're considering writing the values
> to file-backed memory so that metrics gathered during activities like
> shutdown and install can be persistently stored and forwarded next time
> Chrome is running normally.
>
> Would it be reasonable to extend MemoryMappedFile to allow read/write
> access?
>
> The reason for using an mmap'd file is speed.  UMA histograms have to be
> extremely low overhead and typically boil down to a few (predictable) jumps,
> a couple pointer dereferences, and an incremented atomic integer.  Having
> direct memory access to these is important and letting the OS flush out
> dirty pages at its convenience means less impact on Chrome itself.
>
>   Brian
>   bcw...@google.com
> -----------------------------------------------------------------------------------------
> Treat someone as they are and they will remain that way.
> Treat someone as they can be and they will become that way.
>
> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev



--

Lei Zhang

unread,
Oct 16, 2015, 5:10:50 PM10/16/15
to Brian White, chromium-dev
So how often are you going to be writing out to disk? For preferences
as an example, Chromium keeps the data in memory as a DictionaryValue
(I think), and serialize to JSON when it writes it out to disk
periodically.

Brian White

unread,
Oct 16, 2015, 5:23:00 PM10/16/15
to Lei Zhang, chromium-dev
So how often are you going to be writing out to disk? For preferences
as an example, Chromium keeps the data in memory as a DictionaryValue
(I think), and serialize to JSON when it writes it out to disk
periodically.

Chrome likely won't, at least not until it exits.  Writes to disk will only happen when the OS deems it beneficial, likely every couple minutes under Windows according to the document I quoted.  Total data amount will likely be a couple pages -- I haven't gotten that far in my investigation.

Matthew Dempsky

unread,
Oct 16, 2015, 5:32:52 PM10/16/15
to bcw...@google.com, Lei Zhang, chromium-dev
The OS is also allowed to write the pages to disk out of order, and if the machine crashes you might have persisted an inconsistent view of memory.  Does that matter for your use case and/or do you have plans for how to address it?

--

Lei Zhang

unread,
Oct 16, 2015, 5:45:00 PM10/16/15
to Brian White, chromium-dev, Matthew Dempsky
base::ImportantFileWriter may be what you want...

Brian White

unread,
Oct 16, 2015, 10:20:32 PM10/16/15
to Matthew Dempsky, Lei Zhang, chromium-dev
The OS is also allowed to write the pages to disk out of order, and if the machine crashes you might have persisted an inconsistent view of memory.  Does that matter for your use case and/or do you have plans for how to address it?

OS crashes, reset buttons, and power failures are a potential problem.  I was thinking of making the allocator keep all individual histogram data within the same page so there won't be inconsistencies that way.  Generally they're only a few hundred bytes each.  In the worst case, we get a bunch of garbage data shipped up to UMA but I don't think that should happen or at least be rare enough that it gets lost in the noise.

None of this is essential data.  If an occasional few get lost, it's not a problem so long as it works the majority of the time.

Daniel Bratell

unread,
Oct 19, 2015, 10:58:47 AM10/19/15
to Lei Zhang, 'Brian White' via Chromium-dev, bcw...@google.com
On Fri, 16 Oct 2015 23:03:08 +0200, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:

I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

The other aspect is also important, could these prevent a disk from going to sleep that would otherwise have gone to sleep?

/Daniel

--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Brian White

unread,
Oct 19, 2015, 11:43:11 AM10/19/15
to Daniel Bratell, Alexei Svitkine, Lei Zhang, 'Brian White' via Chromium-dev
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

The other aspect is also important, could these prevent a disk from going to sleep that would otherwise have gone to sleep?

That's a good question.  When Chrome is in active use, there are always writes to the profile directory so no impact there.

Alexei, does Chrome have UMA Histograms that get updated when Chrome isn't actually in use?  If so, we might have to change them or differentiate some histograms as being RAM-only.

Torne (Richard Coles)

unread,
Oct 19, 2015, 11:43:11 AM10/19/15
to bcw...@google.com, Lei Zhang, chromium-dev
On Fri, 16 Oct 2015 at 22:03 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

It doesn't *guarantee* it unless msync(), but there is a kernel thread which goes around cleaning dirty pages to keep them to a manageable number, and so you can't really make any assumptions about causing minimal wear; it is perfectly legitimate for the kernel to instantly flush every single individual write you make to disk.

Brian White

unread,
Oct 19, 2015, 11:46:12 AM10/19/15
to Torne (Richard Coles), Lei Zhang, chromium-dev
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

It doesn't *guarantee* it unless msync(), but there is a kernel thread which goes around cleaning dirty pages to keep them to a manageable number, and so you can't really make any assumptions about causing minimal wear; it is perfectly legitimate for the kernel to instantly flush every single individual write you make to disk.

Right.  But as you said, it is only trying to keep it to a manageable number.  This won't be the first time mmap'd files are used this way.  The OS certainly has SSDs in mind when determining write policies.

Alexei Svitkine

unread,
Oct 19, 2015, 11:47:37 AM10/19/15
to Torne (Richard Coles), Brian White, Lei Zhang, chromium-dev
There are probably histograms logged from background posted tasks. Histogram use is extensive throughout Chrome, so I wouldn't be surprised if there are things like that.

I think there's definitely a lot of "gotchas" with such an approach and some of them we'll just have to dig into once we have a working prototype of this. Then we can figure out what the actual impact is instead of speculating about it.

Brian White

unread,
Oct 19, 2015, 11:54:51 AM10/19/15
to Alexei Svitkine, Torne (Richard Coles), Lei Zhang, chromium-dev
There are probably histograms logged from background posted tasks. Histogram use is extensive throughout Chrome, so I wouldn't be surprised if there are things like that.

That's pretty much what I expected.
 

I think there's definitely a lot of "gotchas" with such an approach and some of them we'll just have to dig into once we have a working prototype of this. Then we can figure out what the actual impact is instead of speculating about it.

Of course.  It helps make good initial design choices, though.

Torne (Richard Coles)

unread,
Oct 19, 2015, 12:09:16 PM10/19/15
to Brian White, Lei Zhang, chromium-dev
On Mon, 19 Oct 2015 at 16:45 Brian White <bcw...@google.com> wrote:
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

It doesn't *guarantee* it unless msync(), but there is a kernel thread which goes around cleaning dirty pages to keep them to a manageable number, and so you can't really make any assumptions about causing minimal wear; it is perfectly legitimate for the kernel to instantly flush every single individual write you make to disk.

Right.  But as you said, it is only trying to keep it to a manageable number.  This won't be the first time mmap'd files are used this way.  The OS certainly has SSDs in mind when determining write policies.

Sadly I think you're being very optimistic there and the kernel flush daemon pretty much only cares about numbers of memory pages in different states (in total, across the whole system) and will do whatever it needs to do to make those numbers be in ratios it likes, with somewhere between "little" and "absolutely no" consideration of which block device even backs any given page, let alone whether it's an SSD. :/

Brian White

unread,
Oct 19, 2015, 1:02:43 PM10/19/15
to Torne (Richard Coles), Lei Zhang, chromium-dev
I presume the data will be changing frequently in memory. How often
are you going to be writing it out to disk? Are you going to wear out
some SSDs? ;-)

Linux, at least, doesn't guarantee any write-back of dirty pages until the file becomes unmapped unless an explicit msync() call is made.

It doesn't *guarantee* it unless msync(), but there is a kernel thread which goes around cleaning dirty pages to keep them to a manageable number, and so you can't really make any assumptions about causing minimal wear; it is perfectly legitimate for the kernel to instantly flush every single individual write you make to disk.

Right.  But as you said, it is only trying to keep it to a manageable number.  This won't be the first time mmap'd files are used this way.  The OS certainly has SSDs in mind when determining write policies.

Sadly I think you're being very optimistic there and the kernel flush daemon pretty much only cares about numbers of memory pages in different states (in total, across the whole system) and will do whatever it needs to do to make those numbers be in ratios it likes, with somewhere between "little" and "absolutely no" consideration of which block device even backs any given page, let alone whether it's an SSD. :/

Perhaps, but it seems to me that Chrome wouldn't be the first program to do use mem-mapped files as persistent RAM so it'll have been a problem encountered before.

We may have to keep some more frequent histograms in volatile ram instead if it is a problem.  Is is okay, then, if I start work on supporting writable mmap'd files and worry about effective use of it once we have real data?

  Brian

Torne (Richard Coles)

unread,
Oct 19, 2015, 1:30:05 PM10/19/15
to Brian White, Lei Zhang, chromium-dev
Personally I'm inclined to believe that concerns about SSD wear rate aren't that important in real world usage :)

I was just noting that you can't really make any such assumption about the page cache. Generally when programs use mem-mapped files as persistent memory they do it because they *do* want it to be written to disk at a reasonable rate (e.g. to preserve state of files currently being edited and similar), and they just don't want to have to deal with the queueing/blocking themselves. Probably few of those programs care about whether they are wearing out someone's SSD.

Scott Hess

unread,
Oct 19, 2015, 1:56:51 PM10/19/15
to Torne (Richard Coles), Brian White, Lei Zhang, chromium-dev
I think the bit about "How will use of memory-mapped files affect SSDs" is a bit like philosophers debating how many teeth a horse has.  It doesn't really matter what assumptions other programmers make about how memory writes to mapped memory are reflected to disk, what matters is whether the OS schedules these disk writes in a way which is materially different than regular writes, given that our supported operating systems all use unified buffer caches.  Beyond that, our other disk-writing subsystems don't show much evidence of concern for SSD wear ...

One way to minimize this class of concern would be to have these be a new kind of histogram, then be selective about what gets to live in that bin, rather than worrying about problems which might be caused by high-volume histograms using memory-mapped files.

Another alternative would be to have them in shared memory, setup so that one of the sharing processes notes the crash and writes things to disk on the way down.  I'd suggest a process specifically dedicated to this, but we already have way too many processes, and if such a process wasn't active we might find that it's generally unresponsive at the point where it is most needed.

-scott


Lei Zhang

unread,
Oct 19, 2015, 2:03:50 PM10/19/15
to Torne (Richard Coles), Brian White, chromium-dev
Yes, the wink when I brought up SSDs means it's not a serious concern.
I'm still trying to understand how frequently data needs to be written
out to disk.

On Mon, Oct 19, 2015 at 10:28 AM, Torne (Richard Coles)
<to...@chromium.org> wrote:

Brian White

unread,
Oct 19, 2015, 2:16:07 PM10/19/15
to Lei Zhang, Torne (Richard Coles), chromium-dev
One way to minimize this class of concern would be to have these be a new kind of histogram, then be selective about what gets to live in that bin, rather than worrying about problems which might be caused by high-volume histograms using memory-mapped files.

Yes.  It would be easiest to simply replace the existing module but that may not be practical.
 

Yes, the wink when I brought up SSDs means it's not a serious concern.
I'm still trying to understand how frequently data needs to be written
out to disk.

It doesn't need to be written to disk at all.  It needs to be persistent across task restarts and be accessible by multiple processes (i.e. browser, rendered, setup, etc.).  It's beneficial to be written out to disk in order to survive system restarts.

Lei Zhang

unread,
Oct 19, 2015, 2:19:52 PM10/19/15
to Brian White, Torne (Richard Coles), chromium-dev
Does using ImportantFileWriter to periodically write it out work for you then?

Brian White

unread,
Oct 19, 2015, 4:07:42 PM10/19/15
to Lei Zhang, Torne (Richard Coles), chromium-dev
Does using ImportantFileWriter to periodically write it out work for you then?

You mean, create a shared memory segment and then call this during shutdown?

Problems I can think of off-hand:
  • It would have to be instantiated in other programs, too, such as setup.exe.  You could then have a race condition.
  • It would also delay the shutdown of Chrome rather than being async by the OS.
  • It would have to write the entire memory instead of just the changed pages (though it'll be a small amount of data).
  • It won't survive Chrome crashing, which is important because those histograms may be relevant to the crash.
Most of those would be mitigated by having a completely independent process doing the periodic writes or only writing if it detects it has the only handle to the shared memory.  Initialization would have to be careful of race conditions reading all the file contents back into memory.

It seems better to start with the mem-mapped file and switch to more complicated schemes if/when it is determined to be necessary.

-- Brian

Lei Zhang

unread,
Oct 20, 2015, 4:20:45 AM10/20/15
to Brian White, Torne (Richard Coles), chromium-dev
On Mon, Oct 19, 2015 at 1:06 PM, Brian White <bcw...@google.com> wrote:
>> Does using ImportantFileWriter to periodically write it out work for you
>> then?
>
>
> You mean, create a shared memory segment and then call this during shutdown?

I was thinking, if there's just the browser process, or only a single
process that's writing out this data, then create whatever you like in
memory, as long as you provide a way to serialize it on shutdown, or
periodically at whatever pace you feel is comfortable and is
reasonable for performance/power/etc.

> Problems I can think of off-hand:
>
> It would have to be instantiated in other programs, too, such as setup.exe.
> You could then have a race condition.

Hrm, I missed your earlier comment about "be accessible by multiple
processes (i.e. browser, rendered, setup, etc.)" What does
"accessible" mean exactly? Is everyone a reader? Is everyone a writer?
Are there multiple writers? Don't forget most processes are generally
sandboxed and don't have direct access to disk. The question started
out as mmaping files, but this requirement makes the issue more
complicated. Care to list all the requirements?

Egor Pasko

unread,
Oct 20, 2015, 7:27:35 AM10/20/15
to bcw...@google.com, chromium-dev
Important questions with file backed data:
1. Generally: what is your access pattern?
2. Particularly: how many 4K pages, how frequently the data is supposed to be read / updated?
3. How much consistency / availability is necessary? Do we wipe off all the data if it's inconsistent? How expensive is the consistency check? (inconsistency may come from crashes, memory reordering, less likely: bit flips in RAM or on the block device)

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Brian White

unread,
Oct 20, 2015, 8:22:05 AM10/20/15
to Lei Zhang, Torne (Richard Coles), chromium-dev
> Problems I can think of off-hand:
>
> It would have to be instantiated in other programs, too, such as setup.exe.
> You could then have a race condition.

Hrm, I missed your earlier comment about "be accessible by multiple
processes (i.e. browser, rendered, setup, etc.)" What does
"accessible" mean exactly? Is everyone a reader? Is everyone a writer?
Are there multiple writers? Don't forget most processes are generally
sandboxed and don't have direct access to disk. The question started
out as mmaping files, but this requirement makes the issue more
complicated. Care to list all the requirements?

Sure.  This thread was just about a part of a design discussed with my colleagues but if we're going to a full design review, here's the (start of a) design doc:


I was about to start a second thread asking if there was existing code to do malloc/free equivalents within a designated memory block.  ???

If the renderer can't access a file even to memory-map it at start-up, then that effectively eliminates memory-mapped files as the persistent storage mechanism.  The (first) alternative would be a simple shared-memory block, unbacked by disk.  This would cover all the cases except keeping stats across logout/shutdown... but that can be added at a later time.


Important questions with file backed data:
1. Generally: what is your access pattern?
2. Particularly: how many 4K pages, how frequently the data is supposed to be read / updated?
3. How much consistency / availability is necessary? Do we wipe off all the data if it's inconsistent? How expensive is the consistency check? (inconsistency may come from crashes, memory reordering, less likely: bit flips in RAM or on the block device)

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

All memory is backed by disk these days so you always have that risk, even in the current implementation.  Frequently accessed histograms would be unlikely to become unpinned.  In fact, since they'd all reside within the same few pages, even the infrequently accessed ones wouldn't become unpinned.

It's also possible that the situation would improve because, today, a histogram could be allocated as part of a page containing only infrequently accessed memory that easily gets swapped out.

However, all of the benefits can be achieved with any shared memory scheme.  It doesn't have to be a memory-mapped file; that just seemed easiest, at least to start.

Primiano Tucci

unread,
Oct 20, 2015, 10:10:08 AM10/20/15
to bcw...@google.com, Lei Zhang, Torne (Richard Coles), chromium-dev
Don't want to jam too much the thread, there are already lot of things into play here.
Just my $0.02: do we have precedent in chrome to write files via MAP_SHARED mmaps or would this be the first time?

what happens if you happen to have a corrupt filesystem? On Linux/Android:
write() to corrupt fs -> write fails, process stays alive
*deref_pointer to a MAP_SHARED mmap -> crash with SIGBUS

In other words, there is a possibility that this will increase the crash rate in presence of corrupted fs. is that ok? are we already in that state due to other subsystems?


Egor Pasko

unread,
Oct 21, 2015, 10:49:09 AM10/21/15
to Brian White, Lei Zhang, Torne (Richard Coles), chromium-dev
On Tue, Oct 20, 2015 at 2:20 PM, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
> Problems I can think of off-hand:
>
> It would have to be instantiated in other programs, too, such as setup.exe.
> You could then have a race condition.

Hrm, I missed your earlier comment about "be accessible by multiple
processes (i.e. browser, rendered, setup, etc.)" What does
"accessible" mean exactly? Is everyone a reader? Is everyone a writer?
Are there multiple writers? Don't forget most processes are generally
sandboxed and don't have direct access to disk. The question started
out as mmaping files, but this requirement makes the issue more
complicated. Care to list all the requirements?

Sure.  This thread was just about a part of a design discussed with my colleagues but if we're going to a full design review, here's the (start of a) design doc:


I was about to start a second thread asking if there was existing code to do malloc/free equivalents within a designated memory block.  ???

If the renderer can't access a file even to memory-map it at start-up, then that effectively eliminates memory-mapped files as the persistent storage mechanism.  The (first) alternative would be a simple shared-memory block, unbacked by disk.  This would cover all the cases except keeping stats across logout/shutdown... but that can be added at a later time.


Important questions with file backed data:
1. Generally: what is your access pattern?
2. Particularly: how many 4K pages, how frequently the data is supposed to be read / updated?
3. How much consistency / availability is necessary? Do we wipe off all the data if it's inconsistent? How expensive is the consistency check? (inconsistency may come from crashes, memory reordering, less likely: bit flips in RAM or on the block device)

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

All memory is backed by disk these days

I could not find it easily in the code. Any pointers?
Actually wow, sounds like lots of sources for random jank everywhere, I am scared now.
 
so you always have that risk, even in the current implementation.  Frequently accessed histograms would be unlikely to become unpinned.  In fact, since they'd all reside within the same few pages, even the infrequently accessed ones wouldn't become unpinned.

It's also possible that the situation would improve because, today, a histogram could be allocated as part of a page containing only infrequently accessed memory that easily gets swapped out.

However, all of the benefits can be achieved with any shared memory scheme.  It doesn't have to be a memory-mapped file; that just seemed easiest, at least to start.

  Brian
  bcw...@google.com
-----------------------------------------------------------------------------------------
Treat someone as they are and they will remain that way.
Treat someone as they can be and they will become that way.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Torne (Richard Coles)

unread,
Oct 21, 2015, 10:55:27 AM10/21/15
to Egor Pasko, Brian White, Lei Zhang, chromium-dev
On Wed, 21 Oct 2015 at 15:47 Egor Pasko <pa...@chromium.org> wrote:
On Tue, Oct 20, 2015 at 2:20 PM, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
> Problems I can think of off-hand:
>
> It would have to be instantiated in other programs, too, such as setup.exe.
> You could then have a race condition.

Hrm, I missed your earlier comment about "be accessible by multiple
processes (i.e. browser, rendered, setup, etc.)" What does
"accessible" mean exactly? Is everyone a reader? Is everyone a writer?
Are there multiple writers? Don't forget most processes are generally
sandboxed and don't have direct access to disk. The question started
out as mmaping files, but this requirement makes the issue more
complicated. Care to list all the requirements?

Sure.  This thread was just about a part of a design discussed with my colleagues but if we're going to a full design review, here's the (start of a) design doc:


I was about to start a second thread asking if there was existing code to do malloc/free equivalents within a designated memory block.  ???

If the renderer can't access a file even to memory-map it at start-up, then that effectively eliminates memory-mapped files as the persistent storage mechanism.  The (first) alternative would be a simple shared-memory block, unbacked by disk.  This would cover all the cases except keeping stats across logout/shutdown... but that can be added at a later time.


Important questions with file backed data:
1. Generally: what is your access pattern?
2. Particularly: how many 4K pages, how frequently the data is supposed to be read / updated?
3. How much consistency / availability is necessary? Do we wipe off all the data if it's inconsistent? How expensive is the consistency check? (inconsistency may come from crashes, memory reordering, less likely: bit flips in RAM or on the block device)

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

All memory is backed by disk these days

I could not find it easily in the code. Any pointers?

Modern operating systems assume all memory can be paged, either using the file it's mapped from if it's mapped from a file, or using swap if it's not, except in very special scenarios where you've demanded things be locked in RAM (typically for crypto use cases). There's nothing to find in our code, the kernel is doing it. :)
 
Actually wow, sounds like lots of sources for random jank everywhere, I am scared now.

Yes, this has been the case since people started doing paging in kernels. You have little control over when you will block on disk IO.

Brian White

unread,
Oct 21, 2015, 11:22:09 AM10/21/15
to Torne (Richard Coles), Egor Pasko, Lei Zhang, chromium-dev

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

All memory is backed by disk these days

I could not find it easily in the code. Any pointers?

Modern operating systems assume all memory can be paged, either using the file it's mapped from if it's mapped from a file, or using swap if it's not, except in very special scenarios where you've demanded things be locked in RAM (typically for crypto use cases). There's nothing to find in our code, the kernel is doing it. :)
 
Actually wow, sounds like lots of sources for random jank everywhere, I am scared now.

Yes, this has been the case since people started doing paging in kernels. You have little control over when you will block on disk IO.

... and is the reason why production machines do not have any default swap.  Jobs that want swap (not recommended for anything user-facing) need to create their own swap-file and activate that.

   Brian

Egor Pasko

unread,
Oct 21, 2015, 11:32:45 AM10/21/15
to Brian White, Torne (Richard Coles), Lei Zhang, chromium-dev
On Wed, Oct 21, 2015 at 5:20 PM, Brian White <bcw...@google.com> wrote:

Trying to answer (1): histograms are used on all threads (UI and IO threads included), and it's critical to never delay recording them. With file backed memory we don't have any guarantees on data availability. On a busy system it may take a 100ms to fetch the page from disk just to record 8 bytes of your histogram. I would consider these long tail delays highly undesirable.

All memory is backed by disk these days

I could not find it easily in the code. Any pointers?

Modern operating systems assume all memory can be paged, either using the file it's mapped from if it's mapped from a file, or using swap if it's not, except in very special scenarios where you've demanded things be locked in RAM (typically for crypto use cases). There's nothing to find in our code, the kernel is doing it. :)
 
Actually wow, sounds like lots of sources for random jank everywhere, I am scared now.

Yes, this has been the case since people started doing paging in kernels. You have little control over when you will block on disk IO.

... and is the reason why production machines do not have any default swap.  Jobs that want swap (not recommended for anything user-facing) need to create their own swap-file and activate that.

Oh, you guys are talking about swap? Okay, I know what swap is.

But then there is a very particular tiny niche case of Android, where (on devices without the #$%@ zram hack) there is no swap. Is it part of the plan to introduce more jank on these types of devices?

Torne (Richard Coles)

unread,
Oct 21, 2015, 11:33:39 AM10/21/15
to Egor Pasko, Brian White, Lei Zhang, chromium-dev
Paging does not require swap. Not having swap just means that memory *not* backed by a file is stuck in physical ram - memory that's backed by a file (e.g. all your actual executable code) is still paged.

Egor Pasko

unread,
Oct 21, 2015, 11:40:09 AM10/21/15
to Torne (Richard Coles), Brian White, Lei Zhang, chromium-dev
On Wed, Oct 21, 2015 at 5:32 PM, Torne (Richard Coles) <to...@chromium.org> wrote:
Paging does not require swap. Not having swap just means that memory *not* backed by a file is stuck in physical ram - memory that's backed by a file (e.g. all your actual executable code) is still paged.

Sorry, I still cannot figure out how memory paging is supposed to explain Brian's assertion: "All memory is backed by disk these days".

Torne (Richard Coles)

unread,
Oct 21, 2015, 11:40:49 AM10/21/15
to Egor Pasko, Brian White, Lei Zhang, chromium-dev
On pretty much every system Chromium supports other than the very specific class of Android devices that don't have zram, all memory is backed by disk these days.

On that specific class of Android devices, only a substantial chunk of chromium's memory is backed by disk and the rest isn't.

Egor Pasko

unread,
Oct 21, 2015, 11:45:34 AM10/21/15
to Torne (Richard Coles), Brian White, Lei Zhang, chromium-dev
On Wed, Oct 21, 2015 at 5:39 PM, Torne (Richard Coles) <to...@chromium.org> wrote:
On pretty much every system Chromium supports other than the very specific class of Android devices that don't have zram, all memory is backed by disk these days.

On that specific class of Android devices, only a substantial chunk of chromium's memory is backed by disk and the rest isn't.

Great! So I am very happy with less jank these devices have, and now the proposal is to have more jank in order for UMA histogram data to be persistent? I don't think it is a good tradeoff.

Alexei Svitkine

unread,
Oct 21, 2015, 11:50:19 AM10/21/15
to Egor Pasko, Torne (Richard Coles), Brian White, Lei Zhang, chromium-dev
I don't think we'll be converting the system whole-sale to the new approach without understanding the trade-offs (i.e. actually measuring what the impact is). I think initially, we just want a prototype of this system so we can evaluate how it behaves in practice. It might be that it will only makes sense for certain processes (e.g. child processes, setup.exe) and/or certain platforms. Again, hard to evaluate this without actually having a prototype implementation and seeing the impact in practice.

--

Brian White

unread,
Oct 21, 2015, 11:54:05 AM10/21/15
to Egor Pasko, Torne (Richard Coles), Lei Zhang, chromium-dev
On pretty much every system Chromium supports other than the very specific class of Android devices that don't have zram, all memory is backed by disk these days.

On that specific class of Android devices, only a substantial chunk of chromium's memory is backed by disk and the rest isn't.

Great! So I am very happy with less jank these devices have, and now the proposal is to have more jank in order for UMA histogram data to be persistent? I don't think it is a good tradeoff.

Definitely don't want more.  The shared memory segment wouldn't be backed by disk any more that current memory is.  It's possible that grouping the histogram closely in RAM could mean less paging since there's no chance a one could be placed in a page with otherwise infrequently accessed data.

It was my hope that a memory-mapped file wouldn't flush pages out to disk on any sort of periodic basis.  Thus, it would behave the same as existing RAM.  That may not be the case, however, so even though the writes to disk would be async by the OS it would still lead to more disk activity, with all the pitfalls thereof.  It may be best to have the shared memory be ram-only with an optional "other" process that persists it to disk should it be appropriate (i.e. all other processes have exited).

Torne (Richard Coles)

unread,
Oct 21, 2015, 12:46:35 PM10/21/15
to Brian White, Egor Pasko, Lei Zhang, chromium-dev
Egor's point is that on Android devices (at least ones without zram) the current memory is never backed by disk, and so making it a memory-mapped file *does* make it backed by disk more than it is currently, even if that's not the case on other platforms.

Bruce

unread,
Oct 22, 2015, 11:50:47 AM10/22/15
to Chromium-dev, bcw...@google.com, pa...@chromium.org, the...@chromium.org
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

However, if the memory is frequently touched then that won't happen - LRU should prevent it. And, I would expect that the 'priority' of the memory would be comparable to the 'priority' of the code (which is also backed by a file on disk). So, the odds of this file-backed UMA memory getting discarded because it is on disk should be similar to the odds of the code which updates it getting discarded to disk. So, while the risk of a disk access is increased, I don't think it is meaningfully increased.

BTW, there was mention of ImportantFileWriter as a way of maintaining data integrity. This should be reserved for when it is provably needed. I just removed some usage of this on Windows because the overhead (flushing the disk caches after each of many files was written) was exorbitant and the flushes weren't actually needed.


On Wednesday, October 21, 2015 at 12:46:35 PM UTC-4, Richard wrote:
Egor's point is that on Android devices (at least ones without zram) the current memory is never backed by disk, and so making it a memory-mapped file *does* make it backed by disk more than it is currently, even if that's not the case on other platforms.

Brian White

unread,
Oct 22, 2015, 2:16:37 PM10/22/15
to Bruce, Chromium-dev, Egor Pasko, Lei Zhang
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

However, if the memory is frequently touched then that won't happen - LRU should prevent it. And, I would expect that the 'priority' of the memory would be comparable to the 'priority' of the code (which is also backed by a file on disk). So, the odds of this file-backed UMA memory getting discarded because it is on disk should be similar to the odds of the code which updates it getting discarded to disk. So, while the risk of a disk access is increased, I don't think it is meaningfully increased.

I agree with that.  I think the concern was that the OS would do more writes for file-backed memory that it would with "virtual" memory since there is some expectation that the contents of the file will reflect written data.  Windows and Linux both seem to have some sort of background thread that "periodically" flushes dirty pages even if it does not discard them.

Additionally, systems without disk-backed memory (e.g. android) will now being doing page flushes when they wouldn't in a simple shared-memory-segment situation.  I hear ChromeOS doesn't have VM either.  Is that the case?

-- Brian


 

BTW, there was mention of ImportantFileWriter as a way of maintaining data integrity. This should be reserved for when it is provably needed. I just removed some usage of this on Windows because the overhead (flushing the disk caches after each of many files was written) was exorbitant and the flushes weren't actually needed.

On Wednesday, October 21, 2015 at 12:46:35 PM UTC-4, Richard wrote:
Egor's point is that on Android devices (at least ones without zram) the current memory is never backed by disk, and so making it a memory-mapped file *does* make it backed by disk more than it is currently, even if that's not the case on other platforms.

On Wed, 21 Oct 2015 at 16:52 Brian White <...> wrote:
On pretty much every system Chromium supports other than the very specific class of Android devices that don't have zram, all memory is backed by disk these days.

On that specific class of Android devices, only a substantial chunk of chromium's memory is backed by disk and the rest isn't.

Great! So I am very happy with less jank these devices have, and now the proposal is to have more jank in order for UMA histogram data to be persistent? I don't think it is a good tradeoff.

Definitely don't want more.  The shared memory segment wouldn't be backed by disk any more that current memory is.  It's possible that grouping the histogram closely in RAM could mean less paging since there's no chance a one could be placed in a page with otherwise infrequently accessed data.

It was my hope that a memory-mapped file wouldn't flush pages out to disk on any sort of periodic basis.  Thus, it would behave the same as existing RAM.  That may not be the case, however, so even though the writes to disk would be async by the OS it would still lead to more disk activity, with all the pitfalls thereof.  It may be best to have the shared memory be ram-only with an optional "other" process that persists it to disk should it be appropriate (i.e. all other processes have exited).

  Brian
  bcw...@google.com
-----------------------------------------------------------------------------------------
Treat someone as they are and they will remain that way.
Treat someone as they can be and they will become that way.




--

Scott Hess

unread,
Oct 22, 2015, 2:30:38 PM10/22/15
to Brian White, Bruce, Chromium-dev, Egor Pasko, Lei Zhang
On Thu, Oct 22, 2015 at 11:15 AM, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

However, if the memory is frequently touched then that won't happen - LRU should prevent it. And, I would expect that the 'priority' of the memory would be comparable to the 'priority' of the code (which is also backed by a file on disk). So, the odds of this file-backed UMA memory getting discarded because it is on disk should be similar to the odds of the code which updates it getting discarded to disk. So, while the risk of a disk access is increased, I don't think it is meaningfully increased.

I agree with that.  I think the concern was that the OS would do more writes for file-backed memory that it would with "virtual" memory since there is some expectation that the contents of the file will reflect written data.  Windows and Linux both seem to have some sort of background thread that "periodically" flushes dirty pages even if it does not discard them.

Additionally, systems without disk-backed memory (e.g. android) will now being doing page flushes when they wouldn't in a simple shared-memory-segment situation.  I hear ChromeOS doesn't have VM either.  Is that the case?

I think ChromeOS has zram these days.

My impression of the goal of your change is that the entire point is that the OS will be writing things to disk that it otherwise wouldn't have.  To me that feels like a good argument to be selective as to which histograms get this treatment.  Using shared memory with a dedicated logger would definitely help here - you'd have no impact except in the case when something unexpected happened.  It would lose data around OS crashes, though.

Another alternative might be to restrict which executions of Chrome get this treatment.  Like if 1% (or .01% on stable) were memory-mapped, that might provide enough data to diagnose shutdown problems.

Another alternative might be to restrict broad rollout to a specific platform where the downsides are more quantifiable.  It would probably be important to land and test the code on all platforms to make sure you don't design yourself into a corner, because once you're persisting data to disk you may find yourself stuck with some decisions.  Maybe you'll find that it doesn't provide enough data to be actionable.

-scott

Chris Hamilton

unread,
Oct 22, 2015, 2:37:01 PM10/22/15
to sh...@chromium.org, Brian White, Bruce, Chromium-dev, Egor Pasko, Lei Zhang
As I understand it the main point of the intended refactor is to tighten up the holes in UMA. So that when a process dies any pending data doesn't necessarily die with it.

Keeping these file-backed means we offload this work to the OS and that even the browser process can happily die and still leave behind a usable log of UMA data. However, that may actually be overkill. The main problem is dying renderers, and we don't need their shared memory segments to be file backed as long as the browser also maps them. Given the general concerns about the potential cost of the file I/O I think just keeping these as simple shared memory segments should suffice, and the decision can always be revisited later with actual performance data in hand.

Chris

Matthew Dempsky

unread,
Oct 22, 2015, 2:43:31 PM10/22/15
to bruce...@chromium.org, Chromium-dev, Brian White, pa...@chromium.org, Lei Zhang
On Thu, Oct 22, 2015 at 8:50 AM, Bruce <bruce...@chromium.org> wrote:
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

On POSIX systems, Chromium can use mlock() to force the memory pages to stay resident if this proves to be a concern.  I would imagine Windows has an analogous function.

Torne (Richard Coles)

unread,
Oct 23, 2015, 7:19:00 AM10/23/15
to mdem...@chromium.org, bruce...@chromium.org, Chromium-dev, Brian White, pa...@chromium.org, Lei Zhang
I am not 100% certain but I don't think that mlock() prevents dirty pages mapped from files being written back to disk. It only prevents them being evicted from memory (which prevents anonymous pages from being written to swap, because that only happens during eviction).

Also, there's a configurable limit on how much memory an unprivileged process can mlock() at once (to prevent denial of service).

--

Brian White

unread,
Oct 23, 2015, 9:47:50 AM10/23/15
to Chris Hamilton, Scott Hess, Bruce, Chromium-dev, Egor Pasko, Lei Zhang
As I understand it the main point of the intended refactor is to tighten up the holes in UMA. So that when a process dies any pending data doesn't necessarily die with it.

Keeping these file-backed means we offload this work to the OS and that even the browser process can happily die and still leave behind a usable log of UMA data. However, that may actually be overkill. The main problem is dying renderers, and we don't need their shared memory segments to be file backed as long as the browser also maps them. Given the general concerns about the potential cost of the file I/O I think just keeping these as simple shared memory segments should suffice, and the decision can always be revisited later with actual performance data in hand.

Right.  So the current plan is to make histograms work when stored within any arbitrary memory segment.  Code can then choose a process-local segment, shared-memory segment, file-backed segment, etc. as appropriate for the immediate use.

Renderers will likely use (un-backed) shared memory accessible by the Browser.  The Browser and Installer will likely use some sort of file-backed memory (either file-memory-mapped or local-memory-with-disk-write-on-exit).

It'll take some discussion (and testing) to make those final decisions as to which but it should be easy to do (and change) once the rest of it is working.

-- Brian

 

Chris

On Thu, 22 Oct 2015 at 14:28 Scott Hess <sh...@chromium.org> wrote:
On Thu, Oct 22, 2015 at 11:15 AM, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

However, if the memory is frequently touched then that won't happen - LRU should prevent it. And, I would expect that the 'priority' of the memory would be comparable to the 'priority' of the code (which is also backed by a file on disk). So, the odds of this file-backed UMA memory getting discarded because it is on disk should be similar to the odds of the code which updates it getting discarded to disk. So, while the risk of a disk access is increased, I don't think it is meaningfully increased.

I agree with that.  I think the concern was that the OS would do more writes for file-backed memory that it would with "virtual" memory since there is some expectation that the contents of the file will reflect written data.  Windows and Linux both seem to have some sort of background thread that "periodically" flushes dirty pages even if it does not discard them.

Additionally, systems without disk-backed memory (e.g. android) will now being doing page flushes when they wouldn't in a simple shared-memory-segment situation.  I hear ChromeOS doesn't have VM either.  Is that the case?

I think ChromeOS has zram these days.

My impression of the goal of your change is that the entire point is that the OS will be writing things to disk that it otherwise wouldn't have.  To me that feels like a good argument to be selective as to which histograms get this treatment.  Using shared memory with a dedicated logger would definitely help here - you'd have no impact except in the case when something unexpected happened.  It would lose data around OS crashes, though.

Another alternative might be to restrict which executions of Chrome get this treatment.  Like if 1% (or .01% on stable) were memory-mapped, that might provide enough data to diagnose shutdown problems.

Another alternative might be to restrict broad rollout to a specific platform where the downsides are more quantifiable.  It would probably be important to land and test the code on all platforms to make sure you don't design yourself into a corner, because once you're persisting data to disk you may find yourself stuck with some decisions.  Maybe you'll find that it doesn't provide enough data to be actionable.

-scott

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Egor Pasko

unread,
Oct 23, 2015, 10:02:20 AM10/23/15
to Torne (Richard Coles), mdem...@chromium.org, bruce...@chromium.org, Chromium-dev, Brian White, Lei Zhang
On Fri, Oct 23, 2015 at 1:17 PM, Torne (Richard Coles) <to...@chromium.org> wrote:
I am not 100% certain but I don't think that mlock() prevents dirty pages mapped from files being written back to disk. It only prevents them being evicted from memory (which prevents anonymous pages from being written to swap, because that only happens during eviction).

I have the same impression wrt mlock(), also it is not the API 'officially supported' by Android, so there is a danger that some OEMs may break/misconfigure it. 

On Thu, Oct 22, 2015 at 5:50 PM, Bruce <bruce...@chromium.org> wrote:
It is true that putting data in a memory mapped file instead of in allocated memory means that it is more likely to be discarded by the OS and then require reloading from disk.

there is a notable exception on Linux: files on tmpfs (not on Android though)

However, if the memory is frequently touched then that won't happen - LRU should prevent it. And, I would expect that the 'priority' of the memory would be comparable to the 'priority' of the code (which is also backed by a file on disk). So, the odds of this file-backed UMA memory getting discarded because it is on disk should be similar to the odds of the code which updates it getting discarded to disk. So, while the risk of a disk access is increased, I don't think it is meaningfully increased.

That's a good argument, thanks, I agree these few file-backed memory pages for UMA would probably be used often and won't get evicted more often than corresponding code.

To summarize my remaining concerns:

1. extra disk I/O at unpredictable times (not much, and we piled up a lot of unnecessary disk activity already, not adding disk I/O when possible is a good practice, benchmarking is hard here)

2. value of having this persistent data is not clear: on crashes/OOM/whatnot the data will likely be inconsistent (checksuming individual histograms is possible, but may get histograms disagreeing with one another, for example)

3. with shared memory across processes, cross-process locking is unreliable (see the doc for details)

Ideas on simpler systems that would provide more persistency than we have now:
A. have everything as it is now, reuse existing breakpad mechanisms to transfer and persist data on crashes
B. have a mmap-ed file per process, specifically for histograms (files should be open by browser process, and FDs passed to renderers/GPU) - no system-dependent shmem headaches
C. arrange each histogram to know its offset into the 'persistent region' at compile time - then can even use shared memory without allocator and without locking - each histogram updates atomically anyway, just need to reserve a few bits for checksums

I would prefer the option A, if any.

Chris Hamilton

unread,
Oct 23, 2015, 10:30:40 AM10/23/15
to pa...@google.com, Torne (Richard Coles), mdem...@chromium.org, bruce...@chromium.org, Chromium-dev, Brian White, Lei Zhang
1. extra disk I/O at unpredictable times (not much, and we piled up a lot of unnecessary disk activity already, not adding disk I/O when possible is a good practice, benchmarking is hard here)

Keeping these in memory only makes this a moot point. But I agree that extra disk IO is a concern if the mechanism ends up being file backed. 

2. value of having this persistent data is not clear: on crashes/OOM/whatnot the data will likely be inconsistent (checksuming individual histograms is possible, but may get histograms disagreeing with one another, for example)

Large value and a top feature request for the Chrome Metric folks for quite a while. Consistency isn't a problem, as atomic writes are already used for updating entries in a histogram bucket. The worst case scenario is samples in progress don't make it into a histogram, which is better than losing everything.

For new histograms that are in the process of being created care has to be taken to use "online" data structures, which are always navigable at any point during writes, even in progress. Worst case scenario is a histogram is in mid setup, but you simply navigate past it and ignore it. Not a problem since no data would yet have been stored in it.

Checksums are also useful and already used in UMA, so we'd be preserving the status quo here. (Makes sure histogram metadata is valid.)
 
3. with shared memory across processes, cross-process locking is unreliable (see the doc for details)

Locks shouldn't be needed. Updates are atomic writes, and only one process (the renderer) is actually updating the data. The other process is only hanging on to the shared memory to read it, and since the structure is always consistent it can be read and parsed at any time, always yielding usable data.
 
Ideas on simpler systems that would provide more persistency than we have now:
A. have everything as it is now, reuse existing breakpad mechanisms to transfer and persist data on crashes

There are also other reasons for the UMA revamp. It is desirable to have a reusable UMA metric writing component that can be easily used by other processes and software in the Chrome ecosystem (installer, foil, asan, browser process watcher, etc). These want to be able to leave UMA data in a convenient format for a browser process to pick up reports and deliver them to the server. The persistent layout we've been talking about doubles as a file format.
 
B. have a mmap-ed file per process, specifically for histograms (files should be open by browser process, and FDs passed to renderers/GPU) - no system-dependent shmem headaches

We've talked about this, but the file backing nature may be an issue, as you and others have pointed out. But we may still go this way for some processes/histograms. (Obviously not without first quantifying the impact.)
 
C. arrange each histogram to know its offset into the 'persistent region' at compile time - then can even use shared memory without allocator and without locking - each histogram updates atomically anyway, just need to reserve a few bits for checksums

We don't know all the histograms that will be generated at compile time, as many are generated with suffixes that are dynamic. In theory we know *all* of these from the histograms.xml, but allocating sufficient space for *all* histograms would be wasteful, as many won't be used. And that also kind of defeats the purpose of "sparse" histograms.
 
I would prefer the option A, if any.

--

Scott Hess

unread,
Oct 23, 2015, 10:47:05 AM10/23/15
to Chris Hamilton, Egor Pasko, Torne (Richard Coles), mdem...@chromium.org, bruce...@chromium.org, Chromium-dev, Brian White, Lei Zhang
On Fri, Oct 23, 2015 at 7:29 AM, Chris Hamilton <chr...@chromium.org> wrote:
For new histograms that are in the process of being created care has to be taken to use "online" data structures, which are always navigable at any point during writes, even in progress. Worst case scenario is a histogram is in mid setup, but you simply navigate past it and ignore it. Not a problem since no data would yet have been stored in it.

Note that there is no way to portably extend a file-backed shared-memory map.  On POSIX you have to ftruncate to add size then re-map the file.  I would guess that re-mapping the file would be a non-starter because there's no place to synchronize all of the possible writers (in case the segment moves), and requesting a new disjoint segment on demand would be a non-starter because you can't put that kind of synchronous bubble in the call, so I guess maybe keep an on-deck segment and post an async request for a new segment when you start using it?  And aggressively chunk growth.

-scott
 

Egor Pasko

unread,
Oct 23, 2015, 1:09:00 PM10/23/15
to Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, bruce...@chromium.org, Chromium-dev, Brian White, Lei Zhang
You are saying "Locks shouldn't be needed" above, and here you mention dynamic allocation of space for histograms from separate processes.

Assuming that we can solve the problem that shess@ mentioned with growing the area (which looks scary), I would be interested to see a lockfree algorithm for dynamically allocating 4byte chunks in a shared region, using string names as a parameter, making sure that the same name from all processes points to the same chunk. Are you proposing to create another ConcurrentHashMap in the lifetime of this universe or am I missing something? :)

Brian White

unread,
Oct 23, 2015, 1:20:52 PM10/23/15
to Egor Pasko, Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
You are saying "Locks shouldn't be needed" above, and here you mention dynamic allocation of space for histograms from separate processes.

Assuming that we can solve the problem that shess@ mentioned with growing the area (which looks scary), I would be interested to see a lockfree algorithm for dynamically allocating 4byte chunks in a shared region, using string names as a parameter, making sure that the same name from all processes points to the same chunk. Are you proposing to create another ConcurrentHashMap in the lifetime of this universe or am I missing something? :)

The allocation module can be lockless because it is only "alloc".  We don't even need "free".  Also, a histogram doesn't need to be globally unique (though it is space-saving to do so) because multiple ones with the same name will be merged during upload.  We can rely on eventual consistency of the data.

Growing the space would only be possible for memory-mapped disk files and then only on those systems that support it.  In other cases where shared memory is used, it'll require multiple shared-memory segments with a new one created and passed around as the previous one fills up.

  Brian

Chris Hamilton

unread,
Oct 23, 2015, 2:16:28 PM10/23/15
to Brian White, Egor Pasko, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
(And saying completely lockless is a bit of lie, as you effectively use a spinlock in performing an allocation, via NoBarrier_CompareAndSwap. Note that this is when carving memory out of an already existing shared memory segment. Acquiring the segment, or a new one when one is full, to begin with can also be done with eventual consistency: the renderer process creates a new segment, starts using it immediately, and asynchronously informs the browser of its existence.)

Any "indexing" structure on top of the shared memory can be built as usual, allowing for existing histograms to be found and reused rather than multiply defined in a single process. Such a structure would need locks, but this is no different than what already exists in base/metrics. The whole point is to have the metadata (description, bucket layouts) and statistics be in a shared memory segment in a "self documenting" format that can always be read. No need for the reader to be aware of any indexing which is used for efficiency during recording.

Egor Pasko

unread,
Oct 23, 2015, 2:26:11 PM10/23/15
to Chris Hamilton, Brian White, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
On Fri, Oct 23, 2015 at 8:15 PM, Chris Hamilton <chr...@chromium.org> wrote:
(And saying completely lockless is a bit of lie, as you effectively use a spinlock in performing an allocation, via NoBarrier_CompareAndSwap. Note that this is when carving memory out of an already existing shared memory segment. Acquiring the segment, or a new one when one is full, to begin with can also be done with eventual consistency: the renderer process creates a new segment, starts using it immediately, and asynchronously informs the browser of its existence.)

Any "indexing" structure on top of the shared memory can be built as usual, allowing for existing histograms to be found and reused rather than multiply defined in a single process. Such a structure would need locks, but this is no different than what already exists in base/metrics.

It is different because this time the locks are in the shared memory, so we need to be careful. A renderer may lock and crash leaving the browser deadlocked forever. I haven't found a reliable mechanism on Android to avoid this situation. Hence the questions. Also, I commented in Brian's doc about it, see some interesting discussions/links there.

--
Egor Pasko

Chris Hamilton

unread,
Oct 23, 2015, 2:34:17 PM10/23/15
to Egor Pasko, Brian White, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
The locks are only required in the single process doing the writing: the renderer. The CompareAndSwap serves both as a lock for multiple writers, and a way to ensure the data structure is always consistent. (And it's not actually to be used as a spin-lock, so no dead lock occurs. It just guards against concurrent writes.) It can be read without locks, which is all the browser will ever need to do.

The questions are more than welcome, as we've put some thought into this, but there's surely things we haven't considered. Many have already been pointed out :)

Egor Pasko

unread,
Oct 23, 2015, 4:49:53 PM10/23/15
to Chris Hamilton, Brian White, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
On Fri, Oct 23, 2015 at 8:33 PM, Chris Hamilton <chr...@chromium.org> wrote:
The locks are only required in the single process doing the writing: the renderer.

First reaction: Is this some sort of a single-renderer browser? :)

OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.
 
The CompareAndSwap serves both as a lock for multiple writers, and a way to ensure the data structure is always consistent. (And it's not actually to be used as a spin-lock, so no dead lock occurs. It just guards against concurrent writes.) It can be read without locks, which is all the browser will ever need to do.
 
OK, I think it *might* be possible to create a lockfree hashtable-like structure without remove() that is not too much slower than the current 2 atomic memory references (iirc). Some of its properties (such as memory-wastefulness) are not clear to me, which makes me slightly uncomfortable at the moment. Also the matter relies on base::subtle, which is .. well .. subtle. Building on top of that requires some care and knowledgeable people around.

Also there are some terminological problems in this thread, probably because we are throwing half-baked ideas and nobody designed the concurrency model yet. Let's take another approach: please take time to write your detailed proposal. In the concurrency model please define the use of lowlevel OS primitives (starting from Android please).

Then let me know when it's done, I'll look for subtleties and maybe will ask for clarifications. This should save us tons of time compared to numerous code rewrites and chasing hard-to-repro concurrency problems. Thanks.

The questions are more than welcome, as we've put some thought into this, but there's surely things we haven't considered. Many have already been pointed out :)

Sure, I am just hoping we all agree that these conversations save us tons of time.
 

Lei Zhang

unread,
Oct 23, 2015, 5:17:49 PM10/23/15
to Egor Pasko, Chris Hamilton, Brian White, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev
On Fri, Oct 23, 2015 at 1:48 PM, Egor Pasko <pa...@chromium.org> wrote:
> On Fri, Oct 23, 2015 at 8:33 PM, Chris Hamilton <chr...@chromium.org>
> wrote:
>>
>> The locks are only required in the single process doing the writing: the
>> renderer.
>
>
> First reaction: Is this some sort of a single-renderer browser? :)
>
> OK, I did not realize that you were talking about every renderer having
> their own dedicated piece of memory shared with the browser, and even each
> piece may have some duplication, which would be deduped at upload time. This
> is not large pieces of memory to duplicate 10 fold, so maybe it's ok.

FWIW, the current renderer process limit is ~80 for systems with lots
of RAM, so the max combined process limit is probably ~90, assuming we
don't have bugs where utility processes get spawned out of control.

Brian White

unread,
Oct 23, 2015, 5:25:44 PM10/23/15
to Egor Pasko, Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
\On Fri, Oct 23, 2015 at 8:33 PM, Chris Hamilton <chr...@chromium.org> wrote:
The locks are only required in the single process doing the writing: the renderer.

First reaction: Is this some sort of a single-renderer browser? :)

OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.

It may or may not share the same histogram space with all renderers; it hasn't been decided.  If so, it's possible (extremely narrow race condition) that multiple renderers would create the same histogram at the same moment and start writing to them.  Shortly thereafter, however, they would synchronize and both start updating the same one.  The other would be left for dead but any data stored to it before unification would be merged during the upload after which it would effectively be dead space.


 
The CompareAndSwap serves both as a lock for multiple writers, and a way to ensure the data structure is always consistent. (And it's not actually to be used as a spin-lock, so no dead lock occurs. It just guards against concurrent writes.) It can be read without locks, which is all the browser will ever need to do.
 
OK, I think it *might* be possible to create a lockfree hashtable-like structure without remove() that is not too much slower than the current 2 atomic memory references (iirc). Some of its properties (such as memory-wastefulness) are not clear to me, which makes me slightly uncomfortable at the moment. Also the matter relies on base::subtle, which is .. well .. subtle. Building on top of that requires some care and knowledgeable people around.

No shared hash-table.  In the shared memory is just a vector of allocations.  When a process detects that the vector has grown, it analyzes the new blocks and adds histograms found to its local knowledge (std::map).
 

Also there are some terminological problems in this thread, probably because we are throwing half-baked ideas and nobody designed the concurrency model yet. Let's take another approach: please take time to write your detailed proposal. In the concurrency model please define the use of lowlevel OS primitives (starting from Android please).

I have added "Allocation" plans to my document, including pseudo-code.  The only low-level primitive is CAS.

Matthew Dempsky

unread,
Oct 23, 2015, 6:36:15 PM10/23/15
to Torne (Richard Coles), bruce...@chromium.org, Chromium-dev, Brian White, pa...@chromium.org, Lei Zhang
Right, but I was responding to Bruce's concern about needing to reload memory from disk.

I'm convinced the extra disk I/O from occasional page cleaning is a real concern.

Lastly, on Linux desktop, the default RLIMIT_MEMLOCK is unlimited.  Even if it's lower, 1) I don't expect we should need to mlock() very much memory, and 2) if it fails, it's not a correctness issue, just a speed issue.  We can advise users to raise their RLIMIT_MEMLOCK settings for Chrome.

Matthew Dempsky

unread,
Oct 23, 2015, 6:37:13 PM10/23/15
to Torne (Richard Coles), bruce...@chromium.org, Chromium-dev, Brian White, Egor Pasko, Lei Zhang
On Fri, Oct 23, 2015 at 3:34 PM, Matthew Dempsky <mdem...@chromium.org> wrote:
I'm convinced the extra disk I/O from occasional page cleaning is a real concern.

Sigh, that should be I'm *NOT* convinced.

Lei Zhang

unread,
Oct 23, 2015, 6:43:33 PM10/23/15
to Matthew Dempsky, Torne (Richard Coles), Bruce Dawson, Chromium-dev, Brian White, Egor Pasko
The limits may vary per Linux distro. e.g. Fedora has a soft fd limit
of 1024. Recently I raised it to the hard limit on startup so users
can have 100 tabs open. It'll be good to check what the limits are in
various distros if this is an issue.

On Fri, Oct 23, 2015 at 3:34 PM, Matthew Dempsky <mdem...@chromium.org> wrote:

Egor Pasko

unread,
Oct 23, 2015, 9:23:59 PM10/23/15
to Brian White, Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang

On Fri, Oct 23, 2015 at 11:24 PM, Brian White <bcw...@google.com> wrote:
\On Fri, Oct 23, 2015 at 8:33 PM, Chris Hamilton <chr...@chromium.org> wrote:
The locks are only required in the single process doing the writing: the renderer.

First reaction: Is this some sort of a single-renderer browser? :)

OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.

It may or may not share the same histogram space with all renderers; it hasn't been decided.  If so, it's possible (extremely narrow race condition) that multiple

There must be no race condition. Narrowness is irrelevant.

Brian White

unread,
Oct 24, 2015, 10:33:03 AM10/24/15
to Egor Pasko, Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.

It may or may not share the same histogram space with all renderers; it hasn't been decided.  If so, it's possible (extremely narrow race condition) that multiple

There must be no race condition. Narrowness is irrelevant.

Sorry, but that's not the case.  Race conditions are fine as long as there are no adverse effects and since we're only concerned with eventual consistency, either outcome of the race is acceptable.  One is preferred but not required (as explained in the rest of the paragraph that you omitted from your quote).

  Brian

Alexander Potapenko

unread,
Oct 24, 2015, 1:46:19 PM10/24/15
to Brian White, Richard Coles, Chris Hamilton, Egor Pasko, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce

sent from phone


On Oct 24, 2015 4:32 PM, "'Brian White' via Chromium-dev" <chromi...@chromium.org> wrote:
>>>>
>>>> OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.
>>>
>>>
>>> It may or may not share the same histogram space with all renderers; it hasn't been decided.  If so, it's possible (extremely narrow race condition) that multiple
>>
>>
>> There must be no race condition. Narrowness is irrelevant.
>
>
> Sorry, but that's not the case.  Race conditions are fine as long as there are no adverse effects and since we're only concerned with eventual consistency, either outcome of the race is acceptable.  One is preferred but not required (as explained in the rest of the paragraph that you omitted from your quote).

I hope there is a misunderstanding caused by different definitions of the term "race condition". Egor is referring to the definition given by the C++ Standard, and such races indeed should not exist in our codebase (not sure this thread is a good place for further discussion of this topic). If I'm understanding correctly, you are referring to a situation with a non-deterministic result, but with synchronized memory accesses, which is ok.


>
>   Brian
>   bcw...@google.com
> -----------------------------------------------------------------------------------------
> Treat someone as they are and they will remain that way.
> Treat someone as they can be and they will become that way.
>

Brian White

unread,
Oct 25, 2015, 10:11:38 AM10/25/15
to Alexander Potapenko, Richard Coles, Chris Hamilton, Egor Pasko, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce

> Sorry, but that's not the case.  Race conditions are fine as long as there are no adverse effects and since we're only concerned with eventual consistency, either outcome of the race is acceptable.  One is preferred but not required (as explained in the rest of the paragraph that you omitted from your quote).

 

I hope there is a misunderstanding caused by different definitions of the term "race condition". Egor is referring to the definition given by the C++ Standard, and such races indeed should not exist in our codebase (not sure this thread is a good place for further discussion of this topic). If I'm understanding correctly, you are referring to a situation with a non-deterministic result, but with synchronized memory accesses, which is ok.

Makes sense.  My use of the term follows...
 

race condition is a special condition that may occur inside a critical section. A critical section is a section of code that is executed by multiple threads and where the sequence of execution for the threads makes a difference in the result of the concurrent execution of the critical section.

There's nothing in that definition that says that different results is not acceptable, which follows with the description that I included with my original use of the term.

Egor Pasko

unread,
Oct 26, 2015, 8:13:45 AM10/26/15
to Alexander Potapenko, Brian White, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce
On Sat, Oct 24, 2015 at 7:45 PM, Alexander Potapenko <gli...@google.com> wrote:

sent from phone
On Oct 24, 2015 4:32 PM, "'Brian White' via Chromium-dev" <chromi...@chromium.org> wrote:
>>>>
>>>> OK, I did not realize that you were talking about every renderer having their own dedicated piece of memory shared with the browser, and even each piece may have some duplication, which would be deduped at upload time. This is not large pieces of memory to duplicate 10 fold, so maybe it's ok.
>>>
>>>
>>> It may or may not share the same histogram space with all renderers; it hasn't been decided.  If so, it's possible (extremely narrow race condition) that multiple
>>
>>
>> There must be no race condition. Narrowness is irrelevant.
>
>
> Sorry, but that's not the case.  Race conditions are fine as long as there are no adverse effects and since we're only concerned with eventual consistency, either outcome of the race is acceptable.  One is preferred but not required (as explained in the rest of the paragraph that you omitted from your quote).
I hope there is a misunderstanding caused by different definitions of the term "race condition". Egor is referring to the definition given by the C++ Standard, and such races indeed should not exist in our codebase (not sure this thread is a good place for further discussion of this topic). If I'm understanding correctly, you are referring to a situation with a non-deterministic result, but with synchronized memory accesses, which is ok.

Yes, that's what I mean, thanks, Alexander. Out of curiosity, does TSan support searching for potential race conditions in shared memory used by multiple processes? If not, it could be another reason not to use tricky lockfree stuff in shared memory.

--
Egor Pasko

Egor Pasko

unread,
Oct 26, 2015, 10:44:20 AM10/26/15
to Brian White, Chris Hamilton, Torne (Richard Coles), Matthew Dempsky, Bruce, Chromium-dev, Lei Zhang
On Fri, Oct 23, 2015 at 11:24 PM, Brian White <bcw...@google.com> wrote:
Also there are some terminological problems in this thread, probably because we are throwing half-baked ideas and nobody designed the concurrency model yet. Let's take another approach: please take time to write your detailed proposal. In the concurrency model please define the use of lowlevel OS primitives (starting from Android please).

I have added "Allocation" plans to my document, including pseudo-code.  The only low-level primitive is CAS.

Thanks, that is useful. I found no race conditions, it should probably work (modulo our inability to grow shared memory segment), but will leak memory after each renderer dies. Did you mean some tricky platform-dependent tossing shared memory between renderers to make sure not too much memory is wasted? Otherwise, creating one shmem region per renderer seems simpler.

Then after that .. not using shmem is almost the same .. and even simpler (sorry, had to say that).

--
Egor Pasko

Brian White

unread,
Oct 26, 2015, 10:50:59 AM10/26/15
to Egor Pasko, Alexander Potapenko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce

> Sorry, but that's not the case.  Race conditions are fine as long as there are no adverse effects and since we're only concerned with eventual consistency, either outcome of the race is acceptable.  One is preferred but not required (as explained in the rest of the paragraph that you omitted from your quote).
I hope there is a misunderstanding caused by different definitions of the term "race condition". Egor is referring to the definition given by the C++ Standard, and such races indeed should not exist in our codebase (not sure this thread is a good place for further discussion of this topic). If I'm understanding correctly, you are referring to a situation with a non-deterministic result, but with synchronized memory accesses, which is ok.

Yes, that's what I mean, thanks, Alexander. Out of curiosity, does TSan support searching for potential race conditions in shared memory used by multiple processes? If not, it could be another reason not to use tricky lockfree stuff in shared memory.

Are you referring to this?  If not, would you provide a reference, please?

  Brian

Alexander Potapenko

unread,
Oct 26, 2015, 11:28:49 AM10/26/15
to Egor Pasko, Brian White, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Dmitriy Vyukov
On Mon, Oct 26, 2015 at 4:27 PM, Alexander Potapenko <gli...@google.com> wrote:
> +dvyukov
> I don't think this is gonna work out of the box. IIUC TSan doesn't
> know that a certain memory range is shared and doesn't share the
> corresponding shadow memory range as well.
> Adding Dima to comment on this.
>> --
>> Egor Pasko
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Dienerstraße 12
> 80331 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
> leiten Sie diese bitte nicht weiter, informieren Sie den
> Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
> This e-mail is confidential. If you are not the right addressee please
> do not forward it, please inform the sender, and please erase this
> e-mail including any attachments. Thanks.



--
Alexander Potapenko
Software Engineer
Google Moscow

Alexander Potapenko

unread,
Oct 26, 2015, 11:28:49 AM10/26/15
to Egor Pasko, Brian White, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce
+dvyukov

On Mon, Oct 26, 2015 at 1:11 PM, Egor Pasko <pa...@google.com> wrote:
>
>

Egor Pasko

unread,
Oct 27, 2015, 4:36:17 AM10/27/15
to Brian White, chromium-dev
On Mon, Oct 26, 2015 at 3:53 PM, Brian White <bcw...@google.com> wrote:
Also there are some terminological problems in this thread, probably because we are throwing half-baked ideas and nobody designed the concurrency model yet. Let's take another approach: please take time to write your detailed proposal. In the concurrency model please define the use of lowlevel OS primitives (starting from Android please).

I have added "Allocation" plans to my document, including pseudo-code.  The only low-level primitive is CAS.

Thanks, that is useful. I found no race conditions, it should probably work (modulo our inability to grow shared memory segment), but will leak memory after each renderer dies. Did you mean some tricky platform-dependent tossing shared memory between renderers to make sure not too much memory is wasted? Otherwise, creating one shmem region per renderer seems simpler.

If the segments are shared across all renderers, then they'll use the same set of histograms without allocating new ones.  If they're separate, then the shared memory segment will be discarded when a renderer dies.
 
Ah, I see, we can occasionally leak if a histogram is allocated by two processes within a short timeframe, and we think that this situation will not be common. Sgtm.
 
Then after that .. not using shmem is almost the same .. and even simpler (sorry, had to say that).

Persisting the data to a file has its own set of problems.  But can be accomplished with a single block-write of the memory segment (no longer shared but just allocated as a single block from the heap).  At least the main histogram code remains unchanged regardless of the storage mechanism.

One can merge it on the browser-side before writing to the file. That's very similar to how breakpad persists crash data. For simplicity (and performance) I would prefer to extend these mechanisms.

Dmitriy Vyukov

unread,
Oct 27, 2015, 9:03:59 AM10/27/15
to Alexander Potapenko, Egor Pasko, Brian White, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce
Tsan does not catch races between different processes. But it can
catch races on shared memory if both racy accesses happen in the same
process, so you would need a test that emulates several processes as
multiple threads.

Brian White

unread,
Mar 7, 2016, 11:19:24 AM3/7/16
to Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
And...  We're Back!

If you haven't been following the development of persistent metrics, an allocator for memory segments has been created, files are used to pass them from setup.exe to Chrome, and shared memory is used to pass them between Renderer and Browser.  Plus numerous other changes to the metrics system to deal with all this.

There are still cases, however, where a read/write memory-mapped file seems the best choice.  Case in point:

SyzyASAN runs inside Chrome Canary to do analysis.  It has its own "base" and thus a set of histograms independent of the main Chrome process.  The team would like to export those metrics and have them reported to UMA.

Using a shared memory segment (as renderer/browser) won't work because the data needs to live across process restarts for reporting during the next run.

Dumping everything to a file at exit (as setup does) doesn't work well because those metrics really should be reported periodically.  In addition, trying to dump them at exit may not work given the relationship between SyzyASAN and the browser, and even if it does, could run into complications with trying to overwrite files in use by the other process.

This is a really good case for a shared read/write memory-mapped file between these two systems.  It affects only the Canary build and then only the 5% of runs that get SyzyASAN enabled.

So...  To support this use-case...  Is there any objection to me adding read/write support to the MemoryMappedFile class?

Primiano Tucci

unread,
Mar 7, 2016, 11:34:44 AM3/7/16
to bcw...@google.com, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
Who is the writer going to be? The child process?
IIRC on linux if you have two processes which mmap the same fd and one of them ftruncate(fd, 0), the other one will crash (with a SIGBUS IIRC) while trying to access the mmap-ed region (if didn't page fault before) as the underlying backing file has shrunk.
If you translate this in chrome terms, the quesiton becomes: can a malicious chilld process ftruncate() the shared fd ? If the answer is yes it means that a malicious renderer can cause the crash of the browser, which sounds bad.


Scott Hess

unread,
Mar 7, 2016, 11:46:19 AM3/7/16
to Primiano Tucci, Brian White, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
ftruncate() is blocked by the renderer sandbox on OSX and Linux.  So it's plausible that this has already been prevented (perhaps for the reason described).

-scott

Primiano Tucci

unread,
Mar 7, 2016, 11:49:03 AM3/7/16
to Scott Hess, Brian White, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
Good point but not on Android though [1], well right now we don't have bpf at all on most devices, but even when we get there won't be prevented.

Chris Hamilton

unread,
Mar 7, 2016, 11:50:28 AM3/7/16
to Scott Hess, Primiano Tucci, Brian White, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil

In this use case (SyzyAsan) the reader and writer are in the same process, but in different modules. However, in the general case this could be cross process.

Brian White

unread,
Mar 7, 2016, 12:13:23 PM3/7/16
to Primiano Tucci, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
Who is the writer going to be? The child process?
IIRC on linux if you have two processes which mmap the same fd and one of them ftruncate(fd, 0), the other one will crash (with a SIGBUS IIRC) while trying to access the mmap-ed region (if didn't page fault before) as the underlying backing file has shrunk.
If you translate this in chrome terms, the quesiton becomes: can a malicious chilld process ftruncate() the shared fd ? If the answer is yes it means that a malicious renderer can cause the crash of the browser, which sounds bad.

Good to know.  However, this won't be used for renderer/browser communication.

-- Brian

Peter Kasting

unread,
Mar 7, 2016, 5:22:58 PM3/7/16
to Brian White, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
On Mon, Mar 7, 2016 at 8:17 AM, 'Brian White' via Chromium-dev <chromi...@chromium.org> wrote:
So...  To support this use-case...  Is there any objection to me adding read/write support to the MemoryMappedFile class?

At the risk of saying something dumb, I'm going to ask a question without having read the preceding part of this thread:

Assuming earlier messages have expressed reservations towards the generalized use of these capabilities, how will you add this in a way that prevents other authors in the future from doing (whatever people were worried about)?  Just add comments saying "please don't use this unless X"?  Or maybe go further and have a subclass of MemoryMappedFile, in some relevant SyzyASAN-related directory, which adds write support, so that only this use case can see and use it?

PK

Brian White

unread,
Mar 7, 2016, 5:31:51 PM3/7/16
to Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
At the risk of saying something dumb, I'm going to ask a question without having read the preceding part of this thread:

Assuming earlier messages have expressed reservations towards the generalized use of these capabilities, how will you add this in a way that prevents other authors in the future from doing (whatever people were worried about)?  Just add comments saying "please don't use this unless X"?  Or maybe go further and have a subclass of MemoryMappedFile, in some relevant SyzyASAN-related directory, which adds write support, so that only this use case can see and use it?

Memory-mapped files, whether they be read-only or read-write, have the problem that they effectively allow file I/O to be done on any thread.  Adding read/write support doesn't change anything in that regard.

The concerns previously expressed were due to the use of such for Metrics collection, something that happens all the time and in potentially time-critical places in the code and which currently exist only in RAM so making them file-backed would have been something new.

A read/write memory-mapped file may actually be easier on the process than doing direct I/O because the system is free to flush that page whenever it sees fit.  Of course, it can also buffer explicit I/O, so who really knows...

The bottom line is that there's no way of knowing if use of such is an improvement or a detriment; it has to be determined on a case-by-case basis.

Primiano Tucci

unread,
Mar 8, 2016, 5:00:17 AM3/8/16
to bcw...@google.com, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
how will you add this in a way that prevents other authors in the future from doing (whatever people were worried about)?  Just add comments saying "please don't use this unless X"? 

Precisely.
My worry is that by introducing easy RW mmaped files in base is that it would make it easier to accidentally buy a class of very subtle bugs. Specifically I worry that people will:
 - start making RW files fly across the IPC boundary to co-share RW-MemoryMapped files with the browser and open the door to crash-on-truncate errors like explained above.
 - start blanket-replacing file I/O with rw-mmaps which looks easier and leaner to use but: (i) Introduces jank on aribtrary threads out of the control of todays's AllowedIO checks. (ii) increases the risk of crashes on corrupted filesystems. read()/write() on a corrupted fs returns an error. mmap() on a corrupted fs causes the entire process to crash (IIRC shess@ has seen this happening for real swithing sqlite from raw IO to mmap)


 Adding read/write support doesn't change anything in that regard.

You are right, concretely you can already achieve all this today without any base support, just doing mmap yourself. Or you can have some of the problems above with the already existing read-only base::MemoryMappedFile.

What I am getting at is not that mmap is bad and we shouldn't use it unconditionally. But mmap is a pretty dangerous gun and we shouldn't encourage people to use it without knowing the risks.
My $0.02 is that exposing writable base::MemoryMappedFiles can make it too appealing to use from the author-side and induce a false sense of safety on the reviewer side.

Scott Hess

unread,
Mar 8, 2016, 7:38:44 AM3/8/16
to Primiano Tucci, Brian White, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
On Tue, Mar 8, 2016 at 1:59 AM, Primiano Tucci <prim...@chromium.org> wrote:
 - start blanket-replacing file I/O with rw-mmaps which looks easier and leaner to use but: (i) Introduces jank on aribtrary threads out of the control of todays's AllowedIO checks. (ii) increases the risk of crashes on corrupted filesystems. read()/write() on a corrupted fs returns an error. mmap() on a corrupted fs causes the entire process to crash (IIRC shess@ has seen this happening for real swithing sqlite from raw IO to mmap)

I think a significant fraction of those crashes are due to profiles on removable or network storage.  That usage is not at all supported by Chromium, in the sense that the previous errors also weren't generally being handled.  The corrupted-fs case is really also outside of Chromium's ability to deal with successfully (Chromium cannot fix it, and via swap it could affect literally any of Chromium's memory).

-scott

Brian White

unread,
Mar 8, 2016, 9:00:52 AM3/8/16
to Primiano Tucci, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
how will you add this in a way that prevents other authors in the future from doing (whatever people were worried about)?  Just add comments saying "please don't use this unless X"? 

Precisely.
My worry is that by introducing easy RW mmaped files in base is that it would make it easier to accidentally buy a class of very subtle bugs. Specifically I worry that people will:
 - start making RW files fly across the IPC boundary to co-share RW-MemoryMapped files with the browser and open the door to crash-on-truncate errors like explained above.
 - start blanket-replacing file I/O with rw-mmaps which looks easier and leaner to use but: (i) Introduces jank on aribtrary threads out of the control of todays's AllowedIO checks. (ii) increases the risk of crashes on corrupted filesystems. read()/write() on a corrupted fs returns an error. mmap() on a corrupted fs causes the entire process to crash (IIRC shess@ has seen this happening for real swithing sqlite from raw IO to mmap)

There must already exist a thousand ways inside Chrome to introduce jank and other subtle usability issues.  This one, at least, is fairly obvious.  It's also limited because it's not like the system is going to allocate objects in that memory that would then get used arbitrarily.  A developer has to code for it specifically, knowing that it's file-backed memory, and can be checked in code-review.  Impact will be localized; these issues from writable memory won't "leak" into other areas of the system.

Also, it is reads (the existing functionality) that are the issue because of page-faults.  The writing thread won't be impacted because the system flushes those to disk on its own time.

Primiano Tucci

unread,
Mar 8, 2016, 9:59:30 AM3/8/16
to Brian White, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
On Tue, Mar 8, 2016 at 2:00 PM Brian White <bcw...@google.com> wrote:
There must already exist a thousand ways inside Chrome to introduce jank 
and other subtle usability issues. 
This is typically not a good reason for adding a new one. there are entire teams which work on multi year projects to reduce jank.
 
  This one, at least, is fairly obvious.  It's also limited because it's not like the system is going to allocate objects in that memory that would then get used arbitrarily.
With the exception of:
 - vm readahead [1], that will pull in pages you never requested (on top of the standard I/O read-ahead mechanisms of the block layer), which might be beneficial or a waste of (clean) memory depending on your memory access patterns (in which case you are advised to use madvise(), but now how do you factor that in?).
- writeback that might cause higher peaks than direct write()'s, as dirty memory gets stashes until you hit the expiration timeout or exceed the dirty_ratio [2] threshold, or reach a memory pressure situation.

Also, it is reads (the existing functionality) that are the issue because of page-faults.  The writing thread won't be impacted because the system flushes those to disk on its own time.
Until you do the last write which exceeds the dirty_ratio/dirty_bytes threshold, in which case you now just caused your thread to be blocked on the entire writeback [3], which according to [4] mentions  "For less than 1s think time (ext3/4 may block the dirtier for up to 800ms from time to time on 1-HDD)"

Yet again, I am not saying we shouldn't use mmaped memory. There have been cases, like the aforementioned shess@ one (crbug.com/555578) where this is has been hugely beneficial.
But, as you see from this thread, there are a lot of catches associated, and IMHO very easy to do more harm than good. This is my only reason why I'd like that to see that as a feature which is really discouraged unless you really have strong arguments for it.
At very least, this seems to me more a ::subtle API.

That's my $0.02, but I'd like to hear other folks opinions at this point. I'm definitely too much attached to the perf topic and far from being unbiased.


Brian White

unread,
Mar 8, 2016, 11:38:46 AM3/8/16
to Primiano Tucci, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
On Tue, Mar 8, 2016 at 2:00 PM Brian White <bcw...@google.com> wrote:
There must already exist a thousand ways inside Chrome to introduce jank 
and other subtle usability issues. 
This is typically not a good reason for adding a new one. there are entire teams which work on multi year projects to reduce jank.

Of course.  But just because something could be (mis-)used to cause jank doesn't mean it's a reason to not add it.

 
  This one, at least, is fairly obvious.  It's also limited because it's not like the system is going to allocate objects in that memory that would then get used arbitrarily.
With the exception of:
 - vm readahead [1], that will pull in pages you never requested (on top of the standard I/O read-ahead mechanisms of the block layer), which might be beneficial or a waste of (clean) memory depending on your memory access patterns (in which case you are advised to use madvise(), but now how do you factor that in?).

Also true for read-only, which already exists.  Madvise() support is possible; my initial use can easily specify which part of the file it expects to access.

 
- writeback that might cause higher peaks than direct write()'s, as dirty memory gets stashes until you hit the expiration timeout or exceed the dirty_ratio [2] threshold, or reach a memory pressure situation.

Would it be significantly different than write-back disk cache?  And if you're hitting memory pressure, you might get swapping of process ram.  Plus, if using file I/O, then you have both the in-ram copy and the disk buffers so there is twice the memory pressure from that particular allocation.

 
Also, it is reads (the existing functionality) that are the issue because of page-faults.  The writing thread won't be impacted because the system flushes those to disk on its own time.
Until you do the last write which exceeds the dirty_ratio/dirty_bytes threshold, in which case you now just caused your thread to be blocked on the entire writeback [3], which according to [4] mentions  "For less than 1s think time (ext3/4 may block the dirtier for up to 800ms from time to time on 1-HDD)"

Is the code saying minimum time between flushes is 1s (800ms)?  Meaning that if a process were to write the last X byte about 700ms after the last flush, the next flush would be 300ms (100ms) later?
 

Yet again, I am not saying we shouldn't use mmaped memory. There have been cases, like the aforementioned shess@ one (crbug.com/555578) where this is has been hugely beneficial.
But, as you see from this thread, there are a lot of catches associated, and IMHO very easy to do more harm than good. This is my only reason why I'd like that to see that as a feature which is really discouraged unless you really have strong arguments for it.
At very least, this seems to me more a ::subtle API.

Using subtle:: will have to be someone else's call than myself but it seems to me that such should have been done for the existing MemoryMappedFile if it was to be done at all.

Using the object is already protected by checks for being an I/O thread so subtle:: doesn't gain much there.  Once mapped, using the memory will lose any subtle:: designation anyway.


These are all good points and the possible side-effects are well-considered.  Any new method of accessing slow devices needs to be considered and the potential dangers made clear to any who would use it.

Brian White

unread,
Mar 10, 2016, 12:38:24 PM3/10/16
to Primiano Tucci, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
Does the sound of crickets mean that it's okay for me to add this capability for the intended purpose (i.e. asan testing on Canary)?

With copious comments warning of the potential pitfalls of careless use of file-backed memory?

-- Brian

--

Alexander Potapenko

unread,
Mar 14, 2016, 11:19:39 AM3/14/16
to Brian White, Primiano Tucci, Peter Kasting, Dmitriy Vyukov, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Lei Zhang, Bruce, Georges Khalil
Since this is an additional way to pass data between processes, the
security team might want to take a look at the design.
(No other drawbacks come to my mind)
--

Lei Zhang

unread,
Mar 30, 2016, 8:29:03 PM3/30/16
to Brian White, Peter Kasting, Dmitriy Vyukov, Alexander Potapenko, Egor Pasko, Richard Coles, Chris Hamilton, Matthew Dempsky, Chromium-dev, Bruce, Georges Khalil
On Mon, Mar 7, 2016 at 2:30 PM, Brian White <bcw...@google.com> wrote:
>> At the risk of saying something dumb, I'm going to ask a question without
>> having read the preceding part of this thread:
>>
>> Assuming earlier messages have expressed reservations towards the
>> generalized use of these capabilities, how will you add this in a way that
>> prevents other authors in the future from doing (whatever people were
>> worried about)? Just add comments saying "please don't use this unless X"?
>> Or maybe go further and have a subclass of MemoryMappedFile, in some
>> relevant SyzyASAN-related directory, which adds write support, so that only
>> this use case can see and use it?
>
>
> Memory-mapped files, whether they be read-only or read-write, have the
> problem that they effectively allow file I/O to be done on any thread.
> Adding read/write support doesn't change anything in that regard.

Many uses of base::MemoryMappedFile() follow the pattern of calling
Initialize(), and then immediately processing the mapped data.
Initialize() DCHECKs to make sure it is on a thread that allows file
I/O. Though there's no code mechanism to prevent calling Initialize()
on one thread and then passing the mapped data pointer to another.

I also think you should just use mmap() in your own code for now,
figure out if it is right for your use case, and what special tweaks
you may need. When the dust has settled, we can separately work out if
that code belongs in / fits well with base::MemoryMappedFile. FWIW,
there already exists a few other places that do writable memory maps.
Reply all
Reply to author
Forward
0 new messages