Is it OK to make allocations that intentionally aren't freed? (was: Re: Is Big5 form submission fast enough?)

Henri Sivonen

unread,

May 19, 2017, 1:45:41 AM5/19/17

to dev-platform

On Tue, May 16, 2017 at 7:03 AM, Tim Guan-tin Chien
<timd...@mozilla.com> wrote:
> According to Alexa top 100 Taiwan sites and quick spot checks, I can only
> see the following two sites encoded in Big5:
>
> http://www.ruten.com.tw/
> https://www.momoshop.com.tw/
>
> Both are shopping sites (eBay-like and Amazon-like) so you get the idea how
> forms are used there.

Thank you. It seems to me that encoder performance doesn't really
matter for sites like these, since the number of characters one would
enter in the search field at a time is very small.

> Mike reminded me to check the Tax filing website: http://www.tax.nat.gov.tw/
> .Yes, it's unfortunately also in Big5.

I guess I'm not going to try filing taxes there for testing. :-)

- -

One option I've been thinking about is computing an encode
acceleration table for JIS X 0208 on the first attempt to encode a CJK
Unified Ideograph in any of Shift_JIS, EUC-JP or ISO-2022-JP, for GBK
on the first attempt to encode a CJK Unified Ideograph in either GBK
or gb18030, and for Big5 on the first attempt to encode a CJK Unified
Ideograph in Big5.

Each of the three tables would then remain allocated through to the
termination of the process.

This would have the advantage of not bloating our binary footprint
with data that can be computed from other data in the binary while
still making legacy Chinese and Japanese encode fast without a setup
cost for each encoder instance.

The downsides would be that the memory for the tables wouldn't be
reclaimed if the tables aren't needed anymore (the browser can't
predict the future) and executions where any of the tables has been
created wouldn't be valgrind-clean. Also, in the multi-process world,
the tables would be recomputed per-process. OTOH, if we shut down
rendered processes from time to time, it would work as a coarse
mechanism to reclaim the memory is case Japanese or Chinese legacy
encode is a relatively isolated event in the user's browsing pattern.

Creating a mechanism for the encoding library to become aware of
application shutdown just in order to be valgrind-clean would be
messy, though. (Currently, we have shutdown bugs where uconv gets used
after we've told it can shut down. I'd really want to avoid
re-introducing that class of bugs with encoding_rs.)

Is it OK to create allocations that are intentionally never freed
(i.e. process termination is what "frees" them)? Is valgrind's message
suppression mechanism granular enough to suppress three allocations
from a particular Rust crate statically linked into libxul?

--
Henri Sivonen
hsiv...@hsivonen.fi
https://hsivonen.fi/

Kris Maglione

unread,

May 19, 2017, 2:58:55 PM5/19/17

to Henri Sivonen, dev-platform

On Fri, May 19, 2017 at 08:44:58AM +0300, Henri Sivonen wrote:
>The downsides would be that the memory for the tables wouldn't be
>reclaimed if the tables aren't needed anymore (the browser can't
>predict the future) and executions where any of the tables has been
>created wouldn't be valgrind-clean.

If we do this, it would be nice to flush the tables when we get
a memory-pressure event, which should at least mitigate some of
the effects for users on memory-constrained systems.

And is there a reason ClearOnShutdown couldn't be used to deal
with valgrind issues?

That said, can we try to get some telemetry on how often we'd
need to build these tables, and how likely they are to be needed
again in the same process, before we make a decision?

Jeff Muizelaar

unread,

May 19, 2017, 3:09:42 PM5/19/17

to Henri Sivonen, dev-platform

We use functions like cairo_debug_reset_static_data() on shutdown to
handle cases like this.

-Jeff

> The downsides would be that the memory for the tables wouldn't be
> reclaimed if the tables aren't needed anymore (the browser can't
> predict the future) and executions where any of the tables has been

> created wouldn't be valgrind-clean. Also, in the multi-process world,
> the tables would be recomputed per-process. OTOH, if we shut down
> rendered processes from time to time, it would work as a coarse
> mechanism to reclaim the memory is case Japanese or Chinese legacy
> encode is a relatively isolated event in the user's browsing pattern.
>
> Creating a mechanism for the encoding library to become aware of
> application shutdown just in order to be valgrind-clean would be
> messy, though. (Currently, we have shutdown bugs where uconv gets used
> after we've told it can shut down. I'd really want to avoid
> re-introducing that class of bugs with encoding_rs.)
>
> Is it OK to create allocations that are intentionally never freed
> (i.e. process termination is what "frees" them)? Is valgrind's message
> suppression mechanism granular enough to suppress three allocations
> from a particular Rust crate statically linked into libxul?
>
> --
> Henri Sivonen
> hsiv...@hsivonen.fi
> https://hsivonen.fi/

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Jet Villegas

unread,

May 19, 2017, 3:22:38 PM5/19/17

to Henri Sivonen, dev-platform

Might be good to serialize to/from disk after the first run, so only
the first process pays the compute cost?

Eric Rahm

unread,

May 19, 2017, 3:38:36 PM5/19/17

to Kris Maglione, Henri Sivonen, dev-platform

I second Kris' concern re: memory, particularly given this is in multiple
processes. I'd suggest something more along the lines of a timeout; AFAICT
'memory-pressure' isn't actually fired in low-memory situations (it's still
useful, and registering for it and handling it would make sense for
anything that's caching large amounts of data). I'd certainly want to see
measurements on what the actual overhead is.

In general I'm not super concerned about leaking at shutdown, but of course
as you noted V, lsan, etc are and it makes life harder for our leak
checking frameworks. Kris and Jeff's suggestions seem reasonable.

I'd be less concerned about overhead if we had a good way of sharing these
static tables across processes (ICU seems like a good candidate as well).

-e

On Fri, May 19, 2017 at 11:58 AM, Kris Maglione <kmag...@mozilla.com>
wrote:

> On Fri, May 19, 2017 at 08:44:58AM +0300, Henri Sivonen wrote:
>

>> The downsides would be that the memory for the tables wouldn't be
>> reclaimed if the tables aren't needed anymore (the browser can't
>> predict the future) and executions where any of the tables has been
>> created wouldn't be valgrind-clean.
>>
>

> If we do this, it would be nice to flush the tables when we get a
> memory-pressure event, which should at least mitigate some of the effects
> for users on memory-constrained systems.
>
> And is there a reason ClearOnShutdown couldn't be used to deal with
> valgrind issues?
>
> That said, can we try to get some telemetry on how often we'd need to
> build these tables, and how likely they are to be needed again in the same
> process, before we make a decision?
>

Nicholas Nethercote

unread,

May 19, 2017, 10:38:36 PM5/19/17

to Jeff Muizelaar, Henri Sivonen, dev-platform

There's also a pre-processor constant that we define in Valgrind/ASAN/etc.
builds that you can check in order to free more stuff than you otherwise
would. But I can't for the life of me remember what it's called :(

Nick

On Sat, May 20, 2017 at 5:09 AM, Jeff Muizelaar <jmuiz...@mozilla.com>
wrote:

> We use functions like cairo_debug_reset_static_data() on shutdown to
> handle cases like this.
>
> -Jeff
>

> On Fri, May 19, 2017 at 1:44 AM, Henri Sivonen <hsiv...@hsivonen.fi>

> > The downsides would be that the memory for the tables wouldn't be
> > reclaimed if the tables aren't needed anymore (the browser can't
> > predict the future) and executions where any of the tables has been

> > created wouldn't be valgrind-clean. Also, in the multi-process world,
> > the tables would be recomputed per-process. OTOH, if we shut down
> > rendered processes from time to time, it would work as a coarse
> > mechanism to reclaim the memory is case Japanese or Chinese legacy
> > encode is a relatively isolated event in the user's browsing pattern.
> >
> > Creating a mechanism for the encoding library to become aware of
> > application shutdown just in order to be valgrind-clean would be
> > messy, though. (Currently, we have shutdown bugs where uconv gets used
> > after we've told it can shut down. I'd really want to avoid
> > re-introducing that class of bugs with encoding_rs.)
> >
> > Is it OK to create allocations that are intentionally never freed
> > (i.e. process termination is what "frees" them)? Is valgrind's message
> > suppression mechanism granular enough to suppress three allocations
> > from a particular Rust crate statically linked into libxul?
> >
> > --
> > Henri Sivonen
> > hsiv...@hsivonen.fi
> > https://hsivonen.fi/

Botond Ballo

unread,

May 19, 2017, 10:49:08 PM5/19/17

to Nicholas Nethercote, Jeff Muizelaar, dev-platform, Henri Sivonen

On Fri, May 19, 2017 at 10:38 PM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
> There's also a pre-processor constant that we define in Valgrind/ASAN/etc.
> builds that you can check in order to free more stuff than you otherwise
> would. But I can't for the life of me remember what it's called :(

It looks like some code checks for MOZ_ASAN and MOZ_VALGRIND.

Cheers,
Botond

Henri Sivonen

unread,

May 21, 2017, 8:47:04 AM5/21/17

to dev-platform

Thanks.

On Fri, May 19, 2017 at 10:09 PM, Jeff Muizelaar <jmuiz...@mozilla.com> wrote:
> We use functions like cairo_debug_reset_static_data() on shutdown to
> handle cases like this.

The comment there is encouraging, since it suggests that Cairo doesn't
attempt to deal with cairo_debug_reset_static_data() getting called
too early.

On Fri, May 19, 2017 at 9:58 PM, Kris Maglione <kmag...@mozilla.com> wrote:
> On Fri, May 19, 2017 at 08:44:58AM +0300, Henri Sivonen wrote:
>>

>> The downsides would be that the memory for the tables wouldn't be
>> reclaimed if the tables aren't needed anymore (the browser can't
>> predict the future) and executions where any of the tables has been
>> created wouldn't be valgrind-clean.
>
>

> If we do this, it would be nice to flush the tables when we get a
> memory-pressure event, which should at least mitigate some of the effects
> for users on memory-constrained systems.

How large would the tables have to be for it to be worthwhile, in your
estimate, to engineer a mechanism for dynamically dropping them (for
non-valgrind reasons)?

If there is only a one-way transition (first a table doesn't exist and
after some point in time it exists and will continue to exist), there
can be an atomic pointer to the table and no mutex involved when the
pointer is read as non-null. That is, only threads that see the
pointer as null would then obtain a mutex to make sure that only one
thread creates the table.

If the tables can go away, the whole use of the table becomes a
critical section and then entering the critical section on a
per-character basis probably becomes a bad idea and hoisting the mutex
acquisition to cover a larger swath of work means that the fact that
the table is dynamically created will leak from behind some small
lookup abstraction. That's doable, of course, but I'd really like to
avoid over-designing this, when there's a good chance that users
wouldn't even notice if GBK and Shift_JIS got the same slowdown as
Big5 got in Firefox 43.

I guess instead of looking at the relative slowness and pondering
acceleration tables, I should measure how much Chinese or Japanese
text a Raspberry Pi 3 (the underpowered ARM device I have access to
and that has predictable-enough scheduling to be benchmarkable in a
usefully repeatable way unlike Android devices) can legacy-encode in a
tenth of a second or 1/24th of a second without an acceleration table.
(I posit that with the network roundtrip happening afterwards, no one
is going to care if the form encode step in the legacy case takes up
to one movie frame duration. Possibly, the "don't care" allowance is
much larger.)

> And is there a reason ClearOnShutdown couldn't be used to deal with valgrind
> issues?

I could live with having a valgrind/ASAN-only clean-up method that
would be UB to call too early if I can genuinely not be on the hook
for someone calling it too early. I don't want to deal with what we
have now: First we tell our encoding framework to shut down but then
we still occasionally do stuff like parse URLs afterwards.

> That said, can we try to get some telemetry on how often we'd need to build
> these tables, and how likely they are to be needed again in the same
> process, before we make a decision?

I'd really like to proceed with work sooner than it takes to do the
whole round-trip of setting up telemetry and getting results. What
kind of thresholds would we be looking for to make decisions?

On Fri, May 19, 2017 at 10:38 PM, Eric Rahm <er...@mozilla.com> wrote:
> I'd be less concerned about overhead if we had a good way of sharing these
> static tables across processes

Seem sad to add process-awareness complexity to a library that
otherwise doesn't need to know about processes when the necessity of
fast legacy encode is itself doubtful.

> (ICU seems like a good candidate as well).

What do you mean?

On Fri, May 19, 2017 at 10:22 PM, Jet Villegas <jvil...@mozilla.com> wrote:
> Might be good to serialize to/from disk after the first run, so only
> the first process pays the compute cost?

Building the acceleration table would be more of a matter of writing
memory in a non-sequential order than about doing serious math.
Reading from disk would mostly have the benefit of making the memory
write sequential. I'd expect the general overhead of disk access to be
worse that a recompute. In any case, I don't want encoding_rs to have
to know about file system locations or file IO error situations.

Henri Sivonen

unread,

May 21, 2017, 10:22:34 AM5/21/17

to dev-platform

On Sun, May 21, 2017 at 3:46 PM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:
> I guess instead of looking at the relative slowness and pondering
> acceleration tables, I should measure how much Chinese or Japanese
> text a Raspberry Pi 3 (the underpowered ARM device I have access to
> and that has predictable-enough scheduling to be benchmarkable in a
> usefully repeatable way unlike Android devices) can legacy-encode in a
> tenth of a second or 1/24th of a second without an acceleration table.
> (I posit that with the network roundtrip happening afterwards, no one
> is going to care if the form encode step in the legacy case takes up
> to one movie frame duration. Possibly, the "don't care" allowance is
> much larger.)

Here are numbers from ARMv7 code running on RPi3:

UTF-16 to Shift_JIS: 626000 characters per second or the
human-readable non-markup text of a Wikipedia article in 1/60th of a
second.

UTF-16 to GB18030 (same as GBK for the dominant parts): 206000
characters per second or the human-readable non-markup text of a
Wikipedia article in 1/15th of a second

UTF-16 to Big5: 258000 characters per second or the human-readable
non-markup text of a Wikipedia article in 1/20th of a second

Considering that usually a user submits considerably less than a
Wikipedia article's worth of text in a form at a time, I think we can
conclude that as far as user perception of form submission goes, it's
OK to ship Japanese and Chinese legacy encoders that do linear search
over decode-optimized data (no encode-specific data structures at all)
and are extremely slow *relative* (by a factor of over 200!) to UTF-16
to UTF-8 encode.

The test data I used was:
https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_tw.txt
https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_cn.txt
https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/ja.txt

So it's human-authored text, but my understanding is that the
Simplified Chinese version has been machine-mapped from the
Traditional Chinese version, so it's possible that some slowness of
the Simplified Chinese case is attributable to the conversion from
Traditional Chinese exercising less common characters than if it had
been human-authored directly as Simplified Chinese.

Japanese is not fully ideographic and the kana mapping is a matter of
a range check plus offset, which is why the Shift_JIS case is so much
faster.

Andrew McCreight

unread,

May 21, 2017, 8:06:31 PM5/21/17

to dev-platform

On Fri, May 19, 2017 at 7:38 PM, Nicholas Nethercote <n.neth...@gmail.com

> wrote:

> There's also a pre-processor constant that we define in Valgrind/ASAN/etc.
> builds that you can check in order to free more stuff than you otherwise
> would. But I can't for the life of me remember what it's called :(
>

NS_FREE_PERMANENT_DATA.

Please don't just cobble together your own, as this will work in all of the
various configurations we care about for leak checking, present and future,
without any further work.

Nick
>
> On Sat, May 20, 2017 at 5:09 AM, Jeff Muizelaar <jmuiz...@mozilla.com>

> wrote:
>
> > We use functions like cairo_debug_reset_static_data() on shutdown to
> > handle cases like this.
> >

> > > The downsides would be that the memory for the tables wouldn't be
> > > reclaimed if the tables aren't needed anymore (the browser can't
> > > predict the future) and executions where any of the tables has been

> > > created wouldn't be valgrind-clean. Also, in the multi-process world,
> > > the tables would be recomputed per-process. OTOH, if we shut down
> > > rendered processes from time to time, it would work as a coarse
> > > mechanism to reclaim the memory is case Japanese or Chinese legacy
> > > encode is a relatively isolated event in the user's browsing pattern.
> > >
> > > Creating a mechanism for the encoding library to become aware of
> > > application shutdown just in order to be valgrind-clean would be
> > > messy, though. (Currently, we have shutdown bugs where uconv gets used
> > > after we've told it can shut down. I'd really want to avoid
> > > re-introducing that class of bugs with encoding_rs.)
> > >
> > > Is it OK to create allocations that are intentionally never freed
> > > (i.e. process termination is what "frees" them)? Is valgrind's message
> > > suppression mechanism granular enough to suppress three allocations
> > > from a particular Rust crate statically linked into libxul?
> > >

> > > --
> > > Henri Sivonen
> > > hsiv...@hsivonen.fi
> > > https://hsivonen.fi/

Nicholas Nethercote

unread,

May 22, 2017, 12:58:10 AM5/22/17

to Andrew McCreight, dev-platform

On Mon, May 22, 2017 at 10:06 AM, Andrew McCreight <amccr...@mozilla.com>
wrote:

>
> NS_FREE_PERMANENT_DATA.
>

That's it! (Thank you mccr8.)

Please use that one. Look at the existing uses for ideas.

Nick

Karl Tomlinson

unread,

May 22, 2017, 7:45:47 PM5/22/17

to

Andrew McCreight writes:

> On Fri, May 19, 2017 at 7:38 PM, Nicholas Nethercote <n.neth...@gmail.com
>> wrote:
>
>> There's also a pre-processor constant that we define in Valgrind/ASAN/etc.
>> builds that you can check in order to free more stuff than you otherwise
>> would. But I can't for the life of me remember what it's called :(
>>
>
> NS_FREE_PERMANENT_DATA.

I understand the need to clean up objects for
NS_BUILD_REFCNT_LOGGING, but how important is this for objects not
tracked with NS_BUILD_REFCNT_LOGGING, but only MOZ_VALGRIND and/or
MOZ_ASAN?

I thought I'd seen enough unfreed allocations that can be
referenced from a root in static storage, that I assumed that
valgrind and ASAN ignored these allocations in their leak report.

Is the inclusion of MOZ_VALGRIND and MOZ_ASAN in the value of
NS_FREE_PERMANENT_DATA merely to make leak detection easier for
valgrind and ASAN, or will such allocations be reported as leaks
by these tools?

Andrew McCreight

unread,

May 22, 2017, 9:24:15 PM5/22/17

to Karl Tomlinson, dev-platform

On Mon, May 22, 2017 at 4:45 PM, Karl Tomlinson <moz...@karlt.net> wrote:

> Andrew McCreight writes:
>
> > On Fri, May 19, 2017 at 7:38 PM, Nicholas Nethercote <
> n.neth...@gmail.com
> >> wrote:
> >
> >> There's also a pre-processor constant that we define in
> Valgrind/ASAN/etc.
> >> builds that you can check in order to free more stuff than you otherwise
> >> would. But I can't for the life of me remember what it's called :(
> >>
> >
> > NS_FREE_PERMANENT_DATA.
>
> I understand the need to clean up objects for
> NS_BUILD_REFCNT_LOGGING, but how important is this for objects not
> tracked with NS_BUILD_REFCNT_LOGGING, but only MOZ_VALGRIND and/or
> MOZ_ASAN?
>
> I thought I'd seen enough unfreed allocations that can be
> referenced from a root in static storage, that I assumed that
> valgrind and ASAN ignored these allocations in their leak report.
>

That is correct for LSan. I don't know how Valgrind reports leaks.

Is the inclusion of MOZ_VALGRIND and MOZ_ASAN in the value of
> NS_FREE_PERMANENT_DATA merely to make leak detection easier for
> valgrind and ASAN, or will such allocations be reported as leaks
> by these tools?
>

It will let you clean up things that are not referred to by static storage,
and change behavior in other ways that are needed for leak checking. For
instance, we do our shutdown CCs when it is defined, and disables the
timeout for shutting down a child process.