Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Intent to implement and ship: navigator.hardwareConcurrency

673 views
Skip to first unread message

Rik Cabanier

unread,
May 12, 2014, 8:03:21 PM5/12/14
to dev-pl...@lists.mozilla.org
Primary eng emails
caba...@adobe.com, bug...@eligrey.com

*Proposal*
http://wiki.whatwg.org/wiki/NavigatorCores

*Summary*
Expose a property on navigator called hardwareConcurrency that returns the
number of logical cores on a machine.

*Motivation*
All native platforms expose this property, It's reasonable to expose the
same capabilities that native applications get so web applications can be
developed with equivalent features and performance.

*Mozilla bug*
https://bugzilla.mozilla.org/show_bug.cgi?id=1008453
The patch is currently not behind a runtime flag, but I could add it if
requested.

*Concerns*
The original proposal required that a platform must return the exact number
of logical CPU cores. To mitigate the fingerprinting concern, the proposal
was updated so a user agent can "lie" about this.
In the case of WebKit, it will return a maximum of 8 logical cores so high
value machines can't be discovered. (Note that it's already possible to do
a rough estimate of the number of cores)

*Compatibility Risk*
Blink: approved for implementation and shipping [1]
WebKit: almost approved for implementation [2]
Internet Explorer: No public signals
Web developers: Positive

1:
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/xwl0ab20hVc
2: https://bugs.webkit.org/show_bug.cgi?id=132588
https://lists.webkit.org/pipermail/webkit-dev/2014-May/026511.html

Joshua Cranmer 🐧

unread,
May 13, 2014, 1:15:02 AM5/13/14
to
On 5/12/2014 7:03 PM, Rik Cabanier wrote:
> *Concerns*
> The original proposal required that a platform must return the exact number
> of logical CPU cores. To mitigate the fingerprinting concern, the proposal
> was updated so a user agent can "lie" about this.
> In the case of WebKit, it will return a maximum of 8 logical cores so high
> value machines can't be discovered. (Note that it's already possible to do
> a rough estimate of the number of cores)

The discussion on the WHATWG mailing list covered a lot more than the
fingerprinting concern. Namely:
1. The user may not want to let web applications hog all of the cores on
a machine, and exposing this kind of metric makes it easier for
(good-faith) applications to inadvertently do this.
2. It's not clear that this feature is necessary to build high-quality
threading workload applications. In fact, it's possible that this
technique makes it easier to build inferior applications, relying on a
potentially inferior metric. (Note, for example, the disagreement on
figuring out what you should use for make -j if you have N cores).

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Rik Cabanier

unread,
May 13, 2014, 2:37:41 AM5/13/14
to Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com>wrote:

> On 5/12/2014 7:03 PM, Rik Cabanier wrote:
>
>> *Concerns*
>>
>> The original proposal required that a platform must return the exact
>> number
>> of logical CPU cores. To mitigate the fingerprinting concern, the proposal
>> was updated so a user agent can "lie" about this.
>> In the case of WebKit, it will return a maximum of 8 logical cores so high
>> value machines can't be discovered. (Note that it's already possible to do
>> a rough estimate of the number of cores)
>>
>
> The discussion on the WHATWG mailing list covered a lot more than the
> fingerprinting concern. Namely:
> 1. The user may not want to let web applications hog all of the cores on a
> machine, and exposing this kind of metric makes it easier for (good-faith)
> applications to inadvertently do this.
>

Web applications can already do this today. There's nothing stopping them
from figuring out the CPU's and trying to use them all.
Worse, I think they will likely optimize for popular platforms which either
overtax or underutilize non-popular ones.


> 2. It's not clear that this feature is necessary to build high-quality
> threading workload applications. In fact, it's possible that this technique
> makes it easier to build inferior applications, relying on a potentially
> inferior metric. (Note, for example, the disagreement on figuring out what
> you should use for make -j if you have N cores).


Everyone is in agreement that that is a hard problem to fix and that there
is no clear answer.
Whatever solution is picked (maybe like Grand Central or Intel TBB), most
solutions will still want to know how many cores are available.
Looking at the native platform (and Adobe's applications), many query the
operating system for this information to balance the workload. I don't see
why this would be different for the web platform.

Rik Cabanier

unread,
May 13, 2014, 11:04:50 AM5/13/14
to Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com>wrote:

FYI people brought up the same arguments on the WebKit bug [1] and Filip
did a great job explaining why this attribute is needed.

1: https://bugs.webkit.org/show_bug.cgi?id=132588

Ehsan Akhgari

unread,
May 13, 2014, 11:20:00 AM5/13/14
to Rik Cabanier, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com> wrote:

> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com
> >wrote:
>
> > On 5/12/2014 7:03 PM, Rik Cabanier wrote:
> >
> >> *Concerns*
> >>
> >> The original proposal required that a platform must return the exact
> >> number
> >> of logical CPU cores. To mitigate the fingerprinting concern, the
> proposal
> >> was updated so a user agent can "lie" about this.
> >> In the case of WebKit, it will return a maximum of 8 logical cores so
> high
> >> value machines can't be discovered. (Note that it's already possible to
> do
> >> a rough estimate of the number of cores)
> >>
> >
> > The discussion on the WHATWG mailing list covered a lot more than the
> > fingerprinting concern. Namely:
> > 1. The user may not want to let web applications hog all of the cores on
> a
> > machine, and exposing this kind of metric makes it easier for
> (good-faith)
> > applications to inadvertently do this.
> >
>
> Web applications can already do this today. There's nothing stopping them
> from figuring out the CPU's and trying to use them all.
> Worse, I think they will likely optimize for popular platforms which either
> overtax or underutilize non-popular ones.
>

Can you please provide some examples of actual web applications that do
this, and what they're exactly trying to do with the number once they
estimate one? (Eli's timing attack demos don't count. ;-)


> > 2. It's not clear that this feature is necessary to build high-quality
> > threading workload applications. In fact, it's possible that this
> technique
> > makes it easier to build inferior applications, relying on a potentially
> > inferior metric. (Note, for example, the disagreement on figuring out
> what
> > you should use for make -j if you have N cores).
>
>
> Everyone is in agreement that that is a hard problem to fix and that there
> is no clear answer.
> Whatever solution is picked (maybe like Grand Central or Intel TBB), most
> solutions will still want to know how many cores are available.
> Looking at the native platform (and Adobe's applications), many query the
> operating system for this information to balance the workload. I don't see
> why this would be different for the web platform.
>

I don't think that the value exposed by the native platforms is
particularly useful. Really if the use case is to try to adapt the number
of workers to a number that will allow you to run them all concurrently,
that is not the same number as reported traditionally by the native
platforms. If you try Eli's test case in Firefox under different workloads
(for example, while building Firefox, doing a disk intensive operation,
etc.), the utter inaccuracy of the results is proof in the ineffectiveness
of this number in my opinion.

Also, I worry that this API is too focused on the past/present. For
example, I don't think anyone sufficiently addressed Boris' concern on the
whatwg thread about AMP vs SMP systems. This proposal also assumes that
the UA itself is mostly contempt with using a single core, which is true
for the current browser engines, but we're working on changing that
assumption in Servo. It also doesn't take the possibility of several ones
of these web application running at the same time.

Until these issues are addressed, I do not think we should implement or
ship this feature.

Cheers,
Ehsan

Rik Cabanier

unread,
May 13, 2014, 12:25:35 PM5/13/14
to Ehsan Akhgari, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 8:20 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com> wrote:
>
>> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com
>> >wrote:
>>
>> > On 5/12/2014 7:03 PM, Rik Cabanier wrote:
>> >
>> >> *Concerns*
>> >>
>> >> The original proposal required that a platform must return the exact
>> >> number
>> >> of logical CPU cores. To mitigate the fingerprinting concern, the
>> proposal
>> >> was updated so a user agent can "lie" about this.
>> >> In the case of WebKit, it will return a maximum of 8 logical cores so
>> high
>> >> value machines can't be discovered. (Note that it's already possible
>> to do
>> >> a rough estimate of the number of cores)
>> >>
>> >
>> > The discussion on the WHATWG mailing list covered a lot more than the
>> > fingerprinting concern. Namely:
>> > 1. The user may not want to let web applications hog all of the cores
>> on a
>> > machine, and exposing this kind of metric makes it easier for
>> (good-faith)
>> > applications to inadvertently do this.
>> >
>>
>> Web applications can already do this today. There's nothing stopping them
>> from figuring out the CPU's and trying to use them all.
>> Worse, I think they will likely optimize for popular platforms which
>> either
>> overtax or underutilize non-popular ones.
>>
>
> Can you please provide some examples of actual web applications that do
> this, and what they're exactly trying to do with the number once they
> estimate one? (Eli's timing attack demos don't count. ;-)
>

Eli's listed some examples:
http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases
I don't have any other cases where this is done. Maybe PDF.js would be
interested. They use workers to render pages and decompress images so I
could see how this is useful to them.


> > 2. It's not clear that this feature is necessary to build high-quality
>> > threading workload applications. In fact, it's possible that this
>> technique
>> > makes it easier to build inferior applications, relying on a potentially
>> > inferior metric. (Note, for example, the disagreement on figuring out
>> what
>> > you should use for make -j if you have N cores).
>>
>>
>> Everyone is in agreement that that is a hard problem to fix and that there
>> is no clear answer.
>> Whatever solution is picked (maybe like Grand Central or Intel TBB), most
>> solutions will still want to know how many cores are available.
>> Looking at the native platform (and Adobe's applications), many query the
>> operating system for this information to balance the workload. I don't see
>> why this would be different for the web platform.
>>
>
> I don't think that the value exposed by the native platforms is
> particularly useful. Really if the use case is to try to adapt the number
> of workers to a number that will allow you to run them all concurrently,
> that is not the same number as reported traditionally by the native
> platforms.
>

Why not? How is the web platform different?


> If you try Eli's test case in Firefox under different workloads (for
> example, while building Firefox, doing a disk intensive operation, etc.),
> the utter inaccuracy of the results is proof in the ineffectiveness of this
> number in my opinion.
>

As Eli mentioned, you can run the algorithm for longer and get a more
accurate result. Again, if the native platform didn't support this, doing
this in C++ would result in the same.


> Also, I worry that this API is too focused on the past/present. For
> example, I don't think anyone sufficiently addressed Boris' concern on the
> whatwg thread about AMP vs SMP systems.
>

Can you provide a link to that? Are there systems that expose this to the
user? (AFAIK slow cores are substituted with fast ones on the fly.)


> This proposal also assumes that the UA itself is mostly contempt with
> using a single core, which is true for the current browser engines, but
> we're working on changing that assumption in Servo. It also doesn't take
> the possibility of several ones of these web application running at the
> same time.
>

How is this different from the native platform?


> Until these issues are addressed, I do not think we should implement or
> ship this feature.
>

FWIW these issues were already discussed in the WebKit bug.
I find it odd that we don't want to give authors access to such a basic
feature. Not everything needs to be solved by a complex framework.

Tom Schuster

unread,
May 13, 2014, 12:55:28 PM5/13/14
to Rik Cabanier, Joshua Cranmer 🐧, Ehsan Akhgari, dev-pl...@lists.mozilla.org
I recently saw this bug about implementing navigator.getFeature, wouldn't
it make sense for this to be like hardware.memory, but hardware.cores?


On Tue, May 13, 2014 at 6:25 PM, Rik Cabanier <caba...@gmail.com> wrote:

> On Tue, May 13, 2014 at 8:20 AM, Ehsan Akhgari <ehsan....@gmail.com
> >wrote:
>
> > On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com>
> wrote:
> >
> >> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <
> Pidg...@gmail.com
> >> >wrote:
> >>
> >> > On 5/12/2014 7:03 PM, Rik Cabanier wrote:
> >> >
> >> >> *Concerns*
> >> >>
> >> >> The original proposal required that a platform must return the exact
> >> >> number
> >> >> of logical CPU cores. To mitigate the fingerprinting concern, the
> >> proposal
> >> >> was updated so a user agent can "lie" about this.
> >> >> In the case of WebKit, it will return a maximum of 8 logical cores so
> >> high
> >> >> value machines can't be discovered. (Note that it's already possible
> >> to do
> >> >> a rough estimate of the number of cores)
> >> >>
> >> >
> >> > The discussion on the WHATWG mailing list covered a lot more than the
> >> > fingerprinting concern. Namely:
> >> > 1. The user may not want to let web applications hog all of the cores
> >> on a
> >> > machine, and exposing this kind of metric makes it easier for
> >> (good-faith)
> >> > applications to inadvertently do this.
> >> >
> >>
> >> Web applications can already do this today. There's nothing stopping
> them
> >> from figuring out the CPU's and trying to use them all.
> >> Worse, I think they will likely optimize for popular platforms which
> >> either
> >> overtax or underutilize non-popular ones.
> >>
> >
> > Can you please provide some examples of actual web applications that do
> > this, and what they're exactly trying to do with the number once they
> > estimate one? (Eli's timing attack demos don't count. ;-)
> >
>
> Eli's listed some examples:
> http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases
> I don't have any other cases where this is done. Maybe PDF.js would be
> interested. They use workers to render pages and decompress images so I
> could see how this is useful to them.
>
>
> > > 2. It's not clear that this feature is necessary to build high-quality
> >> > threading workload applications. In fact, it's possible that this
> >> technique
> >> > makes it easier to build inferior applications, relying on a
> potentially
> >> > inferior metric. (Note, for example, the disagreement on figuring out
> >> what
> >> > you should use for make -j if you have N cores).
> >>
> >>
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Eli Grey

unread,
May 13, 2014, 1:35:22 PM5/13/14
to Ehsan Akhgari, Joshua Cranmer 🐧, Rik Cabanier, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Can you please provide some examples of actual web applications that do
> this, and what they're exactly trying to do with the number once they
> estimate one? (Eli's timing attack demos don't count. ;-)

One example of a website in the wild that is currently using
navigator.hardwareConcurrency with my polyfill is
http://danielsadventure.info/html5fractal/

On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> I don't think that the value exposed by the native platforms is
> particularly useful. Really if the use case is to try to adapt the number
> of workers to a number that will allow you to run them all concurrently,
> that is not the same number as reported traditionally by the native
> platforms


Can you back that up with a real-world example desktop application
that behaves as such?

Every highly parallel desktop application that I have (HandBrake, xz,
Photoshop, GIMP, Blender (CPU-based render modes)) use all available
CPU cores and keep the same threadpool size throughout the application
life. Can you provide a single example of a one desktop application
that resizes its threadpool based on load, as opposed to allowing the
OS scheduler to do its job? The use case of
navigator.hardwareConcurrency is not to "adapt the number of workers
to a number that will allow you to run them all concurrently". The use
case is sizing a threadpool so that that an application can perform
parallel tasks with as many system CPU resources as it can get.

You state that "this API is too focused on the past/present". I may be
compressing some data with xz while also compiling Firefox. If both of
these applications use 12 threads on my 12-thread Intel CPU, the OS
scheduler balances the loads so that they both finish as fast as
possible. If I use only 1 thread for compression while compiling
Firefox, Firefox may finish compiling faster, but my compression will
undoubtedly take longer.

On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com> wrote:
>
>> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com
>> >wrote:
>>
>> > On 5/12/2014 7:03 PM, Rik Cabanier wrote:
>> >
>> >> *Concerns*
>> >>
>> >> The original proposal required that a platform must return the exact
>> >> number
>> >> of logical CPU cores. To mitigate the fingerprinting concern, the
>> proposal
>> >> was updated so a user agent can "lie" about this.
>> >> In the case of WebKit, it will return a maximum of 8 logical cores so
>> high
>> >> value machines can't be discovered. (Note that it's already possible to
>> do
>> >> a rough estimate of the number of cores)
>> >>
>> >
>> > The discussion on the WHATWG mailing list covered a lot more than the
>> > fingerprinting concern. Namely:
>> > 1. The user may not want to let web applications hog all of the cores on
>> a
>> > machine, and exposing this kind of metric makes it easier for
>> (good-faith)
>> > applications to inadvertently do this.
>> >
>>
>> Web applications can already do this today. There's nothing stopping them
>> from figuring out the CPU's and trying to use them all.
>> Worse, I think they will likely optimize for popular platforms which either
>> overtax or underutilize non-popular ones.
>>
>
> Can you please provide some examples of actual web applications that do
> this, and what they're exactly trying to do with the number once they
> estimate one? (Eli's timing attack demos don't count. ;-)
>
>
>> > 2. It's not clear that this feature is necessary to build high-quality
>> > threading workload applications. In fact, it's possible that this
>> technique
>> > makes it easier to build inferior applications, relying on a potentially
>> > inferior metric. (Note, for example, the disagreement on figuring out
>> what
>> > you should use for make -j if you have N cores).
>>
>>
>> Everyone is in agreement that that is a hard problem to fix and that there
>> is no clear answer.
>> Whatever solution is picked (maybe like Grand Central or Intel TBB), most
>> solutions will still want to know how many cores are available.
>> Looking at the native platform (and Adobe's applications), many query the
>> operating system for this information to balance the workload. I don't see
>> why this would be different for the web platform.
>>
>
> I don't think that the value exposed by the native platforms is
> particularly useful. Really if the use case is to try to adapt the number
> of workers to a number that will allow you to run them all concurrently,
> that is not the same number as reported traditionally by the native
> platforms. If you try Eli's test case in Firefox under different workloads
> (for example, while building Firefox, doing a disk intensive operation,
> etc.), the utter inaccuracy of the results is proof in the ineffectiveness
> of this number in my opinion.
>
> Also, I worry that this API is too focused on the past/present. For
> example, I don't think anyone sufficiently addressed Boris' concern on the
> whatwg thread about AMP vs SMP systems. This proposal also assumes that
> the UA itself is mostly contempt with using a single core, which is true
> for the current browser engines, but we're working on changing that
> assumption in Servo. It also doesn't take the possibility of several ones
> of these web application running at the same time.
>
> Until these issues are addressed, I do not think we should implement or
> ship this feature.
>
> Cheers,
> Ehsan

Ehsan Akhgari

unread,
May 13, 2014, 1:43:16 PM5/13/14
to Rik Cabanier, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On 2014-05-13, 9:25 AM, Rik Cabanier wrote:
> Web applications can already do this today. There's nothing
> stopping them
> from figuring out the CPU's and trying to use them all.
> Worse, I think they will likely optimize for popular platforms
> which either
> overtax or underutilize non-popular ones.
>
>
> Can you please provide some examples of actual web applications that
> do this, and what they're exactly trying to do with the number once
> they estimate one? (Eli's timing attack demos don't count. ;-)
>
>
That is a list of use cases which could use better ways of supporting a
worker pool that actually scales to how many cores you have available at
any given point in time. That is *not* what
navigator.hardwareConcurrency gives you, so I don't find those examples
very convincing.

(Note that I would be very eager to discuss a proposal that actually
tries to solve that problem.)

> I don't have any other cases where this is done.

That really makes me question the "positive feedback from web
developers" cited in the original post on this thread. Can you please
point us to places where that feedback is documented?

> Maybe PDF.js would be
> interested. They use workers to render pages and decompress images so I
> could see how this is useful to them.

I'm not aware of that use case for pdf.js.

> Everyone is in agreement that that is a hard problem to fix and
> that there
> is no clear answer.
> Whatever solution is picked (maybe like Grand Central or Intel
> TBB), most
> solutions will still want to know how many cores are available.
> Looking at the native platform (and Adobe's applications), many
> query the
> operating system for this information to balance the workload. I
> don't see
> why this would be different for the web platform.
>
>
> I don't think that the value exposed by the native platforms is
> particularly useful. Really if the use case is to try to adapt the
> number of workers to a number that will allow you to run them all
> concurrently, that is not the same number as reported traditionally
> by the native platforms.
>
>
> Why not? How is the web platform different?

Here's why I find the native platform parity argument unconvincing here.
This is not the only primitive that native platforms expose to make it
possible for you to write apps that scale to the number of available
cores. For example, OS X provides GCD. Windows provides at least two
threadpool APIs. Not sure if Linux directly addresses this problem
right now.

Another very important distinction between the Web platform and native
platforms which is relevant here is the amount of abstraction that each
platform provides on top of hardware. Native platforms provide a much
lower level of abstraction, and as a result, on such platforms at the
very least you can control how many threads your own application spawns
and keeps active. We don't even have this level of control on the Web
platform (applications are typically even unaware that you have multiple
copies running in different tabs for example.)

Also, please note that there are use cases on native platforms which
don't really exist on the Web. For example, on a desktop OS you might
want to write a "system info" application which actually wants to list
information about the hardware installed on the system.

> If you try Eli's test case in Firefox under different workloads (for
> example, while building Firefox, doing a disk intensive operation,
> etc.), the utter inaccuracy of the results is proof in the
> ineffectiveness of this number in my opinion.
>
>
> As Eli mentioned, you can run the algorithm for longer and get a more
> accurate result.

I tried <http://wg.oftn.org/projects/customized-core-estimator/demo/>
which is supposed to give you a more accurate estimate. Have you tried
that page when the system is under load in Firefox?

> Again, if the native platform didn't support this,
> doing this in C++ would result in the same.

Yes, exactly. Which is why I don't really buy the argument that we
should do this because native platforms do this.

> Also, I worry that this API is too focused on the past/present. For
> example, I don't think anyone sufficiently addressed Boris' concern
> on the whatwg thread about AMP vs SMP systems.
>
>
> Can you provide a link to that?

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2014-May/296737.html

> Are there systems that expose this to
> the user? (AFAIK slow cores are substituted with fast ones on the fly.)

I'm not sure about the details of how these cores are controlled,
whether the control happens in hardware or in the OS, etc. This is one
aspect of this problem which needs more research before we can decide to
implement and ship this, IMO.

> This proposal also assumes that the UA itself is mostly contempt
> with using a single core, which is true for the current browser
> engines, but we're working on changing that assumption in Servo. It
> also doesn't take the possibility of several ones of these web
> application running at the same time.
>
>
> How is this different from the native platform?

On the first point, I hope the difference is obvious. Native apps don't
typically run in a VM which provides highly sophisticated functionality
for them. And also they give you direct control over how many threads
your "application" (which typically maps to an OS level process) spawns
and when, what their priorities and affinities are, etc. I think with
that in mind, implementing this API as is in Gecko will be lying to the
user (because we run some threads with higher priority than worker
threads, for example our chrome workers, the MediaStreamGraph thread,
etc.) and it would actually be harmful in Servo where the UA tries to
get its hands on as many cores as it can do to things such as running
script, layout, etc.

On the second point, please see the paragraph above where I discuss that.

> Until these issues are addressed, I do not think we should implement
> or ship this feature.
>
>
> FWIW these issues were already discussed in the WebKit bug.

The issues that I bring up here are the ones that I think have not
either been brought up before or have not been sufficiently addressed,
so I'd appreciate if you could try to address them sufficiently. It
could be that I'm wrong/misinformed and I would appreciate if you would
call me out on those points.

> I find it odd that we don't want to give authors access to such a basic
> feature. Not everything needs to be solved by a complex framework.

You're asserting that navigator.hardwareConcurrency gives you a basic
way of solving the use case of scaling computation over a number of
worker threads. I am rejecting that assertion here. I am not arguing
that we should not try to fix this problem, I'm just not convinced that
the current API brings us any closer to solving it.

Cheers,
Ehsan

Ehsan Akhgari

unread,
May 13, 2014, 1:43:45 PM5/13/14
to Tom Schuster, Rik Cabanier, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On 2014-05-13, 9:55 AM, Tom Schuster wrote:
> I recently saw this bug about implementing navigator.getFeature,
> wouldn't it make sense for this to be like hardware.memory, but
> hardware.cores?

No, because that would have all of the same issues as the current API.

Cheers,
Ehsan

Rik Cabanier

unread,
May 13, 2014, 1:44:35 PM5/13/14
to Tom Schuster, Joshua Cranmer 🐧, Ehsan Akhgari, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 9:55 AM, Tom Schuster <t...@schuster.me> wrote:

> I recently saw this bug about implementing navigator.getFeature, wouldn't
> it make sense for this to be like hardware.memory, but hardware.cores?
>

Is this a feature that is adopted across browsers?

Interesting that Firefox exposes this. Was there a discussion thread? It
seems that property would face the same (or even stronger) objections than
navigator.hardwareConcurrency


> On Tue, May 13, 2014 at 6:25 PM, Rik Cabanier <caba...@gmail.com> wrote:
>
>> On Tue, May 13, 2014 at 8:20 AM, Ehsan Akhgari <ehsan....@gmail.com
>> >wrote:
>>
>> > On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com>
>> wrote:
>> >
>> >> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <
>> Pidg...@gmail.com
>> >> >wrote:
>> >>
>> >> > On 5/12/2014 7:03 PM, Rik Cabanier wrote:
>> >> >
>> >> >> *Concerns*
>> >> >>
>> >> >> The original proposal required that a platform must return the exact
>> >> >> number
>> >> >> of logical CPU cores. To mitigate the fingerprinting concern, the
>> >> proposal
>> >> >> was updated so a user agent can "lie" about this.
>> >> >> In the case of WebKit, it will return a maximum of 8 logical cores
>> so
>> >> high
>> >> >> value machines can't be discovered. (Note that it's already possible
>> >> to do
>> >> >> a rough estimate of the number of cores)
>> >> >>
>> >> >
>> >> > The discussion on the WHATWG mailing list covered a lot more than the
>> >> > fingerprinting concern. Namely:
>> >> > 1. The user may not want to let web applications hog all of the cores
>> >> on a
>> >> > machine, and exposing this kind of metric makes it easier for
>> >> (good-faith)
>> >> > applications to inadvertently do this.
>> >> >
>> >>
>> >> Web applications can already do this today. There's nothing stopping
>> them
>> >> from figuring out the CPU's and trying to use them all.
>> >> Worse, I think they will likely optimize for popular platforms which
>> >> either
>> >> overtax or underutilize non-popular ones.
>> >>
>> >
>> > Can you please provide some examples of actual web applications that do
>> > this, and what they're exactly trying to do with the number once they
>> > estimate one? (Eli's timing attack demos don't count. ;-)
>> >
>>
>> Eli's listed some examples:
>> http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases
>> I don't have any other cases where this is done. Maybe PDF.js would be
>> interested. They use workers to render pages and decompress images so I
>> could see how this is useful to them.
>>
>>
>> > > 2. It's not clear that this feature is necessary to build high-quality
>> >> > threading workload applications. In fact, it's possible that this
>> >> technique
>> >> > makes it easier to build inferior applications, relying on a
>> potentially
>> >> > inferior metric. (Note, for example, the disagreement on figuring out
>> >> what
>> >> > you should use for make -j if you have N cores).
>> >>
>> >>
>> >> Everyone is in agreement that that is a hard problem to fix and that
>> there
>> >> is no clear answer.
>> >> Whatever solution is picked (maybe like Grand Central or Intel TBB),
>> most
>> >> solutions will still want to know how many cores are available.
>> >> Looking at the native platform (and Adobe's applications), many query
>> the
>> >> operating system for this information to balance the workload. I don't
>> see
>> >> why this would be different for the web platform.
>> >>
>> >
>> > I don't think that the value exposed by the native platforms is
>> > particularly useful. Really if the use case is to try to adapt the
>> number
>> > of workers to a number that will allow you to run them all concurrently,
>> > that is not the same number as reported traditionally by the native
>> > platforms.
>> >
>>
>> Why not? How is the web platform different?
>>
>>
>> > If you try Eli's test case in Firefox under different workloads (for
>> > example, while building Firefox, doing a disk intensive operation,
>> etc.),
>> > the utter inaccuracy of the results is proof in the ineffectiveness of
>> this
>> > number in my opinion.
>> >
>>
>> As Eli mentioned, you can run the algorithm for longer and get a more
>> accurate result. Again, if the native platform didn't support this, doing
>> this in C++ would result in the same.
>>
>>
>> > Also, I worry that this API is too focused on the past/present. For
>> > example, I don't think anyone sufficiently addressed Boris' concern on
>> the
>> > whatwg thread about AMP vs SMP systems.
>> >
>>
>> Can you provide a link to that? Are there systems that expose this to the
>> user? (AFAIK slow cores are substituted with fast ones on the fly.)
>>
>>
>> > This proposal also assumes that the UA itself is mostly contempt with
>> > using a single core, which is true for the current browser engines, but
>> > we're working on changing that assumption in Servo. It also doesn't
>> take
>> > the possibility of several ones of these web application running at the
>> > same time.
>> >
>>
>> How is this different from the native platform?
>>
>>
>> > Until these issues are addressed, I do not think we should implement or
>> > ship this feature.
>> >
>>
>> FWIW these issues were already discussed in the WebKit bug.
>> I find it odd that we don't want to give authors access to such a basic
>> feature. Not everything needs to be solved by a complex framework.

Ehsan Akhgari

unread,
May 13, 2014, 1:54:03 PM5/13/14
to Eli Grey, Joshua Cranmer ��, Rik Cabanier, dev-pl...@lists.mozilla.org
On 2014-05-13, 10:35 AM, Eli Grey wrote:
> On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> Can you please provide some examples of actual web applications that do
>> this, and what they're exactly trying to do with the number once they
>> estimate one? (Eli's timing attack demos don't count. ;-)
>
> One example of a website in the wild that is currently using
> navigator.hardwareConcurrency with my polyfill is
> http://danielsadventure.info/html5fractal/

Thanks, so we have one example. But we need more. :-)

> On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> I don't think that the value exposed by the native platforms is
>> particularly useful. Really if the use case is to try to adapt the number
>> of workers to a number that will allow you to run them all concurrently,
>> that is not the same number as reported traditionally by the native
>> platforms
>
>
> Can you back that up with a real-world example desktop application
> that behaves as such?

Please see my reply to Rik where I explain my concern in more detail.
As examples of desktop applications that behave as such, see the apps
that use GCD on OS X and the workerpool APIs on Windows.

> Every highly parallel desktop application that I have (HandBrake, xz,
> Photoshop, GIMP, Blender (CPU-based render modes)) use all available
> CPU cores and keep the same threadpool size throughout the application
> life. Can you provide a single example of a one desktop application
> that resizes its threadpool based on load, as opposed to allowing the
> OS scheduler to do its job?

Please let's not go down the route of discussing anecdotes here.

> The use case of
> navigator.hardwareConcurrency is not to "adapt the number of workers
> to a number that will allow you to run them all concurrently". The use
> case is sizing a threadpool so that that an application can perform
> parallel tasks with as many system CPU resources as it can get.

OS level threads are not free, they have a context switch cost, they
consume virtual address space, etc. navigator.hardwareConcurrency lets
you size a threadpool with more threads than the number of parallel
tasks as your application could get, so it doesn't really address the
use case you quote above.

> You state that "this API is too focused on the past/present". I may be
> compressing some data with xz while also compiling Firefox. If both of
> these applications use 12 threads on my 12-thread Intel CPU, the OS
> scheduler balances the loads so that they both finish as fast as
> possible.

That is *only* true if all of those threads run at the same priority.
If you're building Firefox with a really low nice value, xz will not get
_any_ cores unless Firefox's build is finishned (or yields some threads
when it gets bound on I/O, etc.)

> If I use only 1 thread for compression while compiling
> Firefox, Firefox may finish compiling faster, but my compression will
> undoubtedly take longer.

Like I said, you're ignoring the fact that OS level threads can have
different priorities. And that is what we utilize in Firefox today, so
it's not just a theoretical concern.

Cheers,
Ehsan

> On Tue, May 13, 2014 at 11:20 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> On Tue, May 13, 2014 at 2:37 AM, Rik Cabanier <caba...@gmail.com> wrote:
>>
>>> On Mon, May 12, 2014 at 10:15 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com
>>>> wrote:
>>>
>>>> On 5/12/2014 7:03 PM, Rik Cabanier wrote:
>>>>
>>>>> *Concerns*
>>>>>
>>>>> The original proposal required that a platform must return the exact
>>>>> number
>>>>> of logical CPU cores. To mitigate the fingerprinting concern, the
>>> proposal
>>>>> was updated so a user agent can "lie" about this.
>>>>> In the case of WebKit, it will return a maximum of 8 logical cores so
>>> high
>>>>> value machines can't be discovered. (Note that it's already possible to
>>> do
>>>>> a rough estimate of the number of cores)
>>>>>
>>>>
>>>> The discussion on the WHATWG mailing list covered a lot more than the
>>>> fingerprinting concern. Namely:
>>>> 1. The user may not want to let web applications hog all of the cores on
>>> a
>>>> machine, and exposing this kind of metric makes it easier for
>>> (good-faith)
>>>> applications to inadvertently do this.
>>>>
>>>
>>> Web applications can already do this today. There's nothing stopping them
>>> from figuring out the CPU's and trying to use them all.
>>> Worse, I think they will likely optimize for popular platforms which either
>>> overtax or underutilize non-popular ones.
>>>
>>
>> Can you please provide some examples of actual web applications that do
>> this, and what they're exactly trying to do with the number once they
>> estimate one? (Eli's timing attack demos don't count. ;-)
>>
>>
>>>> 2. It's not clear that this feature is necessary to build high-quality
>>>> threading workload applications. In fact, it's possible that this
>>> technique
>>>> makes it easier to build inferior applications, relying on a potentially
>>>> inferior metric. (Note, for example, the disagreement on figuring out
>>> what
>>>> you should use for make -j if you have N cores).
>>>
>>>
>>> Everyone is in agreement that that is a hard problem to fix and that there
>>> is no clear answer.
>>> Whatever solution is picked (maybe like Grand Central or Intel TBB), most
>>> solutions will still want to know how many cores are available.
>>> Looking at the native platform (and Adobe's applications), many query the
>>> operating system for this information to balance the workload. I don't see
>>> why this would be different for the web platform.
>>>
>>
>> I don't think that the value exposed by the native platforms is
>> particularly useful. Really if the use case is to try to adapt the number
>> of workers to a number that will allow you to run them all concurrently,
>> that is not the same number as reported traditionally by the native
>> platforms. If you try Eli's test case in Firefox under different workloads
>> (for example, while building Firefox, doing a disk intensive operation,
>> etc.), the utter inaccuracy of the results is proof in the ineffectiveness
>> of this number in my opinion.
>>
>> Also, I worry that this API is too focused on the past/present. For
>> example, I don't think anyone sufficiently addressed Boris' concern on the
>> whatwg thread about AMP vs SMP systems. This proposal also assumes that
>> the UA itself is mostly contempt with using a single core, which is true
>> for the current browser engines, but we're working on changing that
>> assumption in Servo. It also doesn't take the possibility of several ones
>> of these web application running at the same time.
>>
>> Until these issues are addressed, I do not think we should implement or
>> ship this feature.
>>
>> Cheers,
>> Ehsan

Eli Grey

unread,
May 13, 2014, 1:54:22 PM5/13/14
to Ehsan Akhgari, Joshua Cranmer 🐧, Rik Cabanier, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 1:43 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> supporting a worker pool that actually scales to how many cores you have available

1) What is an "available core" to you? An available core to me is a
core that I can use to compute. A core under load (even 100% load) is
still a core I can use to compute.
2) Web workers were intentionally made to be memory-heavy, long-lived,
reusable interfaces. The startup and unload overhead is massive if you
actually want to dynamically resize your threadpool. Ask the people
who put Web Workers in the HTML5 spec or try benchmarking it (rapid
threadpool resizing) yourself--they are not meant to be lightweight.

Ehsan Akhgari

unread,
May 13, 2014, 1:55:26 PM5/13/14
to Rik Cabanier, Tom Schuster, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On 2014-05-13, 10:44 AM, Rik Cabanier wrote:
>
>
>
> On Tue, May 13, 2014 at 9:55 AM, Tom Schuster <t...@schuster.me
> <mailto:t...@schuster.me>> wrote:
>
> I recently saw this bug about implementing navigator.getFeature,
> wouldn't it make sense for this to be like hardware.memory, but
> hardware.cores?
>
>
> Is this a feature that is adopted across browsers?

No, and it's not exposed to the Web.

> Interesting that Firefox exposes this. Was there a discussion thread? It
> seems that property would face the same (or even stronger) objections
> than navigator.hardwareConcurrency

Please let's focus on discussing hardwareConcurrency. Comparing these
two APIs is comparing apples and oranges.

Ehsan Akhgari

unread,
May 13, 2014, 1:57:48 PM5/13/14
to Eli Grey, Joshua Cranmer ��, Rik Cabanier, dev-pl...@lists.mozilla.org
On 2014-05-13, 10:54 AM, Eli Grey wrote:
> On Tue, May 13, 2014 at 1:43 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> supporting a worker pool that actually scales to how many cores you have available
>
> 1) What is an "available core" to you? An available core to me is a
> core that I can use to compute. A core under load (even 100% load) is
> still a core I can use to compute.

No, you're wrong. An available core is a core which your application
can use to run computations on. If another code is already keeping it
busy with a higher priority, it's unavailable by definition.

> 2) Web workers were intentionally made to be memory-heavy, long-lived,
> reusable interfaces. The startup and unload overhead is massive if you
> actually want to dynamically resize your threadpool. Ask the people
> who put Web Workers in the HTML5 spec or try benchmarking it (rapid
> threadpool resizing) yourself--they are not meant to be lightweight.

How does this support your argument exactly?

Joshua Cranmer 🐧

unread,
May 13, 2014, 1:58:49 PM5/13/14
to
On 5/13/2014 12:35 PM, Eli Grey wrote:
> Can you back that up with a real-world example desktop application
> that behaves as such?

The OpenMP framework?

Benoit Jacob

unread,
May 13, 2014, 2:11:58 PM5/13/14
to Joshua Cranmer 🐧, dev-platform
Also note that even some popular desktop APIs that in practice expose the
"hardware" thread count, choose not to call it that way. For example, Qt
calls it the "ideal" thread count.
http://qt-project.org/doc/qt-4.8/qthread.html#idealThreadCount

IMO this suggests that we're not the only ones feeling uncomfortable about
committing to "hardware thread count" as being forever a well-defined and
useful thing to expose to applications.

Benoit

Eli Grey

unread,
May 13, 2014, 2:14:35 PM5/13/14
to Ehsan Akhgari, Joshua Cranmer ��, Rik Cabanier, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 1:57 PM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> No, you're wrong. An available core is a core which your application can
> use to run computations on. If another code is already keeping it busy
> with a higher priority, it's unavailable by definition.
>

Run this code <https://gist.github.com/eligrey/9a48b71b2f5da67b834b> in
your browser. All cores are at 100% CPU usage, so clearly by your
definition all cores are now "unavailable". How are you able to interact
with your OS? It must be some kind of black magic... or maybe it's because
your OS scheduler knows how to prioritize threads properly so that you can
multitask under load.

On Tue, May 13, 2014 at 1:57 PM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> How does this support your argument exactly?


It has nothing to do with my argument, it has to do with yours. You are
suggesting that people should dynamically resize their threadpools. I'm
bringing up the fact that web workers were *designed* to not be used in
this manner in the first place.

Kip Gilbert

unread,
May 13, 2014, 2:19:32 PM5/13/14
to dev-pl...@lists.mozilla.org
Just wish to throw in my 2c...

Many game engines will query the core count to determine if they should
follow a simple (one main thread, one render thread, one audio thread,
one streamer thread) or more parallel (multiple render threads, multiple
audio threads, gameplay/physics/ai broken up into separate workers)
approach. If there are sufficient cores, this is necessary to get the
greatest possible framerate (keep the GPU fed), best quality audio (i.e.
more channels, longer reverb), and things such as secondary animations
that would not be enabled otherwise.

Even if not enabling all features and quality levels, the overhead of
fencing, double buffering, etc, should be avoided on systems with fewer
cores.

I also see that there are reasons that this may not be good for the
web. NUMA (Non Uniform Memory Architecture) and Hyper-threading
attributes also need to be taken into account to effectively optimize
for core count. This seems out of place given the level of abstraction
web developers expect. I can also imagine a very-short-term future
where "CPU core count" will be an outdated concept.

Cheers,
- Kearwood "Kip" Gilbert

Ehsan Akhgari

unread,
May 13, 2014, 2:26:19 PM5/13/14
to Eli Grey, Joshua Cranmer ��, Rik Cabanier, dev-pl...@lists.mozilla.org
On 2014-05-13, 11:14 AM, Eli Grey wrote:
> On Tue, May 13, 2014 at 1:57 PM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> No, you're wrong. An available core is a core which your
> application can use to run computations on. If another code is
> already keeping it busy with a higher priority, it's unavailable by
> definition.
>
>
> Run this code <https://gist.github.com/eligrey/9a48b71b2f5da67b834b> in
> your browser. All cores are at 100% CPU usage, so clearly by your
> definition all cores are now "unavailable".

They are unavailable to *all* threads running on your system with a
lower priority. (Note that Gecko runs Web Workers with a low priority
already, so that they won't affect any of your normal apps, including
Firefox's UI.)

> How are you able to interact
> with your OS? It must be some kind of black magic... or maybe it's
> because your OS scheduler knows how to prioritize threads properly so
> that you can multitask under load.

There is no magic involved here.
> bringing up the fact that web workers were /designed/ to not be used in
> this manner in the first place.

OK, so you're asserting that it's impossible to implement a resizing
worker pool on top of Web Workers. I think you're wrong, but I'll grant
you this assumption. ;-) Just wanted to make it clear that doing that
won't bring us closer to a conclusion in this thread.

Cheers,
Ehsan

Rik Cabanier

unread,
May 13, 2014, 5:42:19 PM5/13/14
to Ehsan Akhgari, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 10:43 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> On 2014-05-13, 9:25 AM, Rik Cabanier wrote:
>
>> Web applications can already do this today. There's nothing
>> stopping them
>> from figuring out the CPU's and trying to use them all.
>> Worse, I think they will likely optimize for popular platforms
>> which either
>> overtax or underutilize non-popular ones.
>>
>>
>> Can you please provide some examples of actual web applications that
>> do this, and what they're exactly trying to do with the number once
>> they estimate one? (Eli's timing attack demos don't count. ;-)
>>
>>
>> Eli's listed some examples:
>> http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases
>>
>
> That is a list of use cases which could use better ways of supporting a
> worker pool that actually scales to how many cores you have available at
> any given point in time. That is *not* what navigator.hardwareConcurrency
> gives you, so I don't find those examples very convincing.
>

That is not the point of this attribute. It's just a hint for the author so
he can tune his application accordingly.
Maybe the application is tuned to use fewer cores, or maybe more. It all
depends...


> (Note that I would be very eager to discuss a proposal that actually tries
> to solve that problem.)


You should do that! People have brought this up in the past but no progress
has been made in the last 2 years.
However, if this simple attribute is able to stir people's emotions, can
you imagine what would happen if you propose something complex? :-)


> I don't have any other cases where this is done.
>>
>
> That really makes me question the "positive feedback from web developers"
> cited in the original post on this thread. Can you please point us to
> places where that feedback is documented?


That was from the email to blink-dev where Adam Barth stated this.
I'll ask him where this came from.

I looked at other interpreted languages and they all seem to give you
access to the CPU count. Then I searched on GitHub to see the popularity:
Python:

multiprocessing.cpu_count()

11,295 results

https://github.com/search?q=multiprocessing.cpu_count%28%29+extension%3Apy&type=Code&ref=advsearch&l=

Perl:

use Sys::Info;
use Sys::Info::Constants qw( :device_cpu );
my $info = Sys::Info->new;
my $cpu = $info->device( CPU => %options );

7 results
https://github.com/search?q=device_cpu+extension%3Apl&type=Code&ref=searchresults

Java:

Runtime.getRuntime().availableProcessors()

23,967 results

https://github.com/search?q=availableProcessors%28%29+extension%3Ajava&type=Code&ref=searchresults

Ruby:

Facter.processorcount

115 results

https://github.com/search?q=processorcount+extension%3Arb&type=Code&ref=searchresults

C#:

Environment.ProcessorCount

5,315 results
https://github.com/search?q=Environment.ProcessorCount&type=Code&ref=searchresults

I also searched for JavaScript files that contain "cpu" and "core":

21,487 results

https://github.com/search?q=core+cpu+extension%3Ajs&type=Code&ref=searchresults

The results are mixed. Some projects seem to hard code CPU cores while
others are not about workers at all.
A search for "worker" and "cpu" gets more consistent results:

2,812 results

https://github.com/search?q=worker+cpu+extension%3Ajs&type=Code&ref=searchresults

node.js is also exposing it:

require('os').cpus()

4,851 results

https://github.com/search?q=require%28%27os%27%29.cpus%28%29+extension%3Ajs&type=Code&ref=searchresults


> Maybe PDF.js would be
>
>> interested. They use workers to render pages and decompress images so I
>> could see how this is useful to them.
>
> I'm not aware of that use case for pdf.js.


I'm sure someone on this list is currently working on pdf.js. Maybe they
can chime in?


> Everyone is in agreement that that is a hard problem to fix and
>> that there
>> is no clear answer.
>> Whatever solution is picked (maybe like Grand Central or Intel
>> TBB), most
>> solutions will still want to know how many cores are available.
>> Looking at the native platform (and Adobe's applications), many
>> query the
>> operating system for this information to balance the workload. I
>> don't see
>> why this would be different for the web platform.
>>
>>
>> I don't think that the value exposed by the native platforms is
>> particularly useful. Really if the use case is to try to adapt the
>> number of workers to a number that will allow you to run them all
>> concurrently, that is not the same number as reported traditionally
>> by the native platforms.
>>
>>
>> Why not? How is the web platform different?
>>
>
> Here's why I find the native platform parity argument unconvincing here.
> This is not the only primitive that native platforms expose to make it
> possible for you to write apps that scale to the number of available cores.
> For example, OS X provides GCD. Windows provides at least two threadpool
> APIs. Not sure if Linux directly addresses this problem right now.
>

I'm not familiar with the success of those frameworks. Asking around at
Adobe, so far I haven't found anyone that has used them.
Tuning the application depending on the number of CPU's is done quite often.


> Another very important distinction between the Web platform and native
> platforms which is relevant here is the amount of abstraction that each
> platform provides on top of hardware. Native platforms provide a much
> lower level of abstraction, and as a result, on such platforms at the very
> least you can control how many threads your own application spawns and
> keeps active. We don't even have this level of control on the Web platform
> (applications are typically even unaware that you have multiple copies
> running in different tabs for example.)
>

I'm unsure how tabs are different from different processes.
As an author, I would certainly want my web workers to run in parallel. Why
else would I use workers to do number crunching?
Again, this is a problem that already exists and we're not trying to solve
it here.


> Also, please note that there are use cases on native platforms which don't
> really exist on the Web. For example, on a desktop OS you might want to
> write a "system info" application which actually wants to list information
> about the hardware installed on the system.
>
>
> If you try Eli's test case in Firefox under different workloads (for
>> example, while building Firefox, doing a disk intensive operation,
>> etc.), the utter inaccuracy of the results is proof in the
>> ineffectiveness of this number in my opinion.
>>
>>
>> As Eli mentioned, you can run the algorithm for longer and get a more
>> accurate result.
>>
>
> I tried <http://wg.oftn.org/projects/customized-core-estimator/demo/>
> which is supposed to give you a more accurate estimate. Have you tried
> that page when the system is under load in Firefox?
>
>
> > Again, if the native platform didn't support this,
>
>> doing this in C++ would result in the same.
>>
>
> Yes, exactly. Which is why I don't really buy the argument that we should
> do this because native platforms do this.


I don't follow. Yes, the algorithm is imprecise and it would be just as
imprecise in C++.
There is no difference in behavior between the web platform and native.


> Also, I worry that this API is too focused on the past/present. For
>> example, I don't think anyone sufficiently addressed Boris' concern
>> on the whatwg thread about AMP vs SMP systems.
>>
>>
>> Can you provide a link to that?
>>
>
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2014-May/296737.html
>
>
> > Are there systems that expose this to
>
>> the user? (AFAIK slow cores are substituted with fast ones on the fly.)
>>
>
> I'm not sure about the details of how these cores are controlled, whether
> the control happens in hardware or in the OS, etc. This is one aspect of
> this problem which needs more research before we can decide to implement
> and ship this, IMO.


Does Firefox behave different on such systems? (Is it even supported on
these systems?)
If so, how are workers scheduled? In the end, even if the cores are
heterogeneous, knowing the number of them will keep them ALL busy (which
means more work is getting done)


> This proposal also assumes that the UA itself is mostly contempt
>> with using a single core, which is true for the current browser
>> engines, but we're working on changing that assumption in Servo. It
>> also doesn't take the possibility of several ones of these web
>> application running at the same time.
>>
>>
>> How is this different from the native platform?
>>
>
> On the first point, I hope the difference is obvious. Native apps don't
> typically run in a VM which provides highly sophisticated functionality for
> them.


See my long list of interpreted languages earlier in this email.
There are lots of VM's that support this and a lot of people are using it.


> And also they give you direct control over how many threads your
> "application" (which typically maps to an OS level process) spawns and
> when, what their priorities and affinities are, etc. I think with that in
> mind, implementing this API as is in Gecko will be lying to the user
> (because we run some threads with higher priority than worker threads, for
> example our chrome workers, the MediaStreamGraph thread, etc.) and it would
> actually be harmful in Servo where the UA tries to get its hands on as many
> cores as it can do to things such as running script, layout, etc.
>

Why would that be? Are you burning more CPU resources in servo to do the
same thing? If so, that sounds like a problem.
If not, the use case to scale your workload to more CPU cores is even
better as similar tasks will end faster.
For instance, if we have a system with 8 idle cores and we divide up a 64
second task

UA overhead = 2s + 8 * 8s -> 10s

UA overhead over 2 threads = 2 * 1s + 8 * 8s -> 9s



> On the second point, please see the paragraph above where I discuss that.
>
>
> Until these issues are addressed, I do not think we should implement
>> or ship this feature.
>>
>>
>> FWIW these issues were already discussed in the WebKit bug.
>>
>
> The issues that I bring up here are the ones that I think have not either
> been brought up before or have not been sufficiently addressed, so I'd
> appreciate if you could try to address them sufficiently. It could be that
> I'm wrong/misinformed and I would appreciate if you would call me out on
> those points.
>
>
> I find it odd that we don't want to give authors access to such a basic
>> feature. Not everything needs to be solved by a complex framework.
>>
>
> You're asserting that navigator.hardwareConcurrency gives you a basic way
> of solving the use case of scaling computation over a number of worker
> threads. I am rejecting that assertion here. I am not arguing that we
> should not try to fix this problem, I'm just not convinced that the current
> API brings us any closer to solving it.
>

I'm not asserting anything. I want to give authors an hint that they can
make a semi-informed decision to balance their workload.
Even if there's a more general solution later on to solve that particular
problem, it will sometimes still be valuable to know the layout of the
system so you can best divide up the work.

Boris Zbarsky

unread,
May 13, 2014, 5:59:47 PM5/13/14
to
On 5/13/14, 2:42 PM, Rik Cabanier wrote:
> Why would that be? Are you burning more CPU resources in servo to do the
> same thing?

In some cases, possibly yes.

> If so, that sounds like a problem.

It depends on what your goals are. Any sort of speculation, prefetch or
prerender is burning more CPU resources to in the end do the same thing.
But it may provide responsiveness benefits that are worth the extra
CPU cycles.

Current browsers don't do those things very much in the grand schemed of
things because they're hard to do without janking the UI. Servo should
not have that problem, so it may well do things like speculatively
starting layout in the background when a script changes styles, for
example, and throwing the speculation away if more style changes happen
before the layout is done.

-Boris

Ehsan Akhgari

unread,
May 13, 2014, 6:16:44 PM5/13/14
to Rik Cabanier, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On 2014-05-13, 2:42 PM, Rik Cabanier wrote:
>
>
>
> On Tue, May 13, 2014 at 10:43 AM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> On 2014-05-13, 9:25 AM, Rik Cabanier wrote:
>
> Web applications can already do this today. There's nothing
> stopping them
> from figuring out the CPU's and trying to use them all.
> Worse, I think they will likely optimize for popular
> platforms
> which either
> overtax or underutilize non-popular ones.
>
>
> Can you please provide some examples of actual web
> applications that
> do this, and what they're exactly trying to do with the
> number once
> they estimate one? (Eli's timing attack demos don't count. ;-)
>
>
> Eli's listed some examples:
> http://wiki.whatwg.org/wiki/__NavigatorCores#Example_use___cases
> <http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases>
>
>
> That is a list of use cases which could use better ways of
> supporting a worker pool that actually scales to how many cores you
> have available at any given point in time. That is *not* what
> navigator.hardwareConcurrency gives you, so I don't find those
> examples very convincing.
>
>
> That is not the point of this attribute. It's just a hint for the author
> so he can tune his application accordingly.
> Maybe the application is tuned to use fewer cores, or maybe more. It all
> depends...

The problem is that the API doesn't really make it obvious that you're
not supposed to take the value that the getter returns and just spawn N
workers. IOW, the API encourages the wrong behavior by design.

> (Note that I would be very eager to discuss a proposal that actually
> tries to solve that problem.)
>
>
> You should do that! People have brought this up in the past but no
> progress has been made in the last 2 years.
> However, if this simple attribute is able to stir people's emotions, can
> you imagine what would happen if you propose something complex? :-)

Sorry, but I have a long list of things on my todo list, and honestly
this one is not nearly close to the top of the list, because I'm not
aware of people asking for this feature very often. I'm sure there are
some people who would like it, but there are many problems that we are
trying to solve here, and this one doesn't look very high priority.

> I don't have any other cases where this is done.
>
>
> That really makes me question the "positive feedback from web
> developers" cited in the original post on this thread. Can you
> please point us to places where that feedback is documented?
>
>
> That was from the email to blink-dev where Adam Barth stated this.
> I'll ask him where this came from.

Thanks!
I don't view platform parity as a checklist of features, so I really
have no interest in "checking this checkbox" just so that the Web
platform can be listed in these kinds of lists. Honestly a list of
github hits without more information on what this value is actually used
for etc. is not really that helpful. We're not taking a vote of
popularity here. ;-)
But do you have arguments on the specific problems I brought up which
make this a bad idea? "Others do this" is just not going to convince me
here.

> Another very important distinction between the Web platform and
> native platforms which is relevant here is the amount of abstraction
> that each platform provides on top of hardware. Native platforms
> provide a much lower level of abstraction, and as a result, on such
> platforms at the very least you can control how many threads your
> own application spawns and keeps active. We don't even have this
> level of control on the Web platform (applications are typically
> even unaware that you have multiple copies running in different tabs
> for example.)
>
>
> I'm unsure how tabs are different from different processes.
> As an author, I would certainly want my web workers to run in parallel.
> Why else would I use workers to do number crunching?
> Again, this is a problem that already exists and we're not trying to
> solve it here.

What _is_ the problem that you're trying to solve here then? I thought
that this API is supposed to give you a number of workers that the
application should start so that it can keep all of the cores busy?

> Also, please note that there are use cases on native platforms which
> don't really exist on the Web. For example, on a desktop OS you
> might want to write a "system info" application which actually wants
> to list information about the hardware installed on the system.
>
>
> If you try Eli's test case in Firefox under different
> workloads (for
> example, while building Firefox, doing a disk intensive
> operation,
> etc.), the utter inaccuracy of the results is proof in the
> ineffectiveness of this number in my opinion.
>
>
> As Eli mentioned, you can run the algorithm for longer and get a
> more
> accurate result.
>
>
> I tried
> <http://wg.oftn.org/projects/__customized-core-estimator/__demo/
> <http://wg.oftn.org/projects/customized-core-estimator/demo/>> which
> is supposed to give you a more accurate estimate. Have you tried
> that page when the system is under load in Firefox?

So did you try this? :-)

> > Again, if the native platform didn't support this,
>
> doing this in C++ would result in the same.
>
>
> Yes, exactly. Which is why I don't really buy the argument that we
> should do this because native platforms do this.
>
>
> I don't follow. Yes, the algorithm is imprecise and it would be just as
> imprecise in C++.
> There is no difference in behavior between the web platform and native.

My point is, I think you should have some evidence indicating why this
is a good idea. So far I think the only argument has been the fact that
this is exposed by other platforms.

> Also, I worry that this API is too focused on the
> past/present. For
> example, I don't think anyone sufficiently addressed Boris'
> concern
> on the whatwg thread about AMP vs SMP systems.
>
>
> Can you provide a link to that?
>
>
> http://lists.whatwg.org/htdig.__cgi/whatwg-whatwg.org/2014-__May/296737.html
> <http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2014-May/296737.html>
>
>
> > Are there systems that expose this to
>
> the user? (AFAIK slow cores are substituted with fast ones on
> the fly.)
>
>
> I'm not sure about the details of how these cores are controlled,
> whether the control happens in hardware or in the OS, etc. This is
> one aspect of this problem which needs more research before we can
> decide to implement and ship this, IMO.
>
>
> Does Firefox behave different on such systems? (Is it even supported on
> these systems?)
> If so, how are workers scheduled? In the end, even if the cores are
> heterogeneous, knowing the number of them will keep them ALL busy (which
> means more work is getting done)

I don't know the answer to any of these questions. I was hoping that
you would do the research here. :-)

> This proposal also assumes that the UA itself is mostly
> contempt
> with using a single core, which is true for the current browser
> engines, but we're working on changing that assumption in
> Servo. It
> also doesn't take the possibility of several ones of these web
> application running at the same time.
>
>
> How is this different from the native platform?
>
>
> On the first point, I hope the difference is obvious. Native apps
> don't typically run in a VM which provides highly sophisticated
> functionality for them.
>
>
> See my long list of interpreted languages earlier in this email.
> There are lots of VM's that support this and a lot of people are using it.
>
> And also they give you direct control over how many threads your
> "application" (which typically maps to an OS level process) spawns
> and when, what their priorities and affinities are, etc. I think
> with that in mind, implementing this API as is in Gecko will be
> lying to the user (because we run some threads with higher priority
> than worker threads, for example our chrome workers, the
> MediaStreamGraph thread, etc.) and it would actually be harmful in
> Servo where the UA tries to get its hands on as many cores as it can
> do to things such as running script, layout, etc.
>
>
> Why would that be? Are you burning more CPU resources in servo to do the
> same thing? If so, that sounds like a problem.
> If not, the use case to scale your workload to more CPU cores is even
> better as similar tasks will end faster.
> For instance, if we have a system with 8 idle cores and we divide up a
> 64 second task

What Boris said.
I disagree. Let me try to rephrase the issue with this. The number of
available cores is not a constant number equal to the number of logical
cores exposed to us by the OS. This number varies depending on
everything else which is going on in the system, including the things
that the UA has control over and the things that it does not. I hope
the reason for my opposition is clear so far.

Cheers,
Ehsan

Rik Cabanier

unread,
May 13, 2014, 6:17:36 PM5/13/14
to Boris Zbarsky, dev-pl...@lists.mozilla.org
I agree that this isn't a problem. Sorry if I sounded critical.

Xidorn Quan

unread,
May 13, 2014, 7:23:25 PM5/13/14
to Rik Cabanier, Boris Zbarsky, dev-pl...@lists.mozilla.org
As the main usage of this number is to maintain a fixed thread pool, I feel
it might be better to have a higher level API, such as worker pool.

I do agree that thread pool is very useful, but exposing the number of
cores directly seems not to be the best solution. We could have a better
abstraction, and let UAs to dynamically control the pool to have better
throughput.


On Wed, May 14, 2014 at 8:17 AM, Rik Cabanier <caba...@gmail.com> wrote:

> On Tue, May 13, 2014 at 2:59 PM, Boris Zbarsky <bzba...@mit.edu> wrote:
>
> I agree that this isn't a problem. Sorry if I sounded critical.

Rik Cabanier

unread,
May 14, 2014, 12:01:13 AM5/14/14
to Ehsan Akhgari, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Tue, May 13, 2014 at 3:16 PM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

>
>> ...
>>
>>
>> That is not the point of this attribute. It's just a hint for the author
>> so he can tune his application accordingly.
>> Maybe the application is tuned to use fewer cores, or maybe more. It all
>> depends...
>>
>
> The problem is that the API doesn't really make it obvious that you're not
> supposed to take the value that the getter returns and just spawn N
> workers. IOW, the API encourages the wrong behavior by design.


That is simply untrue.
For the sake of argument, let's say you are right. How are things worse
than before?


> (Note that I would be very eager to discuss a proposal that actually
>> tries to solve that problem.)
>>
>>
>> You should do that! People have brought this up in the past but no
>> progress has been made in the last 2 years.
>> However, if this simple attribute is able to stir people's emotions, can
>> you imagine what would happen if you propose something complex? :-)
>>
>
> Sorry, but I have a long list of things on my todo list, and honestly this
> one is not nearly close to the top of the list, because I'm not aware of
> people asking for this feature very often. I'm sure there are some people
> who would like it, but there are many problems that we are trying to solve
> here, and this one doesn't look very high priority.


That's fine but we're coming right back to the start: there is no way for
informed authors to make a decision today.
The "let's build something complex that solves everything" proposal won't
be done in a long time. Meanwhile apps can make responsive UI's and fluid
games.


> I don't have any other cases where this is done.
>>
>>
>> That really makes me question the "positive feedback from web
>> developers" cited in the original post on this thread. Can you
>> please point us to places where that feedback is documented?
>>
>> ...
>> Python:
>>
>> multiprocessing.cpu_count()
>>
>> 11,295 results
>>
>> https://github.com/search?q=multiprocessing.cpu_count%28%
>> 29+extension%3Apy&type=Code&ref=advsearch&l=
>>
>> ...
>> Java:
>>
>> Runtime.getRuntime().availableProcessors()
>>
>> 23,967 results
>>
>> https://github.com/search?q=availableProcessors%28%29+
>> extension%3Ajava&type=Code&ref=searchresults
>>
>> ...
>>
>> node.js is also exposing it:
>>
>> require('os').cpus()
>>
>> 4,851 results
>>
>> https://github.com/search?q=require%28%27os%27%29.cpus%28%
>> 29+extension%3Ajs&type=Code&ref=searchresults
>>
>
> I don't view platform parity as a checklist of features, so I really have
> no interest in "checking this checkbox" just so that the Web platform can
> be listed in these kinds of lists. Honestly a list of github hits without
> more information on what this value is actually used for etc. is not really
> that helpful. We're not taking a vote of popularity here. ;-)


Wait, you stated:

Native apps don't typically run in a VM which provides highly sophisticated
functionality for them.

and

That really makes me question the "positive feedback from web developers"
cited in the original post on this thread.


There were 24,000 hits for java which is on the web and a VM but now you
say that it's not a vote of popularity?


>
> ...
>> Why not? How is the web platform different?
>>
>>
>> Here's why I find the native platform parity argument unconvincing
>> here. This is not the only primitive that native platforms expose
>> to make it possible for you to write apps that scale to the number
>> of available cores. For example, OS X provides GCD. Windows
>> provides at least two threadpool APIs. Not sure if Linux directly
>> addresses this problem right now.
>>
>>
>> I'm not familiar with the success of those frameworks. Asking around at
>> Adobe, so far I haven't found anyone that has used them.
>> Tuning the application depending on the number of CPU's is done quite
>> often.
>>
>
> But do you have arguments on the specific problems I brought up which make
> this a bad idea?


Can you restate the actual problem? I reread your message but didn't find
anything that indicates this is a bad idea.


> "Others do this" is just not going to convince me here.


What would convince you? The fact that every other framework provides this
and people use it, is not a strong indication?
It's not possible for me to find exact javascript examples that use this
feature since it doesn't exist.

...
>>
>> I'm unsure how tabs are different from different processes.
>> As an author, I would certainly want my web workers to run in parallel.
>> Why else would I use workers to do number crunching?
>> Again, this is a problem that already exists and we're not trying to
>> solve it here.
>>
>
> What _is_ the problem that you're trying to solve here then? I thought
> that this API is supposed to give you a number of workers that the
> application should start so that it can keep all of the cores busy?
>

Make it possible for authors to make a semi-informed decision on how to
divide the work among workers.
In a good number of cases the pool will be smaller than the number of cores
(ie a game), or it might be bigger (see the webkit bug that goes over
this).

>
> Also, please note that there are use cases on native platforms which
>> don't really exist on the Web. For example, on a desktop OS you
>> might want to write a "system info" application which actually wants
>> to list information about the hardware installed on the system.
>>
>
I don't think that's all that important.


> If you try Eli's test case in Firefox under different
>> workloads (for
>> example, while building Firefox, doing a disk intensive
>> operation,
>> etc.), the utter inaccuracy of the results is proof in the
>> ineffectiveness of this number in my opinion.
>>
>>
>> As Eli mentioned, you can run the algorithm for longer and get a
>> more
>> accurate result.
>>
>>
>> I tried
>> <http://wg.oftn.org/projects/__customized-core-estimator/__demo/
>>
>> <http://wg.oftn.org/projects/customized-core-estimator/demo/>> which
>> is supposed to give you a more accurate estimate. Have you tried
>> that page when the system is under load in Firefox?
>>
>
> So did you try this? :-)


I did. As expected, it drops off as the load increases. I don't see what
this proves except that the polyfill is unreliable as it posted.


> > Again, if the native platform didn't support this,
>>
>> doing this in C++ would result in the same.
>>
>>
>> Yes, exactly. Which is why I don't really buy the argument that we
>> should do this because native platforms do this.
>>
>>
>> I don't follow. Yes, the algorithm is imprecise and it would be just as
>> imprecise in C++.
>> There is no difference in behavior between the web platform and native.
>>
>
> My point is, I think you should have some evidence indicating why this is
> a good idea. So far I think the only argument has been the fact that this
> is exposed by other platforms.
>

And used successfully on other platforms.
Note that it is exposed on PNaCl in Chrome as well


> ...
>>
>>
>> Does Firefox behave different on such systems? (Is it even supported on
>> these systems?)
>> If so, how are workers scheduled? In the end, even if the cores are
>> heterogeneous, knowing the number of them will keep them ALL busy (which
>> means more work is getting done)
>>
>
> I don't know the answer to any of these questions. I was hoping that you
> would do the research here. :-)


I did a little bit of research. As usual, wikipedia is the easiest to read:
http://en.wikipedia.org/wiki/Big.LITTLE There are many other papers [1] for
more information.

In "In-kernel switcher" mode, the little CPU's are taken offline when the
big one spool up. So, in this case the number of cores is half the physical
CPU's.
In "Heterogeneous multi-processing", the big CPU's will help out when the
system load increases. In this case, the number of cores is equal to the
number of CPU's.


> This proposal also assumes that the UA itself is mostly
>> contempt
>> with using a single core, which is true for the current
>> browser
>> engines, but we're working on changing that assumption in
>> Servo. It
>> also doesn't take the possibility of several ones of these
>> web
>> application running at the same time.
>>
>>
>> How is this different from the native platform?
>>
>>
>> On the first point, I hope the difference is obvious. Native apps
>> don't typically run in a VM which provides highly sophisticated
>> functionality for them.
>>
>>
>> ...
>
> Why would that be? Are you burning more CPU resources in servo to do the
>> same thing? If so, that sounds like a problem.
>> If not, the use case to scale your workload to more CPU cores is even
>> better as similar tasks will end faster.
>> For instance, if we have a system with 8 idle cores and we divide up a
>> 64 second task
>>
>
> What Boris said.


He didn't refute that knowing the number of cores would still help.
No, you failed to show why this does not apply to the web platform and
JavaScript in particular.
Your arguments apply equally to PNaCL, Java, native applications and all
the other examples listed above, yet they all provide this functionality
and people are using it to build successful applications.

1:
http://www.samsung.com/global/business/semiconductor/minisite/Exynos/blog_Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.html

Ehsan Akhgari

unread,
May 14, 2014, 2:39:43 PM5/14/14
to Rik Cabanier, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On 2014-05-13, 9:01 PM, Rik Cabanier wrote:
>
>
>
> On Tue, May 13, 2014 at 3:16 PM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
>
> ...
>
>
> That is not the point of this attribute. It's just a hint for
> the author
> so he can tune his application accordingly.
> Maybe the application is tuned to use fewer cores, or maybe
> more. It all
> depends...
>
>
> The problem is that the API doesn't really make it obvious that
> you're not supposed to take the value that the getter returns and
> just spawn N workers. IOW, the API encourages the wrong behavior by
> design.
>
>
> That is simply untrue.

I'm assuming that the goal of this API is to allow authors to spawn as
many workers as possible so that they can exhaust all of the cores in
the interest of finishing their computation faster. I have provided
reasons why any thread which is running at a higher priority on the
system busy doing work is going to make this number an over
approximation, I have given you two examples of higher priority threads
that we're currently shipping in Firefox (Chrome Workers and the
MediaStreamGraph thread) and have provided you with experimental
evidence of running Eli's test cases trying to exhaust as many cores as
it can fails to predict the number of cores in these situations. If you
don't find any of this convincing, I'd respectfully ask us to agree to
disagree on this point.

> For the sake of argument, let's say you are right. How are things worse
> than before?

I don't think we should necessarily try to find a solution that is just
not worse than the status quo, I'm more interested in us implementing a
good solution here (and yes, I'm aware that there is no concrete
proposal out there that is better at this point.)

> (Note that I would be very eager to discuss a proposal that
> actually
> tries to solve that problem.)
>
>
> You should do that! People have brought this up in the past but no
> progress has been made in the last 2 years.
> However, if this simple attribute is able to stir people's
> emotions, can
> you imagine what would happen if you propose something complex? :-)
>
>
> Sorry, but I have a long list of things on my todo list, and
> honestly this one is not nearly close to the top of the list,
> because I'm not aware of people asking for this feature very often.
> I'm sure there are some people who would like it, but there are
> many problems that we are trying to solve here, and this one doesn't
> look very high priority.
>
>
> That's fine but we're coming right back to the start: there is no way
> for informed authors to make a decision today.

Yes, absolutely.

> The "let's build something complex that solves everything" proposal
> won't be done in a long time. Meanwhile apps can make responsive UI's
> and fluid games.

That's I think one fundamental issue we're disagreeing on. I think that
apps can build responsive UIs and fluid games without this today on the Web.

> I don't have any other cases where this is done.
>
>
> That really makes me question the "positive feedback from web
> developers" cited in the original post on this thread. Can you
> please point us to places where that feedback is documented?
>
> ...
> Python:
>
> multiprocessing.cpu_count()
>
> 11,295 results
>
> https://github.com/search?q=__multiprocessing.cpu_count%28%__29+extension%3Apy&type=Code&__ref=advsearch&l=
> <https://github.com/search?q=multiprocessing.cpu_count%28%29+extension%3Apy&type=Code&ref=advsearch&l=>
>
> ...
> Java:
>
> Runtime.getRuntime().__availableProcessors()
>
> 23,967 results
>
> https://github.com/search?q=__availableProcessors%28%29+__extension%3Ajava&type=Code&__ref=searchresults
> <https://github.com/search?q=availableProcessors%28%29+extension%3Ajava&type=Code&ref=searchresults>
>
> ...
>
> node.js is also exposing it:
>
> require('os').cpus()
>
> 4,851 results
>
> https://github.com/search?q=__require%28%27os%27%29.cpus%28%__29+extension%3Ajs&type=Code&__ref=searchresults
> <https://github.com/search?q=require%28%27os%27%29.cpus%28%29+extension%3Ajs&type=Code&ref=searchresults>
>
>
> I don't view platform parity as a checklist of features, so I really
> have no interest in "checking this checkbox" just so that the Web
> platform can be listed in these kinds of lists. Honestly a list of
> github hits without more information on what this value is actually
> used for etc. is not really that helpful. We're not taking a vote
> of popularity here. ;-)
>
>
> Wait, you stated:
>
> Native apps don't typically run in a VM which provides highly
> sophisticated functionality for them.
>
> and
>
> That really makes me question the "positive feedback from
> web developers" cited in the original post on this thread.
>
> There were 24,000 hits for java which is on the web and a VM but now you
> say that it's not a vote of popularity?

We may have a different terminology here, but to me, "positive feedback
from web developers" should indicate a large amount of demand from the
web developer community for us to solve this problem at this point, and
also a strong positive signal from them on this specific solution with
the flaws that I have described above in mind. That simply doesn't map
to searching for API names on non-Web technologies on github. :-)

Also, FTR, I strongly disagree that we should implement all popular Java
APIs just because there is a way to run Java code on the web. ;-)

> ...
> Why not? How is the web platform different?
>
>
> Here's why I find the native platform parity argument
> unconvincing
> here. This is not the only primitive that native platforms
> expose
> to make it possible for you to write apps that scale to the
> number
> of available cores. For example, OS X provides GCD. Windows
> provides at least two threadpool APIs. Not sure if Linux
> directly
> addresses this problem right now.
>
>
> I'm not familiar with the success of those frameworks. Asking
> around at
> Adobe, so far I haven't found anyone that has used them.
> Tuning the application depending on the number of CPU's is done
> quite often.
>
>
> But do you have arguments on the specific problems I brought up
> which make this a bad idea?
>
>
> Can you restate the actual problem? I reread your message but didn't
> find anything that indicates this is a bad idea.

See above where I re-described why this is not a good technical solution
to achieve the goal of the API.

Also, as I've mentioned several times, this API basically ignores the
fact that there are AMP systems shipping *today* and dies not take the
fact that future Web engines may try to use as many cores as they can at
a higher priority (Servo being one example.)

> "Others do this" is just not going to convince me here.
>
> What would convince you? The fact that every other framework provides
> this and people use it, is not a strong indication?
> It's not possible for me to find exact javascript examples that use this
> feature since it doesn't exist.

I'm obviously not asking you to create evidence of usage of an API which
no engine has shipped yet. You originally cited strong positive
feedback from web developers on this and given the fact that I have not
seen that myself I would like to know more about where those requests
are coming from. At the lack of that, what would convince me would be
good answers to all of the points that I've brought up several times in
this thread (which I have summarized above.)

Please note that _if_ this were the single most requested features that
actually blocked people from building apps for the Web, I might have
been inclined to go on with a bad solution rather than no solution at
all. And if you provide evidence of that, I'm willing to reconsider my
position.

> ...
>
> I'm unsure how tabs are different from different processes.
> As an author, I would certainly want my web workers to run in
> parallel.
> Why else would I use workers to do number crunching?
> Again, this is a problem that already exists and we're not trying to
> solve it here.
>
>
> What _is_ the problem that you're trying to solve here then? I
> thought that this API is supposed to give you a number of workers
> that the application should start so that it can keep all of the
> cores busy?
>
>
> Make it possible for authors to make a semi-informed decision on how to
> divide the work among workers.

That can already be done using the timing attacks at the waste of some
CPU time. The question is, whether we should do that right now?

> In a good number of cases the pool will be smaller than the number of
> cores (ie a game), or it might be bigger (see the webkit bug that goes
> over this).

Which part of the WebKit bug are you mentioning exactly? The only
mention of "games" on the bug is
https://bugs.webkit.org/show_bug.cgi?id=132588#c10 which seems to argue
against your position. (It's not very easy to follow the discussion in
that bug...)

> Also, please note that there are use cases on native
> platforms which
> don't really exist on the Web. For example, on a desktop
> OS you
> might want to write a "system info" application which
> actually wants
> to list information about the hardware installed on the system.
>
>
> I don't think that's all that important.

Well, you seem to imply that the reason why those platforms expose the
number of cores is to support the use case under the discussion, and I'm
challenging that assumption.

> If you try Eli's test case in Firefox under different
> workloads (for
> example, while building Firefox, doing a disk
> intensive
> operation,
> etc.), the utter inaccuracy of the results is
> proof in the
> ineffectiveness of this number in my opinion.
>
>
> As Eli mentioned, you can run the algorithm for longer
> and get a
> more
> accurate result.
>
>
> I tried
>
> <http://wg.oftn.org/projects/____customized-core-estimator/____demo/
> <http://wg.oftn.org/projects/__customized-core-estimator/__demo/>
>
>
> <http://wg.oftn.org/projects/__customized-core-estimator/__demo/
> <http://wg.oftn.org/projects/customized-core-estimator/demo/>>>
> which
> is supposed to give you a more accurate estimate. Have you
> tried
> that page when the system is under load in Firefox?
>
>
> So did you try this? :-)
>
>
> I did. As expected, it drops off as the load increases. I don't see what
> this proves except that the polyfill is unreliable as it posted.

It's an argument that the information, if exposed from the UA, will be
*just* as unreliable.

> > Again, if the native platform didn't support this,
>
> doing this in C++ would result in the same.
>
>
> Yes, exactly. Which is why I don't really buy the argument
> that we
> should do this because native platforms do this.
>
>
> I don't follow. Yes, the algorithm is imprecise and it would be
> just as
> imprecise in C++.
> There is no difference in behavior between the web platform and
> native.
>
>
> My point is, I think you should have some evidence indicating why
> this is a good idea. So far I think the only argument has been the
> fact that this is exposed by other platforms.
>
>
> And used successfully on other platforms.
> Note that it is exposed on PNaCl in Chrome as well

So? PNaCl is a Chrome specific technology so it's not any more relevant
to this discussion that Python, Perl, Java, etc. is.

> Does Firefox behave different on such systems? (Is it even
> supported on
> these systems?)
> If so, how are workers scheduled? In the end, even if the cores are
> heterogeneous, knowing the number of them will keep them ALL
> busy (which
> means more work is getting done)
>
>
> I don't know the answer to any of these questions. I was hoping
> that you would do the research here. :-)
>
>
> I did a little bit of research. As usual, wikipedia is the easiest to
> read: http://en.wikipedia.org/wiki/Big.LITTLE There are many other
> papers [1] for more information.
>
> In "In-kernel switcher" mode, the little CPU's are taken offline when
> the big one spool up. So, in this case the number of cores is half the
> physical CPU's.
> In "Heterogeneous multi-processing", the big CPU's will help out when
> the system load increases. In this case, the number of cores is equal to
> the number of CPU's.

So which number is the one that the OS exposes to us in each case? And
is that number constant no matter how many actual hardware cores are
active at any given point in time?
I'm trying to do that here. :-)
That is not a fair summary of everything I have said here so far.
Please see the first paragraph of my response here where I summarize why
I think this doesn't help the use case that it's trying to solve.
You're of course welcome to disagree, but that doesn't mean that I've
necessarily failed to show my side of the argument.

> Your arguments apply equally to PNaCL, Java, native applications and all
> the other examples listed above

Yes they do!

> yet they all provide this functionality
> and people are using it to build successful applications.

1. PNaCl/Java/native platforms doing something doesn't make it right.
2. There is a reason why people have built more sophisticated solutions
to solve this problem (GCD/Windows threadpools, etc.) So let's not just
close our eyes on those solutions and pretend that the number of cores
is the only solution out there to address this use case in native platforms.

Cheers,
Ehsan

Rik Cabanier

unread,
May 15, 2014, 4:26:28 AM5/15/14
to Ehsan Akhgari, Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org
On Wed, May 14, 2014 at 11:39 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> On 2014-05-13, 9:01 PM, Rik Cabanier wrote:
>
>> ...
>>
>> The problem is that the API doesn't really make it obvious that
>> you're not supposed to take the value that the getter returns and
>> just spawn N workers. IOW, the API encourages the wrong behavior by
>> design.
>>
>>
>> That is simply untrue.
>>
>
> I'm assuming that the goal of this API is to allow authors to spawn as
> many workers as possible so that they can exhaust all of the cores in the
> interest of finishing their computation faster.


That is one way of using it but not the only one.
For instance, let's say that I'm writing on a cooperative game. I might
want to put all my network logic in a worker and want to make sure that
worker is scheduled. This worker consumes little (if any) cpu, but I want
it to be responsive.
NumCores = 1 -> do everything in the main thread and try to make sure the
network code executes
NumCores = 2 -> spin up a worker for the network code. Everything else in
the main thread
NumCores = 3 -> spin up a worker for the network code + another one for
physics and image decompression. Everything else in the main thread


> I have provided reasons why any thread which is running at a higher
> priority on the system busy doing work is going to make this number an over
> approximation, I have given you two examples of higher priority threads
> that we're currently shipping in Firefox (Chrome Workers and the
> MediaStreamGraph thread)


You're arguing against basic multithreading functionality. I'm unsure how
ANY thread framework in a browser could fix this since there might be other
higher priority tasks in the system.
For your example of Chrome Workers and MediaStreamGraph, I assume those
don't run at a constant 100% so a webapp that grabs all cores will still
get more work done.


> and have provided you with experimental evidence of running Eli's test
> cases trying to exhaust as many cores as it can fails to predict the number
> of cores in these situations.


Eli's code is an approximation. It doesn't prove anything.
I don't understand your point here.


> If you don't find any of this convincing, I'd respectfully ask us to
> agree to disagree on this point.


OK.


> For the sake of argument, let's say you are right. How are things worse
>> than before?
>>
>
> I don't think we should necessarily try to find a solution that is just
> not worse than the status quo, I'm more interested in us implementing a
> good solution here (and yes, I'm aware that there is no concrete proposal
> out there that is better at this point.)


So, worst case, there's no harm.
Best case, we have a more responsive application.

...
>>
>> That's fine but we're coming right back to the start: there is no way
>> for informed authors to make a decision today.
>>
>
> Yes, absolutely.
>
>
> The "let's build something complex that solves everything" proposal
>> won't be done in a long time. Meanwhile apps can make responsive UI's
>> and fluid games.
>>
>
> That's I think one fundamental issue we're disagreeing on. I think that
> apps can build responsive UIs and fluid games without this today on the Web.
>

Sure. You can build apps that don't tax the system or that are specifically
tailored to work well on a popular system.


> There were 24,000 hits for java which is on the web and a VM but now you
>> say that it's not a vote of popularity?
>>
>
> We may have a different terminology here, but to me, "positive feedback
> from web developers" should indicate a large amount of demand from the web
> developer community for us to solve this problem at this point, and also a
> strong positive signal from them on this specific solution with the flaws
> that I have described above in mind. That simply doesn't map to searching
> for API names on non-Web technologies on github. :-)
>

This was not a simple search. Please look over the examples especially the
node.js ones and see how it's being used.
This is what we're trying to achieve with this attribute.


> Also, FTR, I strongly disagree that we should implement all popular Java
> APIs just because there is a way to run Java code on the web. ;-)

...
>>
>> Can you restate the actual problem? I reread your message but didn't
>> find anything that indicates this is a bad idea.
>>
>
> See above where I re-described why this is not a good technical solution
> to achieve the goal of the API.
>
> Also, as I've mentioned several times, this API basically ignores the fact
> that there are AMP systems shipping *today* and dies not take the fact that
> future Web engines may try to use as many cores as they can at a higher
> priority (Servo being one example.)


OK. They're free to do so. This is not a problem (see previous messages)
It seems like you're arguing against basic multithreading again.


> "Others do this" is just not going to convince me here.
>>
>> What would convince you? The fact that every other framework provides
>> this and people use it, is not a strong indication?
>> It's not possible for me to find exact javascript examples that use this
>> feature since it doesn't exist.
>>
>
> I'm obviously not asking you to create evidence of usage of an API which
> no engine has shipped yet. You originally cited strong positive feedback
> from web developers on this and given the fact that I have not seen that
> myself I would like to know more about where those requests are coming
> from. At the lack of that, what would convince me would be good answers to
> all of the points that I've brought up several times in this thread (which
> I have summarized above.)
>
> Please note that _if_ this were the single most requested features that
> actually blocked people from building apps for the Web, I might have been
> inclined to go on with a bad solution rather than no solution at all. And
> if you provide evidence of that, I'm willing to reconsider my position.


It's not blocking people from building apps. It's blocking them from being
able to squeeze performance out of their browsers. This is not a problem
for native applications.


> ...
>>
>> Make it possible for authors to make a semi-informed decision on how to
>> divide the work among workers.
>>
>
> That can already be done using the timing attacks at the waste of some CPU
> time.


It's imprecise and wasteful. A simple attribute check is all this should
take.


> The question is, whether we should do that right now?
>
>
> In a good number of cases the pool will be smaller than the number of
>> cores (ie a game), or it might be bigger (see the webkit bug that goes
>> over this).
>>
>
> Which part of the WebKit bug are you mentioning exactly? The only mention
> of "games" on the bug is https://bugs.webkit.org/show_
> bug.cgi?id=132588#c10 which seems to argue against your position. (It's
> not very easy to follow the discussion in that bug...)


It's in Filip's message how some algorithms run better if you double the
number of threads per core.


> Also, please note that there are use cases on native
>> platforms which
>> don't really exist on the Web. For example, on a desktop
>> OS you
>> might want to write a "system info" application which
>> actually wants
>> to list information about the hardware installed on the
>> system.
>>
>>
>> I don't think that's all that important.
>>
>
> Well, you seem to imply that the reason why those platforms expose the
> number of cores is to support the use case under the discussion, and I'm
> challenging that assumption.
>

Sorry, I don't understand your response.
What I meant to say was that using this API to create a "system info"
application is not that important. The vast majority of users doesn't care
or even know how many cores their system has.

...
>>
>> I did. As expected, it drops off as the load increases. I don't see what
>> this proves except that the polyfill is unreliable as it posted.
>>
>
> It's an argument that the information, if exposed from the UA, will be
> *just* as unreliable.


You're arguing against basic multithreading again.


> ...
>> My point is, I think you should have some evidence indicating why
>> this is a good idea. So far I think the only argument has been the
>> fact that this is exposed by other platforms.
>>
>>
>> And used successfully on other platforms.
>> Note that it is exposed on PNaCl in Chrome as well
>>
>
> So? PNaCl is a Chrome specific technology so it's not any more relevant
> to this discussion that Python, Perl, Java, etc. is.


They are all relevant as a counter to your statement of:

"native apps don't typically run in a VM which provides highly
sophisticated functionality for them"



> ...
>>
>> I did a little bit of research. As usual, wikipedia is the easiest to
>> read: http://en.wikipedia.org/wiki/Big.LITTLE There are many other
>> papers [1] for more information.
>>
>> In "In-kernel switcher" mode, the little CPU's are taken offline when
>> the big one spool up. So, in this case the number of cores is half the
>> physical CPU's.
>> In "Heterogeneous multi-processing", the big CPU's will help out when
>> the system load increases. In this case, the number of cores is equal to
>> the number of CPU's.
>>
>
> So which number is the one that the OS exposes to us in each case?


See diagrams on page 4 and 5 of the Samsung paper [1]
Half the cores for "in-kernel" switcher, all the cores for "Heterogeneous
multi-processing"


> And is that number constant no matter how many actual hardware cores are
> active at any given point in time?


I believe so.


> ...
>> What Boris said.
>>
>>
>> He didn't refute that knowing the number of cores would still help.
>>
>
> I'm trying to do that here. :-)
>
> ...
>> I disagree. Let me try to rephrase the issue with this. The number
>> of available cores is not a constant number equal to the number of
>> logical cores exposed to us by the OS. This number varies depending
>> on everything else which is going on in the system, including the
>> things that the UA has control over and the things that it does not.
>> I hope the reason for my opposition is clear so far.
>>
>> No, you failed to show why this does not apply to the web platform and
>> JavaScript in particular.
>>
>
> That is not a fair summary of everything I have said here so far. Please
> see the first paragraph of my response here where I summarize why I think
> this doesn't help the use case that it's trying to solve. You're of course
> welcome to disagree, but that doesn't mean that I've necessarily failed to
> show my side of the argument.
>
> Your arguments apply equally to PNaCL, Java, native applications and all
>> the other examples listed above
>>
>
> Yes they do!
>
> > yet they all provide this functionality
>
>> and people are using it to build successful applications.
>>
>
> 1. PNaCl/Java/native platforms doing something doesn't make it right.
>

If other frameworks can use it to make better applications with no bad side
effects, yes, it does make it right.


> 2. There is a reason why people have built more sophisticated solutions to
> solve this problem (GCD/Windows threadpools, etc.) So let's not just close
> our eyes on those solutions and pretend that the number of cores is the
> only solution out there to address this use case in native platforms.
>

I've always said that we can add that later. Threadpools/GCD/TBB serve
specific use cases and are not a solution for everything.
Filip brought up that web workers are not compatible with a GDC-like
solution so that's something that needs to be solved as well.

1:
http://www.samsung.com/global/business/semiconductor/minisite/Exynos/data/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdf

Ben Kelly

unread,
May 15, 2014, 12:40:31 PM5/15/14
to Rik Cabanier, Joshua Cranmer 🐧, Ehsan Akhgari, dev-pl...@lists.mozilla.org
On May 15, 2014, at 1:26 AM, Rik Cabanier <caba...@gmail.com> wrote:
> On Wed, May 14, 2014 at 11:39 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:
>> ...
>>>
>>> Make it possible for authors to make a semi-informed decision on how to
>>> divide the work among workers.
>>>
>>
>> That can already be done using the timing attacks at the waste of some CPU
>> time.
>
>
> It's imprecise and wasteful. A simple attribute check is all this should
> take.

If we want to support games on mobile platforms like Firefox OS, then this seems like a pretty important point.

Do we really want apps on buri (or tarako) wasting CPU, memory, and power to determine that they should not spin up web workers?

Ben

lrb...@gmail.com

unread,
May 16, 2014, 2:03:15 PM5/16/14
to
Do you think it would be feasible that the browser fires events every time the number of cores available for a job changes? That might allow to build an efficient event-based worker pool.

In the meantime, there are developers out there who are downloading micro-benchmarks on every client to stress-test the browser and determine the number of physical core. This is nonsense, we can all agree, but unless you give them a short-term alternative, they'll keep doing exactly that. And "native" will keep looking a lot more usable than the web.

lrb...@gmail.com

unread,
May 16, 2014, 2:26:35 PM5/16/14
to
Here's the naive worker pool implementation I was thinking about. It requires that the browser fires an event everytime a core becomes available (only in an active tab of course), and provide a property that tells whether or not a core is available at a given time:

// a handler that runs when a job is added to the queue or when a core becomes available
jobHandler() {
if ( isJobInTheQueue && isCoreAvailable ) {
if ( noWorkerAvailable ) {
pool.spawnWorker();
}
pool.distribute( queue.pullJob() );
}
}

Rik Cabanier

unread,
May 16, 2014, 2:38:41 PM5/16/14
to lrb...@gmail.com, dev-pl...@lists.mozilla.org
On Fri, May 16, 2014 at 11:03 AM, <lrb...@gmail.com> wrote:

> Do you think it would be feasible that the browser fires events every time
> the number of cores available for a job changes? That might allow to build
> an efficient event-based worker pool.
>

I think this will be very noisy and might cause a lot of confusion.
Also I'm unsure how we could even implement this since the operating
systems don't give us such information.


> In the meantime, there are developers out there who are downloading
> micro-benchmarks on every client to stress-test the browser and determine
> the number of physical core. This is nonsense, we can all agree, but unless
> you give them a short-term alternative, they'll keep doing exactly that.
> And "native" will keep looking a lot more usable than the web.


I agree.
Do you have pointers where people are describing this?

lrb...@gmail.com

unread,
May 16, 2014, 3:04:41 PM5/16/14