User Mode Scheduling

282 views
Skip to first unread message

code...@gmail.com

unread,
Jun 14, 2015, 4:10:25 AM6/14/15
to li...@googlegroups.com
At NodeConf 2015, I brought the idea of using User Mode Scheduling(UMS) in libuv.

I have built a few things with UMS and integrated it with V8.

I wanted to get some feedback on this before I get started. Maybe tell me some things to avoid, or best practices for this project.

Feel free to contact me directly to discuss this, and I will be uploading some code that uses UMS and V8 to github soon.

My GitHub username is the codepilot, just like my gmail username.

The projects that use UMS are v8green, and anything with UMS in its name.

Thanks for the opportunity to do this, and I am excited to get feedback!

Ben Noordhuis

unread,
Jun 14, 2015, 6:08:46 AM6/14/15
to li...@googlegroups.com
Perhaps you could start with an outline of your exact goals and how
you plan to accomplish them?

Also, what is your definition of user-mode scheduling? Do you mean
green threading with makecontext/swapcontext, this set of patches for
Linux[0], or something else?

[0] http://www.linuxplumbersconf.org/2013/ocw/sessions/1653

code...@gmail.com

unread,
Jun 16, 2015, 5:34:03 AM6/16/15
to li...@googlegroups.com
I am referring to Window 7 and beyond User Mode Scheduling. see https://msdn.microsoft.com/en-us/library/windows/desktop/dd627187%28v=vs.85%29.aspx

I uploaded some examples of UMS on my GitHub area, these are old experiments, and I will be updating them a bit to extract useful performance measurements. 

See
   Doesn't currently compile, I am changing it to use node 0.12.4.

   Super simple example.

   Simple webserver, made to compare UMS with the New Threadpool API

Ben Noordhuis

unread,
Jun 16, 2015, 10:41:36 AM6/16/15
to li...@googlegroups.com
On Tue, Jun 16, 2015 at 11:30 AM, <code...@gmail.com> wrote:
> I am referring to Window 7 and beyond User Mode Scheduling. see
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd627187%28v=vs.85%29.aspx
>
> I uploaded some examples of UMS on my GitHub area, these are old
> experiments, and I will be updating them a bit to extract useful performance
> measurements.
>
> See
> https://github.com/codepilot/v8green
> Doesn't currently compile, I am changing it to use node 0.12.4.
>
> https://github.com/codepilot/UMS
> Super simple example.
>
> https://github.com/codepilot/UMS1
> Simple webserver, made to compare UMS with the New Threadpool API

How and why would you integrate that with libuv? It's not entirely
clear to me what you're trying to accomplish.

Saúl Ibarra Corretgé

unread,
Jun 16, 2015, 12:22:30 PM6/16/15
to li...@googlegroups.com
My guess would be as a replacement for the thread-pool? Then, if that is
the case, your proposal for a customizable threadpool would help here, I
guess.

--
Saúl Ibarra Corretgé
bettercallsaghul.com


signature.asc

code...@gmail.com

unread,
Jun 16, 2015, 5:13:55 PM6/16/15
to li...@googlegroups.com
Good guess Saúl, I want to provide some alternative implementations for the thread-pool that is currently used in libuv on windows.

UMS from everything I have read and experimented with is just plane awesome.

See Microsoft's sales pitch for it in a blog 


and a channel 9 video


But, it only exists for x64 Windows 7 and x64 Server 2008 r2, or newer.

So, I also want to experiment with and provide the OLD Thread Pool API(available for x86+x64 xp/2003) and the NEW Thread Pool API(available for x86+x64 vista/2008), see msdn link


I can see that part of the OLD thread pool api is being used, namely RegisterWaitForSingleObject and UnregisterWaitEx, but I am interested in checking the entire package of the OLD and NEW thread pool apis,



Hopefully, with these new options, multicore performance scaling can be improved.

I have found through experimentation and reading that UMS has special benefits that cannot be found in any other way.

Also, from experimentation I have found that the new thread pool api can sometimes show an improvement over direct usage of a pool of threads.

I forked libuv with the intention of adding these experiments to the benchmarks, and if we find that there are some improvements with the additional options, then I hope we add them to the master tree.

see



Can someone let me know if XP and Vista are still important targets for libuv? Or, should I be targeting win 7 and newer?

Saúl Ibarra Corretgé

unread,
Jun 17, 2015, 4:42:00 AM6/17/15
to li...@googlegroups.com
On 06/16/2015 11:13 PM, code...@gmail.com wrote:
> Good guess Saúl, I want to provide some alternative implementations for
> the thread-pool that is currently used in libuv on windows.
>
> UMS from everything I have read and experimented with is just plane awesome.
>
> See Microsoft's sales pitch for it in a blog
>
> http://blogs.msdn.com/b/nativeconcurrency/archive/2009/02/02/dave-probert-goes-deep-on-win7-user-mode-scheduled-threads.aspx
>
> and a channel 9 video
>
> https://channel9.msdn.com/Shows/Going+Deep/Dave-Probert-Inside-Windows-7-User-Mode-Scheduler-UMS
>
> But, it only exists for x64 Windows 7 and x64 Server 2008 r2, or newer.
>
> So, I also want to experiment with and provide the OLD Thread Pool
> API(available for x86+x64 xp/2003) and the NEW Thread Pool API(available
> for x86+x64 vista/2008), see msdn link
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686766(v=vs.85).aspx
>
> I can see that part of the OLD thread pool api is being used,
> namely RegisterWaitForSingleObject and UnregisterWaitEx, but I am
> interested in checking the entire package of the OLD and NEW thread pool
> apis,
>

Our current threadpool implementation does not use those APIs. It's a
user-space threadpool built by a Dutch Systems Artisan, Ben. Those APIs
are used as well, but for reading from names pipes which we cannot use
IOCP on (IIRC).

>
>
> Hopefully, with these new options, multicore performance scaling can be
> improved.
>

What multicore problem are you trying to solve here? Each libuv event
loop runs in a single thread and that will remain as is for the
foreseeable future. Only filesystem and DNS operations use the threadpool.

> I have found through experimentation and reading that UMS has special
> benefits that cannot be found in any other way.
>
> Also, from experimentation I have found that the new thread pool api can
> sometimes show an improvement over direct usage of a pool of threads.
>
> I forked libuv with the intention of adding these experiments to the
> benchmarks, and if we find that there are some improvements with the
> additional options, then I hope we add them to the master tree.
>
> see
>
> https://github.com/codepilot/libuv
>
>
> Can someone let me know if XP and Vista are still important targets for
> libuv? Or, should I be targeting win 7 and newer?
>

We still support those. Ideally we'll drop XP and Server 2k3 support in
the next major release, but Vista is still supported by MS.


Cheers,
signature.asc

Iñaki Baz Castillo

unread,
Jun 17, 2015, 12:43:36 PM6/17/15
to li...@googlegroups.com
2015-06-16 23:13 GMT+02:00 <code...@gmail.com>:
> Hopefully, with these new options, multicore performance scaling can be
> improved.

It seems that you are trying to improve something that cannot be
improved given that libuv is mostly single threaded.


--
Iñaki Baz Castillo
<i...@aliax.net>

Bert Belder

unread,
Jun 17, 2015, 1:50:29 PM6/17/15
to li...@googlegroups.com, i...@aliax.net
I am greatly in favor of doing an experiment with user-mode scheduling. 
UMS is this magical thing where the operating system asks you what to do next when it blocks a thread on I/O (or a page fault/yield).
I think it would be possible to create a much more efficient thread pool than what we currently have.

So +1 from me; try to implement an UMS-based thread pool and run the 'stat' benchmark so see if it makes for a good improvement.

- Bert

codepilot Account

unread,
Jun 18, 2015, 12:49:33 AM6/18/15
to li...@googlegroups.com, i...@aliax.net
Thanks Bert. I have been getting some reading done on libuv, and running the fs_stat benchmark. I already forked and made a new ums branch, and committed a tiny change to increase a buffer used for printing.

I am mapping out how I want to integrated UMS into libuv right now, and updating my ums1 project too, so that I have a good example of runtime behavior for when I get started changing the libuv code.

There are many magical things in ums, I would suggest that it is likely the best way to do a thread pool that uses I/O. As for example, it was created specifically to make thread pools for MS SQL Server, according to the going deep video I mentioned above. If there was a faster way, I would suspect that Microsoft would have found it.
--
You received this message because you are subscribed to a topic in the Google Groups "libuv" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/libuv/DLXZdgwnt1k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to libuv+un...@googlegroups.com.
To post to this group, send email to li...@googlegroups.com.
Visit this group at http://groups.google.com/group/libuv.
For more options, visit https://groups.google.com/d/optout.

code...@gmail.com

unread,
Jun 29, 2015, 7:50:40 PM6/29/15
to li...@googlegroups.com
I successfully added UMS to libuv, it works. See commit https://github.com/codepilot/libuv/commit/facca7bbd9c40e90199470179d993915dd159341

The speed isn't too much slower, see fs_stat.txt vs fs_stat_ums.txt

The speed difference is mostly because UMS has some builtin synchronization, and so does the work queue. These conflict some, but do not cause errors, just slow downs.

code...@gmail.com

unread,
Jun 30, 2015, 10:41:15 AM6/30/15
to li...@googlegroups.com
FYI, my workstation for testing and development consists of:
  • OS: Windows 8.1 x64, with all of the updates
  • CPU: Haswell-E I7-5820K, normal clocking of 3.3GHz, 6 physical cores, 12 logical cores
  • Memory: 16GB Total, normal clocking of DDR4-2133, Quad Chanel each with single stick of 4GB
  • System drive: Samsung 840 EVO 250GB, with Rapid Mode Enabled
  • Data drives: 3x Hitachi 4TB, not used for testing

Saúl Ibarra Corretgé

unread,
Jul 1, 2015, 5:15:03 AM7/1/15
to li...@googlegroups.com
On 30/06/15 01:50, code...@gmail.com wrote:
> I successfully added UMS to libuv, it works. See commit
> https://github.com/codepilot/libuv/commit/facca7bbd9c40e90199470179d993915dd159341
>
> The speed isn't too much slower, see fs_stat.txt
> <https://github.com/codepilot/libuv/blob/libuv/ums/test/fs_stat.txt> vs
> fs_stat_ums.txt
> <https://github.com/codepilot/libuv/blob/libuv/ums/test/fs_stat_ums.txt>
>
> The speed difference is mostly because UMS has some builtin
> synchronization, and so does the work queue. These conflict some, but do
> not cause errors, just slow downs.
>

Great work, thanks for sharing!

IMHO, this is not the right way to integrate support for UMS, though
it's a great way to see it working.

I think uv_thread_create should always use native threads. Then Windows
could use a threadpool implementation which uses UMS threads instead of
native threads. Doing this you might be able to simplify synchronization
and thus get some speedups.


Cheers,

--
Saúl Ibarra Corretgé
http://bettercallsaghul.com

codepilot Account

unread,
Jul 1, 2015, 12:05:48 PM7/1/15
to li...@googlegroups.com
I don't think it is the right way either, but I was trying to confine the changes to as small an area as possible, just one file, to see if it still works, and some of the 
performance implications.

I completely agree with you, but from the benchmarks, the UMS replacement does appear to convince libuv that it is a real thread, not a scheduler with little threads inside.

I will try to move the changes to replace the thread pool implementation instead of individual threads.

Something that I think might be really interesting is if not just the workers were UMS threads, but the thread(s) that generates the requests and runs the callbacks was also UMS. Then, it could generate the requests, with UmsThreadYield for implicit synchronization, and use UmsThreadYield in the polling loop to check for and run call backs.

But, I have read all of the documentation, and thoroughly looked over the source, and I see no real way to do that without massive external interface changes, other than suspending the thread, and swapping the Thread Environment Blocks. I might try that.

My goal is to have no interface changes, just implementation changes, if they improve something.


Reply all
Reply to author
Forward
0 new messages