Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

/dev/poll DP_POLL ioctl insanely slow

466 views
Skip to first unread message

Eric B.

unread,
Aug 30, 2007, 5:50:12 PM8/30/07
to
Hi all,

I'm writing some software that opens a few thousand TCP sockets and
receives on them. I want to make it as efficient as possible, and on
Solaris < 10 I've been led to believe that usually involves using /dev/
poll.

When I add a new file descriptor to my interest set or change events
I'm interested in or remove the file descriptor, I do something like:

write(dpfd, &(dp_fds[i]), sizeof(struct pollfd));

checking that the return from write == sizeof(pollfd), of course.
When I want to wait for some file descriptors to pop, I do something
like:

ioctl(dpfd, DP_POLL, &(devpoll_dopoll));

with devpoll_dopoll.dp_timeout variable, but usually around 500 ms,
and devpoll_dopoll.dp_nfds = my maximum number of fds. I then service
the fd's returned by the ioctl.

Everything works. No errors, no problems, everything gets handled.
However, a completely insane amount of time is spent in the ioctl
call. The exact same app, using a regular poll() to wait instead of
the ioctl(), takes far, far less CPU. Some example truss -c output
from a /dev/poll run:

syscall seconds calls errors
_exit .000 1
read .001 8
write .165 13450
close .293 3017
brk .077 9478
ioctl 33.480 6065
fcntl .018 3009
fcntl .019 3009
lwp_park .002 142
lwp_unpark .001 142
mmap .000 1
yield .000 4
lwp_exit .000 1
lwp_sigmask .000 1
lwp_wait .000 1
nanosleep .003 49
so_socket .258 3009
bind .137 3009
connect .435 3009 796
recv .156 10215
recvfrom .231 5470
sendmsg .142 1897
getsockopt .134 6018
-------- ------ ----
sys totals: 35.559 71005 796
usr time: 6.580
elapsed: 51.130


and the exact same app, same conditions, using poll() instead:

syscall seconds calls errors
_exit .000 1
write .001 21
close .033 507
brk .009 1584
fcntl .002 500
fcntl .002 500
lwp_park .000 2
lwp_unpark .000 2
mmap .000 1
lwp_exit .000 1
lwp_sigmask .000 1
lwp_wait .000 1
pollsys .291 1576
nanosleep .001 19
lwp_mutex_timedlock .000 1
so_socket .024 500
bind .013 500
connect .047 500 166
recv .845 55685
recvfrom .014 677
sendmsg .008 102
getsockopt .016 1000
-------- ------ ----
sys totals: 1.312 63681 166
usr time: 4.547
elapsed: 18.900


This is using Solaris 10 64-bit Sparc. I got similar results on a
Solaris 8 64-bit Sparc machine, and using a Solaris 10 64-bit AMD
machine.

I glanced at Solaris 10's /dev/poll code briefly, but didn't see
anything particularly brain damaged. What am I likely doing wrong?
(I'm not running out of allowable fd's; I set ulimit -n 20000 and I'm
only using up to around 6000).

Any ideas? I can't post real code, unfortunately. It's based off
Sun's example code with hints from various other Solaris /dev/poll
implementations from projects that ostensibly work and perform well.
I don't know what I'm doing wrong.

Thanks,
Eric

Casper H.S. Dik

unread,
Aug 31, 2007, 5:44:07 AM8/31/07
to
"Eric B." <ebo...@gmail.com> writes:

>When I add a new file descriptor to my interest set or change events
>I'm interested in or remove the file descriptor, I do something like:

>write(dpfd, &(dp_fds[i]), sizeof(struct pollfd));

That looks correct.

>checking that the return from write == sizeof(pollfd), of course.
>When I want to wait for some file descriptors to pop, I do something
>like:

>ioctl(dpfd, DP_POLL, &(devpoll_dopoll));

>with devpoll_dopoll.dp_timeout variable, but usually around 500 ms,
>and devpoll_dopoll.dp_nfds = my maximum number of fds. I then service
>the fd's returned by the ioctl.

Why the short wait? Is there other work to be done? Does it help
if you wait much longer?

The many pollfds do you pass to the DO_POLL ioctl?

I notice that you use around 6x as many filedescriptors in the /dev/poll
example so the comparison isn't completely like for like.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Eric B.

unread,
Aug 31, 2007, 11:05:43 AM8/31/07
to
On Aug 31, 4:44 am, Casper H.S. Dik <Casper....@Sun.COM> wrote:

> "Eric B." <ebow...@gmail.com> writes:
> >When I add a new file descriptor to my interest set or change events
> >I'm interested in or remove the file descriptor, I do something like:
> >write(dpfd, &(dp_fds[i]), sizeof(struct pollfd));
>
> That looks correct.
>
> >checking that the return from write == sizeof(pollfd), of course.
> >When I want to wait for some file descriptors to pop, I do something
> >like:
> >ioctl(dpfd, DP_POLL, &(devpoll_dopoll));
> >with devpoll_dopoll.dp_timeout variable, but usually around 500 ms,
> >and devpoll_dopoll.dp_nfds = my maximum number of fds. I then service
> >the fd's returned by the ioctl.
>
> Why the short wait? Is there other work to be done? Does it help
> if you wait much longer?

There's other work to be done, but mostly it's a very latency-
sensitive application, so I want to return from the ioctl and service
other events in a small amount of time. The timeouts when I use plain
poll() are exactly the same. Could there be an issue with the pollfd
caching in Solaris? The application fd use profile basically starts
out registering interest in a few thousand fd's rather quickly, but is
poll()'ing as it registers them. It then receives for a while, not
adding any new fds while poll()'ing, and when its done receiving, it
removes interest in all fds and closes up the sockets.

> The many pollfds do you pass to the DO_POLL ioctl?

It grows on application startup from around 10 up to 6000.

> I notice that you use around 6x as many filedescriptors in the /dev/poll
> example so the comparison isn't completely like for like.

Whoops - you're right, I accidentally copied output from a test with
poll using a much lower number of fds. The output with both using the
same number still shows similarly terrible ioctl performance, though.
Here's another set of output with the same number of fds (about 6000):

with /dev/poll:

syscall seconds calls errors
_exit .000 1

read .000 1
write .154 14244
close .186 3008
brk .073 9478
ioctl 19.897 530106
fcntl .017 3000
fcntl .019 3000
lwp_park .011 637
lwp_unpark .006 636
mmap .000 1
yield .000 20


lwp_exit .000 1
lwp_sigmask .000 1
lwp_wait .000 1

nanosleep .004 70
lwp_mutex_timedlock .000 1
so_socket .230 3000
bind .133 3000
connect .409 3000 640
recv .956 60892
recvfrom .185 4968
sendmsg .283 3915
getsockopt .129 6000
-------- ------ ----
sys totals: 22.700 648981 640
usr time: 11.382
elapsed: 71.970


with poll():

syscall seconds calls errors
_exit .000 1

read .003 288
write .013 1113
close .245 3627
brk .068 9488
fcntl .020 3620
fcntl .023 3620
lwp_park .015 898
lwp_unpark .009 898
mmap .000 1
yield .000 25


lwp_exit .000 1
lwp_sigmask .000 1
lwp_wait .000 1

pollsys 1.117 7924
nanosleep .003 56
so_socket .222 3620
bind .123 3620
connect .351 3620 1701
recv 1.124 67715
recvfrom .132 5481
sendmsg .124 1831
getsockopt .134 7240
-------- ------ ----
sys totals: 3.733 124689 1701
usr time: 9.730
elapsed: 58.780


Looks like this run, poll() actually used a few more fds. But nearly
20 seconds in ioctl with /dev/poll vs. a bit more than 1 second in
pollsys with poll()? That's still pretty bad. I could see /dev/poll
not providing a huge speedup over regular poll(), but I can't really
see how it could be 20 times slower with the same timeouts and same
number of fds. I almost have to be doing something wrong - but then
again, my poll() works fine, and I got fantastic results trying
epoll() on Linux. And /dev/poll _works_, it's just acting very slow.

Would you recommend skipping /dev/poll altogether and going with
Solaris 10's Event Completion Ports? I was wanting to maintain
Solaris 8/9 support, but I'm not sure how needed it is. I'd like to
eventually support both.

Thanks!

Eric

Casper H.S. Dik

unread,
Sep 3, 2007, 5:55:35 AM9/3/07
to
>There's other work to be done, but mostly it's a very latency-
>sensitive application, so I want to return from the ioctl and service
>other events in a small amount of time. The timeouts when I use plain
>poll() are exactly the same. Could there be an issue with the pollfd
>caching in Solaris? The application fd use profile basically starts
>out registering interest in a few thousand fd's rather quickly, but is
>poll()'ing as it registers them. It then receives for a while, not
>adding any new fds while poll()'ing, and when its done receiving, it
>removes interest in all fds and closes up the sockets.

The number of calls to "ioctl" seems rather high, though. 500000?
You only have 8000 calls to poll?

I don't quite understand that as you are saying it calls ioctl(DO_POLL)
abotu as often as poll()?

Eric B.

unread,
Sep 4, 2007, 12:08:59 PM9/4/07
to
On Sep 3, 4:55 am, Casper H.S. Dik <Casper....@Sun.COM> wrote:
> >There's other work to be done, but mostly it's a very latency-
> >sensitive application, so I want to return from the ioctl and service
> >other events in a small amount of time. The timeouts when I use plain
> >poll() are exactly the same. Could there be an issue with the pollfd
> >caching in Solaris? The application fd use profile basically starts
> >out registering interest in a few thousand fd's rather quickly, but is
> >poll()'ing as it registers them. It then receives for a while, not
> >adding any new fds while poll()'ing, and when its done receiving, it
> >removes interest in all fds and closes up the sockets.
>
> The number of calls to "ioctl" seems rather high, though. 500000?
> You only have 8000 calls to poll?
>
> I don't quite understand that as you are saying it calls ioctl(DO_POLL)
> abotu as often as poll()?

Hum - you're right, that does seem really odd. There's no reason I
know of that ioctl _should_ be getting called nearly that many times;
it's called in the same place poll() would be, in the same program,
with the same test case... I'll try looking into why it's being
called so many more times. If you assume about the same length of
time for each ioctl call, calling it the 8000 times poll was called
instead of 50000 times would yield about the right performance gain
for /dev/poll over poll. Suspicious!

I'll post again when I figure out what's causing so many more calls to
ioctl than to poll and let you know. It's also possible that number
of calls to "poll()" itself and ioctl() are similar, but poll()
doesn't always call pollsys. But I doubt it - time for some heavier
duty instrumentin' with the fancy Sun Performance Analyzer.

Eric

Eric B.

unread,
Sep 4, 2007, 1:11:36 PM9/4/07
to
Okay - I made an oopsie and had dopoll.dp_nfds hard coded to just 10
as an experiment, which made ioctl get called many more times than
poll (which was still set to poll all fds), as more than 10 fds were
popping at a time. But strangely, hard-coding the number of fds to
poll with /dev/poll to 10, although dramatically increasing the number
of calls to ioctl, actually made overall time spent in ioctl go
_down_. With nfds set to 10, I get about 20 seconds total in ioctl.
With nfds set to the actual number of fds in the set I'm watching, I
call ioctl an order of magnitude less times - but my total time spent
in ioctl increases to about 35 seconds!

That doesn't seem to make much sense. The same number of events ends
up getting serviced either way. I'll keep poking at things.

Side (possibly related) question: on this man page for IRIX's /dev/
poll:

http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=man&fname=/usr/share/catman/a_man/cat7/poll.z

it says "Users wishing to remove flags from the events field of a
registered pollfd should write two pollfds to the device; the first
should remove the pollfd with the POLLREMOVE flag and the second
should re-add the pollfd with the new set of desired flags." I was
under the impression that the two writes was unecessary with
Solaris's /dev/poll, since if it saw a pollfd with the same fd as one
it was already monitoring, it would automatically "merge" them into
one pollfd by ORing their two event fields together. Is that correct,
or should I actually be doing the two writes each time I remove an
event type I'm interested in?

Right now, I just change the event flags and do a write if I'm still
interested in at least one event type. If I'm no longer interested in
any events, I set the POLLREMOVE flag and do a write. I'll try doing
a full remove and re-add for every change of events I'm interested in
as an experiment; if I've been mistaken and Solaris's /dev/poll is
similar to IRIX's, then I'm probably ending up with a bunch of
essentially duplicate pollfds in the set /dev/poll is monitoring, with
about half of them just never popping.

Eric

Eric B.

unread,
Sep 4, 2007, 2:14:34 PM9/4/07
to
On Sep 4, 12:11 pm, "Eric B." <ebow...@gmail.com> wrote:
> Okay - I made an oopsie and had dopoll.dp_nfds hard coded to just 10
> as an experiment, which made ioctl get called many more times than
> poll (which was still set to poll all fds), as more than 10 fds were
> popping at a time. But strangely, hard-coding the number of fds to
> poll with /dev/poll to 10, although dramatically increasing the number
> of calls to ioctl, actually made overall time spent in ioctl go
> _down_. With nfds set to 10, I get about 20 seconds total in ioctl.
> With nfds set to the actual number of fds in the set I'm watching, I
> call ioctl an order of magnitude less times - but my total time spent
> in ioctl increases to about 35 seconds!
>
> That doesn't seem to make much sense. The same number of events ends
> up getting serviced either way. I'll keep poking at things.
>
> Side (possibly related) question: on this man page for IRIX's /dev/
> poll:
>
> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=m...

>
> it says "Users wishing to remove flags from the events field of a
> registered pollfd should write two pollfds to the device; the first
> should remove the pollfd with the POLLREMOVE flag and the second
> should re-add the pollfd with the new set of desired flags." I was
> under the impression that the two writes was unecessary with
> Solaris's /dev/poll, since if it saw a pollfd with the same fd as one
> it was already monitoring, it would automatically "merge" them into
> one pollfd by ORing their two event fields together. Is that correct,
> or should I actually be doing the two writes each time I remove an
> event type I'm interested in?
>
> Right now, I just change the event flags and do a write if I'm still
> interested in at least one event type. If I'm no longer interested in
> any events, I set the POLLREMOVE flag and do a write. I'll try doing
> a full remove and re-add for every change of events I'm interested in
> as an experiment; if I've been mistaken and Solaris's /dev/poll is
> similar to IRIX's, then I'm probably ending up with a bunch of
> essentially duplicate pollfds in the set /dev/poll is monitoring, with
> about half of them just never popping.
>
> Eric

Aaack! That was it!

Doing the full two writes (one to POLLREMOVE, and then another to re-
add the pollfd to the interest set) each time I want to change which
events I'm interested in seems to do the trick. Here's a run with
that change:

syscall seconds calls errors
_exit .000 1

read .002 209
write .195 18383
close .246 3810
brk .067 9478
ioctl .636 8844
fcntl .022 3802
fcntl .024 3802
lwp_park .019 1059
lwp_unpark .011 1059
mmap .000 1
yield .000 4


lwp_exit .000 1
lwp_sigmask .000 1
lwp_wait .000 1

nanosleep .003 57
lwp_mutex_timedlock .000 2
so_socket .236 3802
bind .134 3802
connect .358 3802 2061
recv 1.367 82616
recvfrom .147 6754
sendmsg .130 1830
getsockopt .140 7604
-------- ------ ----
sys totals: 3.743 160724 2061
usr time: 9.819
elapsed: 60.020


0.636 seconds in iotcl ain't bad - depending on the run, that's now
almost half the time I spend in regular poll().

I could've sworn I read that Solaris's /dev/poll automatically merges
pollfds with the same fd into one pollfd internally somewhere (which
is why I hadn't bothered with the remove/re-add approach in the first
place), but that sure seems to not be the case. What was probably
happening was I was creating a whole duplicate pollfd every time I
changed which events I cared about - so I was monitoring an ever-
increasing set of (mostly duplicate) pollfds.

Casper, thanks bunches for the help! Having another set of eyes
pointing out bits that didn't make sense was incredibly useful. The
product I'm working on should be officially supporting /dev/poll on
Solaris in the near future, now.

Would you recommend adding Event Ports support as well for Solaris
10? Am I likely to see much performance improvement over /dev/poll ?

Thanks again,
Eric

Casper H.S. Dik

unread,
Sep 5, 2007, 5:18:29 AM9/5/07
to
"Eric B." <ebo...@gmail.com> writes:

>Would you recommend adding Event Ports support as well for Solaris
>10? Am I likely to see much performance improvement over /dev/poll ?

To measure is to know.

I'm assuming they've been implemented to be yet more efficient.

Eric B.

unread,
Sep 5, 2007, 11:45:15 AM9/5/07
to
On Sep 5, 4:18 am, Casper H.S. Dik <Casper....@Sun.COM> wrote:

I'm adding Event Ports support now; I'm betting it'll perform better
than /dev/poll for this application. I'll post again in a while with
results/an informal comparison in case anyone's curious.

Eric

Henry Townsend

unread,
Sep 5, 2007, 12:14:14 PM9/5/07
to
Eric B. wrote:
> I'm adding Event Ports support now; I'm betting it'll perform better
> than /dev/poll for this application. I'll post again in a while with
> results/an informal comparison in case anyone's curious.

Have you looked at libevent? If this is a Solaris-only app it may not be
worth the trouble but if it wants to be portable, libevent may be the
ticket. As I understand it, libevent is an abstraction layer which by
default will choose the fastest/best event interface available per platform.

Note that I haven't used it myself, I only looked into it for a
potential future need. Also, there appear to be two different OSS
products by this name so be forewarned and don't give up if the first
one you find is the wrong thing.

HT

Eric B.

unread,
Sep 6, 2007, 5:08:57 PM9/6/07
to

Oh, I've definitely looked at libevent. For my purposes, coding
things myself ended up being the better solution - but having libevent
code to look through to see an example of how /dev/poll and Event
Ports could be used was very helpful. A few of the other example
programs listed on Dan Kegel's famous C10K page were also pretty
helpful.

Eric

Eric B.

unread,
Sep 6, 2007, 5:27:23 PM9/6/07
to

An update: I've tried Event Ports now and... performance is actually
just about exactly the same as /dev/poll, for my particular app. It
might even introduce a tiny amount of additional overhead. This made
me strongly suspect that when associating fds with a port using
PORT_SOURCE_FD, Event Ports is more or less behaving as a wrapper for /
dev/poll under the covers...

After a really brief glance at the Solaris 10 Event Ports code, it
looks like it's not exactly a wrapper for /dev/poll, but a lot of the
code for the two is certainly very similar and they both use the same
underlying notification mechanisms. So it's not too surprising that
they'd perform roughly the same. Anyone have a different experience?

(I do think the Event Ports API is much more elegant and universal
than /dev/poll's, though.)

Eric

0 new messages