The design I inherited (but is not yet
finalized) is to monitor a large number
of idle devices, all with a 2-way connection
via sockets. I was hoping to use tcl but
I need to know how large it can scale. Are
we talking hundreds, thousands, or what?
The OS in question is some kind of Linux.
Thanks,
Keith
It is common practice to choose a one port (HTTP usually uses 80,
8000, etc). You can also have more than one interface on a physical
server, so the special ip address 0.0.0.0 is shorthand for all
available addresses on the machine. A server listens on a fixed number
of well known ports, but when a client connections is accepted, a new
port is used to handle the connection. The maximum number of these
client connections is adjustable, but there are only like 64k ports,
and those below 1024 are reserved.
Tcl itself doesn't impose any additional limit. Usually applications
are ignorant of these details.
The obvious bottleneck is the max number of allowed file descriptors
per process, which is typically 1024 but can be pushed up by sysctl.
However, as a recent discussion on tclcore showed, there is another
one that kicks in, and that is not configurable at runtime:
FD_SETSIZE, which is the static size of the bit arrays passed to select
(). IIRC it is 1024 for glibc.
You can look up this discussion on the tclcore archive, you'll learn
about the plans to get over this limitation. In short, the select()
API is inherently inefficient for larger numbers, that's why you won't
find a libc with a much bigger FD_SETSIZE anyway. Hence epoll() seems
the way to go when available. The only problem is that pending
universal availability of epoll, the Tcl core will need to maintain an
ifdef'ed pair of implementations. Eeek.
-Alex
On at least some OSes, you can set it at build-time.
Donal.
Yes. Recompile your libc, reboot, and pray.
-Alex
AOLserver has an implimentation/abstraction/ifdef for poll:
http://junom.com/gitweb/gitweb.perl?p=aolserver.git;a=blob;f=aolserver/nsd/sock.c#l745
When poll isn't available, an emulation using select is provided. Of
course, it isn't epoll.
Then why mention this ? Poll is just like select wrt scaling up: it
doesn't; only epoll does.
-Alex
No. Their libc is built without the restriction; the fixed value of
FD_SETSIZE is just a feature of the headers. But I don't know if glibc
supports that.
Donal.
But epoll is (of course) completely unportable.
Donal.
My point is that you must first harmonize epoll with the other two
possibilities. epoll must first work with the rest of the io code. The
example I gave shows that a successful abstraction of poll only
involved one tiny ifdef. But if epoll requires extensive modification
to other levels of the io API, it probably can't be used effectively.
Since this code was stolen from the GNU C Library, maybe an epoll
emulation also exists.
No. Guessing is not necessary nor useful here. No interaction with the
"rest of the io code". Just read the manpages: you'll find that
epoll_wait() as the distinguished feature or returning *only* the hot
fds in a compact list, while select/poll force you to scan the whole
list looking for the hot ones. Hence emulation at the API level is
useless, it will not solve the fundamental performance issue. That's
the kind of situation that calls for a new syscall. In Linuxland
that's also the minimum.
-Alex
Hi, I recently hit this same problem described here, after sock1024 is
used the socket command basically stops working. I was trying to find
out what need to do get it working for more than 1024 sockets but I
don't really understand the exact relation between tcl and libc (I'm
tcl newbie). How do I find out which libc my tcl uses?
Cheers,
Tom
Well, it uses _the_ libc present on the system (unless you're
intentionally juggling with several versions, you have just one of
them). Then look at the value of the FD_SETSIZE in the associated
include files.
-Alex
poll() is also a lot more efficient than select().
One can preallocate the "pollfd array" ONCE and reuse it over and
over.
Unlike select() where you generally make FD_Sets over from scratch or
memcpy it back...
And its possible to make maps that map the "pollfd array" directly to
your "internal fd system structure" with the same index.
So switching to poll() would definately be a win/win in a couple
areas.
ps: epoll is "recent"-linux thing and should be used when available.
I'm on RHEL3 and RHEL4 systems that don't have epoll.
Yes but you still have to iterate over the whole array to find the hot
ones, so it's O(N).
Only epoll() is O(1) in that it picks them out for you (and doesn't
iterate under the hood either; they are "pushed" by the kernel instead
of being "pulled" by the caller.)
-Alex
I think the basic problem with this reasoning is that you assume a
single choke-point. This has to do with not using threads (or other
application features) to divide the work. So a simple system which
must run everything through one "select" is going to run into
problems. This solution ignores many other opportunities for handling
increased load. Even within a single thread, you could move fds into
different sets, kind of like a queue of sets. This helps more than you
think. You process new fds and push them to the next set, then you
process the next set, etc. The result is that responsive fds move very
quickly through the queue and slow fds tend to wait. If you use
threads, then the fd gets moved to a set of one.
But scalability potential is not limited by one particular syscall.
You might have to create an larger architecture, but this usually
allows benefits beyond the fd choke-point problem...like running
specific code based upon the current state.
But epoll seems great for a dumb and dirty scalability issue. It isn't
as powerful as a divide-and-conquer solution.
Not on all platforms. I've seen systems where poll() was implemented
on top of select() though I forget which; I just remember seeing that
in stack traces of various programs I was working on. Because of this,
which low-level API to build the Tcl Notifier on top of has to be a
configuration option (select is a LCD, but it is widely available)
*and* someone has to actually submit an implementation...
Donal.
Thanks Alex, I recompiled the libc (setting FD_SETSIZE to 10000 for a
start), updated, restarted, but it didn't help :(. Still the same...
Well, the kernel has some influence too ;-) If select is a syscall
(instead of being a library function implemented in terms of a poll/
epoll syscall), then clearly FD_SETSIZE will have to be a constant in
the kernel itself. Which you can recompile too, if you have an open
source OS.
-Alex
so forex on Linux 2.6.5-7.111.19-default the hard limit for open files is 2**20
uwe
Yep, ulimit was the first thing I increased - without it socket
returns "too many files open", but even if you increase that the
socket function just stops working when you hit sock1024. I recompiled
libc as Alex suggested and I'm in the process of recompiling the
kernel as well (I just needed to update to CentOS 5.4 first - we are
using CentOS).
Let you know how it went :)
T.
> returns "too many files open", but even if you increase that thesocketfunction just stops working when you hit sock1024. I recompiled
As everyone else pointed out a host of things effectively limit you to
1024 sockets. I recently needed to test a system that involved
running just under 10000 network servers, connected in roughly a tree
fashion, with at least one connection from a monitoring process to
each server (all on a single machine)
I got it going in the end by using a 2 tier system, where a master
monitor process spawns slaves (open |slave.tcl) and communicates with
it over it's stdin / stdout. At this level of fan-out there area
about 180 connections from the master to the slaves. The master then
sends jobs to the slaves to get them to spawn the servers and
communicate the state back to the master over their stdout, threadpool
style. I didn't use actual threads because I use tcl builds with
threading disabled because it breaks other stuff I do, and using
processes means I don't need to tweak any OS fd limits.
In the end it worked well. It was throwaway code, only meant to
support a single test, but if you want it use as the basis for what
you need to do give me a shout.
Cyan