ANNOUNCE: iocp 1.1.0

Ashok

unread,

Jun 13, 2021, 10:58:09 PM6/13/21

to

# Packages iocp, iocp_inet, iocp_bt

Tcl extension implementing TCP and Bluetooth channels based on
I/O completion ports. The iocp_inet package is API-compatible
with Tcl sockets while offering much higher performance. The iocp_bt
package supports Bluetooth (not Bluetooth LE) client sockets.

Requires Tcl 8.6 and Windows 7 or later.

Binary downloads are at
https://sourceforge.net/projects/magicsplat/files/iocp/.

Repository is at https://github.com/apnadkarni/iocp

Documentation at https://iocp.magicsplat.com.

## Changes in 1.1.0

- enable lazy loading of Bluetooth so TCP sockets can still be used on
servers
without Bluetooth libraries installed
- fixed crash in Bluetooth asynchronous connect
- fixed reversed sense of -nagle option to TCP sockets

Harald Oehlmann

unread,

Jun 14, 2021, 3:06:16 AM6/14/21

to

Ashok, great work, thank you !

The point to have TCL/IP available with BlueTooth not installed is good.
That removes another trap. I do only include the BlueTooth library, if I
really need BlueTooth.

I am in Barcode Reader Business and we have a lot of BlueTooth.
Unfortunately, they all change to BTLE now.

Thank you and take care,
HGarald

Michael Niehren

unread,

Jun 22, 2021, 9:28:04 AM6/22/21

to

Am 14.06.21 um 04:58 schrieb Ashok:

Hi Ashok,

any chance to get it running under Linux ?

It would be very nice to have, as i can then program Lego Boost with an Tcl-GUI under Linux.

best regards
Michael

Ashok

unread,

Jun 22, 2021, 11:42:22 AM6/22/21

to

Afraid not. As its name implies, the iocp package's reason for existence
is to make use of IOCP ports on Windows and is completely tied to that
facility.

Assuming you are interested in Bluetooth for Linux, you might look at
the bluez libraries, either invoke via exec, write bindings or call via
an FFI extension, or perhaps use Python's PyBluez extension. A Tcl
extension based on bindings or FFI would be great!

Androwish also has bluetooth support iirc but not sure if it is tied to
Android API's.

/Ashok

David Gravereaux

unread,

Jun 23, 2021, 3:38:15 AM6/23/21

to

Nice code

--
David Gravereaux
<davy...@pobox.com>

$ make war
make: *** No rule to make target `war'. Stop.

OpenPGP_signature

Ashok

unread,

Jun 23, 2021, 11:33:28 AM6/23/21

to

Thanks David. Nice to hear from one of the original Windows Tcl guys :-)

On 6/23/2021 1:08 PM, David Gravereaux wrote:
> Nice code
>
>

Harald Oehlmann

unread,

Jun 23, 2021, 11:48:06 AM6/23/21

to

Yes, great to read from you, David! Next step is to get it in the core ;-)

Thank you all for your constant support and work,
Harald

David Gravereaux

unread,

Jun 23, 2021, 5:45:22 PM6/23/21

to

On 6/23/21 8:33 AM, Ashok wrote:
> Thanks David. Nice to hear from one of the original Windows Tcl guys :-)

I'm enjoying reading through your source. I like your organization..
inet and bt as modules are perfect. As I'm sure you're aware, an IOCP
generic layer could even support more native handle types such as file,
pipes, and console.

Async cross-protocol name resolving with naming services got me stuck at
the end.

I have so much more to say, but I'll need time to organize my thoughts.

Are you doing the three modes of I/O: zero-byte, flow-controlled, and
burst with growth?

This has my blessing

make: *** No rule to make target 'war'. Stop. Try `love' instead.

OpenPGP_signature

David Gravereaux

unread,

Jun 23, 2021, 5:48:15 PM6/23/21

to

On 6/23/21 8:48 AM, Harald Oehlmann wrote:
> Next step is to get it in the core ;-)

DO IT!

make: *** No rule to make target 'war'. Stop. Try `love' instead.

OpenPGP_signature

David Gravereaux

unread,

Jun 26, 2021, 9:31:41 AM6/26/21

to

Ashok,

A long time ago, I wrote a channel driver just like this. Please use it
for ideas (and stuff) if you want.
https://sourceforge.net/projects/iocpsock/

On 6/23/21 2:45 PM, David Gravereaux wrote:
> Are you doing the three modes of I/O: zero-byte, flow-controlled, and
> burst with growth?

The complexity to support numerous outstanding overlapped operations got
a bit tricky ;) but I needed to make it that way so I could experiment
to find what works best ..to then later rip-out what I considered not
worthy.

Initially, the goal of the project was to fix a bug in Windows about
2003. Under heavy abuse, Winsock's accept() would just stop working.
My best guess was that some internal resource was getting drained dry.
I later found that connect() had the same problem, too. This was for a
web server so I chose to go with the higher level of I/O performance
using IOCP, too, for the channel driver.

So initially, posting numerous overlapped AcceptEx() calls was the idea
for hardness. And it worked. The more, the better.

What I did find later on, was the recv-mode I called flow-controlled was
a waste. Just use zero-byte with the internal socket buffer the size of
channel buffer. Or something like that, I don't recall exactly. The
cost in time with the extra memcpy didn't seem worthy. Just let a
non-overlapped ReceiveEx copy right into the channel buffer. In this
case, zero is better than one.

Did my burst-detect work? Yes, it would get hit and increase. Worth
it? Not sure. Not wrong. YMMV

Well, this is all I can remember at the moment

OpenPGP_signature

David Gravereaux

unread,

Jun 30, 2021, 2:06:46 PM6/30/21

to

On 6/23/21 8:33 AM, Ashok wrote:

> Thanks David. Nice to hear from one of the original Windows Tcl guys :-)

I love your work and think it will be a great benefit for the community.

One thing to watch out for... The main service thread that runs the
loop on GetQueuedCompletetionStatus() is rather sensitive to thread
synchronization.

https://github.com/apnadkarni/iocp/blob/22fe8456e3b98f4186fc3eeda60739e73abfa686/win/tclWinIocpThread.c#L243

You will get the best performance by avoiding any calls to
criticalsection syncing. Please avoid them at all cost. Even consider
special "atomic" linked list procedures. Maybe this could be a project
goal for later?

As an example: https://kukuruku.co/post/lock-free-stack-for-windows/

make: *** No rule to make target `war'. Stop. Try `love' instead.

OpenPGP_signature

Ashok

unread,

Jul 1, 2021, 2:36:24 AM7/1/21

to

Hi David,

Tx again for your interest and input. With respect to your suggestions,
as a general comment I'll say I don't plan to spending any time on
changes to iocp (except bug fixes of course) wrt the TCP component. It
is fast enough (anywhere from 2-10 times faster than Tcl's socket for
depending on traffic loads), stable and I have other fish (projects) to fry.

Having said that, some comments on your suggestions -

- Regarding zero-byte copies - this is also suggested in the "Network
Programmin for Windows" book (two decades old but still the reference
for Winsock). I did implement this as a strawman but found no
measureable difference in either throughput or CPU. My guess is that the
reduced byte copies are offset by increased kernel transitions (one to
post the zero byte and a second one to then read from the kernel). So
added complexity for no measurable gain.

- Regarding critical sections / thread synchronization - I'm not sure if
you noticed but the channel locking is based on SRWLocks, not critical
sections unless you specifically compile for XP which does not support
the former. Nevertheless, I would not think there is really a need to
avoid critical sections either. Unless there is contention, they do not
transition to the kernel (which is expensive) and contention is limited
because there are at most two threads competing for a channel - the Tcl
thread owning the channel and the iocp thread. Given how much work the
Tcl channel subsystem has to do outside of the TCP driver itself, I
really would not expect much contention.

- Regarding atomic lock-less lists, I do not know of any that support
*queues* as opposed to *stacks* (which is what your link points to).
Moreover, the list manipulation is associated with other sync'ed state
as well so making the lists lockfree does not really help as access to
other state information still has to be sync-ed.

Thanks again for taking the trouble to comment

/Ashok

David Gravereaux

unread,

Jul 3, 2021, 2:20:16 PM7/3/21

to

On 6/30/21 11:36 PM, Ashok wrote:
> Hi David,
>
> Tx again for your interest and input. With respect to your suggestions,
> as a general comment I'll say I don't plan to spending any time on
> changes to iocp (except bug fixes of course) wrt the TCP component. It
> is fast enough (anywhere from 2-10 times faster than Tcl's socket for
> depending on traffic loads), stable and I have other fish (projects) to
> fry.

I concur 2-10 times faster, more robust, and lower resource usage. Have
your cake, eat it too. Complex code, yeah.

> - Regarding zero-byte copies - this is also suggested in the "Network
> Programmin for Windows" book (two decades old but still the reference
> for Winsock). I did implement this as a strawman but found no
> measureable difference in either throughput or CPU. My guess is that the
> reduced byte copies are offset by increased kernel transitions (one to
> post the zero byte and a second one to then read from the kernel). So
> added complexity for no measurable gain.

The gain regards lowering the per-socket resource usage of the non-paged
global memory pool. Good for a web server with many thousands of
concurrent connections I would think. I also had a user of iocpsock who
benefited from this when matching the SO_RCVBUF of the socket to the
channel buffer size. (IIRC)

If only for academic reasons, can we have the code do all the modes?

Here's a more detailed description of the memory limitation as it exists
in IOCPsock. If one runs this in a tclsh shell:

C:\WINNT\system32>tclsh84
% for {set i 0} {$i < 10000} {incr i} {socket2 localhost 80}
% foreach s [file channels] {if {[string match iocp* $s]} {close $s};
after 5}
% exit

With those 10,000 sockets connected (no data transfer), you'll see on
the tclhttpd status page that "General pool bytes in use" will be
1,800,516 bytes and "Special pool bytes in use" will be at 41,755,025 bytes.

Using a zero-byte receiving algorithm, and the 10,000 socket connection
script, I can lower the "Special pool bytes in use" to a mere 774,472
bytes. "Special pool" is the non-paged pool.

https://techcommunity.microsoft.com/t5/windows-blog-archive/pushing-the-limits-of-windows-paged-and-nonpaged-pool/ba-p/723789

> - Regarding critical sections / thread synchronization - I'm not sure if
> you noticed but the channel locking is based on SRWLocks, not critical
> sections unless you specifically compile for XP which does not support
> the former. Nevertheless, I would not think there is really a need to
> avoid critical sections either. Unless there is contention, they do not
> transition to the kernel (which is expensive) and contention is limited
> because there are at most two threads competing for a channel - the Tcl
> thread owning the channel and the iocp thread. Given how much work the
> Tcl channel subsystem has to do outside of the TCP driver itself, I
> really would not expect much contention.

Years back I did some profiling and found Tcl's event loop to poll event
sources ran at ~1,500 iterations per sec while the completion thread ran
at least ~3,500 per sec. So yes, more faster than Tcl could consume.

> - Regarding atomic lock-less lists, I do not know of any that support
> *queues* as opposed to *stacks* (which is what your link points to).
> Moreover, the list manipulation is associated with other sync'ed state
> as well so making the lists lockfree does not really help as access to
> other state information still has to be sync-ed.

I haven't done a deep read yet :) I have to setup a windows dev
environment so I can have some fun with your work. You see, this right
here is where I left off around 14 years ago. Let me have my fun.

https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-singly-linked-lists

Yeah, that's incomplete. Off-hand, there's quite a few caveats about
using InterlockedCompare64Exchange128() directly. I think I'm starting
to remember why this was challenging.

Off-hand, there are only two "free-running" operations the completion
thread is essentially doing -- replacing completed AcceptEx calls to the
pool and growing the count of overlapped WSARecv calls if in a burst
condition given it is under the limit. Shaving a few uSec on those are,
I feel, worth the effort of profiling on a range of test conditions.

When it comes down to it, it isn't the normal case behavior but how the
use of IOCP responds to abuse for a DDoS attack that interests me.

...and then there's UDP and multicast.

...and all the other LSP types such as Bluetooth and irda. Who even
uses Netware anymore? Is it still even there in Windows 10? Appletalk?

...and other native HANDLE types that support overlapped I/O as a
complete overhaul of Tcl's I/O underpinnings on windows.

This is exciting. I gave up coding ages ago, even as a hobby. I'll
come out of retirement for this.

OpenPGP_signature

Petro Kazmirchuk

unread,

Jun 9, 2022, 8:03:30 AM6/9/22

to

Ashok, thank you so much for this package!
I've just discovered iocp_inet, and it literally saved my project, where I need to receive simulated satellite data over TCP on Windows with rather high data rate (>600 Mbit/s split across 3 processes), and the standard Tcl [socket] was simply not fast enough.

apn

unread,

Jun 10, 2022, 10:35:27 PM6/10/22

to

On 6/9/2022 5:33 PM, Petro Kazmirchuk wrote:
> Ashok, thank you so much for this package!
> I've just discovered iocp_inet, and it literally saved my project, where I need to receive simulated satellite data over TCP on Windows with rather high data rate (>600 Mbit/s split across 3 processes), and the standard Tcl [socket] was simply not fast enough.

yw. It's good to know of use in production as it is relatively new. At
some point, once it's been more battle tested in the real world, I'd
consider proposing it for the core. So thanks for letting me know.

/Ashok

Harald Oehlmann

unread,

Jun 11, 2022, 9:48:54 AM6/11/22

to

+1 for the core.

I use it everywhere and it is just great.

The Bluetooth part is also interesting, but is loosing power, as BTLE is
not supported. And modern barcode scanners use BTLE, which is also a
challenge on Android.

I am also in favour to put things into the core. So, it gets really
tested and there are more eyes on it.

Thank you and take care,

Harald