Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Non-blocking TCPIP server in GAWK

87 views
Skip to first unread message

Kenny McCormack

unread,
Feb 3, 2017, 10:44:44 AM2/3/17
to
At this point in my research, I have the need to write a TCPIP server that
accepts connections, but doesn't block when no connection is forthcoming.
I.e., the server needs to do other work while waiting for a connection.

There seems to be no way to do this using the GAWK built-in networking.

After much thought and kicking around a bunch of ideas, many of which
involved either writing a C extension library or hacking the GAWK source
code, I came up with a (not perfect) solution (workaround). The purpose
of this post is to present my solution and to ask if there are any plans to
make this easier to do in native GAWK (and/or if there is anything I've
missed in the current implementation).

My workaround uses the 'netcat' program as a co-process. Note that
'netcat' is a really nifty program, but the bad thing about it is that
there are about 20 versions of it floating around. I'm using one called
"nc6", which seems to have the best feature set.

Here is my GAWK script (this is a demo of the idea; not my actual use case
code):

--- Cut Here ---
BEGIN {
cmd = "nc6 -w10 -l -p 2000 2>&1"
while (1) {
cmd |& getline
print "Status:",$0
if (!/nc6: connection timed out/) {
while (cmd |& getline) {
print "Received line:",$0
print "You said:",$0 |& cmd
}
print "End of file from the co-process"
}
close(cmd)
}
}
--- Cut Here ---

A couple of notes on the above code:
1) The "-w10" parameter tells netcat how long to wait for an incoming
connection. The above script will loop, printing out the "timed
out" message every 10 seconds until a connection is initiated.
2) The first line from netcat will always be either the "timed out"
message or the message "nc6: using stream socket", which means that
a connection was received. In either case, that first line tells
you the status of the connection attempt.

Here is a bash script you can use as the "other side" of this for testing:

--- Cut Here ---
#!/bin/bash
exec 3<>/dev/tcp/localhost/2000
echo "this is a test" >&3
read -u3
exec 3>&-
echo "REPLY: $REPLY"
sleep 500
--- Cut Here ---

--
The difference between communism and capitalism?
In capitalism, man exploits man. In communism, it's the other way around.

- Daniel Bell, The End of Ideology (1960) -

Mike Sanders

unread,
Feb 3, 2017, 11:34:02 AM2/3/17
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:

> At this point in my research, I have the need to write a TCPIP server that
> accepts connections, but doesn't block when no connection is forthcoming.
> I.e., the server needs to do other work while waiting for a connection.

Fascinating stuff. Hope you'll post your completed project
when/if the time comes for further study...

The following (note 'while (x)' ...) does not adress blocking,
simply feeding ideas into the mix. Good luck k.

# server.awk v1.01 - Michael Sanders 2009
# a simple, single user, web server built with gawk
#
# creates an html menu of local applications - season to taste...
# usage requires two steps...
#
# 1. run: 'gawk -f server.awk'
# 2. open browser at: http://localhost:8080
#
# based on the examples located at:
# http://www.gnu.org/software/gawk/manual/gawkinet/gawkinet.html

BEGIN {
x = 1 # script exits if x < 1
port = 8080 # port number
host = "/inet/tcp/" port "/0/0" # host string
url = "http://localhost:" port # server url
RS = ORS = "\r\n" # header line terminators
doc = Setup() # html document

while (x) {
if ($1 == "GET") RunApp(substr($2, 2))
if (! x) break
Message(doc)
host |& getline # wait for new client request
}

Message(Bye()) # server terminated...

}

# ----------------------------------------------------------------------

function Message(txt) {

status = 200 # 200 == OK
reason = "OK" # server response
len = length(txt) + length(ORS) # length of document

print "HTTP/1.0", status, reason |& host
print "Connection: Close" |& host
print "Pragma: no-cache" |& host
print "Content-length:", len |& host
print ORS txt |& host
close(host)

}

# ----------------------------------------------------------------------

function RunApp(app) {

if (app == "xterm") {system("xterm&"); return}
if (app == "xcalc") {system("xcalc&"); return}
if (app == "xload") {system("xload&"); return}
if (app == "exit") {x = 0}

}

# ----------------------------------------------------------------------

function Setup() {

tmp = "<html>\
<head><title>Simple gawk server</title></head>\
<body>\
<p><a href=" url "/xterm>xterm</a>\
<p><a href=" url "/xcalc>xcalc</a>\
<p><a href=" url "/xload>xload</a>\
<p><a href=" url "/exit>terminate script</a>\
</body>\
</html>"

return tmp

}

# ----------------------------------------------------------------------

function Bye() {

tmp = "<html>\
<head><title>Simple gawk server</title></head>\
<body><p>Script Terminated...</body>\
</html>"

return tmp

}

# eof

--
later on,
Mike

http://busybox.hypermart.net

Andrew Schorr

unread,
Feb 4, 2017, 9:50:54 AM2/4/17
to
On Friday, February 3, 2017 at 10:44:44 AM UTC-5, Kenny McCormack wrote:
> At this point in my research, I have the need to write a TCPIP server that
> accepts connections, but doesn't block when no connection is forthcoming.
> I.e., the server needs to do other work while waiting for a connection.
>
> There seems to be no way to do this using the GAWK built-in networking.
>
...
> The purpose
> of this post is to present my solution and to ask if there are any plans to
> make this easier to do in native GAWK (and/or if there is anything I've
> missed in the current implementation).

I think you are correct that it can't be done with current gawk networking facilities. If you grab a copy of gawk from the master branch, there is support for non-blocking multiplexed I/O using the gawkextlib select extension. That's the hard part, and it's been solved. The unsolved problem is that the existing gawk TCP/IP networking facilities were designed with single-threaded behavior in mind. I think solving this correctly will require us to take 1 of 2 approaches:

1. Modify, possibly incompatibly, the existing gawk tcp/ip networking facilities. At the moment, when you first call getline on a tcp/ip server socket, it blocks until a client connects. Getline then returns the first line of input received over that connected socket. I think we need to introduce the notion of a separate listener socket, and I think getline on the listener should return a new socket handle for a connected client socket. One can then call getline on the connected client sockets to receive data. Of course, all of this would be non-blocking and work with the select extension. We can repurpose the existing socket syntax for this new scheme, or we can come up with new special socket filenames to leave the existing behavior in place without breaking compatibility.

2. Implement a separate socket extension library that uses a functional BSD-style interface. This can currently be done by anybody who has the time and energy to write a socket extension library. So far, nobody has had the need to do so, but all the hooks are there in the master branch.

Actually, these approaches are orthogonal, so we could implement one or both or neither. I'm hoping eventually we'll get at least one of them done. #2 is easy, because it doesn't require any changes to core gawk or existing socket behavior, but #1 might be easier to use and more consistent with AWK style.

Regards,
Andy

Kenny McCormack

unread,
Feb 4, 2017, 10:23:27 AM2/4/17
to
In article <ec5a6b13-b9a6-43aa...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote much good stuff:

>(Much good stuff)

Very interesting. Thanks for posting.

I've thought about going it on my own with your suggestion #2. It seems
pretty doable.

Kees Nuyt

unread,
Feb 5, 2017, 7:02:22 AM2/5/17
to
On Sat, 4 Feb 2017 15:23:26 +0000 (UTC),
gaz...@shell.xmission.com (Kenny McCormack) wrote:

> In article <ec5a6b13-b9a6-43aa...@googlegroups.com>,
> Andrew Schorr <asc...@telemetry-investments.com> wrote much good stuff:
>
>> (Much good stuff)
>
> Very interesting. Thanks for posting.
>
> I've thought about going it on my own with your suggestion #2. It seems
> pretty doable.

On Unix/Linux it is worth investigating what can be done with the
inetd daemon. It probably could start gawk on every incoming
request.
(untested)
--
Regards,
Kees Nuyt

Kenny McCormack

unread,
Feb 5, 2017, 9:37:34 AM2/5/17
to
In article <9k4e9cp3dm7qghchi...@dim53.demon.nl>,
Kees Nuyt <k.n...@nospam.demon.nl> wrote:
...
>On Unix/Linux it is worth investigating what can be done with the
>inetd daemon. It probably could start gawk on every incoming
>request.

An interesting idea - running a GAWK script out of inetd - but not relevant
to the instant topic. You'd still need a way to do the non-blocking IO in
the GAWK script. I.e., it is not feasible to have a separate invocation of
GAWK for each connection.

I had thought about using the fork() function in one of the provided
extension libs (which one escapes me ATM, but it is in one of them) and
then letting the fork'd copy do the "S |& getline", but then you're still
left with needing a way for the main function to be alerted to the
availability of data. Note that there *is* a rudimentary "timeout on IO"
functionality in GAWK, but as I found out a while back (and posted here in
regards to), it doesn't work the way you'd like it to.

--
It's possible that leasing office space to a Starbucks is a greater liability
in today's GOP than is hitting your mother on the head with a hammer.

Andrew Schorr

unread,
Feb 5, 2017, 10:40:32 AM2/5/17
to
On Saturday, February 4, 2017 at 10:23:27 AM UTC-5, Kenny McCormack wrote:
> Very interesting. Thanks for posting.

You're welcome!

> I've thought about going it on my own with your suggestion #2. It seems
> pretty doable.

Even within the scope of approach #2, it will be nice if the handle returned by a socket open call can be used with getline. To achieve that, one must use the gawk master branch, which I hope may be released as gawk 4.2 sometime this year. The new "get_file" API hook should enable the extension to add the socket to gawk's file table.

Regards,
Andy

Andrew Schorr

unread,
Feb 5, 2017, 10:48:39 AM2/5/17
to
On Sunday, February 5, 2017 at 9:37:34 AM UTC-5, Kenny McCormack wrote:
> I had thought about using the fork() function in one of the provided
> extension libs (which one escapes me ATM, but it is in one of them) and
> then letting the fork'd copy do the "S |& getline", but then you're still
> left with needing a way for the main function to be alerted to the
> availability of data. Note that there *is* a rudimentary "timeout on IO"
> functionality in GAWK, but as I found out a while back (and posted here in
> regards to), it doesn't work the way you'd like it to.

The fork() function is in the fork library bundled with gawk:

bash-4.2$ gawk -l fork 'BEGIN {pid = fork(); print pid}'
10222
0

And yes, the timeout on I/O features were not so good. That has been fixed, but I think you need the master branch (gawk 4.2 at some point) to handle this stuff properly with the new I/O RETRY features.

Regards,
Andy

Kaz Kylheku

unread,
Feb 5, 2017, 3:07:16 PM2/5/17
to
On 2017-02-03, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> At this point in my research, I have the need to write a TCPIP server that
> accepts connections, but doesn't block when no connection is forthcoming.
> I.e., the server needs to do other work while waiting for a connection.

TXR Lisp example with timed-out socket accept, and awk macro
processing the socket input. The awk job here just prints the records
received from the client, with comma as the output field separator.

Log of server side:

$ txr sock-awk.tl
a,b,c,d,e
f,g
d
timeout
timeout

Log from client side:

$telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
a b c d e
f g
d
^]
telnet> q
Connection closed.

Code:

(defun timed-out-accept (sock milliseconds)
(let ((polled (poll ^((,(fileno sock) . ,poll-in)) milliseconds)))
(if polled
(sock-accept sock))))

(let ((sock (open-socket af-inet sock-stream)))
(sock-bind sock (new sockaddr-in
addr inaddr-loopback
port 12345))
(sock-listen sock)

(while t
(let ((acc-sock (timed-out-accept sock 5000)))
(if acc-sock
(awk (:inputs acc-sock)
(:set ofs ",")
(t (set f f) (prn)))
(put-line "timeout")))))
0 new messages