system() using vfork() or posix

Jilles Tjoelker

unread,

Jul 30, 2012, 6:25:23 AM7/30/12

to freebsd...@freebsd.org

People sometimes use system() from large address spaces where it would
improve performance greatly to use vfork() instead of fork().

A simple approach is to change fork() to vfork(), although I have not
tried this. It seems safe enough to use sigaction and sigprocmask system
calls in the vforked process.

Alternatively, we can have posix_spawn() do the vfork() with signal
changes. This avoids possible whining from compilers and static
analyzers about using vfork() in system.c. However, I do not like the
tricky code for signals and that it adds lines of code.

This is lightly tested.

Index: lib/libc/stdlib/system.c
===================================================================
--- lib/libc/stdlib/system.c (revision 238371)
+++ lib/libc/stdlib/system.c (working copy)
@@ -42,16 +42,21 @@
#include <unistd.h>
#include <paths.h>
#include <errno.h>
+#include <spawn.h>
#include "un-namespace.h"
#include "libc_private.h"

+extern char **environ;
+
int
__system(const char *command)
{
pid_t pid, savedpid;
- int pstat;
+ int error, pstat;
struct sigaction ign, intact, quitact;
- sigset_t newsigblock, oldsigblock;
+ sigset_t newsigblock, oldsigblock, defmask;
+ const char *argv[4];
+ posix_spawnattr_t attr;

if (!command) /* just checking... */
return(1);
@@ -65,28 +70,36 @@
ign.sa_flags = 0;
(void)_sigaction(SIGINT, &ign, &intact);
(void)_sigaction(SIGQUIT, &ign, &quitact);
+ (void)sigemptyset(&defmask);
+ if ((intact.sa_flags & SA_SIGINFO) != 0 ||
+ intact.sa_handler != SIG_IGN)
+ (void)sigaddset(&defmask, SIGINT);
+ if ((quitact.sa_flags & SA_SIGINFO) != 0 ||
+ quitact.sa_handler != SIG_IGN)
+ (void)sigaddset(&defmask, SIGQUIT);
(void)sigemptyset(&newsigblock);
(void)sigaddset(&newsigblock, SIGCHLD);
(void)_sigprocmask(SIG_BLOCK, &newsigblock, &oldsigblock);
- switch(pid = fork()) {
- case -1: /* error */
- break;
- case 0: /* child */
- /*
- * Restore original signal dispositions and exec the command.
- */
- (void)_sigaction(SIGINT, &intact, NULL);
- (void)_sigaction(SIGQUIT, &quitact, NULL);
- (void)_sigprocmask(SIG_SETMASK, &oldsigblock, NULL);
- execl(_PATH_BSHELL, "sh", "-c", command, (char *)NULL);
- _exit(127);
- default: /* parent */
+ argv[0] = "sh";
+ argv[1] = "-c";
+ argv[2] = command;
+ argv[3] = NULL;
+ if ((error = posix_spawnattr_init(&attr)) != 0 ||
+ (error = posix_spawnattr_setsigmask(&attr, &oldsigblock)) != 0 ||
+ (error = posix_spawnattr_setsigdefault(&attr, &defmask)) != 0 ||
+ (error = posix_spawnattr_setflags(&attr,
+ POSIX_SPAWN_SETSIGDEF | POSIX_SPAWN_SETSIGMASK)) != 0 ||
+ (error = posix_spawn(&pid, _PATH_BSHELL, NULL, &attr,
+ __DECONST(char **, argv), environ)) != 0) {
+ pid = -1;
+ errno = error;
+ } else {
savedpid = pid;
do {
pid = _wait4(savedpid, &pstat, 0, (struct rusage *)0);
} while (pid == -1 && errno == EINTR);
- break;
}
+ posix_spawnattr_destroy(&attr);
(void)_sigaction(SIGINT, &intact, NULL);
(void)_sigaction(SIGQUIT, &quitact, NULL);
(void)_sigprocmask(SIG_SETMASK, &oldsigblock, NULL);

--
Jilles Tjoelker
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Konstantin Belousov

unread,

Jul 30, 2012, 6:53:53 AM7/30/12

to Jilles Tjoelker, freebsd...@freebsd.org

On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote:
> People sometimes use system() from large address spaces where it would
> improve performance greatly to use vfork() instead of fork().
>
> A simple approach is to change fork() to vfork(), although I have not
> tried this. It seems safe enough to use sigaction and sigprocmask system
> calls in the vforked process.
>
> Alternatively, we can have posix_spawn() do the vfork() with signal
> changes. This avoids possible whining from compilers and static
> analyzers about using vfork() in system.c. However, I do not like the
> tricky code for signals and that it adds lines of code.
>
> This is lightly tested.

It is interesting to note that for some time our vfork(2) no longer stops
the whole forked process (parent), only the forking thread is waiting for
the child exit or exec. I am not sure is this point important for system(3),
but determined code can notice the difference from the fork->vfork switch.

Jilles Tjoelker

unread,

Aug 5, 2012, 5:55:11 PM8/5/12

to Konstantin Belousov, freebsd...@freebsd.org

Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is
not a difference.

Thread singling may be noticeable from a failing execve() (but only in
the process doing execve()) and in the rare case of rfork() without
RFPROC.

Konstantin Belousov

unread,

Aug 6, 2012, 4:26:56 AM8/6/12

to Jilles Tjoelker, freebsd...@freebsd.org

On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote:
> On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote:
> > On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote:
> > > People sometimes use system() from large address spaces where it would
> > > improve performance greatly to use vfork() instead of fork().
>
> > > A simple approach is to change fork() to vfork(), although I have not
> > > tried this. It seems safe enough to use sigaction and sigprocmask system
> > > calls in the vforked process.
>
> > > Alternatively, we can have posix_spawn() do the vfork() with signal
> > > changes. This avoids possible whining from compilers and static
> > > analyzers about using vfork() in system.c. However, I do not like the
> > > tricky code for signals and that it adds lines of code.
>
> > > This is lightly tested.
>
> > It is interesting to note that for some time our vfork(2) no longer
> > stops the whole forked process (parent), only the forking thread is
> > waiting for the child exit or exec. I am not sure is this point
> > important for system(3), but determined code can notice the difference
> > from the fork->vfork switch.
>
> Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is
> not a difference.

It is the difference, because vforked child shares parent address space.

>
> Thread singling may be noticeable from a failing execve() (but only in
> the process doing execve()) and in the rare case of rfork() without
> RFPROC.

No, other running threads in parent affect vforked child till exec or exit.
In fact, I would classify this as bug, but not a serious one.

Jilles Tjoelker

unread,

Aug 9, 2012, 6:57:43 AM8/9/12

to Konstantin Belousov, dav...@freebsd.org, freebsd...@freebsd.org

There are some ugly ways this parallel execution is depended on. If the
vforked child calls sigaction() while another thread is also in
sigaction() for that signal, the vforked child needs to wait for the
other thread to release the lock.

This uses a per-process lock to synchronize threads in different
processes, which may not work properly.

If the vforked child is killed (such as by SIGKILL) while holding the
lock, the parent is not killed but its _thr_sigact is damaged.

These problems could be avoided in libthr by skipping the lock in
_sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and
the old action is not queried. In those cases, _thr_sigact is not
touched so no lock is required. This change also helps applications,
provided they call sigaction() and not signal().

Alternatively, posix_spawn() and system() could use the sigaction system
call directly, bypassing libthr (if present). However, this will not
help applications that call vfork() and sigaction() themselves (such as
a shell that wants to implement ... & using vfork()).

posix_spawn() likely still needs some adjustment so that having it reset
all signals (sigfillset()) to the default action will not cause it to
[EINVAL] because libthr does not allow changing SIGTHR's disposition.

Index: lib/libthr/thread/thr_sig.c
===================================================================
--- lib/libthr/thread/thr_sig.c (revision 238970)
+++ lib/libthr/thread/thr_sig.c (working copy)
@@ -519,8 +519,16 @@
return (-1);
}

- if (act)
+ if (act) {
newact = *act;
+ /*
+ * Short-circuit cases where we do not touch _thr_sigact.
+ * This allows performing these safely in a vforked child.
+ */
+ if ((newact.sa_handler == SIG_DFL ||
+ newact.sa_handler == SIG_IGN) && oact == NULL)
+ return (__sys_sigaction(sig, &newact, NULL));
+ }

__sys_sigprocmask(SIG_SETMASK, &_thr_maskset, &oldset);
_thr_rwl_wrlock(&_thr_sigact[sig-1].lock);

Konstantin Belousov

unread,

Aug 9, 2012, 7:10:28 AM8/9/12

to Jilles Tjoelker, freebsd...@freebsd.org, dav...@freebsd.org

On Thu, Aug 09, 2012 at 12:56:49PM +0200, Jilles Tjoelker wrote:
> On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote:
> > No, other running threads in parent affect vforked child till exec or exit.
> > In fact, I would classify this as bug, but not a serious one.
>
> There are some ugly ways this parallel execution is depended on. If the
> vforked child calls sigaction() while another thread is also in
> sigaction() for that signal, the vforked child needs to wait for the
> other thread to release the lock.
>
> This uses a per-process lock to synchronize threads in different
> processes, which may not work properly.
>
> If the vforked child is killed (such as by SIGKILL) while holding the
> lock, the parent is not killed but its _thr_sigact is damaged.
>
> These problems could be avoided in libthr by skipping the lock in
> _sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and
> the old action is not queried. In those cases, _thr_sigact is not
> touched so no lock is required. This change also helps applications,
> provided they call sigaction() and not signal().
>
> Alternatively, posix_spawn() and system() could use the sigaction system
> call directly, bypassing libthr (if present). However, this will not
> help applications that call vfork() and sigaction() themselves (such as
> a shell that wants to implement ... & using vfork()).

Third alternative, which seems to be even better, is to restore
single-threading of the parent for vfork().

David Xu

unread,

Aug 9, 2012, 10:17:10 PM8/9/12

to Jilles Tjoelker, Konstantin Belousov, freebsd...@freebsd.org

> a shell that wants to implement ...& using vfork()).

>
> posix_spawn() likely still needs some adjustment so that having it reset
> all signals (sigfillset()) to the default action will not cause it to
> [EINVAL] because libthr does not allow changing SIGTHR's disposition.
>
> Index: lib/libthr/thread/thr_sig.c
> ===================================================================
> --- lib/libthr/thread/thr_sig.c (revision 238970)
> +++ lib/libthr/thread/thr_sig.c (working copy)
> @@ -519,8 +519,16 @@
> return (-1);
> }
>
> - if (act)
> + if (act) {
> newact = *act;
> + /*
> + * Short-circuit cases where we do not touch _thr_sigact.
> + * This allows performing these safely in a vforked child.
> + */
> + if ((newact.sa_handler == SIG_DFL ||

> + newact.sa_handler == SIG_IGN)&& oact == NULL)

> + return (__sys_sigaction(sig,&newact, NULL));
> + }
>
> __sys_sigprocmask(SIG_SETMASK,&_thr_maskset,&oldset);
> _thr_rwl_wrlock(&_thr_sigact[sig-1].lock);
>

Your patch is better than nothing, I don't object.
The problem is visble to you, but there is also invisible user - rtld.
If a symbol is never used in parent process, but now it is used
in a vforked child, the rtld will be involved, if the child
is killed, the rtld's data structure may be in inconsistent state,
such as locking or link list etcs...
I think this problem might be a non-fixable problem.

Konstantin Belousov

unread,

Aug 10, 2012, 6:16:45 AM8/10/12

to Jilles Tjoelker, freebsd...@freebsd.org, dav...@freebsd.org

On Thu, Aug 09, 2012 at 02:08:50PM +0300, Konstantin Belousov wrote:
> Third alternative, which seems to be even better, is to restore
> single-threading of the parent for vfork().

I mean this patch.

diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
index 6cb95cd..e59ee21 100644
--- a/sys/kern/kern_fork.c
+++ b/sys/kern/kern_fork.c
@@ -756,7 +756,7 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,
struct thread *td2;
struct vmspace *vm2;
vm_ooffset_t mem_charged;
- int error;
+ int error, single_threaded;
static int curfail;
static struct timeval lastfail;
#ifdef PROCDESC
@@ -815,6 +815,19 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,
}
#endif

+ if (((p1->p_flag & (P_HADTHREADS | P_SYSTEM)) == P_HADTHREADS) &&
+ (flags & RFPPWAIT) != 0) {
+ PROC_LOCK(p1);
+ if (thread_single(SINGLE_BOUNDARY)) {
+ PROC_UNLOCK(p1);
+ error = ERESTART;
+ goto fail2;
+ }
+ PROC_UNLOCK(p1);
+ single_threaded = 1;
+ } else
+ single_threaded = 0;
+
mem_charged = 0;
vm2 = NULL;
if (pages == 0)
@@ -945,6 +958,12 @@ fail1:
if (vm2 != NULL)
vmspace_free(vm2);
uma_zfree(proc_zone, newproc);
+ if (single_threaded) {
+ PROC_LOCK(p1);
+ thread_single_end();
+ PROC_UNLOCK(p1);
+ }
+fail2:
#ifdef PROCDESC
if (((flags & RFPROCDESC) != 0) && (fp_procdesc != NULL)) {
fdclose(td->td_proc->p_fd, fp_procdesc, *procdescp, td);
diff --git a/tools/test/pthread_vfork/pthread_vfork_test.c b/tools/test/pthread_vfork/pthread_vfork_test.c
index e004727..88956c2 100644
--- a/tools/test/pthread_vfork/pthread_vfork_test.c
+++ b/tools/test/pthread_vfork/pthread_vfork_test.c
@@ -29,6 +29,8 @@
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");

+#include <sys/types.h>
+#include <sys/wait.h>
#include <err.h>
#include <pthread.h>
#include <signal.h>
@@ -39,10 +41,11 @@ __FBSDID("$FreeBSD$");

#define NUM_THREADS 100

-void *
-vfork_test(void *threadid)
+static void *
+vfork_test(void *threadid __unused)
{
- pid_t pid;
+ pid_t pid, wpid;
+ int status;

for (;;) {
pid = vfork();
@@ -50,10 +53,20 @@ vfork_test(void *threadid)
_exit(0);
else if (pid == -1)
err(1, "Failed to vfork");
+ else {
+ wpid = waitpid(pid, &status, 0);
+ if (wpid == -1)
+ err(1, "waitpid");
+ }
}
return (NULL);
}

+static void
+sighandler(int signo __unused)
+{
+}
+
/*
* This program invokes multiple threads and each thread calls
* vfork() system call.
@@ -63,19 +76,24 @@ main(void)
{
pthread_t threads[NUM_THREADS];
struct sigaction reapchildren;
+ sigset_t sigchld_mask;
int rc, t;

memset(&reapchildren, 0, sizeof(reapchildren));
- reapchildren.sa_handler = SIG_IGN;
-
- /* Automatically reap zombies. */
+ reapchildren.sa_handler = sighandler;
if (sigaction(SIGCHLD, &reapchildren, NULL) == -1)
err(1, "Could not sigaction(SIGCHLD)");

+ sigemptyset(&sigchld_mask);
+ sigaddset(&sigchld_mask, SIGCHLD);
+ if (sigprocmask(SIG_BLOCK, &sigchld_mask, NULL) == -1)
+ err(1, "sigprocmask");
+
for (t = 0; t < NUM_THREADS; t++) {
rc = pthread_create(&threads[t], NULL, vfork_test, (void *)t);
if (rc)
errc(1, rc, "pthread_create");
}
+ pause();
return (0);
}

Jilles Tjoelker

unread,

Aug 11, 2012, 9:12:15 AM8/11/12

to David Xu, Konstantin Belousov, freebsd...@freebsd.org

Hmm. Rtld cannot be fixed like libthr because its data structures are
inherently in userland.

Perhaps signal handling should be different for a vforked child, like
the default action of a signal sent to a thread affects the entire
process and not just the thread. This cannot be implemented in the
calling code because resolving execve() itself also needs rtld (ugly
hacks like performing an execve() call that is guaranteed to fail
aside).

The rtld problem can be avoided specifically by linking with '-z now'.
This might be acceptable for sh and csh; most applications can use
posix_spawn() which would have to become a system call.

--
Jilles Tjoelker

David Xu

unread,

Aug 11, 2012, 9:41:35 PM8/11/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org, Jilles Tjoelker

On 2012/08/10 18:13, Konstantin Belousov wrote:
> On Thu, Aug 09, 2012 at 02:08:50PM +0300, Konstantin Belousov wrote:
>> Third alternative, which seems to be even better, is to restore
>> single-threading of the parent for vfork().

single-threading is slow for large threaded process, don't know if it
is necessary for vfork(), POSIX says nothing about threaded process.

Konstantin Belousov

unread,

Aug 13, 2012, 7:51:40 AM8/13/12

to David Xu, freebsd...@freebsd.org, dav...@freebsd.org, Jilles Tjoelker

On Sun, Aug 12, 2012 at 08:11:29AM +0800, David Xu wrote:
> On 2012/08/10 18:13, Konstantin Belousov wrote:
> >On Thu, Aug 09, 2012 at 02:08:50PM +0300, Konstantin Belousov wrote:
> >>Third alternative, which seems to be even better, is to restore
> >>single-threading of the parent for vfork().
> single-threading is slow for large threaded process, don't know if it
> is necessary for vfork(), POSIX says nothing about threaded process.

I agree that with both of your statements. But, being fast but allowing
silent process corruption is not good behaviour. Either we need to
actually support vfork() for threaded processes, or disable it with some
error code.

I prefer to support it. I believe that vfork() should be wrapped by
libthr in the same way as fork() is wrapped. Not sure should we call
atfork handlers, for now I decided not to call, since the handlers
assume separate address spaces for parent/child. But we could only call
parent handler in child, however weird this sounds.

The real complication with wrapping is the fact that we cannot return
from wrapper in child without destroying parent state. So I tried to
prototype the code to handle the wrapping in the same frame, thus
neccessity of using asm.

Below is WIP, only for amd64 ATM.

diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc
index e6d99ec..476d26a 100644
--- a/lib/libthr/arch/amd64/Makefile.inc
+++ b/lib/libthr/arch/amd64/Makefile.inc
@@ -1,3 +1,4 @@
#$FreeBSD$

-SRCS+= pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+= pthread_md.c _umtx_op_err.S vfork.S
diff --git a/lib/libthr/arch/amd64/amd64/vfork.S b/lib/libthr/arch/amd64/amd64/vfork.S
new file mode 100644
index 0000000..07d813d
--- /dev/null
+++ b/lib/libthr/arch/amd64/amd64/vfork.S
@@ -0,0 +1,74 @@
+/*-
+ * Copyright (c) 2012 Konstantin Belousov <k...@freebsd.org>
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * William Jolitz.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ * may be used to endorse or promote products derived from this software
+ * without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#if defined(SYSLIBC_SCCS) && !defined(lint)
+ .asciz "@(#)Ovfork.s 5.1 (Berkeley) 4/23/90"
+#endif /* SYSLIBC_SCCS and not lint */
+#include <machine/asm.h>
+__FBSDID("$FreeBSD$");
+
+#include "SYS.h"
+
+ .weak _vfork
+ .set _vfork,__sys_vfork
+ .weak vfork
+ .set vfork,__sys_vfork
+ENTRY(__sys_vfork)
+ call _thr_vfork_pre
+ popq %rsi /* fetch return address (%rsi preserved) */
+ mov $SYS_vfork,%rax
+ KERNCALL
+ jb 2f
+ cmpl $0,%eax
+ jne 1f
+ pushq %rsi
+ pushq %rsi /* twice for stack alignment */
+ call _thr_vfork_post
+ popq %rsi
+ popq %rsi
+ xorl %eax,%eax
+1:
+ jmp *%rsi
+2:
+ pushq %rsi
+ pushq %rax
+ call _thr_vfork_post
+ call PIC_PLT(CNAME(__error))
+ popq %rdx
+ movq %rdx,(%rax)
+ movq $-1,%rax
+ movq $-1,%rdx
+ retq
+END(__sys_vfork)
+
+ .section .note.GNU-stack,"",%progbits
diff --git a/lib/libthr/pthread.map b/lib/libthr/pthread.map
index 355edea..40d14b4 100644
--- a/lib/libthr/pthread.map
+++ b/lib/libthr/pthread.map
@@ -157,6 +157,7 @@ FBSD_1.0 {
system;
tcdrain;
usleep;
+ vfork;
wait;
wait3;
wait4;
diff --git a/lib/libthr/thread/Makefile.inc b/lib/libthr/thread/Makefile.inc
index ddde6e9..92e82ac 100644
--- a/lib/libthr/thread/Makefile.inc
+++ b/lib/libthr/thread/Makefile.inc
@@ -55,4 +55,5 @@ SRCS+= \
thr_switch_np.c \
thr_symbols.c \
thr_umtx.c \
+ thr_vfork.c \
thr_yield.c
diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h
index ba272fe..30f3de2 100644
--- a/lib/libthr/thread/thr_private.h
+++ b/lib/libthr/thread/thr_private.h
@@ -777,6 +777,8 @@ int _thr_setscheduler(lwpid_t, int, const struct sched_param *) __hidden;
void _thr_signal_prefork(void) __hidden;
void _thr_signal_postfork(void) __hidden;
void _thr_signal_postfork_child(void) __hidden;
+void _thr_vfork_pre(void) __hidden;
+void _thr_vfork_post(void) __hidden;
void _thr_try_gc(struct pthread *, struct pthread *) __hidden;
int _rtp_to_schedparam(const struct rtprio *rtp, int *policy,
struct sched_param *param) __hidden;
@@ -833,6 +835,7 @@ pid_t __sys_getpid(void);
ssize_t __sys_read(int, void *, size_t);
ssize_t __sys_write(int, const void *, size_t);
void __sys_exit(int);
+pid_t __sys_vfork(void);
#endif

static inline int
diff --git a/lib/libthr/thread/thr_vfork.c b/lib/libthr/thread/thr_vfork.c
new file mode 100644
index 0000000..bed7db5
--- /dev/null
+++ b/lib/libthr/thread/thr_vfork.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2012 Konstantin Belousov <k...@freebsd.org>
+ * Copyright (c) 2005 David Xu <dav...@freebsd.org>
+ * Copyright (c) 2003 Daniel Eischen <deis...@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Neither the name of the author nor the names of any co-contributors
+ * may be used to endorse or promote products derived from this software
+ * without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+/*
+ * Copyright (c) 1995-1998 John Birrell <j...@cimlogic.com.au>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the author nor the names of any co-contributors
+ * may be used to endorse or promote products derived from this software
+ * without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY JOHN BIRRELL AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "namespace.h"
+#include <errno.h>
+#include <link.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <spinlock.h>
+#include "un-namespace.h"
+
+#include "libc_private.h"
+#include "rtld_lock.h"
+#include "thr_private.h"
+
+static int rtld_locks[MAX_RTLD_LOCKS];
+static int cancelsave, was_threaded;
+
+void
+_thr_vfork_pre(void)
+{
+ struct pthread *curthread;
+
+ curthread = _get_curthread();
+ _thr_rwl_rdlock(&_thr_atfork_lock);
+ cancelsave = curthread->no_cancel;
+ curthread->no_cancel = 1;
+ _thr_signal_block(curthread);
+ _thr_signal_prefork();
+ if (_thr_isthreaded() != 0) {
+ was_threaded = 1;
+ _malloc_prefork();
+ _rtld_atfork_pre(rtld_locks);
+ } else
+ was_threaded = 0;
+}
+
+void
+_thr_vfork_post(void)
+{
+ struct pthread *curthread;
+
+ _thr_signal_postfork();
+ if (was_threaded) {
+ _rtld_atfork_post(rtld_locks);
+ _malloc_postfork();
+ }
+ curthread = _get_curthread();
+ _thr_signal_unblock(curthread);
+ curthread->no_cancel = cancelsave;
+ _thr_rwlock_unlock(&_thr_atfork_lock);
+ if (curthread->cancel_async)
+ _thr_testcancel(curthread);
+}
diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
index 6cb95cd..8735c25 100644
--- a/sys/kern/kern_fork.c
+++ b/sys/kern/kern_fork.c
@@ -150,11 +150,7 @@ sys_vfork(struct thread *td, struct vfork_args *uap)
int error, flags;
struct proc *p2;

-#ifdef XEN
- flags = RFFDG | RFPROC; /* validate that this is still an issue */
-#else
flags = RFFDG | RFPROC | RFPPWAIT | RFMEM;
-#endif
error = fork1(td, flags, 0, &p2, NULL, 0);
if (error == 0) {
td->td_retval[0] = p2->p_pid;
@@ -756,7 +752,7 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,

struct thread *td2;
struct vmspace *vm2;
vm_ooffset_t mem_charged;
- int error;
+ int error, single_threaded;
static int curfail;
static struct timeval lastfail;
#ifdef PROCDESC

@@ -815,6 +811,19 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,

}
#endif

+ if (((p1->p_flag & (P_HADTHREADS | P_SYSTEM)) == P_HADTHREADS) &&
+ (flags & RFPPWAIT) != 0) {
+ PROC_LOCK(p1);
+ if (thread_single(SINGLE_BOUNDARY)) {
+ PROC_UNLOCK(p1);
+ error = ERESTART;
+ goto fail2;
+ }
+ PROC_UNLOCK(p1);
+ single_threaded = 1;
+ } else
+ single_threaded = 0;
+
mem_charged = 0;
vm2 = NULL;
if (pages == 0)

@@ -945,6 +954,12 @@ fail1:

David Xu

unread,

Aug 13, 2012, 7:55:33 AM8/13/12

to Jilles Tjoelker, Konstantin Belousov, freebsd...@freebsd.org, David Xu

Or the libc's posix_spawn() should use each system call directly, it should
not call a function which is exported from libc globally, otherwise a PLT
resolving will cause rtld locking to be acquired, if it dies, the parent is
locked.

David Xu

unread,

Aug 13, 2012, 8:42:26 PM8/13/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org, Jilles Tjoelker

Single threading might be too excessive, the whole point of vfork is it
is faster,
if it is slow, I will use fork() instead. And another question is do you
think child
won't want to talk with another thread in parent process ?

It is unlikely you can use locking in vfork wrapper, this becauses a
vfork child
can call vfork again, if child process dies after it acquired thread
libraries'
locks, it will cause lock leaks. Also, if a child process needs to enter
RTLD
to resolve a PLT, the rtld_bind rwlock will be acquired, if the child
process
dies after it acquired read lock, it also causes lock leak.
I think Jilles patch is good enough to make posix_spawn and system() work,
it seems Solaris also only allows SIG_IGN or SIG_DFL to be use in vfork
child,
same as Jilles' patch.

David Xu

unread,

Aug 14, 2012, 12:43:13 AM8/14/12

to Jilles Tjoelker, Konstantin Belousov, freebsd...@freebsd.org

> a shell that wants to implement ...& using vfork()).

>
> posix_spawn() likely still needs some adjustment so that having it reset
> all signals (sigfillset()) to the default action will not cause it to
> [EINVAL] because libthr does not allow changing SIGTHR's disposition.
>
> Index: lib/libthr/thread/thr_sig.c
> ===================================================================
> --- lib/libthr/thread/thr_sig.c (revision 238970)
> +++ lib/libthr/thread/thr_sig.c (working copy)
> @@ -519,8 +519,16 @@
> return (-1);
> }
>
> - if (act)
> + if (act) {
> newact = *act;
> + /*
> + * Short-circuit cases where we do not touch _thr_sigact.
> + * This allows performing these safely in a vforked child.
> + */
> + if ((newact.sa_handler == SIG_DFL ||

> + newact.sa_handler == SIG_IGN)&& oact == NULL)
> + return (__sys_sigaction(sig,&newact, NULL));
> + }
>
> __sys_sigprocmask(SIG_SETMASK,&_thr_maskset,&oldset);
> _thr_rwl_wrlock(&_thr_sigact[sig-1].lock);
>

I simply duplicated idea from OpenSolaris, here is my patch
which has similar feature as your patch, and it also tries to
prevent vforked child from corrupting parent's data:
http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff

Konstantin Belousov

unread,

Aug 14, 2012, 4:19:51 AM8/14/12

to David Xu, freebsd...@freebsd.org, Jilles Tjoelker

On Tue, Aug 14, 2012 at 12:42:15PM +0800, David Xu wrote:
> I simply duplicated idea from OpenSolaris, here is my patch
> which has similar feature as your patch, and it also tries to
> prevent vforked child from corrupting parent's data:
> http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff

You shall not return from vfork() frame in the child. Otherwise, the
same frame is appears to be destroyed in parent, and parent dies. More
often on !x86, but right combination of events on x86 is deadly too.
If pid or curthread local variables are spilled into stack save area,
then child will override them, and e.g. parent could see pid == 0,
returning it to caller.

This was the reason why I went to asm wrapper for vfork.

Also, it seems that in mt process, malloc and rtld are still broken,
or am I missing something ?

David Xu

unread,

Aug 14, 2012, 5:19:01 AM8/14/12

to Konstantin Belousov, freebsd...@freebsd.org, Jilles Tjoelker

On 2012/08/14 16:18, Konstantin Belousov wrote:
> On Tue, Aug 14, 2012 at 12:42:15PM +0800, David Xu wrote:
>> I simply duplicated idea from OpenSolaris, here is my patch
>> which has similar feature as your patch, and it also tries to
>> prevent vforked child from corrupting parent's data:
>> http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff
> You shall not return from vfork() frame in the child. Otherwise, the
> same frame is appears to be destroyed in parent, and parent dies. More
> often on !x86, but right combination of events on x86 is deadly too.
> If pid or curthread local variables are spilled into stack save area,
> then child will override them, and e.g. parent could see pid == 0,
> returning it to caller.
>
> This was the reason why I went to asm wrapper for vfork.
>

OK.

> Also, it seems that in mt process, malloc and rtld are still broken,
> or am I missing something ?

I will not call it as broken, malloc and rtld are working properly.
vfork is not a normal function, for MT process, fork() is even
very special too.

POSIX says a lot about multi-threaded:
http://pubs.opengroup.org/onlinepubs/000095399/functions/fork.html

I quoted some POSIX document here:
> A process shall be created with a single thread. If a multi-threaded
> process calls fork(), the new process shall contain a replica of the
> calling thread and its entire address space, possibly including the
> states of mutexes and other resources. Consequently, to avoid errors,
> the child process may only execute async-signal-safe operations until
> such time as one of the exec functions is called. [THR] Fork
> handlers may be established by means of the pthread_atfork() function
> in order to maintain application invariants across fork() calls.

This means child process should only do very simple things, and
quickly call execv().

For mt process, fork() is already a very complicated problem,
one of problems I still remembered is that when fork() is called in
signal handler, should the thread library execute pthread_atfork
handler ? if it should do, but none of lock is async-signal safe,
though our internal rwlock allows to be used in signal handler,
but it is not supported by POSIX.
Also are those atfork handler prepared to be executed in signal
handler ? it is undefined. POSIX had opened a door here.
Above is one of complicated problem, the vfork is even more
restrictive than fork().
If it is possible, I would avoid such a complicated problem
which vfork() would cause.

Regards,
David Xu

Konstantin Belousov

unread,

Aug 14, 2012, 5:42:43 AM8/14/12

to David Xu, freebsd...@freebsd.org, Jilles Tjoelker

Sure.

But to call execv*, the child may need a working rtld, so we fixed it.
User code routinely called malloc() or even created threads in the child,
so we fixed that too. Otherwise, we have nothing to answer for the demands,
and 'other' OSes do support such usage. It is beyond POSIX, but this does
not matter, since the feature is expected to be available by application
writers.

>
> For mt process, fork() is already a very complicated problem,
> one of problems I still remembered is that when fork() is called in
> signal handler, should the thread library execute pthread_atfork
> handler ? if it should do, but none of lock is async-signal safe,
> though our internal rwlock allows to be used in signal handler,
> but it is not supported by POSIX.
> Also are those atfork handler prepared to be executed in signal
> handler ? it is undefined. POSIX had opened a door here.

POSIX authors were aware of this problem.
In the rationale for SUSv4, they wrote

"While the fork() function is async-signal-safe, there is no way for an
implementation to determine whether the fork handlers established by
pthread_atfork() are async-signal-safe. The fork handlers may attempt to
execute portions of the implementation that are not async-signal-safe,
such as those that are protected by mutexes, leading to a deadlock
condition. It is therefore undefined for the fork handlers to execute
functions that are not async-signal-safe when fork() is called from a
signal handler."

IMO, since fork() is specified to be async-signal safe, and since fork()
is specified to call atfork() handlers, SUSv4 requires, without any
misinterpreations, that atfork calling machinery must be async-signal
safe. The only possibility for undefined behaviour is the application
code registering non-async safe handlers.

> Above is one of complicated problem, the vfork is even more
> restrictive than fork().
> If it is possible, I would avoid such a complicated problem
> which vfork() would cause.

I fully agree that the issues caused by vfork() in multithreaded code
are complicated, but ignoring them lowers the quality of our implementation.
Fixing vfork in multithreaded process is not trivial, but it is possible.

My patch aims at working rtld and malloc in child.
As I said earlier, we might even try to call parent atfork handlers in child.

Sure, if child dies at wrong time, then rtld and malloc locks and data
structures can be left in unusable state for parent, but currently we
do not work even if child is relatively well-behaving.

David Xu

unread,

Aug 14, 2012, 11:16:25 AM8/14/12

to Konstantin Belousov, freebsd...@freebsd.org, David Xu, Jilles Tjoelker

But in real word, pthread atfork handlers are not async-signal safe,
they mostly do mutex locking and unlocking to keep consistent state,
mutex is not async-signal safe.
The malloc prefork and postfork handlers happen to work because I have
some hacking code in library for malloc locks. Otherwise, you even
can not use fork() in signal handler.

>> Above is one of complicated problem, the vfork is even more
>> restrictive than fork().
>> If it is possible, I would avoid such a complicated problem
>> which vfork() would cause.
> I fully agree that the issues caused by vfork() in multithreaded code
> are complicated, but ignoring them lowers the quality of our implementation.
> Fixing vfork in multithreaded process is not trivial, but it is possible.
>
> My patch aims at working rtld and malloc in child.
> As I said earlier, we might even try to call parent atfork handlers in child.
>
> Sure, if child dies at wrong time, then rtld and malloc locks and data
> structures can be left in unusable state for parent, but currently we
> do not work even if child is relatively well-behaving.

You are requiring the thread library to implement such a mutex
and other locks, that after vfork(), the mutex and other lock types must
still work across processes, the PTHREAD_PROCESS_PRIVATE type of
mutex and other locks now need to work in a PTHREAD_PROCESS_SHARE
mode.

Jilles Tjoelker

unread,

Aug 14, 2012, 5:10:37 PM8/14/12

to dav...@freebsd.org, Konstantin Belousov, freebsd...@freebsd.org

On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote:
> But in real word, pthread atfork handlers are not async-signal safe,
> they mostly do mutex locking and unlocking to keep consistent state,
> mutex is not async-signal safe.
> The malloc prefork and postfork handlers happen to work because I have
> some hacking code in library for malloc locks. Otherwise, you even
> can not use fork() in signal handler.

This problem was also reported to the Austin Group at
http://austingroupbugs.net/view.php?id=62

Atfork handlers are inherently async-signal-unsafe.

An interpretation was issued suggesting to remove fork() from the list
of async-signal-safe functions and add a new async-signal-safe function
_Fork() which does not call the atfork handlers.

This change will however not be in POSIX.1-2008 TC1 but only in the next
issue (SUSv5).

A slightly earlier report http://austingroupbugs.net/view.php?id=18 just
requested the _Fork() function because an existing application
deadlocked when calling fork() from a signal handler.

--
Jilles Tjoelker

David Xu

unread,

Aug 14, 2012, 7:41:40 PM8/14/12

to Jilles Tjoelker, Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org

On 2012/08/15 05:09, Jilles Tjoelker wrote:
> On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote:
>> But in real word, pthread atfork handlers are not async-signal safe,
>> they mostly do mutex locking and unlocking to keep consistent state,
>> mutex is not async-signal safe.
>> The malloc prefork and postfork handlers happen to work because I have
>> some hacking code in library for malloc locks. Otherwise, you even
>> can not use fork() in signal handler.
> This problem was also reported to the Austin Group at
> http://austingroupbugs.net/view.php?id=62
>
> Atfork handlers are inherently async-signal-unsafe.
>
> An interpretation was issued suggesting to remove fork() from the list
> of async-signal-safe functions and add a new async-signal-safe function
> _Fork() which does not call the atfork handlers.
>
> This change will however not be in POSIX.1-2008 TC1 but only in the next
> issue (SUSv5).
>
> A slightly earlier report http://austingroupbugs.net/view.php?id=18 just
> requested the _Fork() function because an existing application
> deadlocked when calling fork() from a signal handler.
>

Thanks, although SUSv5 will have _Fork(), but application will not catch up.

One solution for this problem is thread library does not execute atfork
handler
when fork() is called from signal handler, but it requires some work to
be done in
thread library's signal wrapper, for example, set a flag that the thread is
executing signal handler, but siglongjmp can mess the flag, so I have
to tweak sigsetjmp and siglongjmp to save/restore the flag, I have such
a patch:
it fetches target stack pointer stored in jmpbuf, and compare it with
top most
stack pointer when a first signal was delivered to the thread, if the
target stack
pointer is larger than the top most stack pointer, the flag is cleared.

Konstantin Belousov

unread,

Aug 15, 2012, 1:47:49 PM8/15/12

to dav...@freebsd.org, freebsd...@freebsd.org, Jilles Tjoelker

On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote:

> You are requiring the thread library to implement such a mutex
> and other locks, that after vfork(), the mutex and other lock types must
> still work across processes, the PTHREAD_PROCESS_PRIVATE type of
> mutex and other locks now need to work in a PTHREAD_PROCESS_SHARE
> mode.

In fact, yes. In my patch I achieve this by single-threading the parent,
which means that existing _PRIVATE mutexes are enough.

Konstantin Belousov

unread,

Aug 15, 2012, 1:50:42 PM8/15/12

to dav...@freebsd.org, freebsd...@freebsd.org, Jilles Tjoelker

I do not understand how this interacts with altstacks.

Also, there are longjmp()s implemented outside the base, e.g. in the
libunwind, which cannot be fixed this way.

Also, there are language runtimes that relies heavily on the (synchronous)
signals and which use their internal analogues of the stack unwinding,
which again be broken by such approach.

David Xu

unread,

Aug 15, 2012, 7:58:34 PM8/15/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org, Jilles Tjoelker

On 2012/08/16 01:46, Konstantin Belousov wrote:
> On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote:
>> You are requiring the thread library to implement such a mutex
>> and other locks, that after vfork(), the mutex and other lock types must
>> still work across processes, the PTHREAD_PROCESS_PRIVATE type of
>> mutex and other locks now need to work in a PTHREAD_PROCESS_SHARE
>> mode.
> In fact, yes. In my patch I achieve this by single-threading the parent,

I still think single-threading is execussive, vfork should be fast, and
because parent thread is already waiting for child process, there
is no problem to reuse the parent's stack in child process, it is
compatible.

> which means that existing _PRIVATE mutexes are enough.

Well, you forget that if private mutex sleep-wakeup queue is in kernel,
you only can see it in same process, otherwise it is a security problem.
Now It works because it is me implementing umtx in such a way that it
comparings two vmspace pointers in kernel umtx code, and treat two
threads are in same process if they are same. But there are implementations
do not work in this way, they simply look up lwpid in same process, and if
not found, the mutex is broken. process-private and proecess-shared locks
work in very different way, then your assumptions is false.

David Xu

unread,

Aug 15, 2012, 8:08:02 PM8/15/12

to dav...@freebsd.org, Konstantin Belousov, freebsd...@freebsd.org, Jilles Tjoelker

I must say my implementation is a lucky, not is the intention.

David Xu

unread,

Aug 15, 2012, 8:25:02 PM8/15/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org, Jilles Tjoelker

My patch is very experimental. There are setcontext and getcontext
which also can break it. Another solution would save a flag into
jmpbuf or ucontext, and indicates the signal handler is executing.
a setjmp or getcontext executed in normal context would not
have such a flag, but if they executes in signal handler, the per-thread
flag will be saved. but it requires lots of changes, and setcontext and
getcontext are syscall, kernel does know such a userland flag, unless
they are shared between kernel and userland.

Konstantin Belousov

unread,

Aug 16, 2012, 7:45:29 AM8/16/12

to dav...@freebsd.org, freebsd...@freebsd.org, Jilles Tjoelker

My point is that the fact that fork() is called from dynamic context
that was identified as the signal handler does not mean anything.
It can be mis-identified for many reasons, which both me and you
tried to partially enumerate above.

The really important thing is the atomicity of the current context,
e.g. synchronous SIGSEGV caused by a language runtime GC is very
much safe place to call atfork handlers, since runtimes usually cause
signal generations only at the safe place.

I do not think that such approach as you described can be completed,
the _Fork() seems the only robust way.

BTW, returning to Jilles proposal, can we call vfork() only for
single-threaded parent ? I think it gives good boost for single-threaded
case, and also eliminates the concerns of non-safety.

Jilles Tjoelker

unread,

Aug 16, 2012, 6:40:09 PM8/16/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org

On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
> My point is that the fact that fork() is called from dynamic context
> that was identified as the signal handler does not mean anything.
> It can be mis-identified for many reasons, which both me and you
> tried to partially enumerate above.

> The really important thing is the atomicity of the current context,
> e.g. synchronous SIGSEGV caused by a language runtime GC is very
> much safe place to call atfork handlers, since runtimes usually cause
> signal generations only at the safe place.

> I do not think that such approach as you described can be completed,
> the _Fork() seems the only robust way.

Agreed, that way (detecting signal handler) lies madness.

If need be, _Fork() is easily implemented and used.

> BTW, returning to Jilles proposal, can we call vfork() only for
> single-threaded parent ? I think it gives good boost for single-threaded
> case, and also eliminates the concerns of non-safety.

This would probably fix the safety issues but at a price. There is a
correlation between processes so large that they benefit greatly from
vfork and threaded processes.

On the other hand, I think direct calls to vfork() in applications are
risky and it may not be possible to support them safely in all
circumstances. However, if libc is calling vfork() such as via popen(),
system() or posix_spawn(), it should be possible even in a
multi-threaded process. In that case, the rtld and libthr problems can
be avoided by using aliases with hidden visibility for all functions the
vforked child needs to call (or any other method that prevents
interposition and hard-codes a displacement into libc.so).

There may still be a problem in programs that install signal handlers
because the set of async-signal-safe functions is larger than what can
be done in a vforked child. Userland can avoid this by masking affected
signals before calling vfork() and resetting them to SIG_DFL before
unmasking them. This will add many syscalls if the code does not know
which signals are affected (such as libc). Alternatively, the kernel
could map caught signals to the default action for processes with
P_PPWAIT (just like it does not stop such processes because of signals
or TTY job control).

--
Jilles Tjoelker

Konstantin Belousov

unread,

Aug 17, 2012, 8:45:30 AM8/17/12

to Jilles Tjoelker, freebsd...@freebsd.org, dav...@freebsd.org

On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
> On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
> > My point is that the fact that fork() is called from dynamic context
> > that was identified as the signal handler does not mean anything.
> > It can be mis-identified for many reasons, which both me and you
> > tried to partially enumerate above.
>
> > The really important thing is the atomicity of the current context,
> > e.g. synchronous SIGSEGV caused by a language runtime GC is very
> > much safe place to call atfork handlers, since runtimes usually cause
> > signal generations only at the safe place.
>
> > I do not think that such approach as you described can be completed,
> > the _Fork() seems the only robust way.
>
> Agreed, that way (detecting signal handler) lies madness.
>
> If need be, _Fork() is easily implemented and used.
>
> > BTW, returning to Jilles proposal, can we call vfork() only for
> > single-threaded parent ? I think it gives good boost for single-threaded
> > case, and also eliminates the concerns of non-safety.
>
> This would probably fix the safety issues but at a price. There is a
> correlation between processes so large that they benefit greatly from
> vfork and threaded processes.

Ok, so I will continue with my patch.

>
> On the other hand, I think direct calls to vfork() in applications are
> risky and it may not be possible to support them safely in all
> circumstances. However, if libc is calling vfork() such as via popen(),
> system() or posix_spawn(), it should be possible even in a
> multi-threaded process. In that case, the rtld and libthr problems can
> be avoided by using aliases with hidden visibility for all functions the
> vforked child needs to call (or any other method that prevents
> interposition and hard-codes a displacement into libc.so).

I do not see how using any aliases could help there. Basically, if mt
process is not single-threaded for vfork, you can have both some parent
thread and child enter rtld. This is complete mess.

>
> There may still be a problem in programs that install signal handlers
> because the set of async-signal-safe functions is larger than what can
> be done in a vforked child. Userland can avoid this by masking affected
> signals before calling vfork() and resetting them to SIG_DFL before
> unmasking them. This will add many syscalls if the code does not know
> which signals are affected (such as libc). Alternatively, the kernel
> could map caught signals to the default action for processes with
> P_PPWAIT (just like it does not stop such processes because of signals
> or TTY job control).

If rtld does not work, then any library function call from a signal handler
is problematic. My goal is to get a working rtld and possibly malloc.

BTW, not quite related, it seems that the placement of openat,
setcontext and swapcontext in the pthread.map is wrong. openat is
at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while
libthr exports them at 1.2. App or library gets linked to arbitrary
version depending on whether libphread was specified at link time, and
interposition from libthr does not work.

Below is the latest version of my patch for vfork, which survives (modified)
tools/test/pthread_vfork_test. Patch is only for x86 right now.

diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc
index e6d99ec..476d26a 100644
--- a/lib/libthr/arch/amd64/Makefile.inc
+++ b/lib/libthr/arch/amd64/Makefile.inc
@@ -1,3 +1,4 @@
#$FreeBSD$

-SRCS+= pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+= pthread_md.c _umtx_op_err.S vfork.S
diff --git a/lib/libthr/arch/amd64/amd64/vfork.S b/lib/libthr/arch/amd64/amd64/vfork.S
new file mode 100644

index 0000000..1f0714f
--- /dev/null
+++ b/lib/libthr/arch/amd64/amd64/vfork.S
@@ -0,0 +1,77 @@

+#include <machine/asm.h>
+__FBSDID("$FreeBSD$");
+
+#include "SYS.h"
+
+ .weak _vfork
+ .set _vfork,__sys_vfork
+ .weak vfork
+ .set vfork,__sys_vfork
+ENTRY(__sys_vfork)
+ call _thr_vfork_pre
+ popq %rsi /* fetch return address (%rsi preserved) */
+ mov $SYS_vfork,%rax
+ KERNCALL
+ jb 2f
+ cmpl $0,%eax
+ jne 1f
+ pushq %rsi
+ pushq %rsi /* twice for stack alignment */

+ call CNAME(_thr_vfork_post_child)

+ popq %rsi
+ popq %rsi
+ xorl %eax,%eax

+ jmp *%rsi
+1:

+ pushq %rsi
+ pushq %rax

+ call CNAME(_thr_vfork_post_child)
+ popq %rax
+ popq %rsi

+ jmp *%rsi
+2:
+ pushq %rsi
+ pushq %rax

+ call CNAME(_thr_vfork_post_child)

+ call PIC_PLT(CNAME(__error))
+ popq %rdx
+ movq %rdx,(%rax)
+ movq $-1,%rax

+ movq %rax,%rdx

+ retq
+END(__sys_vfork)
+
+ .section .note.GNU-stack,"",%progbits

diff --git a/lib/libthr/arch/i386/Makefile.inc b/lib/libthr/arch/i386/Makefile.inc
index 01290d5..13d55e2 100644
--- a/lib/libthr/arch/i386/Makefile.inc
+++ b/lib/libthr/arch/i386/Makefile.inc
@@ -1,3 +1,4 @@
# $FreeBSD$

-SRCS+= pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+= pthread_md.c _umtx_op_err.S vfork.S

diff --git a/lib/libthr/arch/i386/i386/vfork.S b/lib/libthr/arch/i386/i386/vfork.S
new file mode 100644
index 0000000..ca8896b
--- /dev/null
+++ b/lib/libthr/arch/i386/i386/vfork.S
@@ -0,0 +1,82 @@

+#include <machine/asm.h>
+__FBSDID("$FreeBSD$");
+
+#include "SYS.h"
+
+ .weak _vfork
+ .set _vfork,__sys_vfork
+ .weak vfork
+ .set vfork,__sys_vfork
+ENTRY(__sys_vfork)

+ PIC_PROLOGUE
+ call PIC_PLT(CNAME(_thr_vfork_pre))
+ PIC_EPILOGUE
+ popl %ecx /* my rta into ecx */
+ mov $SYS_vfork,%eax
+ KERNCALL
+ jb error_ret
+ pushl %ecx
+ cmpl $0,%eax
+ jne parent_ret
+ PIC_PROLOGUE
+ call PIC_PLT(CNAME(_thr_vfork_post_child))
+ PIC_EPILOGUE
+ popl %ecx
+ xorl %eax,%eax
+ jmp *%ecx
+parent_ret:
+ pushl %eax
+ PIC_PROLOGUE
+ call PIC_PLT(CNAME(_thr_vfork_post_parent))
+ PIC_EPILOGUE
+ popl %eax
+ popl %ecx
+ jmp *%ecx
+error_ret:
+ pushl %ecx
+ pushl %eax
+ PIC_PROLOGUE
+ call PIC_PLT(CNAME(_thr_vfork_post_child))
+ call PIC_PLT(CNAME(__error))
+ PIC_EPILOGUE
+ popl %edx
+ movl %edx,(%eax)
+ movl $-1,%eax
+ movl %eax,%edx
+ retl

index ba272fe..951c3b8 100644
--- a/lib/libthr/thread/thr_private.h
+++ b/lib/libthr/thread/thr_private.h
@@ -757,6 +757,7 @@ void _thr_cancel_leave(struct pthread *, int) __hidden;
void _thr_testcancel(struct pthread *) __hidden;
void _thr_signal_block(struct pthread *) __hidden;
void _thr_signal_unblock(struct pthread *) __hidden;
+void _thr_signal_unblock_noref(struct pthread *) __hidden;
void _thr_signal_init(void) __hidden;
void _thr_signal_deinit(void) __hidden;
int _thr_send_sig(struct pthread *, int sig) __hidden;
@@ -777,6 +778,9 @@ int _thr_setscheduler(lwpid_t, int, const struct sched_param *) __hidden;

void _thr_signal_prefork(void) __hidden;
void _thr_signal_postfork(void) __hidden;
void _thr_signal_postfork_child(void) __hidden;
+void _thr_vfork_pre(void) __hidden;

+void _thr_vfork_post_child(void) __hidden;
+void _thr_vfork_post_parent(void) __hidden;

void _thr_try_gc(struct pthread *, struct pthread *) __hidden;
int _rtp_to_schedparam(const struct rtprio *rtp, int *policy,
struct sched_param *param) __hidden;

@@ -833,6 +837,7 @@ pid_t __sys_getpid(void);

ssize_t __sys_read(int, void *, size_t);
ssize_t __sys_write(int, const void *, size_t);
void __sys_exit(int);
+pid_t __sys_vfork(void);
#endif

static inline int

diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
index 3dee8b7..3e5b25e 100644
--- a/lib/libthr/thread/thr_sig.c
+++ b/lib/libthr/thread/thr_sig.c
@@ -107,6 +107,13 @@ _thr_signal_unblock(struct pthread *curthread)
__sys_sigprocmask(SIG_SETMASK, &curthread->sigmask, NULL);
}

+void
+_thr_signal_unblock_noref(struct pthread *curthread)
+{
+ if (curthread->sigblock == 0)
+ __sys_sigprocmask(SIG_SETMASK, &curthread->sigmask, NULL);
+}
+
int
_thr_send_sig(struct pthread *thread, int sig)
{

diff --git a/lib/libthr/thread/thr_vfork.c b/lib/libthr/thread/thr_vfork.c
new file mode 100644

index 0000000..80c9d1e
--- /dev/null
+++ b/lib/libthr/thread/thr_vfork.c
@@ -0,0 +1,123 @@

+ _thr_rwl_wrlock(&_thr_atfork_lock);

+ cancelsave = curthread->no_cancel;
+ curthread->no_cancel = 1;
+ _thr_signal_block(curthread);
+ _thr_signal_prefork();
+ if (_thr_isthreaded() != 0) {
+ was_threaded = 1;
+ _malloc_prefork();
+ _rtld_atfork_pre(rtld_locks);
+ } else
+ was_threaded = 0;
+}
+
+void

+_thr_vfork_post_child(void)

+{
+ struct pthread *curthread;
+
+ _thr_signal_postfork();
+ if (was_threaded) {
+ _rtld_atfork_post(rtld_locks);
+ _malloc_postfork();
+ }
+ curthread = _get_curthread();
+ _thr_signal_unblock(curthread);
+ curthread->no_cancel = cancelsave;
+ _thr_rwlock_unlock(&_thr_atfork_lock);
+ if (curthread->cancel_async)
+ _thr_testcancel(curthread);
+}

+
+void
+_thr_vfork_post_parent(void)

+{
+ struct pthread *curthread;
+
+ curthread = _get_curthread();

+ _thr_signal_unblock_noref(curthread);

+ if (curthread->cancel_async)
+ _thr_testcancel(curthread);
+}
diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c

index 46cdca1..134ba80 100644

Jilles Tjoelker

unread,

Aug 17, 2012, 12:09:42 PM8/17/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org

On Fri, Aug 17, 2012 at 03:43:12PM +0300, Konstantin Belousov wrote:
> On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
> > On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
> > > BTW, returning to Jilles proposal, can we call vfork() only for
> > > single-threaded parent ? I think it gives good boost for single-threaded
> > > case, and also eliminates the concerns of non-safety.

> > This would probably fix the safety issues but at a price. There is a
> > correlation between processes so large that they benefit greatly from
> > vfork and threaded processes.
> Ok, so I will continue with my patch.

> > On the other hand, I think direct calls to vfork() in applications are
> > risky and it may not be possible to support them safely in all
> > circumstances. However, if libc is calling vfork() such as via popen(),
> > system() or posix_spawn(), it should be possible even in a
> > multi-threaded process. In that case, the rtld and libthr problems can
> > be avoided by using aliases with hidden visibility for all functions the
> > vforked child needs to call (or any other method that prevents
> > interposition and hard-codes a displacement into libc.so).

> I do not see how using any aliases could help there. Basically, if mt
> process is not single-threaded for vfork, you can have both some parent
> thread and child enter rtld. This is complete mess.

If libc calls a function with hidden visibility, this proceeds directly
(not via the PLT) and rtld is not involved.

Several ways to do this are in section 2.2.7 Avoid Using Exported
Symbols of Ulrich Drepper's dsohowto. One of them is something like

extern __typeof(next) next_int
__attribute((alias("next"), visibility("hidden")));

in the same source as the definition of the function

int next(void) { ...; }

As Drepper notes, the visibility attribute is not strictly required if a
version script keeps the symbol local but it might lead to better code.
At least on i386, though, the optimal direct near call instruction is
generated even without it. For example, _nsdispatch() calls
libc_dlopen() (kept local by the version script) without going through
the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj).

In the assembler syscall stubs using code from lib/libc/arch/SYS.h this
can be done by adding another .set (we currently have foo, _foo and
__sys_foo for a syscall foo; some syscalls have only _foo and
__sys_foo) such as __syshidden_foo. The ones that are actually used
then need prototypes (similar to the __sys_* ones in lib/libthr/?).

For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported.
Either this should be kept local or a local uninterposable alias should
be added and used (as with the syscalls).

The function __error() (to get errno's address for the current thread)
is and must be called via the PLT (because libthr is separate).
Therefore, we should ensure it is called at least once before vfork so
calls in the child do not involve rtld. The implementations for the
various architectures use the TLS register (or memory location for ARM),
so they seem safe.

This should suffice to fix posix_spawn() but the _execvpe() used by
posix_spawnp also uses various string functions. If not all of these
have already been called earlier, this will not work. Making calls to
them not go through the PLT seems fairly hard, even though it would make
sense in general, so perhaps I should instead reimplement it such that
the parent does almost all of the work.

An alternative is to write the core of posix_spawn() in assembler using
system calls directly but I would rather avoid that :(

> > There may still be a problem in programs that install signal handlers
> > because the set of async-signal-safe functions is larger than what can
> > be done in a vforked child. Userland can avoid this by masking affected
> > signals before calling vfork() and resetting them to SIG_DFL before
> > unmasking them. This will add many syscalls if the code does not know
> > which signals are affected (such as libc). Alternatively, the kernel
> > could map caught signals to the default action for processes with
> > P_PPWAIT (just like it does not stop such processes because of signals
> > or TTY job control).

> If rtld does not work, then any library function call from a signal handler
> is problematic. My goal is to get a working rtld and possibly malloc.

> BTW, not quite related, it seems that the placement of openat,
> setcontext and swapcontext in the pthread.map is wrong. openat is
> at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while
> libthr exports them at 1.2. App or library gets linked to arbitrary
> version depending on whether libphread was specified at link time, and
> interposition from libthr does not work.

Oops :( Can this still be fixed (like by exporting identical functions
in multiple versions)?

> Below is the latest version of my patch for vfork, which survives (modified)
> tools/test/pthread_vfork_test. Patch is only for x86 right now.

Why does this patch call thread_single(SINGLE_BOUNDARY)? I think this
just causes breakage by causing short writes, [ERESTART] and the like
where not permitted by POSIX while not fixing userland's problems. From
userland's point of view, thread_single(SINGLE_BOUNDARY) just freezes
threads at whatever point they are executing, including while holding an
internal lock. Also, the thread_single_end() is before waiting for the
child to exec or exit (because that is now just before returning to
userland).

If single-threading is needed, it should be done in userland such as via
pthread_suspend_all_np(). This function already ensures no other thread
is in a libthr critical section. If rtld blocks SIGTHR during its
critical sections, it likely also waits for the threads to leave rtld.
As with kernel-level single-threading, this is slow and causes incorrect
short writes.

If single-threading is only necessary because umtx locks are formally
per-process instead of per-vmspace, then I suggest avoiding the trouble
and depending on the fact that they are per-vmspace.

> [snip]

Konstantin Belousov

unread,

Aug 19, 2012, 8:27:33 AM8/19/12

to Jilles Tjoelker, freebsd...@freebsd.org, dav...@freebsd.org

On Fri, Aug 17, 2012 at 06:08:47PM +0200, Jilles Tjoelker wrote:
> On Fri, Aug 17, 2012 at 03:43:12PM +0300, Konstantin Belousov wrote:
> > On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
> > > On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
> > > > BTW, returning to Jilles proposal, can we call vfork() only for
> > > > single-threaded parent ? I think it gives good boost for single-threaded
> > > > case, and also eliminates the concerns of non-safety.
>
> > > This would probably fix the safety issues but at a price. There is a
> > > correlation between processes so large that they benefit greatly from
> > > vfork and threaded processes.
> > Ok, so I will continue with my patch.
>
> > > On the other hand, I think direct calls to vfork() in applications are
> > > risky and it may not be possible to support them safely in all
> > > circumstances. However, if libc is calling vfork() such as via popen(),
> > > system() or posix_spawn(), it should be possible even in a
> > > multi-threaded process. In that case, the rtld and libthr problems can
> > > be avoided by using aliases with hidden visibility for all functions the
> > > vforked child needs to call (or any other method that prevents
> > > interposition and hard-codes a displacement into libc.so).
>
> > I do not see how using any aliases could help there. Basically, if mt
> > process is not single-threaded for vfork, you can have both some parent
> > thread and child enter rtld. This is complete mess.
>
> If libc calls a function with hidden visibility, this proceeds directly
> (not via the PLT) and rtld is not involved.

Oh, so you are only concerned with libc there ?

If spending so much efforts on this issue, IMO it is pity to only fix
libc and not fix vfork for all consumers.

>
> Several ways to do this are in section 2.2.7 Avoid Using Exported
> Symbols of Ulrich Drepper's dsohowto. One of them is something like
>
> extern __typeof(next) next_int
> __attribute((alias("next"), visibility("hidden")));
>
> in the same source as the definition of the function
>
> int next(void) { ...; }
>
> As Drepper notes, the visibility attribute is not strictly required if a
> version script keeps the symbol local but it might lead to better code.
> At least on i386, though, the optimal direct near call instruction is
> generated even without it. For example, _nsdispatch() calls
> libc_dlopen() (kept local by the version script) without going through
> the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj).

I am not confident, so this might be a hallucination, but 'optimization'
in the recent gnu ld might rewrite the call target to avoid PLT when
symbol visibility is hidden.

>
> In the assembler syscall stubs using code from lib/libc/arch/SYS.h this
> can be done by adding another .set (we currently have foo, _foo and
> __sys_foo for a syscall foo; some syscalls have only _foo and
> __sys_foo) such as __syshidden_foo. The ones that are actually used
> then need prototypes (similar to the __sys_* ones in lib/libthr/?).
>
> For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported.
> Either this should be kept local or a local uninterposable alias should
> be added and used (as with the syscalls).
>
> The function __error() (to get errno's address for the current thread)
> is and must be called via the PLT (because libthr is separate).
> Therefore, we should ensure it is called at least once before vfork so
> calls in the child do not involve rtld. The implementations for the
> various architectures use the TLS register (or memory location for ARM),
> so they seem safe.
>
> This should suffice to fix posix_spawn() but the _execvpe() used by
> posix_spawnp also uses various string functions. If not all of these
> have already been called earlier, this will not work. Making calls to
> them not go through the PLT seems fairly hard, even though it would make
> sense in general, so perhaps I should instead reimplement it such that
> the parent does almost all of the work.
>
> An alternative is to write the core of posix_spawn() in assembler using
> system calls directly but I would rather avoid that :(

BTW, we have very gross inefficiency in our libc syscall stubs on i386.
Namely, the PIC_PROLOGUE generates 'call 1f; 1f:' sequence, which trashes
return stack predictor in CPU. When compiling for pentium pro and later,
gcc calls dummy procedure that moves return address into %ebx. I think
we ought to do this unconditionally for libc stubs in modern times.

>
> > > There may still be a problem in programs that install signal handlers
> > > because the set of async-signal-safe functions is larger than what can
> > > be done in a vforked child. Userland can avoid this by masking affected
> > > signals before calling vfork() and resetting them to SIG_DFL before
> > > unmasking them. This will add many syscalls if the code does not know
> > > which signals are affected (such as libc). Alternatively, the kernel
> > > could map caught signals to the default action for processes with
> > > P_PPWAIT (just like it does not stop such processes because of signals
> > > or TTY job control).
>
> > If rtld does not work, then any library function call from a signal handler
> > is problematic. My goal is to get a working rtld and possibly malloc.
>
> > BTW, not quite related, it seems that the placement of openat,
> > setcontext and swapcontext in the pthread.map is wrong. openat is
> > at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while
> > libthr exports them at 1.2. App or library gets linked to arbitrary
> > version depending on whether libphread was specified at link time, and
> > interposition from libthr does not work.
>
> Oops :( Can this still be fixed (like by exporting identical functions
> in multiple versions)?

My take is that we shall declare this an ABI bug and fix it before 9.1
by exporting only the right version from libthr. This would indeed break
some binaries, but I having the same symbol at different versions gives
not the semantic we want for bugfix (since you would still link to the
default version).

>
> > Below is the latest version of my patch for vfork, which survives (modified)
> > tools/test/pthread_vfork_test. Patch is only for x86 right now.
>
> Why does this patch call thread_single(SINGLE_BOUNDARY)? I think this
> just causes breakage by causing short writes, [ERESTART] and the like
> where not permitted by POSIX while not fixing userland's problems. From
> userland's point of view, thread_single(SINGLE_BOUNDARY) just freezes
> threads at whatever point they are executing, including while holding an
> internal lock. Also, the thread_single_end() is before waiting for the
> child to exec or exit (because that is now just before returning to
> userland).

Oh, the placement of the single_end() is bug. Thank you for noting it.

The EINTR return from syscalls in other threads when doing fork(2) in
one thread is common and well-known situation. I believe that older
Solarises and SVr4 did exactly this. So I do not consider the EINTR or
short i/o an unacceptable cost, but please see below.

>
> If single-threading is needed, it should be done in userland such as via
> pthread_suspend_all_np(). This function already ensures no other thread
> is in a libthr critical section. If rtld blocks SIGTHR during its
> critical sections, it likely also waits for the threads to leave rtld.
> As with kernel-level single-threading, this is slow and causes incorrect
> short writes.

User-mode single-threading, like pthread_suspend_all_np() is order of
magnitude slower then kernel-mode single-threading (I admit that I was
not aware of pthread_suspend_all_np()).

The cause is that kernel often can declare single-threading done when
the threads are stopped while still on sleepqueue. Only recently we started
marking some sleeps as 'unsafe to stop', mostly in NFS land.

So pthread_suspend_all_np() needs to wake up all threads, deliver SIGCANCEL,
wait for threads to leave kernel and get ast. For large process were lot
of threads typically just sleep waiting for (socket) i/o, there is a lot
of cycles to spend.

>
> If single-threading is only necessary because umtx locks are formally
> per-process instead of per-vmspace, then I suggest avoiding the trouble
> and depending on the fact that they are per-vmspace.

I agree. I worried about inter-process locking to prevent simultaneous
entry into rtld or malloc from parent other thread and child, but as
David rightfully pointed out, still relied on single vmspace for unlock.

So I removed the single-threading from the kernel, relying on the usermode
locks to prevent parallel rtld entrance and to lock fork/vfork in parent.

Patch below also fixes a typo in amd64/vfork.S, which also passes
pthread_vfork_test now.

diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc
index e6d99ec..476d26a 100644
--- a/lib/libthr/arch/amd64/Makefile.inc
+++ b/lib/libthr/arch/amd64/Makefile.inc
@@ -1,3 +1,4 @@
#$FreeBSD$

-SRCS+= pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+= pthread_md.c _umtx_op_err.S vfork.S
diff --git a/lib/libthr/arch/amd64/amd64/vfork.S b/lib/libthr/arch/amd64/amd64/vfork.S
new file mode 100644

index 0000000..7997032

+ call CNAME(_thr_vfork_post_parent)

diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
index 46cdca1..6f2cfed 100644

Jilles Tjoelker

unread,

Aug 26, 2012, 6:05:38 PM8/26/12

to Konstantin Belousov, freebsd...@freebsd.org, dav...@freebsd.org

True. If vfork can be fixed more generally without too much penalty, it
is better to do that. And it looks like that is possible, including
detecting unexpected kills, see below.

> > Several ways to do this are in section 2.2.7 Avoid Using Exported
> > Symbols of Ulrich Drepper's dsohowto. One of them is something like

> > extern __typeof(next) next_int
> > __attribute((alias("next"), visibility("hidden")));

> > in the same source as the definition of the function

> > int next(void) { ...; }

> > As Drepper notes, the visibility attribute is not strictly required if a
> > version script keeps the symbol local but it might lead to better code.
> > At least on i386, though, the optimal direct near call instruction is
> > generated even without it. For example, _nsdispatch() calls
> > libc_dlopen() (kept local by the version script) without going through
> > the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj).

> I am not confident, so this might be a hallucination, but 'optimization'
> in the recent gnu ld might rewrite the call target to avoid PLT when
> symbol visibility is hidden.

If symbol visibility is hidden, rtld does not know about the symbol's
name; therefore, the only way ld can make it work is to make the call
instruction point directly at the called function, not to a PLT entry.

Different from what I wrote earlier, on i386, there is still a penalty
in omitting the visibility attribute: the compiler will generate code to
load %ebx which may not be necessary.

> > In the assembler syscall stubs using code from lib/libc/arch/SYS.h this
> > can be done by adding another .set (we currently have foo, _foo and
> > __sys_foo for a syscall foo; some syscalls have only _foo and
> > __sys_foo) such as __syshidden_foo. The ones that are actually used
> > then need prototypes (similar to the __sys_* ones in lib/libthr/?).

> > For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported.
> > Either this should be kept local or a local uninterposable alias should
> > be added and used (as with the syscalls).

I sent a patch for .cerror (i386 only) in a new mailing list thread.

> > The function __error() (to get errno's address for the current thread)
> > is and must be called via the PLT (because libthr is separate).
> > Therefore, we should ensure it is called at least once before vfork so
> > calls in the child do not involve rtld. The implementations for the
> > various architectures use the TLS register (or memory location for ARM),
> > so they seem safe.

> > This should suffice to fix posix_spawn() but the _execvpe() used by
> > posix_spawnp also uses various string functions. If not all of these
> > have already been called earlier, this will not work. Making calls to
> > them not go through the PLT seems fairly hard, even though it would make
> > sense in general, so perhaps I should instead reimplement it such that
> > the parent does almost all of the work.

> > An alternative is to write the core of posix_spawn() in assembler using
> > system calls directly but I would rather avoid that :(

> BTW, we have very gross inefficiency in our libc syscall stubs on i386.
> Namely, the PIC_PROLOGUE generates 'call 1f; 1f:' sequence, which trashes
> return stack predictor in CPU. When compiling for pentium pro and later,
> gcc calls dummy procedure that moves return address into %ebx. I think
> we ought to do this unconditionally for libc stubs in modern times.

This is indeed the case, although I don't think the inefficiency is very
gross since there will not be very many return addresses left after a
system call.

Some post-pentiumpro CPUs do know about 'call 1f; 1f:'; on these CPUs it
is slightly faster than the get_pc_thunk while it is much slower than
the get_pc_thunk on CPUs that do not treat it specially.

> > > > There may still be a problem in programs that install signal handlers
> > > > because the set of async-signal-safe functions is larger than what can
> > > > be done in a vforked child. Userland can avoid this by masking affected
> > > > signals before calling vfork() and resetting them to SIG_DFL before
> > > > unmasking them. This will add many syscalls if the code does not know
> > > > which signals are affected (such as libc). Alternatively, the kernel
> > > > could map caught signals to the default action for processes with
> > > > P_PPWAIT (just like it does not stop such processes because of signals
> > > > or TTY job control).

> > > If rtld does not work, then any library function call from a signal handler
> > > is problematic. My goal is to get a working rtld and possibly malloc.

> > > Below is the latest version of my patch for vfork, which survives (modified)
> > > tools/test/pthread_vfork_test. Patch is only for x86 right now.

> [snip single-threading]

> > If single-threading is only necessary because umtx locks are formally
> > per-process instead of per-vmspace, then I suggest avoiding the trouble
> > and depending on the fact that they are per-vmspace.
> I agree. I worried about inter-process locking to prevent simultaneous
> entry into rtld or malloc from parent other thread and child, but as
> David rightfully pointed out, still relied on single vmspace for unlock.

> So I removed the single-threading from the kernel, relying on the usermode
> locks to prevent parallel rtld entrance and to lock fork/vfork in parent.

Hmm, if the locks work properly in parent and child, is it necessary to
do prefork/postfork at all? If another thread holds an internal lock
during vfork(), the child can wait for this normally, right?

I think _thr_vfork_post_parent() can detect most of the situations where
the vforked child was killed unexpectedly. If THR_IN_CRITICAL(curthread)
at the start of _thr_vfork_post_parent(), the vforked child was in a
critical section or held an internal lock, and things are broken. In
this case, the process should be terminated because continuing is likely
to cause deadlock or other undefined behaviour.

One way to trigger such a kill is by passing an invalid struct
sched_param pointer to pthread_setschedparam(). Of course it makes no
sense to call this function in a vforked child but it provides a
deterministic way to trigger the problem.

Single-threaded processes without libthr cannot detect an unexpected
kill in this way. Then again, they are unlikely to deadlock and will
probably crash again.

system() using vfork() or posix_spawn()

Jilles Tjoelker

Konstantin Belousov

Jilles Tjoelker

Konstantin Belousov

Jilles Tjoelker

Konstantin Belousov

David Xu

Konstantin Belousov

Jilles Tjoelker

David Xu

Konstantin Belousov

David Xu

David Xu

David Xu

Konstantin Belousov

David Xu

Konstantin Belousov

David Xu

Jilles Tjoelker

David Xu

Konstantin Belousov

Konstantin Belousov

David Xu

David Xu

David Xu

Konstantin Belousov

Jilles Tjoelker

Konstantin Belousov

Jilles Tjoelker

Konstantin Belousov

Jilles Tjoelker