net.UnixListener.Close() always removes socket's file

1,388 views
Skip to first unread message

tnl...@gmail.com

unread,
Feb 7, 2015, 4:14:22 AM2/7/15
to golan...@googlegroups.com
Hi,

There is a syscall.Unlink(l.path) call in the net.UnixListener.Close().
I believe, I found a use case when it's better to have an option to bypass this call.

Namely, I think, I need a way to close file descriptor of unix domain socket without file deletion in a scenario, when one process listens sockets on behave of another process and passes opened file descriptors via environment.

I know just one such scenario yet (but more can exist, I suppose) - when I use golang to build a systemd service unit with socket activation and configure it to use unix domain sockets.

Here is a link to read about socket activation: http://0pointer.de/blog/projects/socket-activation.html. In a nutshell, systemd can listen sockets according to service's configuration and lazily execute service when there is some incoming connection. When it creates process to execute actual service, sysetemd passes all prepared sockets to that process.

Although, not very clear stated, there is a phrase in systemd docs: "A daemon listening on an AF_UNIX socket may, but does not need to, call close(2) on the received socket before exiting. However, it must not unlink the socket from a file system" (http://www.freedesktop.org/software/systemd/man/systemd.socket.html). And indeed, in our experiments with socket activation and unix domain sockets (on CentOS 7) we encountered an issue. If we close UnixListener in our process before exit, then next time systemd executes our service we have an error "dial unix /tmp/activation-test-1.sk: no such file or directory" - file was deleted by net.UnixListener.Close(). If we do not close socket, than we cannot exit endless loop in the method func (fd *netFD) accept() (netfd *netFD, err error) ( https://golang.org/src/net/fd_unix.go), and cleanly exit our program therefor (performing necessary cleanup and graceful-shutdown logic).

Motivation to use socket activation here is to be able to automatically restart failed server as transparent to client as possible (without additional logic on the client side).

We are trying to use this project to achieve our goals: https://github.com/coreos/go-systemd. I forked repository to provide additional test, better reflecting our particular use case and raised an issue: https://github.com/coreos/go-systemd/issues/71. But I think, I exhausted another options except somehow patch standard lib to make a call to syscall.Unlink(l.path) optional. In my final test I compiled golang from source with additional method:

func (l *UnixListener) CloseFD() error {
   
if l == nil || l.fd == nil {
       
return syscall.EINVAL
   
}
   
return l.fd.Close()
}

and invoke this method in place of UnixListener.Close() to exit from program. With this code test passed.

So, I need some opinions on all this. I think, may be it's better to have some property and additional logic in the existing Close() method (instead of additional CloseFD() method, which I created just to prove my guess to myself).

Thanks.

Ian Lance Taylor

unread,
Feb 8, 2015, 12:37:18 AM2/8/15
to tnl...@gmail.com, golang-nuts
On Sat, Feb 7, 2015 at 1:14 AM, <tnl...@gmail.com> wrote:
>
> If we do not close socket,
> than we cannot exit endless loop in the method func (fd *netFD) accept()
> (netfd *netFD, err error) ( https://golang.org/src/net/fd_unix.go), and
> cleanly exit our program therefor (performing necessary cleanup and
> graceful-shutdown logic).

I don't quite follow this step. It sounds like you are saying that
your process exits. If your process exits, then you don't need to
close your UnixListener. What happens if you simply don't close it?

Ian

tnl...@gmail.com

unread,
Feb 8, 2015, 8:19:56 AM2/8/15
to golan...@googlegroups.com, tnl...@gmail.com
Hi, Ian,

Thanks for your interest.

Sorry, it's may be not clear from my post, I tried to describe two possible scenarios there.

In both cases I use Listener.Close() to break out of endless loops in http.Serve() and netFD.accept() - both have endless loops and netFD.accept() nested into loop of http.Serve(). From both of them one can break with socket error when close socket and, honestly, I don't know another way. In the real program I invoke Close() from signal listener on SIGTERM. In the test code I also added invocation of Close() right in the connection handler.

But UnixListener.Close() have a side effect: it also removes the file in the file system. So, next time when systemd invokes service with the same unix socket, the file is absent and the code fails to execute with an error. So, I tried to not invoke UnixListener.Close() in hope that file descriptor will be closed automatically by runtime finalizer. Alas, in this case I have no way to terminate serving goroutine. So, I need to close file descriptor to exit from program, and I must preserve socket's file in file system.

So, in exact replay to your questions: No, process won't exit without Listener.Close(). At least, I cannot make it to exit gracefully - with all necessary cleanup logic. If I simply don't close socket, process hangs up till I kill it from console.

Ian Lance Taylor

unread,
Feb 8, 2015, 12:24:19 PM2/8/15
to tnl...@gmail.com, golang-nuts
On Sun, Feb 8, 2015 at 5:19 AM, <tnl...@gmail.com> wrote:
>
> But UnixListener.Close() have a side effect: it also removes the file in the
> file system. So, next time when systemd invokes service with the same unix
> socket, the file is absent and the code fails to execute with an error. So,
> I tried to not invoke UnixListener.Close() in hope that file descriptor will
> be closed automatically by runtime finalizer. Alas, in this case I have no
> way to terminate serving goroutine. So, I need to close file descriptor to
> exit from program, and I must preserve socket's file in file system.
>
> So, in exact replay to your questions: No, process won't exit without
> Listener.Close(). At least, I cannot make it to exit gracefully - with all
> necessary cleanup logic. If I simply don't close socket, process hangs up
> till I kill it from console.

Thanks. There is still something I don't understand. It is not
necessary to explicitly stop all goroutines on process exits. When
the process exits, all goroutines are stopped, and all file
descriptors are closed.

So why does your program hang on exiting? Can you provide a small
self-contained test case that demonstrates the problem using "go run"?

Ian

tnl...@gmail.com

unread,
Feb 9, 2015, 1:51:25 AM2/9/15
to golan...@googlegroups.com, tnl...@gmail.com
You are right. I should have posted a snippet of code already. Here it is.

This is how another process invokes it (in production systemd do this part): https://github.com/nikolay-turpitko/go-systemd/blob/master/activation/listeners_test.go.
I modified both files in my fork of https://github.com/coreos/go-systemd to provide test for unix domain sockets.

Here is relevant part of the listener program:

package main

import (
  "fmt"
  "net"
  "os"
  "os/signal"
  "sync"
  "time"

   "github.com/coreos/go-systemd/activation"
)

// ...

// Close listeners, got from external process, or not.
// If we close them, first execution will end without error,
// but all subsequent executions (with unix domain sockets) will fail with
// error "dial unix /tmp/activation-test-1.sk: no such file or directory"
// (because it is deleted in UnixListener.Close() during first execution).
// If we don't close them, then process will hangs up.
const closeListeners = false

func main() {

   
//...


    listeners, err := activation.Listeners(true)
   if err != nil {
         panic(err)
   }

    if !closeListeners {
       listeners[0] = &noCloseListener{listeners[0]}
       listeners[1] = &noCloseListener{listeners[1]}
   }

   
// ...


    go func() {
       c := make(chan os.Signal, 1)
       signal.Notify(c)
       <-c
       // os.Exit(1) // not suitable, as we have cleanup and graceful shutdown logic
       listeners[0].Close() // usual approach - pointless here
       listeners[1].Close() // usual approach - pointless here
       // todo: what should we put here???
   }()

    var wg sync.WaitGroup
   wg.Add(1)
   go func() {
       defer wg.Done()
       serve(listeners[0], func(c net.Conn) {
           defer c.Close()
           c.Write([]byte("Hello world"))
           listeners[0].Close()
       })
   }()
   wg.Add(1)
   go func() {
       defer wg.Done()
       serve(listeners[1], func(c net.Conn) {
           defer c.Close()
           c.Write([]byte("Goodbye world"))
           listeners[1].Close()
       })
   }()
   wg.Wait()

    return
}

// simplified http.Serve()
func serve(l net.Listener, serveConn func(c net.Conn)) error {
   defer l.Close()
   for {
       c, e := l.Accept()
       if e != nil {
           if ne, ok := e.(net.Error); ok && ne.Temporary() {
               time.Sleep(5 * time.Millisecond)
               continue
           }
           return e
       }
       go serveConn(c)
   }
}

// no-close listener wrapper
type noCloseListener struct {
      net.Listener
}

func (l *noCloseListener) Close() error {
   switch l.Listener.(type) {
   case *net.UnixListener:
       {
           
return nil
           //return l.Listener.(*net.UnixListener).CloseFD() // there is no such method in the standard lib, it's my "extension"
       }
   }
   return l.Listener.Close()
}

Here is a part, which invokes previous code as a child process:


// How many times to invoke child process.
// If we fork child process only once, than tests for unix domain
// sockets will pass even if child process deletes socket's file.
// If we fork child process several times, then all subsequent forks will
// terminate with error "dial unix /tmp/activation-test-1.sk: no such file or directory".
const forksNumber = 2


func TestUnixListeners(t *testing.T) {
    exec.Command("go", "build", "-o", "../examples/activation/listen", "../examples/activation/listen.go").Run()
    l1, err := net.Listen("unix", "/tmp/activation-test-1.sk")
  if err != nil {
      t.Fatalf(err.Error())

  }
  defer l1.Close()

  l2, err := net.Listen("unix", "/tmp/activation-test-2.sk")
  if err != nil {
      t.Fatalf(err.Error())
  }
  defer l2.Close()

  t1 := l1.(*net.UnixListener)
  t2 := l2.(*net.UnixListener)

    f1, _ := t1.File()
  defer f1.Close()
  f2, _ := t2.File()
  defer f2.Close()

    for i := 0; i < forksNumber; i++ {
        cmd := exec.Command("../examples/activation/listen")
    cmd.ExtraFiles = []*os.File{
        f1,
        f2,
    }

    cmd.Env = os.Environ()
    cmd.Env = append(cmd.Env, "LISTEN_FDS=2", "FIX_LISTEN_PID=1")

    r1, err := net.Dial("unix", "/tmp/activation-test-1.sk")
    if err != nil {
        t.Fatalf(err.Error())
    }
    defer r1.Close()
    r1.Write([]byte("Hi"))

        r2, err := net.Dial("unix", "/tmp/activation-test-2.sk")
    if err != nil {
        t.Fatalf(err.Error())
    }
    defer r2.Close()
    r2.Write([]byte("Hi"))

       
var b bytes.Buffer
    cmd.Stdout = &b
    cmd.Stderr = &b
    if err := cmd.Start(); err != nil {
        println(string(b.Bytes()))
        t.Fatalf(err.Error())
    }

    go func() {
        <-time.NewTimer(time.Second * 5).C
        p := cmd.Process
        if err := p.Signal(syscall.SIGTERM); err != nil {
            println("Cannot terminate process")
            println(err.Error())
        }
        <-time.NewTimer(time.Second * 5).C
        if err := p.Kill(); err != nil {
            println("Cannot kill process")
            println(err.Error())
        }
    }()

        if err := cmd.Wait(); err != nil {
        println(string(b.Bytes()))
        t.Fatalf(err.Error())
    }

        correctStringWrittenNet(t, r1, "Hello world")
    correctStringWrittenNet(t, r2, "Goodbye world")

}





Well, as you can see, program hangs because it explicitly waits serving routines to complete. And I use this wait to be sure, that no handler will be terminated abruptly (real program uses https://github.com/stretchr/graceful, which tracks active connections and allow active requests to complete within timeout).

I understand your point that I can drop all running gorutines when I need to exit. For example, I can wait (in the main) on a channel, into which kill signal handler will write, instead of waiting on a white group. But, If I understand it right, in this case I should move my graceful-shutdown logic into the main routine, because otherwise all serving routines and active requests will be terminated immediately.

From the other hand, the same exact code works perfectly with tcp sockets and works as I expect it should do if I patch slightly standard net lib and add UnixSocket.CloseFD() method as follows:

func (l *UnixListener) CloseFD() error {
   
if l == nil || l.fd == nil {
       
return syscall.EINVAL
   
}
   
return l.fd.Close()
}

Which I than can use to close file descriptor without removing file in the file system.

tnl...@gmail.com

unread,
Feb 9, 2015, 1:10:19 PM2/9/15
to golan...@googlegroups.com, tnl...@gmail.com
Well, I was able to reorganize my code to leave listeners opened and serving gorutine unfinished.
Workaround works, but seems a bit hacky for me. Also, it blocks me from reusing existing library.

I think, the code could be more clean and straightforward if I could to call something like this on UnixListener: listener.(*net.UnixListener).UnlinkFileOnClose(false), so that offended branch of code won't execute. In that case it would be safe to call UnixListener.Close() and the rest of code would be the same as for tcp listeners.

tnl...@gmail.com

unread,
Feb 9, 2015, 1:13:47 PM2/9/15
to golan...@googlegroups.com, tnl...@gmail.com
Ian, thank you for advice!

Ian Lance Taylor

unread,
Feb 9, 2015, 1:22:51 PM2/9/15
to Nikolay Turpitko, golang-nuts
On Mon, Feb 9, 2015 at 10:10 AM, <tnl...@gmail.com> wrote:
>
> Well, I was able to reorganize my code to leave listeners opened and serving
> gorutine unfinished.
> Workaround works, but seems a bit hacky for me. Also, it blocks me from
> reusing existing library.

I'm glad you were able to get it working.


> I think, the code could be more clean and straightforward if I could to call
> something like this on UnixListener:
> listener.(*net.UnixListener).UnlinkFileOnClose(false), so that offended
> branch of code won't execute. In that case it would be safe to call
> UnixListener.Close() and the rest of code would be the same as for tcp
> listeners.

This means making the API more complex, which is something we only
want to do where necessary.

Ian
Reply all
Reply to author
Forward
0 new messages