Go binary, that tries to run shell command in ESXI environment is failing

301 views
Skip to first unread message

Harsh Rathore

unread,
May 18, 2022, 12:15:10 PM5/18/22
to golang-nuts
I will attach, the Code file and the sTrace logs.

It basically tries to run a shell command via golang.

When I try to run it in ESXI-670, it errors with:

program is running
fork/exec /bin/sh: no space left on device

Here is what I have tried:

  1. I have tried doing this with cmd.Output().
  2. I have tried building using CGO_ENABLED=1
  3. In the code above, I have tried to run by redirecting os.Stdout.

This code works and provides desired output in all locations, other than ESXI. I am using ESXI-6.7.0.

Please help me in identifying the cause.

  1. I am certain there is space left on device

I am grateful for the insight of kostix and larsks
The conversation is as follows:
  • Does /bin/sh -c 'ls -l' work OK when run the same way your Go program runs? I'm asking because problems like this has to be dissected. 
  • @kostix , yes the command you mentioned works as it should. Thank you for looking into it.  2 hours ago    
  • Do you have strace (manpage) available in that environment? 
    – kostix
     1 hour ago
  • Yes, I do have strace in my environment. I have edited my Question with strace dump. Thank you.  1 hour ago    
  • 1
    "I am certain there is space left on device": An ENOSPC error in response to a clone() system call doesn't mean you have a full disk. There are a number of documented reasons that clone() may fail with ENOSPC, as described in the man page, but it doesn't look like you're hitting one of those. I would guess that you're hitting some kind of resource limit, but I don't have an ESXI system on which to test. 
    – larsks
     55 mins ago
  • Thank you @larsks, for looking into this. If this helps, I have the same program in python and C++, and they both run without any issues in the same environment.  52 mins ago   
  • That's why SO is not fit for such kind of questions: they require multiple backs-and-forth which is quite inconvenient to do using comments. I would recommend to repost the question to the mailing list and add a summary of our comment thread. I'd also recommend to retake the strace capture using its -o command-line option so that its output is not intermixed with the Go runtime's output—as it is in your comment. 
    – kostix
     19 mins ago
  • Basically, I'm with @larsks on this: the clone syscall on Linux underlies fork, and a hunch is that the Go runtime performs the fork+exec combo to run a child process using some setup which trips some limit on EXSI. So it's a question for the devs to look into; may be it even warrants filing a bug report. 
    – kostix
     18 mins ago
  • …and by the way, what kernel version is there in ESXI? I'm asking because of this
    – kostix
     15 mins ago
  • @kostix, thank you for the guidance, I will do as you mentioned and post it on the mailing list ESXI kernel version - command uname -a VMkernel 6.7.0 #1 SMP Release build-15160138 Nov 22 2019 20:49:31 x86_64 x86_64 x86_64 ESXi  11 mins ago   
  • Oh, looks like VMKernel of ESXi is not based on Linux; it merely provides compatibility for the drivers and implements POSIX syscalls. Hence the situation is even more interesting. If/when you post a message to the mailing list, please post a link here. 
    – kostix
     48 secs ago
sTraceLogs.txt
golangEsxiShellRun.go

Ian Lance Taylor

unread,
May 18, 2022, 12:40:42 PM5/18/22
to Harsh Rathore, golang-nuts
On Wed, May 18, 2022 at 9:15 AM Harsh Rathore <hvrcon...@gmail.com> wrote:
>
> I will attach, the Code file and the sTrace logs.
>
> It basically tries to run a shell command via golang.
>
> When I try to run it in ESXI-670, it errors with:
>
> program is running
> fork/exec /bin/sh: no space left on device
>
> Here is what I have tried:
>
> I have tried doing this with cmd.Output().
> I have tried building using CGO_ENABLED=1
> In the code above, I have tried to run by redirecting os.Stdout.
>
> This code works and provides desired output in all locations, other than ESXI. I am using ESXI-6.7.0.
>
> Please help me in identifying the cause.
>
> I am certain there is space left on device

Thanks for including the strace output. It shows that the clone
system call is failing with ENOSPC. There are several reasons that
clone can fail with ENOSPC. None of them have anything to do with
disk space (though shortage of inodes appears to be a possibility).
ESXI appears to map several internal errors to ENOSPC as well, mostly
dealing with lack of memory.

You may need to discuss this with someone familiar with ESXI.

Sorry this isn't much help.

Ian

Konstantin Khomoutov

unread,
May 19, 2022, 7:01:43 AM5/19/22
to Harsh Rathore, Ian Lance Taylor, golang-nuts
On Wed, May 18, 2022 at 09:40:03AM -0700, Ian Lance Taylor wrote:

[...]

> Thanks for including the strace output. It shows that the clone
> system call is failing with ENOSPC. There are several reasons that
> clone can fail with ENOSPC. None of them have anything to do with
> disk space (though shortage of inodes appears to be a possibility).
> ESXI appears to map several internal errors to ENOSPC as well, mostly
> dealing with lack of memory.
>
> You may need to discuss this with someone familiar with ESXI.

Quick googling turns up [1] which supports what Ian said.

It's interesting, whether there is a ESXi-native tool analogous to strace
but which would provide "native" kernel error codes? That could possibly move
us closer to the root cause of the problem.


By the way, Harsh, in your original SO post you state that you

| have the same program in python and C++, and they both run without any
| issues in the same environment.

Could you please show bits of this code? I'm asking because it's interesting
whether you really do fork+exec (or something) like this in those programs.
I would also be interesting to inspect the traces captured by strace of the
(successful) runs of these programs to see whether the clone(2) syscall gets
performed with the same arguments (I think the "flags" argument is of the most
interest). Of course, this might be a red herring, and the problem might lie
elsewhere, so the ability to somehow obtain the "native" kernel's error code
would probably help a lot.

1. http://virtualization24x7.blogspot.com/2017/03/table-of-all-vmkernel-error-codes-in.html

Reply all
Reply to author
Forward
0 new messages