Passenger Apache 2 Module won't start on Solaris 11 ("Unable to start the Phusion Passenger logging agent")

267 views
Skip to first unread message

Johannes Fahrenkrug

unread,
Jan 22, 2013, 6:34:33 AM1/22/13
to phusion-...@googlegroups.com
Hi,

We are trying to get Passenger 3 (we tried 3.0.10 and 3.0.19) up and running on Solaris 11. It is running fine on Solaris 10 machines, but not on 11.
The gem installs fine and passenger-install-apache2-module finished without any issues.

When starting the apache2 server, however, we get this error in the apache error log:

[Tue Jan 22 11:38:21 2013] [info] Init: Seeding PRNG with 0 bytes of entropy
[Tue Jan 22 11:38:21 2013] [info] Init: Generating temporary RSA private keys (512/1024 bits)
[Tue Jan 22 11:38:21 2013] [info] Init: Generating temporary DH parameters (512/1024 bits)
[Tue Jan 22 11:38:21 2013] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
[Tue Jan 22 11:38:21 2013] [info] Init: Initializing (virtual) servers for SSL
[Tue Jan 22 11:38:21 2013] [info] mod_ssl/2.2.22 compiled against Server: Apache/2.2.22, Library: OpenSSL/1.0.0e
[Tue Jan 22 11:38:21 2013] [info] mod_unique_id: using ip addr 10.144.42.172
[ pid=1140 thr=1 file=ext/apache2/Hooks.cpp:1378 time=2013-01-22 11:38:22.0 ]: Initializing Phusion Passenger...
[ pid=1148 thr=1 file=ext/common/LoggingAgent/Main.cpp:287 time=2013-01-22 11:38:22.55 ]: *** ERROR: Cannot create an event loop
     (empty)
[Tue Jan 22 11:38:22 2013] [error] *** Passenger could not be initialized because of this error: Unable to start the Phusion Passenger watchdog because it encountered the following error during startup: Unable to start the Phusion Passenger logging agent: it seems to have crashed during startup for an unknown reason, with exit code 1

After some research, I tried to run ./agents/PassengerLoggingAgent from the gem directory. This is what I get:

[ pid=9172 thr=1 file=ext/common/AgentBase.cpp:419 time=2013-01-22 12:24:01.779 ]: *** ERROR: read() failed: Operation not applicable (89)
     in 'void Passenger::VariantMap::readFrom(int)' (VariantMap.h:140)
     in 'Passenger::VariantMap Passenger::initializeAgent(int, char**, const char*)' (AgentBase.cpp:355)

When running the same command on the Solaris 10 machine, I get this (as expected):

You're not supposed to start this program from the command line. It's used internally by Phusion Passenger.

Any ideas? I couldn't spot any significant differences between the environments on both machines.

Thank you so much!

- Johannes

Johannes Fahrenkrug

unread,
Jan 22, 2013, 8:56:10 AM1/22/13
to phusion-...@googlegroups.com
A short update:

My colleague found something interesting. After debugging AgentBase.cpp a bit, we found out that these lines (https://github.com/FooBarWidget/passenger/blob/master/ext/common/agents/Base.cpp#L1219) cause trouble (in 3.0.19 this file is still called ext/common/AgentBase.cpp):

int ret = fcntl(FEEDBACK_FD, F_GETFL);
if (ret == -1) {
if (errno == EBADF) {
fprintf(stderr,
"You're not supposed to start this program from the command line. "
"It's used internally by Phusion Passenger.\n");
exit(1);
} else {
int e = errno;
fprintf(stderr,
"Encountered an error in feedback file descriptor 3: %s (%d)\n",
strerror(e), e);
exit(1);
}
}

 It gets tripped up when the LDAP client on Solaris 11 is running because AgentBase.cpp makes assumptions about file descriptor 3 (FEEDBACK_FD). When the LDAP client is running, however, file descriptor 3 does exist.

So when stopping the LDAP client and then running ./agents/PassengerLoggingAgent from the command line, it doesn't crash anymore. Unfortunately this does not fix the crash when running it with Apache 2, though.

- Johannes

Hongli Lai

unread,
Jan 22, 2013, 9:46:36 AM1/22/13
to phusion-...@googlegroups.com
On Tue, Jan 22, 2013 at 2:56 PM, Johannes Fahrenkrug <jfahr...@gmail.com> wrote:
A short update:

My colleague found something interesting. After debugging AgentBase.cpp a bit, we found out that these lines (https://github.com/FooBarWidget/passenger/blob/master/ext/common/agents/Base.cpp#L1219) cause trouble (in 3.0.19 this file is still called ext/common/AgentBase.cpp):

int ret = fcntl(FEEDBACK_FD, F_GETFL);
if (ret == -1) {
if (errno == EBADF) {
fprintf(stderr,
"You're not supposed to start this program from the command line. "
"It's used internally by Phusion Passenger.\n");
exit(1);
} else {
int e = errno;
fprintf(stderr,
"Encountered an error in feedback file descriptor 3: %s (%d)\n",
strerror(e), e);
exit(1);
}
}

 It gets tripped up when the LDAP client on Solaris 11 is running because AgentBase.cpp makes assumptions about file descriptor 3 (FEEDBACK_FD). When the LDAP client is running, however, file descriptor 3 does exist.

So when stopping the LDAP client and then running ./agents/PassengerLoggingAgent from the command line, it doesn't crash anymore. Unfortunately this does not fix the crash when running it with Apache 2, though.

I don't think file descriptor 3 is the problem. PassengerLoggingAgent is supposed to be started from the Watchdog, which in turn is supposed to be started from the web server. It looks like the Watchdog starts properly in your case, so I won't go into that part. Here's what goes on:

The Watchdog creates an anonymous Unix socket pair, forks(), then sets file descriptor 3 to one end of the pair. The Watchdog uses the other end to communicate startup information to PassengerLoggingAgent. We do not use command line arguments in order to make the process name in 'ps' look nicer.

The fcntl() check is just a safeguard to ensure that users do not try to start PassengerLoggingAgent from the command line.

I think the real problem is here:

[ pid=1148 thr=1 file=ext/common/LoggingAgent/Main.cpp:287 time=2013-01-22 11:38:22.55 ]: *** ERROR: Cannot create an event loop

Phusion Passenger uses the libev event loop library, which cannot create an event loop in your case. Here's how our code looks like:

static struct ev_loop *
createEventLoop() {
struct ev_loop *loop;
// libev doesn't like choosing epoll and kqueue because the author thinks they're broken,
// so let's try to force it.
loop = ev_default_loop(EVBACKEND_EPOLL);
if (loop == NULL) {
loop = ev_default_loop(EVBACKEND_KQUEUE);
}
if (loop == NULL) {
loop = ev_default_loop(0);
}
if (loop == NULL) {
throw RuntimeException("Cannot create an event loop");
} else {
return loop;
}
}

Epoll and kqueue are obviously unavailable on Solaris, so that leaves ev_default_loop() which uses the default settings. Apparently that fails too.

According to http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#SOLARIS_PROBLEMS_AND_WORKAROUNDS libev will use Solaris event ports by default. The document also claims that event ports are buggy and recommends installing all the latest kernel patches.

So there are two things you can try:

1. Install all Solaris updates.
2. Set the environment variables LIBEV_FLAGS=3 to force using the select and poll backends.

--
Phusion | Ruby & Rails deployment, scaling and tuning solutions

Web: http://www.phusion.nl/
E-mail: in...@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

Johannes Fahrenkrug

unread,
Jan 22, 2013, 10:21:00 AM1/22/13
to phusion-...@googlegroups.com
Hi Hongli,

Thank you so much for your quick response.

I don't think file descriptor 3 is the problem. PassengerLoggingAgent is supposed to be started from the Watchdog, which in turn is supposed to be started from the web server. It looks like the Watchdog starts properly in your case, so I won't go into that part. [...]


I think the real problem is here:

[ pid=1148 thr=1 file=ext/common/LoggingAgent/Main.cpp:287 time=2013-01-22 11:38:22.55 ]: *** ERROR: Cannot create an event loop


I'm afraid it IS the LoggingAgent after all: The above error message about not being able to create an event loop does not appear in the apache error log anymore (unfortunately my colleague can't recall exactly which step made it disappear). But this error still occurs:

 [Tue Jan 22 11:38:22 2013] [error] *** Passenger could not be initialized because of this error: Unable to start the Phusion Passenger watchdog because it encountered the following error during startup: Unable to start the Phusion Passenger logging agent: it seems to have crashed during startup for an unknown reason, with exit code 1

If you have any other idea or pointer in your bag of tricks, I'd be eternally grateful and will buy you some Poffertjes or Hoegaarden next time I'm in Amsterdam :)

- Johannes

Johannes Fahrenkrug

unread,
Jan 22, 2013, 10:26:47 AM1/22/13
to phusion-...@googlegroups.com
Hi Hongli,

sorry, I overlooked something: The event loop error DOES still occur: It was just logged into the PassengerDebugLogFile. D'oh.
So I have to keep investigating the event loop error. Thanks!

- Johannes

Hongli Lai

unread,
Jan 22, 2013, 2:43:27 PM1/22/13
to phusion-...@googlegroups.com
On Tue, Jan 22, 2013 at 6:03 PM, Dagobert Michelsen <honk...@googlemail.com> wrote:
Hi folks,

I finally found the issue: there were no backends for the embedded ext/libev configured because the bootstrapping of configure.ac to configure as a flaw in the way, that before the detection of the functions for the backends
  AC_CHECK_FUNCS(inotify_init epoll_ctl kqueue port_create poll select eventfd signalfd)
which results in configure in
  for ac_func in inotify_init epoll_ctl kqueue port_create poll select eventfd signalfd
leads to contest still being there and being a directory. Hence all tests fail with "conftest is a directory" not being removed by the used rm -f. When I add
  rm -rf conftest.$ac_objext conftest$ac_exeext
in configure line #20009 right before the "for ac_func ..." the detection works and passenger starts. Maybe a new bootstrap with GNU autotools would suffice, but I can't say for sure as my build machine lacks the proper packages.

I've attached a rebootstrapped configure script to this email. This upgrades Autoconf 2.67 to 2.69. Can you try this and tell me whether it solves your problem?
configure.gz

Johannes Fahrenkrug

unread,
Jan 23, 2013, 3:48:35 AM1/23/13
to phusion-...@googlegroups.com
Hi Hongli,

thank you very much. Yes, this fixes the problem! It would be great if you could update the gem as well (or send us a pre-release gem) so we can try if a clean gem installation on Solaris 11 also works now.

Thanks so much for your help!

- Johannes

Hongli Lai

unread,
Jan 23, 2013, 4:17:24 AM1/23/13
to phusion-...@googlegroups.com
Alright, I'll check this into the Phusion Passenger 4 branch. Could
you also post this bug report to the libev author so that the fix gets
merged upstream? The website is
http://software.schmorp.de/pkg/libev.html
> --
> You received this message because you are subscribed to the Google Groups
> "Phusion Passenger Discussions" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/phusion-passenger/-/DBdGz-vEqKsJ.
>
> To post to this group, send email to phusion-...@googlegroups.com.
> To unsubscribe from this group, send email to
> phusion-passen...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/phusion-passenger?hl=en.

Hongli Lai

unread,
Jan 23, 2013, 5:55:10 AM1/23/13
to phusion-...@googlegroups.com
On Tue, Jan 22, 2013 at 6:22 PM, Dagobert Michelsen
<honk...@googlemail.com> wrote:
> Updating the embedded libel to the latest 4.11 immediately solves the issue,
> so I suggest updating the embedded libev to 4.11.

The embedded libev was already at version 4.11.

Hongli Lai

unread,
Jan 23, 2013, 5:56:40 AM1/23/13
to phusion-...@googlegroups.com
On Wed, Jan 23, 2013 at 11:05 AM, Dagobert Michelsen
<honk...@googlemail.com> wrote:
> Would it be possible to get a 3.0.20 also?

Phusion Passenger 4's release is so close, I don't think it makes
sense anymore to do a 3.0.20 release.

> The libev integrated in 3.0.19 was 3.9 which is already in the Marcs attic,
> the current 4.11 works fine. As the bug is already fixed I don't think this
> would help much.

Ah you were talking about the 3.0 branch, fair enough. I thought you
were talking about the 4.0 branch.

Johannes Fahrenkrug

unread,
Jan 23, 2013, 6:14:04 AM1/23/13
to phusion-...@googlegroups.com
Hi Hongli,

a 3.0.20 bugfix release would be greatly appreciated: the app is very crucial so we have to be conservative about jumping to new major release and test it internally first.

I guess we can manually patch 3.0.19, but a fixed gem would be much prettier :)

- Johannes

Hongli Lai

unread,
Jan 23, 2013, 9:31:16 AM1/23/13
to phusion-...@googlegroups.com
On Wed, Jan 23, 2013 at 12:14 PM, Johannes Fahrenkrug
<jfahr...@gmail.com> wrote:
> Hi Hongli,
>
> a 3.0.20 bugfix release would be greatly appreciated: the app is very
> crucial so we have to be conservative about jumping to new major release and
> test it internally first.
>
> I guess we can manually patch 3.0.19, but a fixed gem would be much prettier
> :)

Alright. I'm not so sure upgrading libev in Phusion Passenger 3.0
won't break anything so I've taken the conservative route of
rebootstrapping the configure script. Can you test whether the
stable-3.0 branch in the git repository works for you?
Reply all
Reply to author
Forward
0 new messages