Long Running Weinre Process Issues

83 views
Skip to first unread message

Andrew Lunny

unread,
Sep 28, 2011, 4:53:38 PM9/28/11
to wei...@googlegroups.com
Hey guys,

As Patrick, and many people reading this group, knows, we at Nitobi run a hosted instance of weinre at http://debug.phonegap.com. It has minimal changes to the open-source version - just a different landing page, no changes to the JS or Java code.

We're just running the weinre jar bound to port 80, forever, like so:
sudo nohup java -jar weinre/current/bin/weinre.jar --httpPort 80 --boundHost -all- >> log/weinre.log 2>&1 &

This actually works well enough for our needs, but yesterday the server stopped creating new channels - you could still connect to http://debug.phonegap.com and see the landing page, and see the weinre interface through http://debug.phonegap.com/client, but devices were not being recognized - according to my understanding of how weinre works, new channels were not being registered or created.

The process had been running for about four months with no admin, nor any CPU/memory issues - I'm just wondering if there's anything in the channel code that would cause it to stop allocating new ones after some limit is reached? Any ideas?

Cheers,
Andrew

Patrick Mueller

unread,
Sep 28, 2011, 8:56:36 PM9/28/11
to weinre
Four months uptime? Wow! I wonder if that's unusually long for a
Java process? I always see Java folks dealing with fallback servers
and what not.

No, can't think of what the problem might be. I assume you rebooted
your image and it came back to life?

I wonder if it was some kind of os resource issue? I could probably
upgrade the version of Jetty I'm shipping, maybe some bugs have been
fixed.

Or there's some weird bug in the code. I should look for some kind of
counter that maybe overflowed 32bits or something.

Go ahead and report this, so we have it tracked anyway ...
https://github.com/phonegap/weinre/issues

On Sep 28, 4:53 pm, Andrew Lunny <andrew.lu...@nitobi.com> wrote:
> Hey guys,
>
> As Patrick, and many people reading this group, knows, we at Nitobi run a
> hosted instance of weinre athttp://debug.phonegap.com. It has minimal
> changes to the open-source version - just a different landing page, no
> changes to the JS or Java code.
>
> We're just running the weinre jar bound to port 80, forever, like so:
> sudo nohup java -jar weinre/current/bin/weinre.jar --httpPort 80 --boundHost
> -all- >> log/weinre.log 2>&1 &
>
> This actually works well enough for our needs, but yesterday the server
> stopped creating new channels - you could still connect tohttp://debug.phonegap.comand see the landing page, and see the weinre
> interface throughhttp://debug.phonegap.com/client, but devices were not

Simon MacDonald

unread,
Sep 28, 2011, 10:14:27 PM9/28/11
to wei...@googlegroups.com
Four months uptime is crazy long for a Java process. Back in my
telecom days we ran some of our products were Java based. We generally
recommended a restart of the process every 2 to 4 weeks depending on
the load of the system. Amazingly enough there are still bugs in the
JVM and sometimes not all resources are being cleared up properly.

I once had a system that would crash after 8 hours of intense load. We
had 4 harrowing days trying to figure out the issue where we could get
3 runs in a day so basically, I lived at work for those 4 days.
Eventually we tracked it down to a bug in the JVM. Downgrading to the
previous point release fixed the problem.

Basically, what I'm trying to say is that you shouldn't spend a ton of
time on this issue. Instead the debug.phonegap.com server should have
some scheduled maintenance time once a month where the service is
cycled.

Simon Mac Donald
http://hi.im/simonmacdonald

> --
> From the weinre Google Group -- http://groups.google.com/group/weinre
>

Patrick Mueller

unread,
Sep 28, 2011, 10:58:37 PM9/28/11
to wei...@googlegroups.com
That does seem pretty reasonable.  Might even want to do it once/week or something.  I'd think just killing the process and restarting it would be the easiest way to go.  I don't write the pid out - not sure if I can (can Java get it's pid?), but presumably you can find the process somehow with ps, pull the pid, do a kill.  Then maybe wait a minute, and start back up.  cron job(s)?
--
Patrick Mueller
http://muellerware.org

Simon MacDonald

unread,
Sep 28, 2011, 11:08:05 PM9/28/11
to wei...@googlegroups.com
Assuming you are on running on some version of UNIX you could span a
runtime process and execute the command "echo $PPID" and that would
get you the PID of weinre. It's kinda kludgy but if you want me to
write up some code to do just that and write it out to a file just let
me know.

Null Pointer

unread,
Sep 29, 2011, 4:18:55 PM9/29/11
to weinre
The weinre jar is a jetty server, so IMO your best bet is to download
Jetty or google "jetty pid", which comes up with stuff like this
http://svn.codehaus.org/jetty/jetty/branches/jetty-6.1/bin/jetty.sh

There is also Apache JSVC, which I can't really recommend.

In every place I worked no one ever had to reboot a tomcat or weblogic
server running on Sun JVM, only for software upgrades. Actually one of
our system admins was always raving about our servers, which were
running for months.

Resource bugs usually problems in user Java code, which are
unfortunately very common in the Java world and not under your control
in this case.

Patrick Mueller

unread,
Sep 29, 2011, 5:28:23 PM9/29/11
to wei...@googlegroups.com
Well, actually, I would consider weinre to be "user code", and can easily imagine something it's doing that slowly, oh so slowly, eating memory.  And it's under our control :-)

Still, if it takes months for this problem to occur, it's cheaper/easier to just reboot the server regularly, than try to figure out where the error is.  I may still try to see if weinre is leaking anything ...


--
From the weinre Google Group -- http://groups.google.com/group/weinre

Andrew Lunny

unread,
Sep 29, 2011, 3:24:55 PM9/29/11
to wei...@googlegroups.com
Thanks for the reponses. Sorry if my concern was unclear - the uptime is fantastic, and the server has been rock-solid. We'll set up a cron job for restarting the service every so often.

I do think it's an issue with the code - possibly an integer overflow error or something - as Jetty kept accepting connections and serving the static files. As we (hopefully) get more users of the hosted weinre service, it may be something that arises more quickly.

I am going to look through the code this afternoon so I can get a useful bug report for you - I'm guessing it's the channelMap member of the channelManager running out of space (Integer.MAX_VALUE entries, according to the Java docs).

Andrew
Andrew Lunny
Chief N00b, Nitobi
604 685 9287
blogs.nitobi.com/andrew
Reply all
Reply to author
Forward
0 new messages