Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
The story of an annoying bug that got nailed and fixed ;-)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
eboutin  
View profile  
 More options Feb 4, 8:51 am
From: eboutin <emmanuel.bou...@free.fr>
Date: Sat, 4 Feb 2012 05:51:40 -0800 (PST)
Local: Sat, Feb 4 2012 8:51 am
Subject: The story of an annoying bug that got nailed and fixed ;-)
Hi,

We chose Zotonic a while ago as the target framework for building our
web site.
Everything was nice and smooth while developping on our pcs (both
under Windows and Debian boxes under VirtualBox).

And then... we decided it was time to deploy Zotonic on our dedicated
server.
And then... Zotonic refused to start.

In the logs, we had something like :

=INFO REPORT==== 29-Jan-2012::14:57:07 ===
DEBUG: z_install_data:196  "Inserting username for the admin"

=INFO REPORT==== 29-Jan-2012::14:57:07 ===
Exception in transaction: "exit,{noproc,{gen_server,call,[z_ids,{id,
10}]}}"
=SUPERVISOR REPORT==== 29-Jan-2012::14:57:07 ===
      Supervisor: {<0.107.0>,z_site_sup}
      Context:    start_error
      Reason:     {'EXIT',
                      {{badmatch,
                           {rollback,
                               {noproc,{gen_server,call,[z_ids,{id,
10}]}}}},
                       [{z_install,install,1},
                        {z_installer,start_link,1},
                        {supervisor,do_start_child,2},
                        {supervisor,start_children,3},
                        {supervisor,init_children,2},
                        {gen_server,init_it,6},
                        {proc_lib,init_p_do_apply,3}]}}
      Offender:   [{pid,undefined},
                   {name,z_installer},
                   {mfa,
                       {z_installer,start_link,
                           [[
                             %%%%% config removed %%%%%
                            ]]}},
                   {restart_type,permanent},
                   {shutdown,1},
                   {child_type,worker}]

Google revealed that "jeff" and Michael Connors had had the same
issue :
http://groups.google.com/group/zotonic-users/browse_thread/thread/41b...
http://groups.google.com/group/zotonic-developers/browse_thread/threa...

But there was no solution proposed.

Over several days we tried (almost) everything to get it right:
- zotonic 0.7.3
- zotonic 0.7.4
- hacking the code in all sorts of inventive ways
- playing with pgsql settings
- creating new dbs
- re-installing pgsql
- formatting the server and reinstalling the OS (i'm quite ashamed of
that one)

Nothing would do. And then I said to myself maybe i could use my
brains a little bit for once... doh!

Everything we had experienced was consistent with the z_ids gen_server
stopped:
     {noproc,{gen_server,call,[z_ids,{id,10}]}}}},

What could have stopped the z_ids gen_server? Since z_ids is under a
supervision tree, it had to be another process that exited
prematurely.

I used the dbg module to trace process events:
        dbg:tracer(),
        dbg:p(all,[p]).

This logs all the spawn/exit process events. In the logs, I searched
the first occurence of exit that was not "exit normal". I got:
        (<0.135.0>) unregister z_sites_manager
        Terminating due to shutdown
        (<0.135.0>) exit {timeout,{gen_server,call,[<0.136.0>,
{start_child,xxx}]}}

So something went wrong inside start_child and caused a time out.

I searched the spawn event that created process <0.135.0>. I got:
        (<0.101.0>) spawn <0.135.0> as proc_lib:init_p(zotonic_sup,
[<0.79.0>],gen,init_it,[gen_server,<0.101.0>,<0.101.0>,
         {local,z_sites_manager},
         z_sites_manager,[],
         [{timeout,3600000}]])
        (<0.101.0>) link <0.135.0>
        (<0.135.0>) register z_sites_manager

This confirmed the issue was inside z_sites_manager:start_child.

I followed the code:

In z_sites_manager:
handle_cast({start, Site}, State) ->
    z_supervisor:start_child(State#state.sup, Site), ...

In z_supervisor:
start_child(Pid, Name) ->
    gen_server:call(Pid, {start_child, Name}).

The gen_server documentation confirmed what i remembered:
gen_server:call can time out with a default timeout of 5s.

I realised that the dedicated server is not as powerful as our desktop
beasts. So it's normal it takes longer to populate zotonic's tables at
first startup. Definitely more than 5s.

So I changed the code in z_supervisor and recompiled:

start_child(Pid, Name) ->
    gen_server:call(Pid, {start_child, Name}, infinity).

Et voilà! Zotonic started for the first time on our dedicated server
and we lived happily ever after with the shame of all the dumb things
we did to get this sorted.

Note: this last piece of code in z_supervisor is different in 0.7.3
and 0.7.4 but they both suffer from the same problem and the same fix
is applicable in both versions.

I think z_sites_manager should not assume anything regarding the time
needed by a given site to start. So it's seems perfectly logical to
have no timeout and let the site itself decide if its start time is
too much.

A better fix would be:

In z_supervisor:
start_child(Pid, Name, Timeout) ->
    gen_server:call(Pid, {start_child, Name}, Timeout).

handle_cast({start, Site}, State) ->
    z_supervisor:start_child(State#state.sup, Site, infinity), ...

What do you think?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Arjan Scherpenisse  
View profile  
 More options Feb 12, 2:24 am
From: Arjan Scherpenisse <ar...@scherpenisse.net>
Date: Sun, 12 Feb 2012 08:24:53 +0100
Local: Sun, Feb 12 2012 2:24 am
Subject: Re: [Zotonic-Dev] RE: The story of an annoying bug that got nailed and fixed ;-)
Hi Emmanuel,

thanks for your detailed report. There's nothing wrong with it, however
it might have lingered in mailing list queue for a while (we review all
messages first time somebody sends something, to prevent spam).

The issue that sometimes a site takes too long to startup is indeed very
annoying and leads to strange behaviour.

I am tempted to commit your solution (changing z_supervisor to allow a
longer startup time for childs) but I'd first like some feedback from
the other developers (Marc, Andreas, Maas?)

One more thought: choosing 'infinity' means you trust all sites (and
modules!) not to do anything stupid in their startup code: the whole
system may fail to start up if one site decides to block.

Cheers,
Arjan

On 02/11/2012 09:53 PM, Emmanuel Boutin wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
eboutin  
View profile  
 More options Feb 12, 3:31 am
From: eboutin <emmanuel.bou...@free.fr>
Date: Sun, 12 Feb 2012 00:31:11 -0800 (PST)
Local: Sun, Feb 12 2012 3:31 am
Subject: Re: The story of an annoying bug that got nailed and fixed ;-)
Hi Arjan,

I understand your point: one site should not be allowed to affect the
others.

My point is that zotonic should not kill itself because of a timeout
at the z_sites_manager/z_supervisor level.
You can always have a situation where a "should be long enough"
timeout is not long enough.

So in my mind:
- if a site takes too long to start because of the zotonic code then
that's a zotonic bug and it should be fixed in zotonic
- if a site takes too long to start because of the way it behaves in
its private code, then zotonic should be able to protect itself from
bad site behaviour

Maybe one cheap way of doing this would be to have timeout checks in
place at the places where zotonic calls site code.
The timeout could then come from the site config so that the site has
a chance to say what "long enough" means for itself.
I'm not familiar enough with zotonic insides to know where those
transitions from zotonic code to site code are made.

A cheaper alternative would be to use the 'infinity' timeout only when
the site starts for the first time but I found no easy way to
implement this: we only know if the site starts for the first time
later, inside z_installer, where we have access to the site's db.

Regards,
Emmanuel

On 12 fév, 08:24, Arjan Scherpenisse <ar...@scherpenisse.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andreas Stenius  
View profile  
 More options Feb 13, 7:26 am
From: Andreas Stenius <andreas.sten...@astekk.se>
Date: Mon, 13 Feb 2012 13:26:23 +0100
Local: Mon, Feb 13 2012 7:26 am
Subject: Re: [Zotonic-Dev] RE: The story of an annoying bug that got nailed and fixed ;-)
Marc has addressed this in
https://github.com/kaos/zotonic/commit/769e1999afaeb029fcec205d8b80df...
(oct 28, master branch)
with a comment of "more to follow"...

The change, as far as I can see from a quick glance, is to give sites
a full minute to start; and a few more changes...
Perhaps back-port this to 0.7.x?

//Andreas

2012/2/12 Arjan Scherpenisse <ar...@scherpenisse.net>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Arjan Scherpenisse  
View profile  
 More options Feb 13, 8:27 am
From: Arjan Scherpenisse <ar...@scherpenisse.net>
Date: Mon, 13 Feb 2012 14:27:13 +0100
Local: Mon, Feb 13 2012 8:27 am
Subject: Re: [Zotonic-Dev] RE: The story of an annoying bug that got nailed and fixed ;-)
That sounds like a good idea..

I'm actually never experiencing this startup annoyance anymore, but that
is probably because I only run master :-)

If no further objections I'll backport it and churn out a release
somewhere this week.

grt Arjan

On 02/13/2012 01:26 PM, Andreas Stenius wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Arjan Scherpenisse  
View profile  
 More options Feb 16, 5:13 am
From: Arjan Scherpenisse <ar...@scherpenisse.net>
Date: Thu, 16 Feb 2012 11:13:05 +0100
Local: Thurs, Feb 16 2012 5:13 am
Subject: Re: [Zotonic-Dev] RE: The story of an annoying bug that got nailed and fixed ;-)
OK I looked into merging Marc's "module stability fix" (769e1999afa) but
it is part of a bigger change (module dependencies and changing the
module startup sequence) that is too  hard to merge properly.
I'm afraid it cannot be backported to the 0.7 series.

Emmanuel, you are suggested to run on the git master while we are
preparing the next stable 0.8 release ("real soon now!")

Arjan

On 02/13/2012 02:27 PM, Arjan Scherpenisse wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
eboutin  
View profile  
 More options Feb 16, 6:17 pm
From: eboutin <emmanuel.bou...@free.fr>
Date: Thu, 16 Feb 2012 15:17:27 -0800 (PST)
Local: Thurs, Feb 16 2012 6:17 pm
Subject: Re: The story of an annoying bug that got nailed and fixed ;-)
Hi Arjan,

Thanks for following this up.
For the moment we're happy with our locally hacked version.

We'll be glad to switch to the next stable 0.8 release real soon ;-)

Regards,

Emmanuel

On 16 fév, 11:13, Arjan Scherpenisse <ar...@scherpenisse.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »