strange "ass" fatal error message. Please help!!

65 views
Skip to first unread message

badnaam

unread,
Oct 5, 2010, 11:02:32 AM10/5/10
to delayed_job
No kidding, or spamming.

My delayed job died last night and monit couldn't restart it. Neither
can I do it manually, things are falling apart in my production. I see
the following in production.log file, notice the last fatal message,
that's all it says. Never seen this, this is the collective idea 2.1.1
pre version. Manual start just seem like it's starting but then it
dies, without any error on the console.

[DEBUG 05-10-2010 07:56:07] Delayed::Backend::ActiveRecord::Job
Update (1.1ms) UPDATE `delayed_jobs` SET locked_at = '2010-10-05
14:56:07', locked_by = 'delayed_job host:myapp pid:18021' WHERE (id =
84 and (locked_at is null or locked_at < '2010-10-05 10:56:07') and
(run_at <= '2010-10-05 14:56:07'))
[DEBUG 05-10-2010 07:56:07] Delayed::Backend::ActiveRecord::Job
Update (0.4ms) UPDATE `delayed_jobs` SET locked_by = null, locked_at
= null WHERE (locked_by = 'delayed_job h
ost:myapp pid:18021')
[FATAL 05-10-2010 07:56:07] ass

Brandon Keepers

unread,
Oct 5, 2010, 11:08:07 AM10/5/10
to delay...@googlegroups.com
Woah, that is strange. Try running `script/delayed_job run` to see if it shows you any other errors.

=b

badnaam

unread,
Oct 5, 2010, 11:16:04 AM10/5/10
to delayed_job
here is what I get..

********************************************************************************************************************************
/usr/local/lib/ruby/gems/1.8/gems/rails-2.3.5/lib/rails/
gem_dependency.rb:119:Warning: Gem::Dependency#version_requirements is
deprecated and will be removed on or after August 2010. Use
#requirement
/var/www/myapp/releases/20101004225511/config/initializers/
geokit_for_named_scope.rb:40: warning: already initialized constant
VALID_FIND_OPTIONS
ass
***************************************************************************************************************************
The warning for the geokit stuff has been theere since day one never
affected anything. If you are interested the contents of that file are
here.

http://pastie.org/1200925

Where is this message coming from?

badnaam

unread,
Oct 5, 2010, 11:17:10 AM10/5/10
to delayed_job
This is running REE 1.8.7, on Ubuntu 10.04, nginx/Passenger, rails
2.3.5

badnaam

unread,
Oct 5, 2010, 11:33:23 AM10/5/10
to delayed_job
Here is the delayed_job log.

It seems everything was working fine and then at some point it just
exited. The first time it happened it came back and did some more work
and then it just died and couldn't wake up.

http://pastie.org/1200950

ambert ho

unread,
Oct 5, 2010, 12:58:25 PM10/5/10
to delay...@googlegroups.com
I'm not sure Brandon will be able determine where that error comes from, as I don't think it's being raised by delayed_job (maybe he will). 

- First, as a baseline, have you pgrep/ps'd that there are no workers running, at all? paste in a ps -ef | grep #{name of the process}

- what does it say when you SQL in and select the most current, failling job from the database? (when it fails it puts in an error message. Hopefully it recorded an error other than "ass"... just a guess, though)

- Try invoking manually from the console? Something like (check the source code to make sure):
worker = Delayed::Worker.new 
job = Delayed::Backend::ActiveRecord::Job.find_available(worker, 1)[0]   # grab the first job on the queue
job.invoke_job

Can you invoke the first job? What about the second job? Are all the jobs in the database the same method? If you can get a job to work and another doesn't, what's different?

In last ditch efforts, there are always ways to reset EVERYTHING if all else fails (for example kill all ruby processes just to make sure they're all gone, backup the delayed_jobs table and wipe it, then restart the worker with an empty table, make sure its humming along and doesn't crash - queue in and do a test job to make sure it works - then reimport the original jobs table)

Ambert

badnaam

unread,
Oct 5, 2010, 1:23:07 PM10/5/10
to delayed_job
Here is the answer to you questions.

http://pastie.org/1201142

The console invocation fails with the same "ass" runtime error
message. The stack trace is interesting. I can invoke and run the job/
method manually. The method is essentially a email notification from
the Notifier class (Notifier.deliver_new_post_notification post)

The method called directly on the Notifier class with the last post as
the parameter works just fine.

I would rather understand the root cause of this, rather than wipe
things clean..because this will be really bad if it happens. Given how
good delayed_job is, lot of things depend on it :).

Thanks for your help

On Oct 5, 9:58 am, ambert ho <amber...@gmail.com> wrote:
> I'm not sure Brandon will be able determine where that error comes from, as
> I don't think it's being raised by delayed_job (maybe he will).
>
> - First, as a baseline, have you pgrep/ps'd that there are no workers
> running, at all? paste in a ps -ef | grep #{name of the process}
>
> - what does it say when you SQL in and select the most current, failling job
> from the database? (when it fails it puts in an error message. Hopefully it
> recorded an error other than "ass"... just a guess, though)
>
> - Try invoking manually from the console? Something like (check the source
> code to make sure):
> worker = Delayed::Worker.new
> job = Delayed::Backend::ActiveRecord::Job.find_available(worker, 1)[0]   #
> grab the first job on the queue
> job.invoke_job
>
> Can you invoke the first job? What about the second job? Are all the jobs in
> the database the same method? If you can get a job to work and another
> doesn't, what's different?
>
> In last ditch efforts, there are always ways to reset EVERYTHING if all else
> fails (for example kill all ruby processes just to make sure they're all
> gone, backup the delayed_jobs table and wipe it, then restart the worker
> with an empty table, make sure its humming along and doesn't crash - queue
> in and do a test job to make sure it works - then reimport the original jobs
> table)
>
> Ambert
>

badnaam

unread,
Oct 5, 2010, 1:30:35 PM10/5/10
to delayed_job
OK, it seems the "ass" is actually coming from Searchlogic, it is not
a typo, SL actually raises that exception.

Not sure why it is affecting only delayed_job or why it's being raised
only in this certain case.

ambert ho

unread,
Oct 5, 2010, 1:35:34 PM10/5/10
to delay...@googlegroups.com
yeah I was about to respond to you with that. Looking at your pastie, the odd thing is that there's no error recorded in the DB.

Check out delayed_job/worker.rb:120

 def run(job)
      runtime =  Benchmark.realtime do
        Timeout.timeout(self.class.max_run_time.to_i) { job.invoke_job }
        job.destroy
      end
      # TODO: warn if runtime > max_run_time ?
      say "* [JOB] #{name} completed after %.4f" % runtime
      return true  # did work
    rescue Exception => e
      handle_failed_job(job, e)
      return false  # work failed
    end

handle_failed_job takes care of writing a "last_error" field into the database, and there is none for your job id 84, so the exception is not happening within the job.

I was gonna look through searchlogic source to see what's going on, but i guess you've got that covered. good luck

Ambert

badnaam

unread,
Oct 5, 2010, 1:39:58 PM10/5/10
to delayed_job
>>I was gonna look through searchlogic source to see what's going on, but i
guess you've got that covered. good luck

Well trying to..but help appreciated, if you can. I am curious, why
the method works fine on it's own and only fails when called by DJ?
Trying to look through both DJ and SL source code, but someone
familiar with the source might be better at spotting what is causing
this condition.

Thanks!

On Oct 5, 10:35 am, ambert ho <amber...@gmail.com> wrote:
> yeah I was about to respond to you with that. Looking at your pastie, the
> odd thing is that there's no error recorded in the DB.
>
> Check out delayed_job/worker.rb:120
>
>  def run(job)
>       runtime =  Benchmark.realtime do
>         Timeout.timeout(self.class.max_run_time.to_i) { job.invoke_job }
>         job.destroy
>       end
>       # TODO: warn if runtime > max_run_time ?
>       say "* [JOB] #{name} completed after %.4f" % runtime
>       return true  # did work
>     rescue Exception => e
>       handle_failed_job(job, e)
>       return false  # work failed
>     end
>
> handle_failed_job takes care of writing a "last_error" field into the
> database, and there is none for your job id 84, so the exception is not
> happening within the job.
>
> I was gonna look through searchlogic source to see what's going on, but i
> guess you've got that covered. good luck
>
> Ambert
>

ambert ho

unread,
Oct 5, 2010, 1:40:10 PM10/5/10
to delay...@googlegroups.com
You might be able to solve this temporarily and get production working again with monkey patching, since Searchlogic:

RuntimeError: ass
  from /usr/local/lib/ruby/gems/1.8/gems/searchlogic-2.4.26/lib/searchlogic/search/to_yaml.rb:14:in `yaml_new'
  from /usr/local/lib/ruby/1.8/yaml.rb:133:in `transr'

module ClassMethods
        def yaml_new(klass, tag, val)
          raise "ass"
        end
      end

What? the hell does that method do? It's overwriting something in the ruby core libs? 

Ambert


On Tue, Oct 5, 2010 at 10:30 AM, badnaam <asitk...@gmail.com> wrote:

Brandon Keepers

unread,
Oct 5, 2010, 1:47:23 PM10/5/10
to delay...@googlegroups.com

Well that's asinine.

Are you trying to pass a searchlogic object as an argument?

Brandon Keepers

unread,
Oct 5, 2010, 1:48:44 PM10/5/10
to delay...@googlegroups.com

badnaam

unread,
Oct 5, 2010, 2:19:59 PM10/5/10
to delayed_job
No, it's not a searchlogic object but I guess internally it becomes
one, no idea how..I am not that good. Yes, the owner of SL just
removed it. This problem actually could in fact be even related to
friendly_id, where this error in SL occurs if a frienly_id scope is
missing in the object. Not sure if that's really what's happening here
or not. This is a known issue with friendly_id, at least for me.
Wow..this is a good start to my morning!


here is the thread in SL google group.

http://groups.google.com/group/searchlogic/browse_thread/thread/4e2c40ac29668e4b#

This particular job is a mail delivery job "new_post_notification"
method in Notifier, that takes a post object as a parameter.
There are other similar jobs, like new_user_activation instruction
@user, which take a complex object too i.e. User object and they work
just fine.

What might be happening in this case, is that this Post model is also
a scoped friendly_id model and I constantly run into issues where if I
do a

Post.find ('friendly_id_of_that_post'), the find fails, complaining a
scope (i.e. category) is missing. so If I do a

Post.find('friendly_id_of_that_post', :scope =>
'category_of_that_post') then it works.

The post object being passed in this case does not include the
category

@post = Post.create(params[:post_id])
Notifier.delay.deliver_new_post_notification @post

Pls bear with me, I am far from an expert Ruby programmer, but does it
have any bearing on this error/condition?


On Oct 5, 10:48 am, Brandon Keepers <bran...@opensoul.org> wrote:
> Hah, looks like that was just removed:
>
>        http://github.com/binarylogic/searchlogic/commit/8cd742f28f3ee0ee92a5...
>
> =b
>
> On Oct 5, 2010, at 1:40 PM, ambert ho wrote:
>
> > You might be able to solve this temporarily and get production working again with monkey patching, since Searchlogic:
>
> > RuntimeError
> > : ass
> >   from
> > /usr/local/lib/ruby/gems/1.8/gems/searchlogic-2.4.26/lib/searchlogic/search/to_yaml.rb:14:in `yaml_new'
> >   from /usr/local/lib/ruby/1.8/yaml.rb:133:in `transr'
>
> > module ClassMethods
> >         def yaml_new(klass, tag, val)
> >           raise "ass"
> >         end
> >       end
>
> > What? the hell does that method do? It's overwriting something in the ruby core libs?
>
> > Ambert
>
Reply all
Reply to author
Forward
0 new messages