Add data to job when buried or released

492 views
Skip to first unread message

Jurian Sluiman

unread,
Oct 28, 2012, 6:24:53 PM10/28/12
to beansta...@googlegroups.com
Due to various cases I'd like to bury a job during its work cycle. To retrieve the reason why a job has been buried I would like to append some data (all the data is JSON encoded).

Is there some way to get this done? The only method I can come up with now is to delete the job being worked on, create a clone of the job, add the new data, put that in the queue again and bury it directly. This seems a bit odd with a lot of steps. Is there anything I miss in this picture? The same question applies to releasing jobs during the work cycle, since I can't imagine something different here too.

I have two examples where above requirements are shown:
  1. When a job notifies the worker it must be buried, the buried state is different from the job throwing an exception during execution. So I would likely add a "buried_type" to the job;
  2. A job can connect to an external 3rd party service. When a time-out happens, the job must be released and rerun after a delay. When the job is retried for x times, I want to bury it for user inspection.
---
Jurian Sluiman

Jurian Sluiman

unread,
Oct 30, 2012, 9:13:54 AM10/30/12
to beansta...@googlegroups.com
On Sunday, October 28, 2012 11:24:53 PM UTC+1, Jurian Sluiman wrote:
The only method I can come up with now is to delete the job being worked on, create a clone of the job, add the new data, put that in the queue again and bury it directly. This seems a bit odd with a lot of steps. Is there anything I miss in this picture? The same question applies to releasing jobs during the work cycle, since I can't imagine something different here too.

Now I have looked more closely to the protocol (https://github.com/kr/beanstalkd/blob/master/doc/protocol.md) I notice there's a flaw in above outline. You can only bury a job when you have it reserved. I know the bury command is "bury <job id> <priority>" but according to the manual you get a NOT_FOUND when you haven't reserved the job if you try to bury it.

The release is still possible with additional data, I will delete the job and put a new clone into the queue (not favourable, but doable). The bury command is quite important to me (one big advantage of beanstalk to other queues). How is "user inspection" possible with a bury when I cannot inform the user why the job was buried? Usually you want to catch an exception and log the exception type and message.

I have no experience in C coding. If this is a missing feature I really hope there is a contributor to help on this. If it is not possible at all, please let me know. I can look for an expert to do it for me, so I can use these features. The license is MIT I just checked, so that shouldn't be a problem.
---
Jurian Sluiman

Chad Kouse

unread,
Oct 30, 2012, 12:13:53 PM10/30/12
to beansta...@googlegroups.com
An easy workaround is to add a flag to the cloned job to tell your consumer to bury it as soon as it picks it up. 

What we normally do with buried jobs is to just run the same payload against a consumer in our dev environment and observe what happens. We also log all exceptions (by just echoing out from our consumer) so we normally can easily tell exactly why a buried job failed. 

--
Chad Kouse

--
You received this message because you are subscribed to the Google Groups "beanstalk-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beanstalk-talk/-/C5to9zYr4OUJ.
To post to this group, send email to beansta...@googlegroups.com.
To unsubscribe from this group, send email to beanstalk-tal...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beanstalk-talk?hl=en.

Jurian Sluiman

unread,
Oct 30, 2012, 3:35:26 PM10/30/12
to beansta...@googlegroups.com
Hi Chad


On Tuesday, October 30, 2012 5:13:57 PM UTC+1, chadkouse wrote:
An easy workaround is to add a flag to the cloned job to tell your consumer to bury it as soon as it picks it up. 

I have thought about that too, but it all just becomes too messy. You cannot kick a buried job if you want to modify the data. If you set a bury flag, you have to remove it again too. There are thus two down sides:
  1. The job gets queued again until it can be buried. If you have a long queue because of some enormous visitor hit or so, a problem which must be buried can take quite some time before it gets into the bury list for inspection;
  2. If the job gets a flag, you must remove the flag when you kick it again. That means the job is cloned again, removed from the queue and inserted again with a put command.
There are too many steps which all can go wrong, just in order to append some data to the job. Also, with clones of the original job floating around, you cannot rely on the job id anymore. I am making a bridge where other developers can use my code. If anyone uses the job id to track the job over time (just to name an example) it is not transparent the buried job is completely unrelated to the inserted job. Or the kicked one. Or the released one.

If you inspect the protocol, the put command is this:
put <pri> <delay> <ttr> <bytes>\r\n
<data>\r\n
The bury command is this:
bury <id> <pri>\r\n
You can make that into this, where you take care of backwards compatibility too:
bury <id> <pri>\r\n
<data>\t\n
The data will overwrite any previously set data for this job. If no data is set, the original data remains.

As said, I am willing to hire a C expert to take care of this and another feature I really would like to see (a peek-buried-all or so, returning a list of ids from all buried jobs, which I can get then with peek <id>). But before I can take that step, I think I need some more insights in how this all works, if the repository owner is willing to accept such pull requests and if I don't overlook something in this process. Or, perhaps there is another job queue manager which is better than beanstalk at this aspect?

What we normally do with buried jobs is to just run the same payload against a consumer in our dev environment and observe what happens. We also log all exceptions (by just echoing out from our consumer) so we normally can easily tell exactly why a buried job failed. 

Well, the cause the job is buried is highly influenced by the environment. I talk about a webapp here where jobs are things like "create pdf", "send email" and "resize image". If I connect to a 3rd party service and it fails, I retry (so a release with a delay). I start a counter there and after x retries I bury the job.

I do not want the app to have infinite loops connecting to a 3rd party service and a user might have a look if something has gone wrong there. I expect many cases where I bury jobs and I do not want to spit through logs to see what went wrong.
---
Jurian Sluiman

Chad Kouse

unread,
Oct 30, 2012, 3:37:20 PM10/30/12
to beansta...@googlegroups.com
a simpler solution  might be to just have a generic "failed_jobs" tube that you move a job into when it fails (cloning it, adding exception data, etc…)

-- 
Chad Kouse

--
You received this message because you are subscribed to the Google Groups "beanstalk-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beanstalk-talk/-/Vs1CMP2jdHsJ.

Keith Rarick

unread,
Oct 30, 2012, 6:07:15 PM10/30/12
to beansta...@googlegroups.com
Hi Jurian,

Here's my advice for dealing with failed jobs:
http://xph.us/2010/05/02/how-to-handle-job-failures.

Instead of adding features to beanstalkd to handle other
things (such as logging stack traces, or tracking the
application's long-term state), it's better to keep beanstalkd
itself focused on scheduling work to be done, and leave
those other things to other tools.

If you have a work item that has two separate phases
of execution, and those phases can fail independently,
it might make sense to break it apart into two jobs.

Having said all that, we definitely need ways to get better
visibility into beanstalkd's internal state while it's running
in production.

Jurian Sluiman

unread,
Oct 30, 2012, 6:41:28 PM10/30/12
to beansta...@googlegroups.com
Hi Keith,


On Tuesday, October 30, 2012 11:07:37 PM UTC+1, Keith Rarick wrote:
Here's my advice for dealing with failed jobs:
http://xph.us/2010/05/02/how-to-handle-job-failures

Thanks for the link (that should be http://xph.us/2010/05/02/how-to-handle-job-failures.html by the way, to help others). Interesting to read your solution! I am going to take a few snippets out of that post if you don't mind.

1. "See what sorts of failures happen in production": yes, I have logging for my workers. In fact, my webapp is half http, half cli and the second half also takes care of the workers. I have app-wide logging to various channels, so I log the workers too. However, how do you relate the errors from jobs to your log? I want to create a web interface with information about the buried jobs. I must open my log file, parse it, peek the bury list (preferable all at once), relate the job ids to the logging information and show that to the user.

I log the exceptions too, but to make the process more comfortable, I see nothing against it to store the exception type + message in the job data itself as well. It gets only more complicated with the reasoning I see here a lot: "just delete the job and put a new one back". Ids are lost and tracking is near-impossible.
 
2. "It might also make sense to retry some jobs only a limited number of times before deleting them": I can only come up with a method to delete the job and put a clone back. Releasing a job with a delay can't be done for above reasons (ie, where do you store the counter?).

3. "For retries, [...], but do add a time delay with exponential backoff": same as above, where do you store this logic?

Ad 1: In case you want to minimize errors (always good) and thus reduce the bury queue (also good), you probably need more data. The complete stack trace might be logged into your real logging service. If you have that goal, you use the log to process the information. But for quick inspection about the job's reason to be buried it's much too complicated I think.

Instead of adding features to beanstalkd to handle other
things (such as logging stack traces, or tracking the
application's long-term state), it's better to keep beanstalkd
itself focused on scheduling work to be done, and leave
those other things to other tools.

I completely agree. The simplicity of beanstalk is something I really appreciate. However, I think the described enhancements could leverage beanstalk's usage without losing focus, compromise on memory footprint or end up being a clumsy one-size-fits-all solution. 

If you have a work item that has two separate phases
of execution, and those phases can fail independently,
it might make sense to break it apart into two jobs. 

It's what we do already :) We have now a custom solution as a nodejs app where we schedule using Redis as a queue service. All jobs are atomic, to make no mistakes with chains of failures. 

Having said all that, we definitely need ways to get better
visibility into beanstalkd's internal state while it's running
in production.

I really would like to hear your thoughts about this more. If you have more information, thoughts, developments or whatsoever, please share :)
---
Jurian Sluiman 

Chad Kouse

unread,
Oct 30, 2012, 9:13:29 PM10/30/12
to beansta...@googlegroups.com
from the protocol doc:
The stats-job data is a YAML file representing a single dictionary of strings
to scalars. It contains these keys:
..
- "releases" is the number of times a client has released this job from a
   reservation.

we use php for our consumer so I'll just give you our function for deleting jobs that stick around too long.


-- chad

--
You received this message because you are subscribed to the Google Groups "beanstalk-talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beanstalk-talk/-/Pm8b9vOhNfMJ.

Keith Rarick

unread,
Oct 30, 2012, 11:52:20 PM10/30/12
to beansta...@googlegroups.com
On Tue, Oct 30, 2012 at 3:41 PM, Jurian Sluiman <jur...@juriansluiman.nl> wrote:
> how do you relate the errors from jobs to your log?

I'd suggest making sure the logs contain enough context for
debugging failures, including the job id and possibly also a
description of what the job was trying to do.

The idea is to avoid the need to match up log messages to
beanstalkd jobs by putting all necessary info in the logs.

> "just delete the job and put a new one back".

I don't usually use that pattern, but if you do, you can generate
your own request-id (for example, a version 4 UUID) and use
that consistently across several beanstalkd jobs.

> Releasing a job with a delay can't be done for
> above reasons (ie, where do you store the counter?).

Beanstalkd counts the number of times a job was released.
You can get the count from stats-job.

> 3. "For retries, [...], but do add a time delay with exponential backoff":
> same as above, where do you store this logic?

You can get the job's previous delay time with stats-job.

See https://github.com/kr/beanstalk-client-ruby/blob/master/lib/beanstalk-client/job.rb#L97
for an example combining an exponential backoff with a cutoff
to limit the total number of retries.

Jurian Sluiman

unread,
Oct 31, 2012, 4:18:34 AM10/31/12
to beansta...@googlegroups.com
Hi Chad,


On Wednesday, October 31, 2012 2:13:35 AM UTC+1, chadkouse wrote:
from the protocol doc:
The stats-job data is a YAML file representing a single dictionary of strings
to scalars. It contains these keys:
..
- "releases" is the number of times a client has released this job from a
   reservation.

we use php for our consumer so I'll just give you our function for deleting jobs that stick around too long.


Thanks for this info! It seems I do not need to track the count myself. I am integrating beanstalk into Zend Framework 2 now (so I also use Pheanstalk, but I wrap it around a self-contained service): https://github.com/juriansluiman/SlmQueue. I will make some more refactorings where I can skip the counter and use the stats-job. 
---
Jurian Sluiman 

Jurian Sluiman

unread,
Oct 31, 2012, 4:27:34 AM10/31/12
to beansta...@googlegroups.com
Hi Keith,


On Wednesday, October 31, 2012 4:52:42 AM UTC+1, Keith Rarick wrote:
On Tue, Oct 30, 2012 at 3:41 PM, Jurian Sluiman <jur...@juriansluiman.nl> wrote:
> how do you relate the errors from jobs to your log?

I'd suggest making sure the logs contain enough context for
debugging failures, including the job id and possibly also a
description of what the job was trying to do.

The idea is to avoid the need to match up log messages to
beanstalkd jobs by putting all necessary info in the logs.

Our logs are not really suited to play nice with business logic like this. The log is beanstalk-worker specific, but in a standard log format (this makes aggregating them very easy and helps us in monitoring the complete state of our servers). That being said, I think I just store the the buried jobs in a sqlite database. There are some buried queues floating around and the sqlite database provides the context of those jobs. It creates a coupling I am not really in favour of, but it probably will do the job well (and behaves like a "log" as you interpret it probably).

I can easily store job id, job data and any exception type, exception message and exception trace as I want. This database plays very well together with the business logic I need and there are no problems in parsing and relating long log files to the state of the bury queue of beanstalk. If eventually there are possibilities to do this internally in beanstalk I would love to hear it.
---
Jurian Sluiman
Reply all
Reply to author
Forward
0 new messages