Given how simple just scheduling things in the past is, I think I'd be
behind providing that functionality as a quick way to get back-fill
capabilities. But, I'm not sure that it's the way things will/should be
done in some future world where Azkaban is perfect and world hunger will be
solved.
So, I guess that despite some objections, for lack of a "better" solution,
I'm +1 for getting it out there and then we can let whatever pains are faced
inform decisions on a better solution (if, in fact, there is one).
--Eric
On 9/22/10 5:27 PM, "Vikram Oberoi" <vob...@gmail.com> wrote:
> Hey folks,
>
> I'm going to introduce the notion of scheduling jobs in the past in Azkaban.
> The change is actually easy to make, but it's confusing and potentially
> contentious. I'd like some feedback on: whether this is a good idea, whether
> this should be in Azkaban, and whether it should even be implemented the way
> I'm proposing.
>
> *What does "scheduling in the past" mean?* Let's take a simple example:
>
> It's September 20th. I've implemented a Pig job that grabs 7-day trailing
> uniques for product X, and I want it to run daily at midnight. However, I
> want to grab this data from August 1st onward. I want to be able to tell
> Azkaban to schedule this job to run on August 1st and run it daily. My
> expected behavior is that Azkaban will immediately begin running the job and
> schedule it to run every day on recurring basis. Until the job catches up to
> the current day, it'll keep executing immediately.
>
> The sole reason I want to do this is because I want the correct date to
> interpolate in my logfile path strings. That's it. I don't want to write a
> separate script to execute a job flow I've already defined for Azkaban.
>
> *How will you implement this?*
> *
> *
> My master branch for Azkaban already has a change to track the scheduled
> time in such a way. The rule for this scheduled time is that *every
> consecutive job's scheduled time is the previous job's scheduled time + the
> period*.