--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi!
The more robust way is to trigger hourly or even more frequently, instead of a particular time of day, and invoke something like
luigi --module your.module RangeDaily --of YourActualTask --start 2016-01-01
as documented in http://luigi.readthedocs.org/en/latest/api/luigi.tools.range.html.
That way you’re saying “finish please as soon as possible, when the dependencies have come in” and “finish the tasks for yesterday (and contiguously back) even if something major blocked them from executing all day”.
On Wed, Feb 3, 2016 at 2:52 PM, Dillon Stadther <dlsta...@gmail.com> wrote:
I have reached a point where I am putting the dozens of Luigi jobs i've written into production. It is at this point that I'm running into unexpected walls that I'm seeking others' execution methods.Presently, the majority of my jobs are being executed by one of two wrapper tasks. These wrapper tasks are then scheduled in crontab as the following:00 05 * * * cd /home/user/path/to/my/luigi/jobs/; python five_utc.py # wrapper task00 06 * * * cd /home/user/path/to/my/luigi/jobs/; python emr_job.py00 10 * * * cd /home/user/path/to/my/luigi/jobs/; python ten_utc.py # wrapper task00 11 * * * cd /home/user/path/to/my/luigi/jobs/; python another_job.py[I have first changed directories so that my client.cfg will be identified (it is located in the same directory as my luigi files).]The first job (at 5 UTC) launches and runs with no issue. However, the second (at 10 UTC) never runs. The files themselves are identical with the exception of which jobs are yielded within requires( ).
Note: Both five_utc.py and ten_utc.py successfully run from terminal using the exact commands above.Has anyone encountered this issue or a similar and can help?
If you can share console output of the 10 UTC invocation, we can help decipher it.
A classic reason might be – if something is already running with a particular task ID, concurrent runs are precluded by the central orchestrator (if you’re using one).
Also, I know that luigi can also be execution by 'luigi <task> --module <filename>'. However, I cannot get this to run successfully. I get an import error "no module named <whatever>". This made me wonder....do luigi users have to install their tasks within their systems? (i.e. create __init__.py and setup.py then 'sudo python setup.py install').
One does not necessarily have to install, but the modules need to be on PYTHONPATH. Typically something like PYTHONPATH=. before python.
Any help would be greatly appreciated!Thanks
--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Thu, Feb 4, 2016 at 2:37 PM, Dillon Stadther <dlsta...@gmail.com> wrote:
As time allows, I will try to round up all my dependencies and move to a virtualenv and install via setuptools as suggested by Dave and Lars.Regarding the luigi range tools, am I correct that it will make sure that it is run for all days between --start and today? I have specific times of the day (5 and 10 UTC) specified because we use Luigi to perform our daily data warehouse ETL for which some external and internal scripts must run and complete prior to our execution. A number of our dependencies are, in fact, constantly being updated, but we only want a daily snapshot of them.
That’s a typical use for Luigi’s completeness checking. You can always execute whatever kind of logic in the ExternalTasks complete(), to see if the preconditions for ETL are fine. For the ETL job itself include date in output (create a dummy marker output, if it’s not easy natively in the data warehouse), so Luigi will see the ETL job as complete on subsequent retries, and will do nothing.
The issue with the 10 UTC invocation is that it does not occur via crontab when scheduled (it does not even show up in the cron logs 'grep CRON /var/log/syslog').For giggles, I just changed the crontab execution time to be 2 minutes into the future, expecting the usual fate. However, I was completely caught off guard when it worked! The cron log for today is below (sorry for the size):Basically, the cron log shows that it never executed anything from the ubuntu user between 6 UTC and 13:05 UTC (where there should have been a cron execution at 10 UTC).
Weird. I won’t be able to help troubleshoot that.