If you talk about simple file timestamp as seen by OS, doit already provides tooling for it.
Dependencies
& Targets - basic introduction. Talks about tracking
file MD5 signature as sign of file being modified
More on dependencies - explains custom functions usable within "uptodate" section to determine, if the file/task is considered uptodate
check_timestamp_unchanged() - an example for a file "foo" and the check, if it's timestamp has changed or not.
But if you talk about timestamps stored within HDFS file, the options are:
I shall be even possible to consider the HDFS file as couple of
independent virtual files, each being updated by doit (incl.
evaluation, if it shall be updated or not) one by one. This would
require use of custom function to determine the HDFS file state
and to remove task dependency on the HDFS file itself (as it would
change with update of any of it's sections and trigger updates
which are sometime not necessary).
With best regards
Jan
--
You received this message because you are subscribed to the Google Groups "python-doit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I am not sure, if you can disable doit using MD5 checks on any of files mentioned as file dependency (this is question to Eduardo Shettino).
My guess is, that currently all files (mentioned in any of "file_dep") are checked using MD5 so any additional checks would only add one check extra.
Workaround could be using small file describing size and modification file of MD5 and keep this one as representative of HDFS file for doit. If your action on fixed set of actions on HDFS file works as one unite and always modifies the representative file, it could work well.
doit also allows for some calculated task results, this is an alternative. See calculated-dependencies, it might be exactly what you need.
Regarding MD5 and huge HDFS file size: is this really a problem? How much time it takes to calculate MD5? Is your code really time constrained? I can imagine, that if you accept your task running 10 seconds longer you can keep your code simple and clean (and with higher chance to provide proper results). But you know your constrains the best.
Jan Vlčinský
Hi Anton
Best would be to ignore doit internals (it does not do trivial
things) and use what it already offers. Try the calculated-dependencies
example. Even if it does not gets clear at the first sight, it
would be better to get through all existing examples and tutorials
than to modify a code of doit. The design is nice and applicable
to many situations. The other option is to mess with the code
whose design you do not understand well.
Jan
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "python-doit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit+unsubscribe@googlegroups.com.
I am not sure, if you can disable doit using MD5 checks on any of files mentioned as file dependency (this is question to Eduardo Shettino).
My guess is, that currently all files (mentioned in any of "file_dep") are checked using MD5 so any additional checks would only add one check extra.