The other day I had reason to run a curation task over our entire
repository. It found a large number of Items that needed
modification, and I watched as it got slower...and slower...and
s l o w e r ... until it ran out of memory and crashed, leaving no work
completed. I got a list of the Collections to be affected, and ran
the curator over each one separately, and the job was (eventually)
completed.
It seems to me that the proper unit of work for a curation run is not
the whole set of affected objects, but the task. We should be
committing work each time a task returns. I would expect that a
well-designed task can be re-run in the same scope without causing
problems.
Comments?
--
Mark H. Wood
Lead Technology Analyst
University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu