Bug#826543: cupsd fails to remove job files at end of PreserveJobFiles interval

Sergio Gelato

unread,

Jun 6, 2016, 5:30:03 AM6/6/16

to

Package: cups-daemon
Version: 1.7.5-11+deb8u1

My /var/spool/cups filled up yesterday. The reason turned out to be that
job files had not been purged for the past six days. I'm running with the
default setting for PreserveJobFiles (86400 seconds), so this is definitely
not intended behaviour.

I don't know yet how easy this is to reproduce; the fact that it only started
last Tuesday on a system that has been running this version of CUPS for months
suggests that it may not be entirely trivial to trigger.

Inspection of the source code has led to my spotting a logic flaw in
scheduler/job.c:cupsdCleanJobs(). I'm not yet 100% certain that it can
account for yesterday's incident, but it is definitely a flaw (apparently
still uncorrected in version 2.1.3-5).

The flaw is that cupsdCleanJobs() does not consistently guard against
the possibility that job->history_time==0 or job->file_time==0 when
updating JobHistoryUpdate. On the one hand there are checks like

else if (job->file_time && job->file_time <= curtime)

and on the other hand one sees, in a different code branch,

if (job->file_time < JobHistoryUpdate || !JobHistoryUpdate)
JobHistoryUpdate = job->file_time;

My impression is that job->file_time can be zero if a job's files have
already been cleaned up by remove_job_files(). That could lead to
JobHistoryUpdate being 0 on exit from cupsdCleanJobs(), thereby
unscheduling further calls to cupsdCleanJobs().

Accordingly, I'd recommend changing the latter statement to

if (job->file_time > 0 &&
(job->file_time < JobHistoryUpdate || !JobHistoryUpdate))
JobHistoryUpdate = job->file_time;

and doing something similar to the two statements involving job->history_time.

(Aside: CUPS probably won't work very well before 1970-01-01 but I can live
with that. I'd be more concerned about INT_MAX being too small.)

Sergio Gelato

unread,

Jun 6, 2016, 12:40:03 PM6/6/16

to

I've now tested my proposed change and played with LogLevel debug2.
The change doesn't seem to do any harm but it didn't obviously cure
the problem either. It took several daemon restarts (and one access
to the web interface, though I'm not sure that was the trigger) for
expired job files to be removed.

I'll keep monitoring the situation, although I'm now running with
PreserveJobFiles Off since that seems better for privacy.

Brian Potkin

unread,

Nov 9, 2019, 12:10:02 PM11/9/19

to

tags 826543 moreinfo upstream
thanks

Thank you for your report, Sergio. I reckon this could be Issue 5538:

https://github.com/apple/cups/issues/5538

Please would you test on buster or later.

Regards,

Brian.

Sergio Gelato

unread,

Nov 10, 2019, 11:40:03 AM11/10/19

to

Thank you for your reply. The issue I reported is distinct from upstream #5538 or Debian #921741 but the upstream patch does indeed seem to incorporate my suggested fix; I assume upstream independently spotted the same problem. Note that the original fix for #5538 had a misplaced right parenthesis, corrected in

[53cac1d08bba395942df638d74526470c9fc7975] Fix parentheses in cupsdCleanJobs

The latter patch is not in buster yet. (I should probably cherry-pick it for local use if you won't issue an update for buster.)

As for testing again, the issue doesn't seem easy to trigger to begin with. I had deployed my suggested fix locally on jessie but not on stretch, and it hasn't come back to bite us. (I'm not saying it was never triggered, only that the spool never filled up again as a result. Unrelated changes to our printing infrastructure --- we no longer run a central CUPS server --- have probably helped.)

Sergio Gelato

unread,

Nov 10, 2019, 4:10:03 PM11/10/19

to

Earlier I pointed out that upstream commit

[53cac1d08bba395942df638d74526470c9fc7975] Fix parentheses in cupsdCleanJobs

isn't in buster. I've now given some more thought to the implications of this. In the typical case (default settings) job->history_time is INT_MAX. The misplaced parenthesis makes the code behave as

if (1 < JobHistoryUpdate || !JobHistoryUpdate) JobHistoryUpdate = job->history_time;

which means JobHistoryUpdate will nearly always be updated. It will therefore most likely (until 2038, anyway) end up with the file_time value of the last job in the array. I haven't checked how the jobs array is sorted; I think the code should avoid relying on a particular ordering. The correct result is of course the earliest file_time value of all jobs in the array, so the misplaced parenthesis could result in cleanup being unduly delayed.

By the way, the way in which history_time and file_time are calculated in cupsdLoadJob() and cupsdUpdateJobs() could result in wraparound on platforms with signed 32-bit time_t. For plausible values of PreserveJobHistory and PreserveJobFiles this isn't going to bite until 2038 or maybe late 2037, but what if one now were to test with "1000w" on a 32-bit platform?

Brian Potkin

unread,

Nov 11, 2019, 11:20:03 AM11/11/19

to

On Sun 10 Nov 2019 at 16:27:42 +0000, Sergio Gelato wrote:

> Thank you for your reply. The issue I reported is distinct from
> upstream #5538 or Debian #921741 but the upstream patch does indeed
> seem to incorporate my suggested fix; I assume upstream independently
> spotted the same problem. Note that the original fix for #5538 had a
> misplaced right parenthesis, corrected in

> [53cac1d08bba395942df638d74526470c9fc7975] Fix parentheses in cupsdCleanJobs

The fix for this is in unstable.

> The latter patch is not in buster yet. (I should probably cherry-pick
> it for local use if you won't issue an update for buster.)
>
> As for testing again, the issue doesn't seem easy to trigger to begin
> with. I had deployed my suggested fix locally on jessie but not on
> stretch, and it hasn't come back to bite us. (I'm not saying it was
> never triggered, only that the spool never filled up again as a
> result. Unrelated changes to our printing infrastructure --- we no
> longer run a central CUPS server --- have probably helped.

You appear to have the issue under control at your end, so, unless
you wish to take your distinct issue upsteam, I am not sure where we
are going. As far as triaging is concerned, I am not happy to work
on something I cannot reproduce here; it does not make life easy.

Reagrds,

Brian.