GhostScript adding 30GB files to tmp folder

880 views
Skip to first unread message

Oliver Slay

unread,
Jul 26, 2015, 11:51:32 PM7/26/15
to ResourceSpace
I have just received several Storage 28 errors from a RS installation...  it has a 40GB system disk, 80GB data disk and a 1TB external disk for the filestore.

It's been running all very well.. no problems .. until Thursday and Friday.  I removed the gs_Ffd3455 (or similar filenames) from the tmp folder.  They are GhostScript temporary files and not needed if the process has ended.  GhostScript is not cleaning them up - esp when it fails during the process.  One file was 29GB.  After removing it, another file appeared on Friday that was 30GB.  This leaves 0% for the system disk and completely prevents RS from running properly.

I have run pdfinfo on the suspected file and the dimensions are 11339 x 3458.  This seems a lot larger than 850x850.  I was wondering if the PDF was also layered.

Is the size of this PDF the problem?  It produces an 11MB jpeg file when it completes.  Is this one file creating a 30GB file during the process?  for a 1-page spread PDF?

GhostScript is 9.10 (2013) there's a 9.16 (2015) - are there any problems with updating GS?  We're still on r6025.

Our options appear to be:
1)  $pdf_dynamic_rip ... 
2)  Point tempdir and tmp env variables to the larger 1TB disk... 
3)  Crontab the deletion of gs temporary files each night that are older than 2 hours?   Would a gs process run longer that 2 hours?

Any ideas why this has just started happening and hasn't happened previously?  Could it just be that these are the first oversized PDFs they have tried on the system?

Regards

Oliver

Dan Huby

unread,
Jul 29, 2015, 3:14:12 PM7/29/15
to ResourceSpace, ma...@oliverslay.com, ma...@oliverslay.com
Hi Oliver,

I've never seen this issue but I'd guess the problem is caused by broken or unsupported PDF files.

Dan

Oliver Slay

unread,
Jul 29, 2015, 8:05:03 PM7/29/15
to ResourceSpace, d...@montala.com
Hi

I've just downloaded the PDF... looks ok to me.. although the print size is 163.68 inches x 49.6 inches at 100%.  That would fit on 84 pages of A4... So I presume it is attempting to create a preview image at the default 150 dpi.

Oliver

Dan Huby

unread,
Jul 30, 2015, 3:36:15 AM7/30/15
to ResourceSpace, ma...@oliverslay.com, ma...@oliverslay.com
Hi Oliver,

A quick bit of maths - that's a 182 megapixel image at 150dpi, which is about 500MB at true colour. That's a long way from 30GB, so I wonder if a different DPI is used initially, perhaps so it can then scale back down to produce an anti-aliased image. As the files are being left behind it seems something is failing - but perhaps it's just running out of disk space.

I think we should scale to a fixed pixel size rather than use DPI but I don't think Ghostscript supports this. We'd need to detect the document size first ourselves and calculate an appropriate DPI.

Dan

Oliver Slay

unread,
Jul 30, 2015, 5:04:03 AM7/30/15
to Dan Huby, ResourceSpace
Hi Dan

Isn't that what $pdf_dynamic_rip does?  It uses pdfinfo to get the dimensions of the pdf, calculates a new DPI and then passes that into the variable $resolution, overriding the default... But they would have to test that to see if they are happy with the results for smaller PDFs and AI files etc.

Yes, it is running out of disk space... The /tmp and /tempdir folders are on the 40GB system drive.  The resourcespace www folder is on the 80GB datadisk .. and the resources are on a 1TB attached storage...

When the temporary GhostScript file reaches 30GB that's 100% of the diskspace, and it could then crash out leaving the temp file on the disk (which GS is notorious for) and then every time someone attempts to view a resource they see the Storage Error 28 message (and I get a flood of error emails).  Without removing the temp file, no further uploads can be made to the server until the offending file is removed.  Since the ingest can be repeated, I figure that these temporary files could be removed without harm.

When I get a free moment I'll upload the file to my local Ubuntu install which is a mirror of the production server and run the GhostScript command manually... 

Oliver

--


------------------------------------------------
Oliver Slay MIAP DipHealthSci(Open)

Mob: 07930 420656
------------------------------------------------

Dan Huby

unread,
Jul 30, 2015, 6:01:10 AM7/30/15
to Oliver Slay, ResourceSpace
Hi Oliver,

> Isn't that what $pdf_dynamic_rip does?

Ah, exactly right, I wasn't aware that had been implemented but looks
like it's been there since 2009! Hopefully that will be the solution
here - interested to hear if that's the fix.

Dan


--
Dan Huby
Montala Limited

http://www.montala.com/
UK: 01367 710245
Intl: +44 136 771 0245

On 30/07/15 10:03, Oliver Slay wrote:
> Hi Dan
>
> Isn't that what $pdf_dynamic_rip does? It uses pdfinfo to get the
> dimensions of the pdf, calculates a new DPI and then passes that into
> the variable $resolution, overriding the default... But they would have
> to test that to see if they are happy with the results for smaller PDFs
> and AI files etc.
>
> Yes, it is running out of disk space... The /tmp and /tempdir folders
> are on the 40GB system drive. The resourcespace www folder is on the
> 80GB datadisk .. and the resources are on a 1TB attached storage...
>
> When the temporary GhostScript file reaches 30GB that's 100% of the
> diskspace, and it could then crash out leaving the temp file on the disk
> (which GS is notorious for) and then every time someone attempts to view
> a resource they see the Storage Error 28 message (and I get a flood of
> error emails). Without removing the temp file, no further uploads can
> be made to the server until the offending file is removed. Since the
> ingest can be repeated, I figure that these temporary files could be
> removed without harm.
>
> When I get a free moment I'll upload the file to my local Ubuntu install
> which is a mirror of the production server and run the GhostScript
> command manually...
>
> Oliver
>
>
> On 30 July 2015 at 08:36, Dan Huby <d...@montala.com
> Email: ma...@oliverslay.com <mailto:ma...@oliverslay.com>
> <mailto:ma...@oliverslay.com>------------------------------------------------
>

Oliver Slay

unread,
Aug 4, 2015, 2:05:11 PM8/4/15
to ResourceSpace, ma...@oliverslay.com
I'm running the gs command by itself... and watching in FileZilla 2 gs_.. tmp files increasing in size... one is much smaller than the other... but the larger is up to 15GB already... and the input file is 2.4MB... 

 gs -dBATCH -r150 -dUseCIEColor -dNOPAUSE -sDEVICE=jpeg -dJPEGQ=90 -sOutputFile=6mbackground.jpg  -dFirstPage=1 -dLastPage=1 -dEPSCrop -dUseCropBox "6m_x_2.5m flightpath background.pdf" -verbose

This file did actually get ingested successfully after 16 minutes previously..  I am thinking that after 10 minutes they gave up...   Mine has reach 15GB and I'm going to crash out of the command as I don't know how much it's going to cost on Azure for 10 minutes of 15GB... I'd like to know tho...

I have run with -r11 11dpi  and it took a few seconds... a 3mb tmp file was created, but disappeared fairly quickly... unfortunately the jpg looks a bit pixelated.. although I can't actually find where this image is used... This is the ref_refhash.jpg file... not refscr_refhash.jpg  for page 1... Is this file displayed anyway?  Or is it used to generate all of the other paper sizes?

Oliver
Reply all
Reply to author
Forward
0 new messages