Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Use shell command to judge whether a file is in pdf format or not.

5 views
Skip to first unread message

Hongyi Zhao

unread,
Feb 15, 2010, 9:47:34 PM2/15/10
to
Hi all,

I want to use shell command to judge whether a file is in pdf format
or not. Any hints?
--
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

Ben Finney

unread,
Feb 15, 2010, 9:51:18 PM2/15/10
to
Hongyi Zhao <hongy...@gmail.com> writes:

> I want to use shell command to judge whether a file is in pdf format
> or not. Any hints?

The ‘file(1)’ command queries file contents and identifies the type of
data they contain.

--
\ “If you continue running Windows, your system may become |
`\ unstable.” —Microsoft, Windows 95 bluescreen error message |
_o__) |
Ben Finney

Janis Papanagnou

unread,
Feb 15, 2010, 9:53:09 PM2/15/10
to
Hongyi Zhao wrote:
> Hi all,
>
> I want to use shell command to judge whether a file is in pdf format
> or not. Any hints?

man file
man grep

On my system file(1) gives, for example, the following information...

ASCII text
ISO-8859 text
Microsoft Office Document
PDF document, version 1.3
PDF document, version 1.4
...

...then apply grep(1) and check the return code in a shell if statement.

Janis

mop2

unread,
Feb 16, 2010, 5:09:01 AM2/16/10
to
On Tue, 16 Feb 2010 00:47:34 -0200, Hongyi Zhao
<hongy...@gmail.com> wrote:

> Hi all,
>
> I want to use shell command to judge whether a file is in pdf
> format
> or not. Any hints?

You can try an alternative to the command file.
A function with bash, for example, can be:

ispdf(){ read -n5 < $1 &&[ "$REPLY" = %PDF- ];}

$ ispdf uhf.htm &&echo yes
$ ispdf irfr3303pbf.pdf &&echo yes
yes
$

Ben Bacarisse

unread,
Feb 16, 2010, 6:27:26 AM2/16/10
to
Hongyi Zhao <hongy...@gmail.com> writes:

> I want to use shell command to judge whether a file is in pdf format
> or not. Any hints?

In addition to 'file' you could use the exit code from pdfinfo. This
is part of a suite of PDF manipulating programs that might not be
available on your target systems, but it seems worth mentioning.

--
Ben.

Hongyi Zhao

unread,
Feb 16, 2010, 7:44:50 AM2/16/10
to
On Tue, 16 Feb 2010 11:27:26 +0000, Ben Bacarisse
<ben.u...@bsb.me.uk> wrote:

>In addition to 'file' you could use the exit code from pdfinfo. This
>is part of a suite of PDF manipulating programs that might not be
>available on your target systems, but it seems worth mentioning.

Excellent tool! It's also available on my system. Thanks a lot.

Fred

unread,
Feb 16, 2010, 10:27:48 AM2/16/10
to
On Feb 16, 4:44 am, Hongyi Zhao <hongyi.z...@gmail.com> wrote:
> On Tue, 16 Feb 2010 11:27:26 +0000, Ben Bacarisse
>
> <ben.use...@bsb.me.uk> wrote:
> >In addition to 'file' you could use the exit code from pdfinfo.  This
> >is part of a suite of PDF manipulating programs that might not be
> >available on your target systems, but it seems worth mentioning.
>
> Excellent tool!  It's also available on my system.  Thanks a lot.

Note that just because the first line of a file begins with "%PDF-"
it does not guarantee that the rest of the file is actually in PDF
format.
So the test can only determine that a file is either *not* PDF, or
that the file is "might be PDF".
--
Fred K

Hongyi Zhao

unread,
Feb 16, 2010, 10:37:58 PM2/16/10
to
On Tue, 16 Feb 2010 07:27:48 -0800 (PST), Fred
<fred.l.kl...@boeing.com> wrote:

>Note that just because the first line of a file begins with "%PDF-"
>it does not guarantee that the rest of the file is actually in PDF
>format.
>So the test can only determine that a file is either *not* PDF, or
>that the file is "might be PDF".

Not so, the pdfinfo will do some more checks besides that magic code.

See the following test:

$ echo %PDF- > 11
$ pdfinfo 11
Error: PDF version 
-- xpdf supports versio
n 1.7 (continuing anyway)
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table

0 new messages