Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

wordcount of pdf in miktex

389 views
Skip to first unread message

rajkumar...@gmail.com

unread,
Jul 11, 2014, 1:05:23 AM7/11/14
to
Please suggest me how to count the number of words of the generated pdf in miktek...


Thank You

Robin Fairbairns

unread,
Jul 11, 2014, 6:08:57 AM7/11/14
to
rajkumar...@gmail.com writes:

> Please suggest me how to count the number of words of the generated
> pdf in miktek...

i don't know any means of counting words in a pdf file. (it's the sort
of thing that you would need full adobe acrobat for, i imagine.)

the tex faq lists a set of mechanisms. some (if not all) require a un*x
machine to run; i don't use windows machines and have only occasionally
picked up a w*-relevant hint for the faq.

the faq will cease to operate some time this year, since i'm retiring.
--
Robin Fairbairns, Cambridge

rajkumar...@gmail.com

unread,
Jul 11, 2014, 7:21:48 AM7/11/14
to
Sir, what about in linux...
I tried doing in linux using the command wc filename.tex and wc filename.pdf
but got confused which one gives the correct word number...Is it that using
wc filename.tex gives the correct word count of the pdf or wc filename.pdf gives the correct counting.

which one of the two is the correct command for counting words in a pdf file.

T.K.

unread,
Jul 11, 2014, 8:44:22 AM7/11/14
to


Dne 11.7.2014 13:21, rajkumar...@gmail.com napsal(a):
> Sir, what about in linux...
> I tried doing in linux using the command wc filename.tex and wc filename.pdf
> but got confused which one gives the correct word number...Is it that using
> wc filename.tex gives the correct word count of the pdf or wc filename.pdf gives the correct counting.
>
> which one of the two is the correct command for counting words in a pdf file.
>

I think no one. The command wc -w <filename> counts "words" in a PLAIN
TEXT file, words being defined as SEQUENCES OF NON-BLANK CHARACTERS. It
means that in the tex file, also control sequences (if followed by a
space) are counted and, on the contrary, words generated by control
sequences are not.

Using wc -w for a pdf file has the same effect as opening it with a
PLAIN TEXT editor and counting the non-blank sequences.

Tomáš

T.K.

unread,
Jul 11, 2014, 8:47:54 AM7/11/14
to
Dne 11.7.2014 14:44, T.K. napsal(a):
Not mentioning the fact that in the tex file it does not process the
included and inputted files…

TK

Peter Percival

unread,
Jul 12, 2014, 6:57:01 AM7/12/14
to
rajkumar...@gmail.com wrote:
> Please suggest me how to count the number of words of the generated pdf in miktek...
>
>
> Thank You

It occurs to me that the answer may depend on why you want the word
count. If you're writing a dissertation and some busybody wants to know
that you haven't exceeded some limit, then guess. If you feel under a
moral obligation to make a reasonably accurate guess, then count by hand
the number of words on one page and multiply by the number of pages.

--
[Dancing is] a perpendicular expression of a horizontal desire.
G.B. Shaw quoted in /New Statesman/, 23 March 1962

Peter Flynn

unread,
Jul 13, 2014, 7:01:19 PM7/13/14
to
On 07/11/2014 06:05 AM, rajkumar...@gmail.com wrote:
> Please suggest me how to count the number of words of the generated pdf in miktek...

This has nothing whatsoever to do with MiKTeX.

Use the pdftotext program and a word-counter, eg on Linux/Mac this would be:

$ pdftotext myfile.pdf | wc -w

On Windows you may have to do it as two separate commands:

C:\> pdftotext myfile.pdf >myfile.txt
C:\> wc myfile.txt

pdftotext for Windows is available at
http://www.foolabs.com/xpdf/download.html
wc for Windows is available at http://www.tawbaware.com/wc.htm

///Peter
--
XML FAQ: http://xml.silmaril.ie/


Peter Flynn

unread,
Jul 13, 2014, 7:03:15 PM7/13/14
to
On 07/11/2014 12:21 PM, rajkumar...@gmail.com wrote:
> On Friday, 11 July 2014 15:38:57 UTC+5:30, Robin Fairbairns wrote:
>> rajkumar...@gmail.com writes:
> I tried doing in linux using the command wc filename.tex and wc filename.pdf

wc only works on plaintext files. It is meaningless on markup (TeX) and
binary (PDF) files. Convert to text first: there is a detex script to do
this for TeX files, and pdftotext for PDF files.

John Harper

unread,
Jul 13, 2014, 7:32:46 PM7/13/14
to
Peter Percival wrote:

> rajkumar...@gmail.com wrote:
>> Please suggest me how to count the number of words of the generated pdf
>> in miktek...
>>
>>
>> Thank You
>
> It occurs to me that the answer may depend on why you want the word
> count. If you're writing a dissertation and some busybody wants to know
> that you haven't exceeded some limit, then guess. If you feel under a
> moral obligation to make a reasonably accurate guess, then count by hand
> the number of words on one page and multiply by the number of pages.

In the former case I wish the busybodies would follow the example of Earth
Sciences, Cambridge, UK, who specified a maximum number of pages for a PhD
thesis and also the format. But other departments in the same university
specified a maximum number of words. Disclaimer: it is some years since I
looked up the requirements, and they may no longer be as I describe. I have
no quarrel with the existence of a maximum, having once been an external
examiner of a PhD in another university that did not specify a maximum. That
thesis was two fat volumes...

In the latter case, first define "word". Is $x+y=y+x$ one word or seven, or
even nine if "=" is interpreted as "is equal to"? Is "Lieutenant-colonel"
one word or two? Is your answer the same if "Lieutenant-" is at the end of
one line and "colonel" at the beginning of the next? What about "headmaster"
and "head-" and "master"? Is a picture worth a thousand words? I do not
assume that Microsoft is infallible on this topic.

--
John Harper

Martin Heller

unread,
Jul 14, 2014, 12:42:41 PM7/14/14
to
Peter Flynn wrote, on 14-07-2014 01:01:
> On Windows you may have to do it as two separate commands:
>
> C:\> pdftotext myfile.pdf >myfile.txt
> C:\> wc myfile.t

On Windows I use

pdftotext myfile.pdf - | wc

i.e. you need a dash to indicate that the output of pdftotext should not
go into the file myfile.txt. I use pdftotext and wc from gnuwin32.




Dan Luecking

unread,
Jul 14, 2014, 2:26:43 PM7/14/14
to
On Mon, 14 Jul 2014 18:42:41 +0200, Martin Heller <mr_h...@yahoo.dk>
wrote:
Tex Live on windows comes with pdftotext if one wants to avoid
GnuWin32.

wc is part MSys's coreutils in addition to GnuWin32's textutils.
(I have both but I don't know what the difference might be.)


Dan
To reply by email, change LookInSig to luecking

William Unruh

unread,
Jul 14, 2014, 3:48:23 PM7/14/14
to
Tried pdftotext on one of my latexed files. It simply came out complete
garbage. The pdflatex file seemed to have been created as an image file,
not a text file (I used dvipdf to create it).


Bob Tennent

unread,
Jul 14, 2014, 4:12:16 PM7/14/14
to
On Mon, 14 Jul 2014 19:48:23 +0000 (UTC), William Unruh wrote:
>
> Tried pdftotext on one of my latexed files. It simply came out complete
> garbage. The pdflatex file seemed to have been created as an image file,
> not a text file (I used dvipdf to create it).

Was it pdflatex or dvipdf? It can't be both.

Guido Milanese

unread,
Jul 14, 2014, 6:37:29 PM7/14/14
to
There is also a Perl script, texcount, included in several TeX
distributions. It runs also online with a web interface and offers a lot
of features and options. It counts words not from the Pdf file but from
your LaTeX source file.

See http://app.uio.no/ifi/texcount/

Best wishes,
Guido Milanese

William Unruh

unread,
Jul 14, 2014, 9:45:53 PM7/14/14
to
It was latex with dvipdf.


>

Robin Fairbairns

unread,
Jul 15, 2014, 6:58:38 PM7/15/14
to
i thought i had already pointed at the uk tex faq link
http://www.tex.ac.uk/cgi-bin/texfaq2html?label=wordcount
which lists that app (which is in tex live and miktex as well as on
ctan).

i was discussing today, with a uk tug colleague, whether it's worth
carrying on with the faq. it really does seem it's irrelevant to most
texies' work...
--
Robin Fairbairns, Cambridge

Axel Berger

unread,
Jul 15, 2014, 7:16:11 PM7/15/14
to
Robin Fairbairns wrote:
> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to
> most texies' work...

Is it? Whenever I hazily remember there was a solution for a problem
that has come up, but have forgotten what exactly it was, your FAQ is my
first and most reliable port of call. I use it regularly.

Axel

Timothy Murphy

unread,
Jul 15, 2014, 8:06:38 PM7/15/14
to
Robin Fairbairns wrote:

> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to most
> texies' work...

I am very grateful for it, as I find it most useful.

--
Timothy Murphy
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland

Mauro Orlandini

unread,
Jul 16, 2014, 3:15:45 AM7/16/14
to
Il Tue, 15 Jul 2014 23:58:38 +0100, Robin Fairbairns ha scritto:

> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to most
> texies' work...

The UK FAQ is THE place to go for any tex-related question. Full stop.

Thank you for maintaing it, Robin.

Ciao, Mauro

Nicola Talbot

unread,
Jul 16, 2014, 3:44:29 AM7/16/14
to
On 15/07/14 23:58, Robin Fairbairns wrote:

> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to most
> texies' work...
>

It's certainly not irrelevant to me. I use it and reference it. There
are some commands or code fragments that I don't use regularly enough to
remember the spelling or syntax, but I know they're described in the
faq, so that's the easiest place for me to look them up.

Thank you for all your hard work.

Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/

Robin Fairbairns

unread,
Jul 16, 2014, 7:22:05 AM7/16/14
to
Nicola Talbot <n.ta...@uea.ac.uk> writes:

> On 15/07/14 23:58, Robin Fairbairns wrote:
>
>> i was discussing today, with a uk tug colleague, whether it's worth
>> carrying on with the faq. it really does seem it's irrelevant to most
>> texies' work...
>>
>
> It's certainly not irrelevant to me. I use it and reference it. There
> are some commands or code fragments that I don't use regularly enough
> to remember the spelling or syntax, but I know they're described in
> the faq, so that's the easiest place for me to look them up.
>
> Thank you for all your hard work.

thank you, and alex, timothy and mauro, for your kind words.

however, the fact is that i can't maintain the faq in its present form,
in the medium term. i have been discouraged for a long time by the
observation that web crawlers (such as google and the like) make up
something like 99% of the traffic to the faq.

furthermore, if i post to somewhere (e.g., tex/sx) referencing the faq,
readers seem to ignore what i've written ... and more biting than that,
often follow up my answer with one saying "this is the way to do it", in
words that could have been copied from the faq.

my assumption is that, with a few exceptions, that the faq is somehow
"bad news" to today's users. i'm afraid this means that a rather large
activation energy is going to be required to restart work on a new
platform that uk tug hope to support. (the new machine will also
support the uk ctan node, since no-one anywhere -- other than here and
germany -- seems willing to undertake the ctan work.)

i retire at the end of september, and i had somehow imagined the faq
might help provide mental exercise for the coming idle hours. but if i
fail to get started, it's not going to help. :-(
--
Robin Fairbairns, Cambridge

Timothy Murphy

unread,
Jul 16, 2014, 9:46:00 AM7/16/14
to
Robin Fairbairns wrote:

> i retire at the end of september, and i had somehow imagined the faq
> might help provide mental exercise for the coming idle hours. but if i
> fail to get started, it's not going to help. :-(

When I think of the handful who keep TeX afloat,
I'm reminded of the lines by another Cambridge man:

These, in the day when heaven was falling,
The hour when earth’s foundations fled,
Followed their mercenary calling
And took their wages and are dead.

Their shoulders held the sky suspended;
They stood, and earth’s foundations stay;
What God abandoned, these defended,
And saved the sum of things for pay.

Well, substitute "retired" for "dead",
and take away the emphasis on pay ...

Peter Wilson

unread,
Jul 16, 2014, 1:05:43 PM7/16/14
to
On 15/07/14 23:58, Robin Fairbairns wrote:

> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to most
> texies' work...
>

I have rarely asked a LaTeX question in any of the forums as the answer
has nearly always been in the FAQ. Robin, you have saved me hours of
work over the years.

Thank you
Peter W.

John Harper

unread,
Jul 16, 2014, 6:28:38 PM7/16/14
to
Robin Fairbairns wrote:

> thank you, and alex, timothy and mauro, for your kind words.
>
> however, the fact is that i can't maintain the faq in its present form,
> in the medium term. i have been discouraged for a long time by the
> observation that web crawlers (such as google and the like) make up
> something like 99% of the traffic to the faq.

I plead guilty to that - it saves me having to remember the faq's http code.
But I agree with Alex, Timothy, Nicola, and Mauro that the faq is really
useful (especially when Lamport and Kopka & Daly don't help).

> furthermore, if i post to somewhere (e.g., tex/sx) referencing the faq,
> readers seem to ignore what i've written ... and more biting than that,
> often follow up my answer with one saying "this is the way to do it", in
> words that could have been copied from the faq.

I plead not guilty to that. And I wish you luck in finding a successor to
keep running it if you are prevented from doing that yourself. Cambridge may
have changed since I had a sabbatical there some years ago and found that my
computer access remained valid for several years. Does someone in authority
there imagine that LaTeX and TeX are lost causes and feel that Oxford is the
home of those?

--
John Harper

Denis Bitouzé

unread,
Jul 17, 2014, 2:25:20 AM7/17/14
to
Le jeu. 17 juil. 2014 à 00h28, John Harper <john....@vuw.ac.nz> a écrit :

>> however, the fact is that i can't maintain the faq in its present form,
>> in the medium term. i have been discouraged for a long time by the
>> observation that web crawlers (such as google and the like) make up
>> something like 99% of the traffic to the faq.
>
> I plead guilty to that - it saves me having to remember the faq's http code.
> But I agree with Alex, Timothy, Nicola, and Mauro that the faq is really
> useful (especially when Lamport and Kopka & Daly don't help).

I plenty agree as well: despite I'm not a LaTeX newbie, I have a look
more than once a week at this faq.

Thanks for all this very nice work!
--
Denis

Joseph Wright

unread,
Jul 17, 2014, 3:25:27 AM7/17/14
to
Agreed: certainly a vital resource that I'm confident UK-TUG will
support as best they can [it still having that vague UK-TUG link :-)]
--
Joseph Wright

Robin Fairbairns

unread,
Jul 17, 2014, 4:52:40 AM7/17/14
to
Joseph Wright <joseph...@morningstar2.co.uk> writes:

> On 16/07/2014 18:05, Peter Wilson wrote:
>> On 15/07/14 23:58, Robin Fairbairns wrote:
>>
>>> i was discussing today, with a uk tug colleague, whether it's worth
>>> carrying on with the faq. it really does seem it's irrelevant to most
>>> texies' work...
>>>
>>
>> I have rarely asked a LaTeX question in any of the forums as the answer
>> has nearly always been in the FAQ. Robin, you have saved me hours of
>> work over the years.
>
> Agreed: certainly a vital resource that I'm confident UK-TUG will
> support as best they can [it still having that vague UK-TUG link :-)]

not exactly "vague" -- it was originally a uk tug project, and the first
edition had contributions from every member of the committee. (that
first edition was printed, and collated and stapled-up manually by
jonathan fine and me, sitting at his kitchen table, which was less
cluttered than mine -- he didn't have clutter-generators in his house
(i.e., children).
--
Robin Fairbairns, Cambridge

blmblm.m...@gmail.com

unread,
Jul 17, 2014, 6:08:25 PM7/17/14
to
In article <qfbnsqd...@dev-rf10-linux.cl.cam.ac.uk>,
Robin Fairbairns <rf...@cl.cam.ac.uk> wrote:

[ snip ]

> i was discussing today, with a uk tug colleague, whether it's worth
> carrying on with the faq. it really does seem it's irrelevant to most
> texies' work...
>

One more voice here in the chorus of "not so!" I have your TeX FAQ
bookmarked, and it's one of the first places I look when I have a
question about TeX and friends. Google searches are all well and
good, and often do produce a useful result, but sometimes at the
cost of wading through too many "hits" of dubious value. I'd like
to hope there's still a place for sites such as your FAQ, where
the information is almost sure to be reliable.

Somewhat tangentially: I've noticed over the years I've followed
this group, sometimes more closely than others, that from time to
time you express discouragement about putting in a lot of work for
little return. I notice too that there's usually a (small?) chorus
of "no, no, don't give up!" replies, and -- well, I think it's not
impossible that they speak for a larger group of people like me, also
appreciative but not quite motivated enough to pipe up themselves.

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.

Peter Flynn

unread,
Jul 18, 2014, 4:44:43 PM7/18/14
to
On 07/17/2014 11:08 PM, blm...@myrealbox.com wrote:
[...]
> Somewhat tangentially: I've noticed over the years I've followed
> this group, sometimes more closely than others, that from time to
> time you express discouragement about putting in a lot of work for
> little return. I notice too that there's usually a (small?) chorus
> of "no, no, don't give up!" replies, and -- well, I think it's not
> impossible that they speak for a larger group of people like me,
> also appreciative but not quite motivated enough to pipe up
> themselves.

I think I posted earlier that I am certainly prepared to allocate some
time towards keeping Robin's work going. I can't do so until certain
other commitments are finished, which will be towards the end of this year.

///Peter

0 new messages