can Get Info be made to ignore HTML code?

3 views
Skip to first unread message

RobS

unread,
Dec 16, 2009, 2:44:44 PM12/16/09
to BBEdit Talk
Hi,

I'm trying to answer the question asked by a client of mine: can you
tell me how many words there are on my web site?

Sure, I said, I'll just ask BBEdit to tell me. But I suspected my all-
time favourite app was inflating the count a little, so to find out I
copied the text of one page of the site, as presented in a browser,
into a new BBE doc, counted it there and found it had 2159 words.
However, BBE reports the actual HTML file has 2970 words, so obviously
the count function is NOT ignoring the code.

Is there maybe a Pref somewhere I've missed? If not, can someone
suggest a better way to get a word count of all the pages on a rather
large site? I know how many words (419) there are in the metadata and
navigation stuff at the top of every page, so I can manually subtract
that, but even doing so I still get a higher count than I should -- by
almost 20%.

Rob

Charlie Garrison

unread,
Dec 16, 2009, 11:41:31 PM12/16/09
to bbe...@googlegroups.com
Good afternoon,

On 16/12/09 at 11:44 AM -0800, RobS <rstev...@accesscable.net> wrote:

>Is there maybe a Pref somewhere I've missed? If not, can someone
>suggest a better way to get a word count of all the pages on a rather
>large site?

Translate the file to text first: Markup -> Utilities -> Translate
Do the word count
Then Undo to get all the HTML back

I would expect that process could be scripted as well.


Charlie

--
Ꮚ Charlie Garrison ♊ <garr...@zeta.org.au>
〠 PO Box 141, Windsor, NSW 2756, Australia

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
http://www.ietf.org/rfc/rfc1855.txt

Dave

unread,
Dec 16, 2009, 11:05:36 PM12/16/09
to BBEdit Talk
Try Markup > Utilities > Remove Comments and Markup > Utilities >
Remove Markup, count the words, then undo.

If there are script and style elements, you'll need to remove them,
because they will add to the word count too.

RobS

unread,
Dec 18, 2009, 7:28:57 AM12/18/09
to BBEdit Talk
Thanks Dave and Charlie. I'll look into scripting that over the
holidays, and if I get it right, I'll post it here for others.

But I solved the client's request in the short term by first trying
out Word Count Plus, a plug-in for Firefox. It seemed to work well so
I suggested the client install it. They did and are happy to click it
whenever they wonder how many words they have on a page. It's manual
on a page by page basis, but the good thing is, they're doing it, not
me. ;-)

Rob

Carlton Gibson

unread,
Dec 18, 2009, 7:32:52 AM12/18/09
to bbe...@googlegroups.com

On 18 Dec 2009, at 12:28, RobS wrote:

> It's manual
> on a page by page basis, but the good thing is, they're doing it, not
> me. ;-)

Amen.

Johan Solve

unread,
Dec 18, 2009, 7:55:40 AM12/18/09
to bbe...@googlegroups.com
Copy from the preview window and paste inte a new text window and get info.

>--
>You received this message because you are subscribed to the
>"BBEdit Talk" discussion group on Google Groups.
>To post to this group, send email to bbe...@googlegroups.com
>To unsubscribe from this group, send email to
>bbedit+un...@googlegroups.com
>For more options, visit this group at
>http://groups.google.com/group/bbedit?hl=en
>If you have a feature request or would like to report a problem,
>please email "sup...@barebones.com" rather than posting to the group.


--
Johan Sölve [FSA Member, Lasso Partner]
Web Application/Lasso/FileMaker Developer
MONTANIA SOFTWARE & SOLUTIONS
http://www.montania.se mailto:jo...@montania.se
(spam-safe email address, replace '-' with 'a')

RobS

unread,
Dec 19, 2009, 7:34:54 AM12/19/09
to BBEdit Talk
On Dec 18, 8:55 am, Johan Solve <inbox...@solve.se> wrote:
> Copy from the preview window and paste inte a new text window and get info.

Thanks Johan. Even faster to select Preview In > New Text Window. :-)

Results so far...
a raw XHTML file - 86,008 words (it's a big one!)
Preview in text window - 60,815
copy/paste from web browser - 60,546
Word Count Plus plug-in - 59,922

I've no idea why those last two are so different, but that's not BBE's
fault.

Thanks all,

Rob

Bill Hernandez

unread,
Dec 19, 2009, 10:19:14 AM12/19/09
to bbe...@googlegroups.com

On Dec 19, 2009, at 6:34 AM, RobS wrote:

> copy/paste from web browser - 60,546

The web browser converts multiple spaces generally to single spaces unless you are using

<pre>
....
</pre>

I doubt that BBEdit is the source of the problem, it could be, but I doubt it...

If the file doesn't contain anything sensitive, send it to sup...@barebones.com, let them have a look at it.

Best Regards,

Bill Hernandez
Plano, Texas

Johan Solve

unread,
Dec 19, 2009, 1:49:37 PM12/19/09
to bbe...@googlegroups.com
At 04.34 -0800 2009-12-19, RobS wrote:
>On Dec 18, 8:55 am, Johan Solve <inbox...@solve.se> wrote:
>> Copy from the preview window and paste inte a new text window and get info.
>
>Thanks Johan. Even faster to select Preview In > New Text Window. :-)

Wow, I hadn't seen that... Thanks back :)


>Results so far...
>a raw XHTML file - 86,008 words (it's a big one!)
>Preview in text window - 60,815
>copy/paste from web browser - 60,546
>Word Count Plus plug-in - 59,922

Preview in > New Text Window outputs extra info for links, so that might explain some of the difference. I get 553 words instead of 499 in a simple test web page with some links in it.

Reply all
Reply to author
Forward
0 new messages