Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Word-processing challenge anyone?

11 views
Skip to first unread message

David Frank

unread,
Feb 13, 2005, 7:33:29 AM2/13/05
to
FYI I just posted below in comp.lang.fortran topic "Word-processing
challenge anyone? "
and I expect to get at least 1 solution in reply.

OTOH, despite the claims that PL/I has superior string-handling thats needed
in a word-processing application,
there wont be any solutions posted here in comp.lang.pl1 and its for sure
Weinkam cant produce a competitive
Cobol solution.

------------- below posted in comp.lang.fortran ----------
Want to give this a shot?
Get the text file at: http://patriot.net/~bmcgin/kjvpage.html
Using a text editor (notepad) remove text before
Book 01 Genesis and text after last word in Revelations, (amen)
producing bible.txt file containing
4,947,047 chars.

The challenge is to process bible.txt file into words array and count unique
word occurances in count array.
(this challenge was initiated by LR, whose C++ source/results will be posted
later, along with my own source/results)..

In this processing, convert all punctuation and numbers to blanks, and
uppercase to lower.
One punctuation exception is ' (within a word) is deleted leaving wife's
as wifes

When bible.txt is "thusly" processed I expect you shud get following outputs
total words = 789781
unique words = 12691
xx.xx Sec ?.?? Ghz PC

Pls post your time and PC speed..

! A few template statements in this word-processing benchmark challenge to
get started are:
! -----------------------------
program count_word_occurances ! in bible.txt
implicit none
integer,parameter :: maxw = 65536 ! or lower if possible
character(24) :: words(maxw)
integer :: i, n, counts(maxw)=0, t1(8), t2(8)

call date_and_time(values=t1) ! get benchmark start time

! open file='bible.txt' ...

! process word occurances into the 2 arrays words,counts
! until EOF as per your word-processing algorithm

call date_and_time(values=t2) ! get benchmark stop time

n = 0 ! count unique words found
do i = 1,maxw
if (counts(i) /= 0) n = n+1
end do

write (*,*) 'total words =',sum(counts)
write (*,*) 'unique words =',n

write (*,'(f0.2,a)') (t2(5)*3600.+t2(6)*60.+t2(7) +t2(8)/1000.) &
-(t1(5)*3600.+t1(6)*60.+t1(7) +t1(8)/1000.), ' Sec <- 2.8 Ghz PC'
end program

James J. Weinkam

unread,
Feb 13, 2005, 2:40:39 PM2/13/05
to
David Frank wrote:
> FYI I just posted below in comp.lang.fortran topic "Word-processing
> challenge anyone? "
> and I expect to get at least 1 solution in reply.
>
> OTOH, despite the claims that PL/I has superior string-handling thats needed
> in a word-processing application,
> there wont be any solutions posted here in comp.lang.pl1 and its for sure
> Weinkam cant produce a competitive
> Cobol solution.
>
I have never written a single line of COBOL code in my life and don't intend to
start now. You are confusing me with someone else.

William M. Klein

unread,
Feb 13, 2005, 3:33:20 PM2/13/05
to
I certainly will NOT post a COBOL solution in the PL/I newsgroup - for a
question asked in the Fortran newsgroup.

As soon as someone provides a valid specification in the COBOL newsgroup, I
suspect there will be a number of solutions provided.

NOTE:
I won't even ask in the Fortran newsgroup how to provide all the functionality
of the "built-in" Report-Writer feature of COBOL - which among other things
provides:

- page headers and footers
- control break headings
- running totals
- "print" (or screen) output based on column and line specifications

all produced by the single COBOL "generate" statement.

--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:420f4915$0$38863$ec3e...@news.usenetmonster.com...

William M. Klein

unread,
Feb 13, 2005, 5:38:47 PM2/13/05
to

"David Frank" <dave_...@hotmail.com> wrote in message
news:420f4915$0$38863$ec3e...@news.usenetmonster.com...
> FYI I just posted below in comp.lang.fortran topic "Word-processing
> challenge anyone? "
> and I expect to get at least 1 solution in reply.
>
> OTOH, despite the claims that PL/I has superior string-handling thats needed
> in a word-processing application,
> there wont be any solutions posted here in comp.lang.pl1 and its for sure
> Weinkam cant produce a competitive
> Cobol solution.
>
> ------------- below posted in comp.lang.fortran ----------
> Want to give this a shot?
> Get the text file at: http://patriot.net/~bmcgin/kjvpage.html
> Using a text editor (notepad) remove text before
> Book 01 Genesis and text after last word in Revelations, (amen)
> producing bible.txt file containing
> 4,947,047 chars.
>

Are you counting CR, LF, and other record separators in your

4,947,047 chars

count? I am NOT counting those - and I get

4,719,042

characters in the file (when only dealing with characters IN each record)

William M. Klein

unread,
Feb 13, 2005, 6:59:38 PM2/13/05
to
More "specification" clarification needed.

1) Am I correct that you are ignoring (in your "words" the "00n : 00n" in
columns 1-7 (i.e. chapter and verse information)

2) Do you want to count "1" in "1 Samuel" and "2" in "2 Samuel" (etc) as words?

3) Are you treating hyphenated words as a single word, e.g.
"BRICK-KILN" as one or two words in the line

"IRON AND MADE THEM PASS THROUGH THE BRICK-KILN AND THUS DID "

--
Bill Klein
wmklein <at> ix.netcom.com

"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:XFQPd.1786116$B07.2...@news.easynews.com...

William M. Klein

unread,
Feb 13, 2005, 7:14:16 PM2/13/05
to
OOPS - my mistake (on questions 1 and 2) - I just noticed the requirement to
convert numbers to spaces (before counting)

I still need to find out about hyphenated words.

--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message

news:KRRPd.1791168$B07.2...@news.easynews.com...

William M. Klein

unread,
Feb 13, 2005, 10:32:29 PM2/13/05
to
I have been able to reproduce SOME of the expected results of:

> total words = 789781
> unique words = 12691

My COBOL program does get 12691 for "unique" words (when I treat hyphenated
words as TWO separate words),

However, I am only getting
+789715

input words. I wonder what the original poster was counting that I am not?
(Possibly some the CR/LF or other record delimiters)? (Or could they be couting
input words BEFORE eliminating numbers?)

For access to my acutal source code, see (separate) post in comp.lang.cobol

--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message

news:KRRPd.1791168$B07.2...@news.easynews.com...

ep...@juno.com

unread,
Feb 13, 2005, 10:36:07 PM2/13/05
to
I considered hyphen as a word delimiter and got the same results as the
OP. (Sorry, but the program was not written in COBOL.) The problem here
seems to be apostrophe embedded in words. It's far from obvious to me
how to squeeze them out simply in COBOL.

ep...@juno.com

unread,
Feb 13, 2005, 10:40:20 PM2/13/05
to
William M. Klein wrote:

> My COBOL program does get 12691 for "unique" words (when I treat
hyphenated
> words as TWO separate words), However, I am only getting +789715
input words.

I have not seen your source code yet due to Google latency, but I
suspect that you are not counting the word 'book' that appears in the
first 8 columns. It appears 66 times. That added to your total gives
the correct number of total words.

William M. Klein

unread,
Feb 13, 2005, 10:49:21 PM2/13/05
to
Thanks for the correction. You are right.

I asked (earlier) about what to do with columns 1-8 - and I decided just to
ignore them (before seeing the rule about converting numbers to spaces).

My code is now posted (and could EASILY be changed if I really wanted to pick up
those "BOOK" in columns 1-8)

For those NOT in comp.lang.cobol, the source code (with LOTS of extraneous
debugging information) is at:
http://home.comcast.net/~wmklein/DOX/WORDCNT.txt

However, if you aren't reading this in comp.lang.cobol, please note that I
commented there on lots of "performance tuning" things that could be done - if
that was the goal. (Also it would be prettier - if using some of the extensions
available in many PC and Unix COBOL compilers)

For the person asking about "squeezing out" apostrophes, this is done with
INSPECT TALLYING (to figure out where they are) and then REFERENCE MODIFICATION
to move the data - to get rid of them.

--
Bill Klein
wmklein <at> ix.netcom.com

<ep...@juno.com> wrote in message
news:1108352420.7...@c13g2000cwb.googlegroups.com...

William M. Klein

unread,
Feb 13, 2005, 11:09:07 PM2/13/05
to
Corrected COBOL code (to handle "BOOK" in columns 1-8) is now posted and
available (to anyone with a COBOL compiler <G> ... Sorry about that PL/I and
Fortran people) at:

http://home.comcast.net/~wmklein/DOX/WORDCNT.txt

For those who don't like "verbose" COBOL, there are LOTS of ways this could be
cut down. I coded this for "understandability" (for the average COBOL
programmer) - not for "terseness" - which could be done *IF* I saw that as a
desirable goal.

As stated in another post, this is "fully conforming" ANSI/ISO 1985 (not even
2002) source code with the single exception that it expects "line sequential"
input file. (Removing the single word "line" from the Select/Assign clause)
would make this valid source code for any "mainframe" or other environment using
"record sequential" text files.

--
Bill Klein
wmklein <at> ix.netcom.com

"William M. Klein" <wmk...@nospam.netcom.com> wrote in message

news:5dVPd.1804097$B07.2...@news.easynews.com...

Richard

unread,
Feb 13, 2005, 11:18:13 PM2/13/05
to
> there on lots of "performance tuning" things that could be done - if
that was the goal.

With these types of 'challenge' it is likely that the goal is simply
'my language is better than yours'. Responding even with a program
that is only fractionally slower or has marginally more lines or only
slightly confuses the setter (compared to the one that they wrote
themselves) simply proves (in the setter's mind) that theirs really is
better.

Unless you can blow them away with a 3 line program that runs 10 times
faster and could be understood by a 5 year old then there is probably
no point in feeding their ego.

glen herrmannsfeldt

unread,
Feb 14, 2005, 2:15:57 AM2/14/05
to
Richard wrote:

Here is how I do it in AWK, including removing the text before
(but not including) the Genesis line. My first version also removed
that one, as I thought that was requested, and also the lines
at the end. It makes sense for programs to do all the work, instead
of only part of it.

The actual unique word identification is done by the statement

words[tolower($i)]++;

and counting unique words by the statement

for(i in words) w++;

Three statements to not count the beginning and end.

Until PL/I, Fortran or COBOL get associative arrays, it will take
a lot more code to do problems like this. It isn't hard to do in Java
as there is a Hashtable class in the standard class libraries.

A few years ago I did one like this in Java, including printing out the
high frequency words on about 18GB of text. It ran for about a day,
which isn't much longer than it takes to read in the whole file.

BEGIN {start=systime();}
/Genesis/ { flag=1;}
/^End/ { flag=0;}
{
if(!flag) next;
gsub("'","");
gsub("[^A-Za-z]"," ");
t += NF;
for(i=1;i<=NF;i++) words[tolower($i)]++;
}
END {
for(i in words) w++;
print "total words",t,"unique words",w,"in",systime()-start,"Sec";
}

total words 789781 unique words 12691 in 18 Sec on a 350MHz P-II.

-- glen

David Frank

unread,
Feb 14, 2005, 4:21:46 AM2/14/05
to

"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
news:6eGdnYleAJG...@comcast.com...

Read the challenge,
you havent loaded 2 arrays ( words,counts ) with individual word
identifiers and count of word occurances


William M. Klein

unread,
Feb 14, 2005, 5:35:12 AM2/14/05
to
"David Frank" <dave_...@hotmail.com> wrote in message
news:42106da7$0$38857$ec3e...@news.usenetmonster.com...

>
> "glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
> news:6eGdnYleAJG...@comcast.com...
>> Richard wrote:
<snip>

>
> Read the challenge,
> you havent loaded 2 arrays ( words,counts ) with individual word identifiers
> and count of word occurances

D.F.
Are you REALLY as dumb as you appear - or do you just try to appear dense?

Don't you understand that programming languages are designed to SOLVE
applicaiton problems NOT to "implement" specific methodology or algorithms in
the internal code used to solve those problems.

The "awk" solution SOLVES the application problem (counting all and unique words
in the source file) and that was the point of the original specifications. The
fact that it did NOT ened to (internally) create arrays has NOTHING to do with
"meeting the challenge" - as it specified a "logic" problem..

So, if you really can't understand the difference, then can you provide a
Fortran solution that provides the correct "output" withOUT creating any
internal arrays? Don't ask me why this would be important but if you think that
internal methodology is so important, then you need to provide a solution to
meet the (better?) "awk" way of working.

Tim Challenger

unread,
Feb 14, 2005, 4:25:55 AM2/14/05
to
On 13 Feb 2005 20:18:13 -0800, Richard wrote:

>> there on lots of "performance tuning" things that could be done - if
> that was the goal.
>
> With these types of 'challenge' it is likely that the goal is simply
> 'my language is better than yours'. Responding even with a program
> that is only fractionally slower or has marginally more lines or only
> slightly confuses the setter (compared to the one that they wrote
> themselves) simply proves (in the setter's mind) that theirs really is
> better.

Ah, you've seen some of Fuckwit Frank's posts already then?

> Unless you can blow them away with a 3 line program that runs 10 times
> faster and could be understood by a 5 year old then there is probably
> no point in feeding their ego.

Even then it wouldn't work as he'd change the goalposts.

--
Tim C.

William M. Klein

unread,
Feb 14, 2005, 11:34:16 AM2/14/05
to
Then, I guess, your challenge is about as useful as my asking you to produce the
count output using an UNSTRING statement (which is the statement used in many
COBOL solutions).

Any cross-language "challenge" is useful ONLY to the extent that it actually
challenges multiple programming languages to SOLVE a problem - not if they
pre-determine HOW the problem is to be solved.

Your challenge just shows (again) your inability to write an application
specification - and then to understand multiple solutions to the specification.

--
Bill Klein
wmklein <at> ix.netcom.com

"David Frank" <dave_...@hotmail.com> wrote in message

news:4210897c$0$39301$ec3e...@news.usenetmonster.com...


>
> "William M. Klein" <wmk...@nospam.netcom.com> wrote in message

> news:A9%Pd.2120565$f47.3...@news.easynews.com...

> Its not my fault if participants are unable to read the posted challenge's
> statement that 2 arrays are to be produced. In order to enforce this
> challenge, I am extending the outputs as shown in my Fortran solution/results
> at
>
> http://home.cfl.rr.com/davegemini/wp_bible.f90
>
> which, IMO makes it clear that 2 arrays are to be produced, and that
> furthermore the results are to written to a file.
>
> btw,
> This is first time I have ever coded/tested a hash algorithm of my own design.
> I tried a couple found on the net, and didnt get anywhere near the minimum
> collisions my unique algorithm produces. Hopefully someone will post a MUCH
> better hash function that has very few collisions processing bible.txt
>
>


ep...@juno.com

unread,
Feb 14, 2005, 9:46:27 AM2/14/05
to
> you havent loaded 2 arrays ( words,counts ) with individual word
> identifiers and count of word occurances

Actually the AWK program does just that except that it fills a hash
table with the WORDS as subscripts and the counts as DATA. But you can
think of them as two parallel arrays if you want!

glen herrmannsfeldt

unread,
Feb 14, 2005, 10:56:29 AM2/14/05
to
David Frank wrote:
(snip)

> Read the challenge,
> you havent loaded 2 arrays ( words,counts ) with individual word
> identifiers and count of word occurances

Why waste two arrays when one will do?

The array stores the counts, the subscript is the word.

-- glen

David Frank

unread,
Feb 14, 2005, 6:20:32 AM2/14/05
to

"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:A9%Pd.2120565$f47.3...@news.easynews.com...

Its not my fault if participants are unable to read the posted challenge's

Mark Yudkin

unread,
Feb 15, 2005, 1:40:55 AM2/15/05
to
Clearly, since Fortran doesn't have associative arrays, you are incapable of
understanding them, and thus continue to post your usual crap.
And, no I am not going to post a PL/I program. I have better things to do
than respond to non-challenges to write silly little programs, whose
originator isn't even capable of understanding a solution than beats shit
out of his.

"David Frank" <dave_...@hotmail.com> wrote in message

news:42106da7$0$38857$ec3e...@news.usenetmonster.com...


>
> "glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
> news:6eGdnYleAJG...@comcast.com...

Mark Yudkin

unread,
Feb 15, 2005, 1:42:26 AM2/15/05
to
Fuckwit Frank already has changed the subgoals. He now wants *two* arrays.

"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108372881.c2a8b0e0945b15b9ef1c989af1351ca4@teranews...

David Frank

unread,
Feb 15, 2005, 3:26:08 AM2/15/05
to

"David Frank" <dave_...@hotmail.com> wrote in message
news:420f4915$0$38863$ec3e...@news.usenetmonster.com...

>
> The challenge is to process bible.txt file into words array and count
> unique
> word occurances in count array.
> (this challenge was initiated by LR, whose C++ source/results will be
> posted
> later, along with my own source/results)..
>

Here is my Fortran source/results as promised:

http://home.cfl.rr.com/davegemini/wp_bible.f90

I wont post LR's lengthy and imo unreadable final solution, but will leave
it to him to post.
Believe me its a p.o.s. " pile of scary syntax"


David Frank

unread,
Feb 15, 2005, 3:47:56 AM2/15/05
to

"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:cq4Qd.1866806$B07.2...@news.easynews.com...

> Then, I guess, your challenge is about as useful as my asking you to
> produce the count output using an UNSTRING statement (which is the
> statement used in many COBOL solutions).
>
> Any cross-language "challenge" is useful ONLY to the extent that it
> actually challenges multiple programming languages to SOLVE a problem -
> not if they pre-determine HOW the problem is to be solved.
>
> Your challenge just shows (again) your inability to write an application
> specification - and then to understand multiple solutions to the
> specification.
>
> --
> Bill Klein
> wmklein <at> ix.netcom.com

Sooo, does your FINAL? solution posted in YOUR topic, write its results to
a words.cnt file
as I requested in response to your re-hashed spec criticisms above?


Tim Challenger

unread,
Feb 15, 2005, 3:53:33 AM2/15/05
to
On Tue, 15 Feb 2005 03:47:56 -0500, David Frank wrote:

> Sooo, does your FINAL? solution posted in YOUR topic, write its results to
> a words.cnt file
> as I requested in response to your re-hashed spec criticisms above?

That's YOUR addition, it's not specified in the original.

--
Tim C.

David Frank

unread,
Feb 15, 2005, 4:15:12 AM2/15/05
to

"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108457337.7276c74e3a8f715b412a0fb4f3c1ebf0@teranews...

Well yes it is my addition which he has since responded to with a Cobol
version that doesnt appear to
write results to file. Instead he continues to criticize "the spec" and I
presume he will continue to do so.

Writing results allows awk,perl etc to "do it their way without 2 arrays"
and still prove they have counted occurances of unique words. Plus its
obviously the right spec extension to get something useful at no extra
charge.

Where are the Awk, Perl solutions that write results to file?
Where is LR's C++ solution? (He initially challenged me to write this
Fortran program)..
Where is your or anyone's PL/I solution?

Tim Challenger

unread,
Feb 15, 2005, 4:22:12 AM2/15/05
to
On Tue, 15 Feb 2005 04:15:12 -0500, David Frank wrote:

> "Tim Challenger" <tim.cha...@aon.at> wrote in message
> news:1108457337.7276c74e3a8f715b412a0fb4f3c1ebf0@teranews...
>> On Tue, 15 Feb 2005 03:47:56 -0500, David Frank wrote:
>>
>>> Sooo, does your FINAL? solution posted in YOUR topic, write its results
>>> to
>>> a words.cnt file
>>> as I requested in response to your re-hashed spec criticisms above?
>>
>> That's YOUR addition, it's not specified in the original.
>>
>> --
>> Tim C.
>
> Well yes it is my addition which he has since responded to with a Cobol
> version that doesnt appear to
> write results to file. Instead he continues to criticize "the spec" and I
> presume he will continue to do so.

Whereas you keep changing it. So that's fair isn't it?

--
Tim C.

William M. Klein

unread,
Feb 15, 2005, 4:22:19 AM2/15/05
to
No it displays the results to the screen.

Do you have problems reading a screen? That *might* explain many of your posts.

P.S. Of course any COBOL programmer could add a "write" statement instead of a
"display" statement. - or on systems that support such things "redirect" the
DISPLAY output to a file (called anything that the O/S supports)

--
Bill Klein
wmklein <at> ix.netcom.com

"David Frank" <dave_...@hotmail.com> wrote in message

news:4211b73e$0$38884$ec3e...@news.usenetmonster.com...

William M. Klein

unread,
Feb 15, 2005, 4:25:15 AM2/15/05
to
My first solution (still evadible on the web)DID create an output file with the
actual words. This must be useful to someone to have after running the
program - but it is sure hard to figure out WHY someone would need it.

--
Bill Klein
wmklein <at> ix.netcom.com

"David Frank" <dave_...@hotmail.com> wrote in message

news:4211bda2$0$39274$ec3e...@news.usenetmonster.com...

Sven Axelsson

unread,
Feb 15, 2005, 5:37:46 AM2/15/05
to
On Tue, 15 Feb 2005 04:15:12 -0500, David Frank wrote:

(snip)

> Writing results allows awk,perl etc to "do it their way without 2 arrays"
> and still prove they have counted occurances of unique words. Plus its
> obviously the right spec extension to get something useful at no extra
> charge.
>
> Where are the Awk, Perl solutions that write results to file?
> Where is LR's C++ solution? (He initially challenged me to write this
> Fortran program)..
> Where is your or anyone's PL/I solution?

Sorry for feeding the troll, but this is too much fun to pass up. Frank, do
you really think outputting the words and counts is "difficult" in any way?

I could do it with a single statement in both Perl and Python, but I'll use
a loop to make it easier to understand. Here are amended versions.

Perl:
--------
use Time::HiRes qw(gettimeofday tv_interval);

$totwords = 0;
%words = ();
$start = [gettimeofday];

open FILE, "<bible12.txt";
while (<FILE>) {
s/'//g;
foreach $word (split(/\W+/, lc)) {
if ($word =~ /^[a-z]+$/) {
$totwords++;
$words{$word}++;
}
}
}

open OUTFILE, ">words.cnt";
foreach $word (sort keys %words) {
print OUTFILE "$word $words{$word}\n"
}

printf("total words : %d\n", $totwords);
printf("unique words : %d\n", scalar(keys %words));
printf("'god' : %s\n", $words{"god"});
printf("collisions : Who cares?\n");
printf("time : %f s.\n", tv_interval($start));
print("1.5 GHz P4 Mobile");
--------


total words : 789781
unique words : 12691

'god' : 4446
collisions : Who cares?
time : 2.987000 s.
1.5 GHz P4 Mobile
--------

Python:
--------
import re, time

words = {}
line_split = re.compile("\W+")
word_test = re.compile("^[a-z]+$")
start = time.clock()

for line in open("bible12.txt").xreadlines():
for word in line_split.split(line.replace("'", "").lower()):
if word_test.match(word):
try: words[word] += 1
except KeyError: words[word] = 1

sorted_keys = words.keys()
sorted_keys.sort()
outfile = open("words.cnt", "w+")
for word in sorted_keys:
print >> outfile, word, words[word]

print "total words : %d" % sum(words.values())
print "unique words : %d" % len(words)
print "'god' : %d" % words["god"]
print "time : %f s." % (time.clock() - start)
print "1.5 GHz P4 Mobile"
--------


total words : 789781
unique words : 12691

'god' : 4446
time : 4.546000 s.
1.5 GHz P4 Mobile

/s. axelsson

David Frank

unread,
Feb 15, 2005, 5:52:01 AM2/15/05
to

"Sven Axelsson" <sve...@SPAMmurrays.HEREnu> wrote in message
news:1108463857.aba1f57af0980c0ed16d00b05027d96c@teranews...

OK, finally ...
Those are impressive results especially for Perl. (did you suspect it would
out-perform Python?)

FWIW, I modified my Fortran to use 1 array via a type declaration, I would
have done it initially but thought there would be significant time penalty,
it turns out there are not, ( adds .02 sec to runtime)..
So I have replaced my previous source/results with the 1 array version..

http://home.cfl.rr.com/davegemini/wp_bible.f90

Sven Axelsson

unread,
Feb 15, 2005, 7:09:43 AM2/15/05
to
On Tue, 15 Feb 2005 12:05:24 +0100, Sven Axelsson wrote:

> On Tue, 15 Feb 2005 05:52:01 -0500, David Frank wrote:
>
>> OK, finally ...
>> Those are impressive results especially for Perl. (did you suspect it would
>> out-perform Python?)
>

> Yes. Since Perl's regular expression handling is highly optimized and more
> integrated with the language than in Python. I actually thought the
> difference would be even bigger.
>
> /s. axelsson

And just to see how the algorithm choosen affects the languages
differently:

Perl (change main loop):
--------
while (<FILE>) {
$_ = lc;
s/'//g;
s/[^a-z]+/ /g;
foreach $word (split) {
$totwords++;
$words{$word}++;
}
}
--------
time : 2.755350 s.

Python (change main loop):
--------
trim_nws = re.compile("[^a-z]+")


for line in open("bible12.txt").xreadlines():

for word in re.sub(trim_nws, " ",
line.lower().replace("'", "")).split():


try: words[word] += 1
except KeyError: words[word] = 1

--------
time : 3.523639 s.

So, cutting down on the use of regular expressions gains more for Python
than for Perl.

/s. axelsson

Sven Axelsson

unread,
Feb 15, 2005, 6:05:24 AM2/15/05
to
On Tue, 15 Feb 2005 05:52:01 -0500, David Frank wrote:

> OK, finally ...
> Those are impressive results especially for Perl. (did you suspect it would
> out-perform Python?)

Yes. Since Perl's regular expression handling is highly optimized and more

robin

unread,
Feb 15, 2005, 7:43:40 AM2/15/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Mon, 14 Feb 2005 06:20:32 -0500

| Its not my fault if participants are unable to read the posted challenge's
| statement that 2 arrays are to be produced. In order to enforce this
| challenge, I am extending the outputs as shown in my Fortran
| solution/results at
| http://home.cfl.rr.com/davegemini/wp_bible.f90
| which, IMO makes it clear that 2 arrays are to be produced, and that
| furthermore the results are to written to a file.

.
Once agan, you are changing the specs AFTER the fake "challenge".
(Is this a fishing expedition?)


.
| btw,
| This is first time I have ever coded/tested a hash algorithm of my own
| design. I tried a couple found on the net, and didnt get anywhere near the
| minimum collisions my unique algorithm produces. Hopefully someone will
| post a MUCH better hash function that has very few collisions processing
| bible.txt

.
And now, yet another change - hashing.

LR

unread,
Feb 15, 2005, 6:21:42 PM2/15/05
to
David Frank wrote:

> Where is LR's C++ solution? (He initially challenged me to write this
> Fortran program)..

You're kidding about that right? I sent you my original code. You never
managed to duplicate what my original C++ does.

LR

glen herrmannsfeldt

unread,
Feb 15, 2005, 10:48:42 PM2/15/05
to
William M. Klein wrote:

> My first solution (still evadible on the web)DID create an output file with the
> actual words. This must be useful to someone to have after running the
> program - but it is sure hard to figure out WHY someone would need it.

I did almost this program for a real problem not so long ago.
I was working on text searching, and needed some statistics on the
text we were using. I sorted it and printed out the highest frequency
words, not the whole list.

Mine had many more words, as I had about 18GB of text.

I also did a Zipf's law test with the results, which would make it
a little more applicable to a scientific programming language.

OK, add that to the requirements. Fit the data to one form of
Zipf's law and print out the results of the fit.

-- glen

William M. Klein

unread,
Feb 16, 2005, 12:28:23 AM2/16/05
to
It's always nice to admit my ignorance (as opposed to one other who posts in
this forum - claiming he "knows all").

I certainly didn't (don't) know what "Zipf's law" is. I looked on the web and
read a couple of hits - and can honestly say, that I (personally) couldn't write
a program for it in ANY programming language - as I still have no idea "how it
really works".

P.S. I think my reply to the original AWK solution got lost (and/or I posted it
to the wrong place).

I am QUITE happy to say that awk, perl, rexx, python, etc are ALL more likely to
be the "right tool for the right job" in a "text" processing application.
(COBOL, historically, was targeted at a "business data processing" environment.)
Given a detailed spec of an application requirement, I can often/usually find a
COBOL solution (unless it is highly scientific - and even that is now easier
with some of the intrinsic function added in '89 and '02 to COBOL). However, I
certainly do NOT claim that COBOL is always the BEST language to meet a specific
application task. Providing a COBOL solution to a (well specified) "application
specification" to prove it CAN be done (in a "reasonably well performing" - and
portable way) can be fun, but it certainly doesn't prove to me which language is
"better" than others.

--
Bill Klein
wmklein <at> ix.netcom.com

"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message

news:apudnVk2Gcs...@comcast.com...

David Frank

unread,
Feb 16, 2005, 6:03:58 AM2/16/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:08mQd.162333$K7.1...@news-server.bigpond.net.au...

My 1st solution posted here used hashing, there has been no change.
Otoh, you have proven over and over that when you can translate a challenge,
YOU DO!!
when you cant, YOU BLATHER!!

You dont have to use hashing , IF you have a better/faster way, but YOU
DONT!!


David Frank

unread,
Feb 16, 2005, 6:35:49 AM2/16/05
to

"LR" <lr...@superlink.net> wrote in message
news:auvQd.1112$Sd.2...@newshog.newsread.com...

Others here have stated over and over that the result is what counts <pun>
not the methodology used.
However rest assured they wont berate you for bad-mouthing me about not
using your map algorithm,


Re: some recent runtimes for your map version vs. your version of my hash
method.

I think you are getting your runtimes mixed up. The last map time was 3.2
sec?
and now my algorithm coded in c++ ran on your pc in 1.7 sec ?

I ran your 123.exe and its a remarkable 583 = 0.58 sec
How can you not be impressed with a algorithm thats 1/2 ? the runtime of
your map version
PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
ability to adapt this algorithm for dynamic allocation (and I see in your
123.cpp you already have to a great degree)

You are in a dilemma, on one hand you have MY algorithm coded thats runs in
1/2 the time of your MAP version.
which of these are you going to post as YOUR C++ solution?

I have uploaded a revised http://home.cfl.rr.com/davegemini/wp_bible.f90

that ALSO uses command line file name as per your change, and outputs
counts to screen (can be
re-directed to file) and runs in 2.14 sec


LR

unread,
Feb 16, 2005, 10:50:46 AM2/16/05
to
David Frank wrote:
> "LR" <lr...@superlink.net> wrote in message
> news:auvQd.1112$Sd.2...@newshog.newsread.com...
>
>>David Frank wrote:
>>
>>
>>
>>
>>>Where is LR's C++ solution? (He initially challenged me to write this
>>>Fortran program)..
>>
>>You're kidding about that right? I sent you my original code. You never
>>managed to duplicate what my original C++ does.
>>
>>LR
>
>
> Others here have stated over and over that the result is what counts <pun>
> not the methodology used.
> However rest assured they wont berate you for bad-mouthing me about not
> using your map algorithm,


I'm not bad mouthing you. I'm stating a simple fact. And the
requirement wasn't to use the map algorithm, although, you would have to
use something very much like it to match the functionality. OTOH,
std::string is likely to be causing you problems for a while.


>
>
> Re: some recent runtimes for your map version vs. your version of my hash
> method.
>
> I think you are getting your runtimes mixed up. The last map time was 3.2
> sec?
> and now my algorithm coded in c++ ran on your pc in 1.7 sec ?
>
> I ran your 123.exe and its a remarkable 583 = 0.58 sec
> How can you not be impressed with a algorithm thats 1/2 ? the runtime of
> your map version

Who said I wasn't impressed? But speed isn't the be all and end all of
programming. At least not for me. Besides which that speed increase
doesn't come for free. Will the effort to get it pay for itself over
the lifetime of the code? I doubt it. We aren't discusisng weather
forcasting here.

Sure, lot's of fun for the sake of fun, but I suspect that any
programmer who spent as much time, as I did on this, at work, would be
getting a bad review. (Not to mention that IMO the final C++ code I
wrote that more or less duplicates what you wrote is not very pretty and
somewhat less general.)


> PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
> ability to adapt this algorithm for dynamic allocation (and I see in your
> 123.cpp you already have to a great degree)

Not quite sure what you mean by this.


>
> You are in a dilemma, on one hand you have MY algorithm coded thats runs in
> 1/2 the time of your MAP version.
> which of these are you going to post as YOUR C++ solution?

If I were to post something that off topic, I'd post the original
std::map<std::string,int> thing, because it is so much more readable (by
any halfway decent c++ programmer) and so much more maintainable. There
are certainly times when speed is important. But this probably isn't
one of them. I read somewhere that programs spend around 90%, or was
that 99%, of their life cycle in either maintenance or upgrade. Given
that, in most cases, I'd gladly trade a little speed for readibility.


>
> I have uploaded a revised http://home.cfl.rr.com/davegemini/wp_bible.f90
>
> that ALSO uses command line file name as per your change, and outputs
> counts to screen (can be
> re-directed to file) and runs in 2.14 sec


The important thing, is that you _never_ managed to duplicate the
functionality that exists in the original C++ program I sent you. Can
you do it? Or is this a FORTRAN CAN'T?

LR

David Frank

unread,
Feb 16, 2005, 1:52:21 PM2/16/05
to

"LR" <lr...@superlink.net> wrote in message
news:qZJQd.2440$h06.4...@monger.newsread.com...

>
> The important thing, is that you _never_ managed to duplicate the
> functionality that exists in the original C++ program I sent you. Can you
> do it? Or is this a FORTRAN CAN'T?
>
> LR

There you go again, my program processes/outputs what your program
processes/outputs..
If you want to post a link to a MUCH larger text file than bible.txt, be my
guest, but
I already have a solution that handles MUCH larger text files than bible.txt
waiting in the wings.


David Frank

unread,
Feb 16, 2005, 2:34:37 PM2/16/05
to

"LR" <lr...@superlink.net> wrote in message
news:qZJQd.2440$h06.4...@monger.newsread.com...

> David Frank wrote:
>
>> PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
>> ability to adapt this algorithm for dynamic allocation (and I see in your
>> 123.cpp you already have to a great degree)
>
> Not quite sure what you mean by this.
>

Sure you do, I declare my strings with fixed length = 30, you dont.


LR

unread,
Feb 16, 2005, 2:58:39 PM2/16/05
to
David Frank wrote:

> "LR" <lr...@superlink.net> wrote in message
> news:qZJQd.2440$h06.4...@monger.newsread.com...
>
>
>>The important thing, is that you _never_ managed to duplicate the
>>functionality that exists in the original C++ program I sent you. Can you
>>do it? Or is this a FORTRAN CAN'T?
>>
>>LR
>
>
> There you go again, my program processes/outputs what your program
> processes/outputs..

As far as I have seen you never produced a program that did what the
original C++ I sent you did, including writing out the results in sorted
order. I think that if you attempt to do this, you'd have to include
whatever sorting you need to do in your timings.


> If you want to post a link to a MUCH larger text file than bible.txt, be my
> guest, but

No. I don't need to do that, here's one line of code from
http://home.cfl.rr.com/davegemini/wp_bible.f90

character(30) :: fname, word, ch*1

If the filename I pass on the argument line is longer than 30 characters
your code won't work.

If any word in the text file is longer than 30 characters your code
won't work. Say for example supercalifragilisticexpialodocious.

But ok, make 'word' larger, much larger, so what? You'll still have to
make it a finite size. Make it really long, say 65k, and you'll
probably be out of memory while you're loading. 65k*65k equals what?

Not to mention what happens if you have more entries in your table than 65k.

Of course, at somepoint std::string won't be able to get bigger either.
But at least, I don't have to have every string in my table reserve
the same amount of memory. And there are mechanisms in C++ to make
std::map read/write the disk. Although I understand that's a lot of work.


> I already have a solution that handles MUCH larger text files than bible.txt
> waiting in the wings.

I'm not sure I understand why your basic algorithm needs to be rewritten
because of file size. I do understand why your code needs to be
rewritten to handle a) larger words, b) files with more than 'maxw' words.

And the best part? I'll still have std::map to do other things with if
I need it. I'm still particularly fond of
std::map<std::set<int>,std::set<int> > for some problems I face in the
real world. Your hash is really rather specific. Maintenance costs money.

LR

William M. Klein

unread,
Feb 16, 2005, 3:29:42 PM2/16/05
to
LR,
Have you posted (or could you provide) the actual "specs" for your original
program? All I have ever seen is the version that DF original posted in
comp.lang.pl1. (I never even say what he wanted in is "word.cnt" file).

I have a version that can create a "sorted and unique output" file and that can
handle arbitrary size input files. However, even this program has an
"arbitrary" limit to each text line (maximum, not that each line needs to be
that size) as well as an arbitrary limit on word size. (COBOL is *very* poor in
handling strings of potentially infinite-minus-one sizes. It also has a
(currently) a limit on the maximum number of words per line - but this could be
fixed more easily.

The COBOL code (does) provide "user friendly" detection and reporting of when
the limits are exceeded, but I would still like to know what the original spec
was.

LR

unread,
Feb 16, 2005, 7:18:13 PM2/16/05
to
David Frank wrote:

Ok, I think I understand now. You're saying that std::string is a
FORTRAN CAN'T.

LR
>
>

glen herrmannsfeldt

unread,
Feb 16, 2005, 10:40:53 PM2/16/05
to
William M. Klein wrote:
> It's always nice to admit my ignorance (as opposed to one other who posts in
> this forum - claiming he "knows all").
>
> I certainly didn't (don't) know what "Zipf's law" is. I looked on the web and
> read a couple of hits - and can honestly say, that I (personally) couldn't write
> a program for it in ANY programming language - as I still have no idea "how it
> really works".

In the original Zipf's law, if you take a ranked list of almost anything
where the counts are large enough to have a statistical distribution,
populations of cities, states, countries, or word frequencies in
a document, the distribution is proportional to 1/n where n is the
rank.

In an improved Zipf's law the frequency is proportional to n**-a,
for a near 1. Using this, then, one can do a one variable
least squares fit to find a.

-- glen

William M. Klein

unread,
Feb 17, 2005, 12:21:32 AM2/17/05
to
Thanks for the explanation.

--
Bill Klein
wmklein <at> ix.netcom.com
"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message

news:eKCdnW4mcbW...@comcast.com...

David Frank

unread,
Feb 17, 2005, 2:42:13 AM2/17/05
to

"LR" <lr...@superlink.net> wrote in message
news:9pRQd.2458$h06.4...@monger.newsread.com...

Dynamic string declaration/assignment is in the new Fortran standard as I
have informed you several times.
You obviously have a mental block against remembering this info.

character(:),allocatable :: string = 'the lazy dog'
write (*,*) len(string) ! = 12
string = 'hello'
write (*,*) len(string) ! = 5


David Frank

unread,
Feb 17, 2005, 3:36:51 AM2/17/05
to

"David Frank" <dave_...@hotmail.com> wrote in message
news:4211b222$0$38903$ec3e...@news.usenetmonster.com...
>
>
> Here is my Fortran source/results as promised:
>
> http://home.cfl.rr.com/davegemini/wp_bible.f90
>

Above currently runs in 2.14 sec
Using dynamic array for file, I have reduced my runtime to 1.16 sec
which is a VERY competitive result ( its faster than LR's map exe he sent
me).

I changed the source file name to reflect its ability to process
commandline specified file.

http://home.cfl.rr.com/davegemini/wc_file.f90

Tim Challenger

unread,
Feb 17, 2005, 3:52:20 AM2/17/05
to

Is that the law first devised by a couple of physicists after an evening
drinking too much Zipfer beer in Austria?

--
Tim C.

Tim Challenger

unread,
Feb 17, 2005, 4:01:54 AM2/17/05
to
On Wed, 16 Feb 2005 14:34:37 -0500, David Frank wrote:

>>> PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
>>> ability to adapt this algorithm for dynamic allocation (and I see in your
>>> 123.cpp you already have to a great degree)
>>
>> Not quite sure what you mean by this.
>>
>
> Sure you do, I declare my strings with fixed length = 30, you dont.

Clear as mud.

--
Tim C.

Tim Challenger

unread,
Feb 17, 2005, 4:05:32 AM2/17/05
to

This is, of course, your 3rd or 4th iteration of this program, after LR's
quickly knocked out version. I would expect it to get faster, as that seems
to be your aim here.

--
Tim C.

Tim Challenger

unread,
Feb 17, 2005, 4:03:30 AM2/17/05
to

Oh you mean like VARYING strings, that PLI has had since I can remember?
Wow, I'm impressed. Not.

--
Tim C.

Tim Challenger

unread,
Feb 17, 2005, 3:59:13 AM2/17/05
to
>> The important thing, is that you _never_ managed to duplicate the
>> functionality that exists in the original C++ program I sent you. Can you
>> do it? Or is this a FORTRAN CAN'T?
>>
>> LR
>
> There you go again, my program processes/outputs what your program
> processes/outputs..

It seems the table has turned. LR is giving you what you always "demanded"
of us and now you're twisting and turning like a twisty, turny thing.

--
Tim C.

LR

unread,
Feb 17, 2005, 6:52:21 AM2/17/05
to
David Frank wrote:

Ok, so this is a Fortran Almost, or a Fortran Will Be When Implemented,
or a Fortran Will Someday Have The Features That C++ Has Right Now!

Yet, ALLOCATE, POINTER and TYPE are available right now. Why didn't you
use them to implement analogs for std::string and std::map? Too hard?
Too ugly? Too unmaintainable?


LR

David Frank

unread,
Feb 17, 2005, 7:04:30 AM2/17/05
to

"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108630854.1c44d7007d17fc4a9cf617c5680d32c1@teranews...

Its one of three aims or claims that I made that LR and I have been
bandying about.
I said Fortran produced more readable, briefer, faster source than C++
Thats when he initiated this word count challenge to me via email.

The facts so far (and you have no way of deciding for yourself as LR hasnt
PUBLICLY posted)
are: .My solution IS more readable, briefer, faster.

Tim Challenger

unread,
Feb 17, 2005, 7:33:08 AM2/17/05
to

Of course I can only see what's been published, so I maintain you've had
more goes at it. I'd expect yours to be faster and shorter at least at this
stage.

Readability is debatable and a matter of personal style to a great degree.
So a statement that something *is* more readable can be little more than an
opinion. I often find that the two words "short" and "readable" tend to
work against each other and it's a good programmer that can combine both.
I've seen precious little evidence to suggest you are one.

--
Tim C.

David Frank

unread,
Feb 17, 2005, 7:57:51 AM2/17/05
to

"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108643309.64e595bae371cb9e256d0ced219ec2a5@teranews...

> On Thu, 17 Feb 2005 07:04:30 -0500, David Frank wrote:
>
>> My solution IS more readable, briefer, faster.
>
> Of course I can only see what's been published, so I maintain you've had
> more goes at it.

Its more like a toss-up, LR has sent me several versions, each time running
faster than the previous.

> I'd expect yours to be faster and shorter at least at this
> stage.
>
> Readability is debatable and a matter of personal style to a great degree.
> So a statement that something *is* more readable can be little more than
> an
> opinion. I often find that the two words "short" and "readable" tend to
> work against each other and it's a good programmer that can combine both.
> I've seen precious little evidence to suggest you are one.
>

And we have seen NIL evidence to suggest you have ANY skill level at being a
PL/I programmer.
Lets see a post be-moaning the lack of a PL/I solution, since other readers
have submitted
results (some inadequate with no output of counts) for AWK, Perl, Python,
but NO PL/I.

I may challenge Peter Elderon to have a go, or perhaps some other long-gone
from this newsgroup,
intelligent Pathetic Loser Ibm'er will drop by and dash off a PL/I
solution, but somehow I doubt it.

Where is a REXX solution, isnt this the kind of processing thats supposed to
be right up its alley?

David Frank

unread,
Feb 17, 2005, 8:06:57 AM2/17/05
to

"David Frank" <dave_...@hotmail.com> wrote in message
news:421494d0$0$39337$ec3e...@news.usenetmonster.com...

>
> Lets see a post be-moaning the lack of a PL/I solution, since other
> readers have submitted
> results (some inadequate with no output of counts) for AWK, Perl, Python,
> but NO PL/I.
>

I forgot to mention Klein's Cobol solution, altho I'm not sure it produces a
output of counts either.


robin

unread,
Feb 17, 2005, 8:14:54 AM2/17/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Wed, 16 Feb 2005 06:03:58 -0500
.

| "robin" <rob...@bigpond.mapson.com> wrote in message news:08mQd.162333$K7.1...@news-server.bigpond.net.au...
| > From: "David Frank" <dave_...@hotmail.com>, Usenet Monster -
| > http://www.usenetmonster.com
| > Date: Mon, 14 Feb 2005 06:20:32 -0500
| >
| > | Its not my fault if participants are unable to read the posted challenge's
| > | statement that 2 arrays are to be produced. In order to enforce this
| > | challenge, I am extending the outputs as shown in my Fortran
| > | solution/results at
| > | http://home.cfl.rr.com/davegemini/wp_bible.f90
| > | which, IMO makes it clear that 2 arrays are to be produced, and that
| > | furthermore the results are to written to a file.
| >
| > Once agan, you are changing the specs AFTER the fake "challenge".
| > (Is this a fishing expedition?)
.
Apparently yes.

.
| > | btw,
| > | This is first time I have ever coded/tested a hash algorithm of my own
| > | design. I tried a couple found on the net, and didnt get anywhere near the
| > | minimum collisions my unique algorithm produces. Hopefully someone will
| > | post a MUCH better hash function that has very few collisions processing
| > | bible.txt
| >
| > And now, yet another change - hashing.
|
| My 1st solution posted here used hashing, there has been no change.
.
You mentioned hashing BEFORE you posted your solution,
and my reply was written before your solution was posted.

Tim Challenger

unread,
Feb 17, 2005, 8:33:32 AM2/17/05
to
On Thu, 17 Feb 2005 07:57:51 -0500, David Frank wrote:

> "Tim Challenger" <tim.cha...@aon.at> wrote in message
> news:1108643309.64e595bae371cb9e256d0ced219ec2a5@teranews...
>> On Thu, 17 Feb 2005 07:04:30 -0500, David Frank wrote:
>>
>>> My solution IS more readable, briefer, faster.
>>
>> Of course I can only see what's been published, so I maintain you've had
>> more goes at it.
>
> Its more like a toss-up, LR has sent me several versions, each time running
> faster than the previous.

Maybe he has, I don't know that, and I can't act upon that unknown "fact".
But as that is an argument about C++ and Fucktran on a PLI group you're
hardly keeping us all on tenderhooks.



>> I'd expect yours to be faster and shorter at least at this
>> stage.
>>
>> Readability is debatable and a matter of personal style to a great degree.
>> So a statement that something *is* more readable can be little more than
>> an
>> opinion. I often find that the two words "short" and "readable" tend to
>> work against each other and it's a good programmer that can combine both.
>> I've seen precious little evidence to suggest you are one.
>
> And we have seen NIL evidence to suggest you have ANY skill level at being a
> PL/I programmer.

Fair enough, at least we have similar opinions of each other's ability.
I've posted solutions to your so-called challenges occasionally. I never
claimed to be any good.

> Lets see a post be-moaning the lack of a PL/I solution, since other readers
> have submitted
> results (some inadequate with no output of counts) for AWK, Perl, Python,
> but NO PL/I.

I might feel inclined to do so if it weren't for the fact that I know you'd
quibble about the number of leading blanks in the output lines or that once
done you'd then start off on a track of changing what you want every 10
minutes as usual by defining how many variables are used and their names,
not - as you said yourself in a other recent post - that it's the output
and functionality that counts of course. I have enough problems at work
trying to deal with tossers who keep changing their minds about what they
want without adding another to my list.


> I may challenge Peter Elderon to have a go, or perhaps some other long-gone
> from this newsgroup,
> intelligent Pathetic Loser Ibm'er will drop by and dash off a PL/I
> solution, but somehow I doubt it.

Of course not, for the very reasons I gave above.

> Where is a REXX solution, isnt this the kind of processing thats supposed to
> be right up its alley?

Who cares? This is a PLI group. Or it should be.
Post it on a REXX group, make yourself universally popular.

--
Tim C.

James J. Weinkam

unread,
Feb 17, 2005, 3:53:34 PM2/17/05
to
David Frank wrote:
> Lets see a post be-moaning the lack of a PL/I solution, since other readers
> have submitted
> results (some inadequate with no output of counts) for AWK, Perl, Python,
> but NO PL/I.
>
> I may challenge Peter Elderon to have a go, or perhaps some other long-gone
> from this newsgroup,
> intelligent Pathetic Loser Ibm'er will drop by and dash off a PL/I
> solution, but somehow I doubt it.
>
> Where is a REXX solution, isnt this the kind of processing thats supposed to
> be right up its alley?
>
Why do you always expect the whole world to dance to your tune? Your so-called
challenges are all first or second semester homework excersises no serious
person has any interest in wasting time on. You are one of the most obnoxious
posters on the news groups, rivaling Tim Martin on the OS/2 groups.

If you want to propose challenges, propose something that is: a) genuinely
useful, and b) not already well known.

If you aren't up to that then please leave us in peace.

David Frank

unread,
Feb 18, 2005, 4:39:15 AM2/18/05
to

"James J. Weinkam" <j...@cs.sfu.ca> wrote in message
news:iv7Rd.78$Fu.73@edtnps89...

> please leave us in peace.

No-one is forcing you to read one of my very informative topics revealing
SHOCKING info about
the inadequacies of PL/I and those here who claim its more powerful than
Fortran.

btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
latest code in
0.38 sec using the Fortran world's top compiler, Intel ifort 8.0

Attn: Vowels ( before you blather about this or another message in this
topic),
he also ran it under SEVERAL commercial compilers with no code
modifications.
only 2 free compilers currently being developed G95 and gfortran had
problems due to lack of
read binary. As you know (and ignore) read binary (stream) data is now in
the standard.
When might your translation get posted, or is it running to embarrassing
looooong?
Not going to post a solution?
Then will you at least admit you have downloaded bible.txt and had a try at
processing it?


Tim Challenger

unread,
Feb 18, 2005, 4:47:00 AM2/18/05
to
On Fri, 18 Feb 2005 04:39:15 -0500, David Frank wrote:

> ... my very informative topics ...

???


--
Tim C.

Tim Challenger

unread,
Feb 18, 2005, 4:48:07 AM2/18/05
to
On Fri, 18 Feb 2005 04:39:15 -0500, David Frank wrote:

> btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
> latest code in
> 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0

Good for him.

--
Tim C.

David Frank

unread,
Feb 18, 2005, 7:53:50 AM2/18/05
to

"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108719806.5fabd32a04a55e94306c39ebc4b00bd0@teranews...

I added a quicksort to my words output and amazingly my runtime decreased to
0.91 sec

see: http://home.cfl.rr.com/davegemini/wc_file.f90

Good for me.


David Frank

unread,
Feb 19, 2005, 6:42:04 AM2/19/05
to

ATTN: Awk, C++, Cobol, Fortran, Perl, Python, Rexx, Etc respondees to this
challenge, thanks..
If you update your source/response to create following outputs, I will
enter it in a table of results.

---
The requirements are to time the execution of reading bible.txt file
producing a sorted list of unique words and their counts. ALL non-alpha
chars are to be treated as blanks, except quote within a word, e.g.
Wife's, in which case its deleted and
becomes the word wifes Upper-case is lower-cased.
Document your results by posting the following info extracted from your
output file.
8177 a
319 aaron
..........
5 zurishaddai
1 zuzims
bible.txt
total words = 789781
unique words = 12691
xx.xx Sec ?.?? Ghz CPU ID
+ any further distinguishing info , e.g.
language/compiler/version/programmer's name
---

DF


robin

unread,
Feb 19, 2005, 8:13:53 AM2/19/05
to
"David Frank" <dave_...@hotmail.com> writes: >
> "David Frank" <dave_...@hotmail.com> wrote in message

start = datetime();
collisions, lines = 0;
do forever;
get edit (text) (a(L));
lines = lines + 1;
if length(text) > 0 then call look(1);
end;
finish = datetime();
put skip list ('total words =', sum(counts));
put skip list ('unique words =', sum(counts>0));
put skip list ('time taken=', secs(finish) - secs(start), ' secs');
put skip list ('collisions=', collisions, ' lines=', lines);

look: procedure (pos) recursive;
dcl pos fixed binary;
dcl (start, ending) fixed binary, word character (24) varying;
start = search(text, alphabet, pos);
if start = 0 then return;
ending = verify (text, alphabet, start);
if ending = 0 then
ending = length(text)+1;
else if (substr(text,ending,1) = '''') then
ending = verify (text, alphabet, ending+1);
if ending < length(text) then call look(ending);
word = substr(text,start,ending-start);
k = index(word, '''');
if k > 0 then substr(word,k) = substr(word,k+1);
call insert_word (word);
end;

robin

unread,
Feb 19, 2005, 8:13:04 AM2/19/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Fri, 18 Feb 2005 04:39:15 -0500

.
| No-one is forcing you to read one of my very informative topics revealing SHOCKING info about
| the inadequacies of PL/I and those here who claim its more powerful than
| Fortran.
.
Your "topics" are neither informative, nor interesting.
Your Fortran word processor code can be translated to PL/I
virtually line-for-line. But why would we bother,
when your Fortran code contains at least 2 bugs.
.

| btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
| latest code in
| 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
.
so what?
.

| he also ran it under SEVERAL commercial compilers with no code
| modifications.
.
So what? It contains at least two bugs just lurking, ready to surface.
We told you about this a while back in re another program,
but you haven't learnt a thing.
.
You're relying on
1. ASCII
2. 7-bit codes. Using another text file that uses codes like á or ś
might give unexpected results.
.
And, as well, the timed section of your code excludes the section to
count the number of unique words.

robin

unread,
Feb 19, 2005, 8:11:49 AM2/19/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Fri, 18 Feb 2005 07:53:50 -0500

| I added a quicksort to my words output and amazingly my runtime decreased to
| 0.91 sec
.
You really don't expect anyone to believe this?
You added an extra routine, and the whole thing
takes LESS time?

Mark Yudkin

unread,
Feb 20, 2005, 4:03:06 AM2/20/05
to
On the REXX forum, dope! Where you're making a further fuckwit of yourself
by posting a "challenge" to which LR already pointed out:

"The C++ I mailed to you was very different from the challenge you've posted
here."

"Fuckwit Frank" <dave_...@hotmail.com> wrote in message
news:421494d0$0$39337$ec3e...@news.usenetmonster.com...

David Frank

unread,
Feb 20, 2005, 5:23:20 AM2/20/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:pWGRd.167357$K7.1...@news-server.bigpond.net.au...

I just asked Fortran'ers their opinion..

"David Frank" <dave_...@hotmail.com> wrote in message
news:421855dd$0$38884$ec3e...@news.usenetmonster.com...
>

I just re-confirmed that doing a quicksort of the output word list reduces
my runtime from
1.20 sec to 0.88 sec Can anyone look at below code and give a rational
answer?

The quicksort is fast at 0.016 sec but thats a positive runtime not a
twilight zone negative runtime..

http://home.cfl.rr.com/davegemini/wc_file.f90


David Frank

unread,
Feb 20, 2005, 5:26:50 AM2/20/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:AXGRd.167358$K7.1...@news-server.bigpond.net.au...
> 2. 7-bit codes. Using another text file that uses codes like á or o
> might give unexpected results.
>

Those are your bugs? ha ha..


> And, as well, the timed section of your code excludes the section to
> count the number of unique words.

Look again, the benchmark times ALL the runtime.
http://home.cfl.rr.com/davegemini/wc_file.f90

David Frank

unread,
Feb 20, 2005, 5:34:26 AM2/20/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:lYGRd.167360$K7.1...@news-server.bigpond.net.au...

Your objective of course in posting above is so you can say in the future:
"you have been shown a PL/I solution"

Its pathetic that your fellow pli'ers tolerate or even more pathetic, cant
recognize,
your usual blathering non-solution.

robin

unread,
Feb 20, 2005, 8:18:04 PM2/20/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Fri, 18 Feb 2005 04:39:15 -0500
.
| btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
| latest code in
| 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
.
Is this not the compiler that Fortran users say contains
bugs, including producing wrong outputs from WRITE statements?

robin

unread,
Feb 21, 2005, 8:09:45 AM2/21/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Sun, 20 Feb 2005 05:34:26 -0500

.
Solutions to forming lists of words, hashing, and quicksort
are topics covered in a first year computer science course.
THere's nothing new in any of the above, and PL/I solutions
are to be found in introductory texts.
.
The above demonstrates a PL/I codes using search and verify.

Mark Yudkin

unread,
Feb 20, 2005, 8:51:11 AM2/20/05
to
Using Cogent/SQL, which is an interpreted language written in 100% pure
PL/I, and running on multiple platforms.

As a language aimed at data transformation of multi-GBs, it's not exactly
optimized for high speed with the specified task. As can be seen the times
are for the complete task, and include both elapsed and CPU times (very
important on z/OS). The code runs on Windows, OS/2 and z/OS (using an EBCDIC
source and with the *scan statement changed to use the right file name).

The source is the bit between <Source> and </Source>:
<Source>
*call time ('R') -- reset CPU timer
*start = time('S')
*words = 0
*total = 0
*uniq = 0
*scan dataset ("D:\TempJunk\WordCount\kjv12.txt") into (line)
* -- As per stupid requirement concerning case conversion and digit
elimination.
* -- Punctuation not specified, so guess.
* line = translate (line, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz0123456789().,;:!?-/"')
* do while (line <> "")
* word = compress(first(line), "'", "") -- stupid requirement: wife's ->
wifes
* line = rest(line)
* words[word] = words[word] + 1
* end
*end scan
*
*traverse reference (words) into (count, word)
&word; &count:l;
* total = total + count
* uniq = uniq + 1
*end

Total words: &total:l;
Unique words: &uniq:l;
Elapsed: &(time('S') - start):ld3; secs on 1.5GHz CPU
CPU: &time('E'):ld3; secs on 1.5GHz CPU
</Source>

The output is the bit between <Output> and </Output>:
<Output>
A 8177
AARON 319
AARONITES 2
AARONS 31
ABADDON 1
ABAGTHA 1
... snipped ...
ZUR 5
ZURIEL 1
ZURISHADDAI 5
ZUZIMS 1

Total words: 789781
Unique words: 12691
Elapsed: ?.??? secs on 1.5GHz CPU
CPU: ?.??? secs on 1.5GHz CPU
</Output>

To determine the times, you'll have to run the program yourself (meaning
you'll need to install a PL/I program that you'll have to obtain legally!).
I'll admit it as being a little slower than Ian's REXX solution.


David Frank

unread,
Feb 22, 2005, 7:08:12 AM2/22/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:t4lSd.170074$K7.1...@news-server.bigpond.net.au...

If the task at hand is for first year CS students, what does that imply
about the "dynamic duo"
skills, neither of you can write a pl/i program that meets the challenge.

David Frank

unread,
Feb 22, 2005, 8:04:24 AM2/22/05
to

"LR" <lr...@superlink.net> wrote in message
news:Vz%Qd.2474$h06.4...@monger.newsread.com...

Not being standard also has the advantage to customizing function source to
the application, to wit:

program test
character,allocatable :: word(:)
allocate (word(999))
open (1,file='test.f90',form='binary')
do while (getword(1,word))
write (*,*) word,size(word)
read(*,*)
end do
stop
contains
! ---------------------
logical function getword(iunit,word)
< my code for this function is about 10 statements>
end function

1st 5 words in execution of this program:
program 7
test 4
character 9
allocatable 11
word 4


David Frank

unread,
Feb 25, 2005, 6:22:38 AM2/25/05
to

"LR" <lr...@superlink.net> wrote in message
news:Vz%Qd.2474$h06.4...@monger.newsread.com...

> Fortran Will Someday Have The Features That C++ Has Right Now!
>

Below is more F2003 syntax in response to my inquiry yesterday in c.l.f.

real,allocatable :: x(:) = [1,2,3]
real :: y(3) = [4,5,6]
real :: z(4) = [7,8,9,0]

can c++ replace element plus extend an array with 1 statement a-la below?

x = [0.0,x(2:),y,z] ! replace x(1) with 0 and extend x with y,z

write (*,*) x ! = 0. 2. 3. 4. 5. 6. 7. 8. 9. 0.
write (*,*) size(x) ! = 10

David Frank

unread,
Feb 28, 2005, 10:15:55 AM2/28/05
to

"robin" <rob...@bigpond.mapson.com> wrote in message
news:AXGRd.167358$K7.1...@news-server.bigpond.net.au...

> From: "David Frank" <dave_...@hotmail.com>, Usenet Monster -
> http://www.usenetmonster.com
> Date: Fri, 18 Feb 2005 04:39:15 -0500
> .
> | No-one is forcing you to read one of my very informative topics
> revealing SHOCKING info about
> | the inadequacies of PL/I and those here who claim its more powerful
> than
> | Fortran.
> .
> Your "topics" are neither informative, nor interesting.
> Your Fortran word processor code can be translated to PL/I
> virtually line-for-line. But why would we bother,
> when your Fortran code contains at least 2 bugs.
>

Quit lying, you would dearly love to be able to translate it line for line
but failed.
The dynamic duo's putrid code posted elsewhere shows both of you DEFINITELY
bothered yourself in failed attempts..

> | btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs
> my
> | latest code in
> | 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
> .
> so what?
>

I used Fortran to come up with the fastest version of this program (tying
with a C version)
over MANY solutions in other languages 0.265 sec vs. LR's 2.9 sec
for this version.

> | he also ran it under SEVERAL commercial compilers with no code
> | modifications.
> .
> So what? It contains at least two bugs just lurking, ready to surface.
> We told you about this a while back in re another program,
> but you haven't learnt a thing.
> .
> You're relying on
> 1. ASCII

> 2. 7-bit codes. Using another text file that uses codes like á or o
> might give unexpected results.
>

Latest version http://home.cfl.rr.com/davegemini/wc_file.f90 processes
8bit codes
with no loss of execution time.

William M. Klein

unread,
Feb 28, 2005, 12:05:52 PM2/28/05
to
"David Frank" <dave_...@hotmail.com> wrote in message
news:422335ac$0$39329$ec3e...@news.usenetmonster.com...
<snip>

>
> Quit lying, you would dearly love to be able to translate it line for line
> but failed.

What possible evidence to you have that ANYONE other than you cares (in the
SLIGHTEST) about "line-for-line" translations from one programming language to
another?

I have seen ZERO evidence in this - or any other forum - that this is something
important (or even interesting) to other programmers.

--
Bill Klein
wmklein <at> ix.netcom.com


David Frank

unread,
Feb 28, 2005, 12:48:15 PM2/28/05
to

"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:QbIUd.3361163$B07.5...@news.easynews.com...

>
> I have seen ZERO evidence in this - or any other forum - that this is
> something important (or even interesting) to other programmers.
>
> --
> Bill Klein
> wmklein <at> ix.netcom.com

The word-processing topic you opened in comp.lang.fortran has drawn over
200 messages, which makes your statement above R I D I C U L O U S !!


David Frank

unread,
Feb 28, 2005, 12:59:17 PM2/28/05
to

"David Frank" <dave_...@hotmail.com> wrote in message
news:42235960$0$39319$ec3e...@news.usenetmonster.com...

correction: your topic was opened in comp.lang.cobol..


William M. Klein

unread,
Feb 28, 2005, 1:26:47 PM2/28/05
to
and the people have provided a NUMBER of solutions that use "COBOL technology"
and none (that I know of) that have tried to do a "line-for-line" translation.

The whole thing that you seem (repeatedly) to miss is that different programming
languages can (and do) use different techniques for solving the same "problem".

Using "one line of source" code to do the same "thing" is unimportant - while
finding "native" (and easily readable to programmers of THAT language)
techniques and algorithms is useful.

What is the "one line" statement in Fortran to do the COBOL "sort" statement? or
report-writer "generate" statement? The fact that Fortran has no "single line"
translations of those verbs doesn't mean that "native Fortran" can't accomplish
the TASK that these statements implement.

--
Bill Klein
wmklein <at> ix.netcom.com

"David Frank" <dave_...@hotmail.com> wrote in message

news:42235bf6$0$39266$ec3e...@news.usenetmonster.com...

robin

unread,
Mar 2, 2005, 3:38:21 AM3/2/05
to
From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
Date: Mon, 28 Feb 2005 10:15:55 -0500

| "robin" <rob...@bigpond.mapson.com> wrote in message news:AXGRd.167358$K7.1...@news-server.bigpond.net.au...
| > From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
| > Date: Fri, 18 Feb 2005 04:39:15 -0500
.

| > Your "topics" are neither informative, nor interesting.
| > Your Fortran word processor code can be translated to PL/I
| > virtually line-for-line. But why would we bother,
| > when your Fortran code contains at least 2 bugs.

.
| Quit lying,
.
I'm not. Your code contains at least 2 bugs.
.


| you would dearly love to be able to translate it line for line
| but failed.

.
Don't talk rot. You still don't understand a line of PL/I.
.
Your code has a line-for-line equivalent in PL/I/


.
| The dynamic duo's putrid code posted elsewhere

.
For someone who knows nil of PL/I, you seem to
think you can judge code quality?
.


| shows both of you DEFINITELY bothered yourself in failed attempts..

.
Again, you talk nonsense.


.
| > | btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
| > | latest code in
| > | 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
| >

| > so what?
|
| I used Fortran to come up with the fastest version of this program (tying
| with a C version)
| over MANY solutions in other languages

.
Only after you saw their code.


.
| > | he also ran it under SEVERAL commercial compilers with no code
| > | modifications.
| >

| > So what? It contains at least two bugs just lurking, ready to surface.
| > We told you about this a while back in re another program,
| > but you haven't learnt a thing.
| >

| > You're relying on
| > 1. ASCII
| > 2. 7-bit codes. Using another text file that uses codes like á or o
| > might give unexpected results.
|
| Latest version http://home.cfl.rr.com/davegemini/wc_file.f90 processes
| 8bit codes
| with no loss of execution time.

.
But it still relies on ASCII, doesn't it.

David Frank

unread,
Nov 16, 2007, 5:07:17 AM11/16/07
to
This reply to a PL/I ancient topic from 13 Feb 2005 is being "re-freshed" to
allow new responses here instead of
in comp.lang.fortran.
One of those respondees of course being Robin Vowels who as a PL/I advocate
is again posting that he has
solution(s) without posting proof as usual, and a Ruby advocate who is being
invited to re-post his Ruby solution for
the "bible word count problem" of two years ago, here now that the topic is
again "visible"

"David Frank" <dave_...@hotmail.com> wrote in message

news:420f4915$0$38863$ec3e...@news.usenetmonster.com...
> FYI I just posted below in comp.lang.fortran topic "Word-processing
> challenge anyone? "
> and I expect to get at least 1 solution in reply.
>
> OTOH, despite the claims that PL/I has superior string-handling thats
> needed in a word-processing application,
> there wont be any solutions posted here in comp.lang.pl1

<snip>


Eric I.

unread,
Nov 16, 2007, 11:16:53 AM11/16/07
to
I've attached a Ruby solution below. It takes 2.593646 seconds on a
2.33 GHz Intel Core 2 Duo (although only one core is used). This is
an interpreted speed. There are at least four highly active parallel
efforts to create a compiled Ruby underway as I type this. They might
produce faster
times.

My code not only keeps track of the unique words, it counts how many
time each appears in the text. Then, after the timing result is
produced, the code outputs information about the most frequent word(s)
(which happens to be "the") and the least frequent words (4004 appear
only once, from "abaddon" to "zuzims"). But again, that output does
not affect the timing.

Also, since I wanted to be able to access the data from the web
directly, there's a little bit of code to allow it to skip the non-
included material at the top and bottom of the file.

Eric

====

Are you interested in on-site Ruby training that uses well-designed,
real-world, hands-on exercises? http://LearnRuby.com

========

# Reads a file containing the text of the Bible and, after processing
# the data slightly, prints out how many total words and how many
# unique words it contained. See http://tinyurl.com/354hry for the
# full problem description.

# This solution is offered by LearnRuby.com (http://learnruby.com).

# If there is a file named "kjv12.txt" in the current directory, the
# data will be read from that file. Otherwise, the data will be read
# from the URI "http://patriot.net/users/bmcgin/kjv12.txt".

start_time = Time.now

Bible_Filename = "kjv12.txt"
Bible_URI = "http://patriot.net/users/bmcgin/kjv12.txt"

input = begin
open Bible_Filename
rescue
require 'open-uri'
puts "NOTE: time taken is invalid since it includes web
access\n\n"
open Bible_URI
end

state = :skip_top
word_count = 0
words_seen = Hash.new(0)

input.each_line do |line|
state = :process if
state == :skip_top && line =~ /Book\s+01\s+Genesis/

next unless state == :process

state = :skip_bottom if line =~ /022:021.*Amen\./

# remove apostrophe between letters
mod_line = line.gsub /([[:alpha:]])'([[:alpha:]])/, '\1\2'

# convert sequences of non-letters to single spaces, remove white
# space at either end, and convert letters to lower case
mod_line.gsub!(/[^[:alpha:]]+/, ' ').strip!.downcase!

words = mod_line.split
word_count += words.size
words.each { |word| words_seen[word] += 1 }
end

input.close

puts "Number of words: %d" % word_count
puts "Number of unique words: %d" % words_seen.size

end_time = Time.now

puts "Time taken to compute: %f seconds" % (end_time - start_time)

#
# Extra information, just for the fun of it...
#

# figure out the counts for the most and least frequent words
word_counts = words_seen.values
top_word_count = word_counts.max
bottom_word_count = word_counts.min

# put together a list of the most frequent word(s) and the least
# frequent word(s)
top_words = words_seen.select { |word, count|
count == top_word_count
}.map { |e| e[0] }
bottom_words = words_seen.select { |word, count|
count == bottom_word_count
}.map { |e| e[0] }

# output information about the most and least frequent words
puts("\nThe following %d most frequent word(s) each appeared %d
time(s):" %
[top_words.size, top_word_count])
puts top_words.sort.join("\n").gsub(/^/, ' ')

puts "\nThe following %d least frequent word(s) each appeared %d
time(s):" %
[bottom_words.size, bottom_word_count]
puts bottom_words.sort.join("\n").gsub(/^/, ' ')
====

robin

unread,
Dec 4, 2007, 7:09:24 AM12/4/07
to
"David Frank" <dave_...@hotmail.com> wrote in message news:13jqquo...@corp.supernews.com...

> This reply to a PL/I ancient topic from 13 Feb 2005 is being "re-freshed" to
> allow new responses here

This is a translation to PL/I of DF's Fortran code.
Like that code, it's ASCII specific.

(nofofl, nosize):
word_counts: proc options (main);
dcl (hashbits value(17), maxw value ((2**hashbits)), wlen value (30)) fixed binary (31);
dcl line char (80) var, sword(wlen) char (1), ch char (1);
dcl (ich, i, k, n, nchars, (nc, collisions, total, unique) init (0), odd, even) fixed bin (31);
dcl 1 wc(maxw) static,
2 word char(wlen),
2 count fixed binary (31);
dcl letters char (52) init ('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPWQSTUVWXYZ');

wc.word = ' ' ; wc.count = 0;

on endfile (sysin) go to compact;

more:
nc = 0;
get edit (line) (L);
line = translate(line, 'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ');
ml:
do forever; nc = nc+1 ;
if nc > length(line) then go to more;
nc = search(line, letters, nc); /* skip until alpha start */
if nc = 0 then go to more; /* no alpha chars at end of line. */
ch = substr(line,nc,1);
ich = unspec(ch);
k = 1 ; sword(1) = ch; /* found start new word */
odd = ich ; even = 1; /* init. hash with 1st char */
il:
do forever;
nc = nc+1;
if nc > length(line) then leave il;
ch = substr(line,nc,1);
if ch = '''' then iterate il; /* delete ' from word */
if ch < 'A' | ch > 'z' then leave il; /* end word */
if ch > 'Z' & ch < 'a' then leave il;
ich = unspec (ch);
k = k+1 ; sword(k) = ch;
if (iand(k,1) = 0) then
even = ieor(isll(even,5),ich); /* accum even hash */
else
odd = ieor(isll(odd, 5),ich); /* accum odd hash */
end;

n = ieor(isrl(odd*even,hashbits+2),odd*even); /* hash product pieces */
n = iand(n,maxw-1); /* positive index */
do forever;
n = n+1 ; if (n > maxw-1) then n = 1; /* reset index */
if (wc(n).count = 0) then
do; wc(n).word = substr(string(sword),1,k) ; wc(n).count = 1 ; iterate ml; end;
/* initial entry */
else if substr(string(sword),1,k) = wc(n).word then
do; wc(n).count = wc(n).count+1 ; iterate ml; end; /* count occurrences */
else
collisions = collisions+1;
end;
end;

compact:
n = 0;
do i = 1 to maxw; /* make entries contiguous from wc(1: */
if (wc(i).count = 0) then iterate;
n = n+1 ; wc(n) = wc(i);
total = total + wc(i).count ; unique = unique+1;
end;

call qsort(0,n-1); /* quicksort wc(1:n) entries */

put skip edit ( (wc(i).count, trim(wc(i).word) do i=1 to n)) (f(5), x(1), a, skip);
put skip list ( 'total words =', total );
put skip list ( 'unique words =', unique);
put skip data (collisions);

qsort: proc (l,r) recursive;
dcl 1 tempwc,
2 word char(wlen),
2 count fixed binary (31);
dcl sword char(wlen);
dcl ( l, r, i,j ) fixed binary (31);

i = l ; j = r ; sword = wc((l+r+2)/2).word;
do while (i <= j);
do while (wc(i+1).word < sword & i < r);
i = i+1;
end;
do while (sword < wc(j+1).word & j > l);
j = j-1;
end;
if (i <= j) then
do;
tempwc = wc(i+1);
wc(i+1) = wc(j+1);
wc(j+1) = tempwc; /* swap words,counts */
i = i+1;
j = j-1;
end;
end;
if (l < j) then call qsort(l, j);
if (i < r) then call qsort(i, r);
end qsort;
end word_counts;
/*
.......
3 zuph
5 zur
1 zuriel
5 zurishaddai
1 zuzims


total words = 789781
unique words = 12691

COLLISIONS= 4318;
*/

David Frank

unread,
Dec 5, 2007, 4:54:44 AM12/5/07
to

"robin" <rob...@bigpond.com> wrote in message
news:Urb5j.20210$CN4....@news-server.bigpond.net.au...

>
> This is a translation to PL/I of DF's Fortran code.
> Like that code, it's ASCII specific.
>

<snip source>
Congratulations,
very interesting translation of my
http://home.earthlink.net/~dave_gemini/wc.f90 fortran source
that NO-ONE expected to see after almost 3yrs from the original challenge.

> .......
> 3 zuph
> 5 zur
> 1 zuriel
> 5 zurishaddai
> 1 zuzims
> total words = 789781
> unique words = 12691
> COLLISIONS= 4318;
> */

I note you have removed any benchmark timing, surely the code will beat
Eric's RUBY version = 2.6 sec ??

Prove that PL/I "once upon a time" supported EASY distribution of a windows
exe program by
making your exe available to run on our PCs, even tho I sense that no
longer is supported with the
"web-sphere PL/I system"

OTOH, I can EASILY make my windows exe (1 file) available on request for
anyone's use, and note that it will process
ANY text file

Perhaps someone will confirm your source is valid for their compiler..

Gerard Schildberger

unread,
Dec 5, 2007, 4:48:07 PM12/5/07
to
| David Frank wrote:

|> robin wrote:
|> This is a translation to PL/I of DF's Fortran code.
|> Like that code, it's ASCII specific.

| <snip source>
| Congratulations,
| very interesting translation of my
| http://home.earthlink.net/~dave_gemini/wc.f90 fortran source
| that NO-ONE expected to see after almost 3yrs from the original challenge.
|
| > .......
| > 3 zuph
| > 5 zur
| > 1 zuriel
| > 5 zurishaddai
| > 1 zuzims
| > total words = 789781
| > unique words = 12691
| > COLLISIONS= 4318;
| > */
|
| I note you have removed any benchmark timing, surely the code will beat
| Eric's RUBY version = 2.6 sec ??

Benchmark timings from different CPUs are meaningless unless
all benchmarks are done on the same CPU, same operating
systems, same harddrive(s), etc. Saying that I ran some
code on my computer, and it ran in 2.2 seconds. So what?
I'd have to run the Fortran code (with a particular Fortran
compiler) also.

I also wish you would write the code to not be ASCII dependent.
______________________________________________________Gerard S.

robin

unread,
Dec 13, 2007, 6:34:12 AM12/13/07
to
"David Frank" <dave_...@hotmail.com> wrote in message news:13lctb4...@corp.supernews.com...

>
> "robin" <rob...@bigpond.com> wrote in message
> news:Urb5j.20210$CN4....@news-server.bigpond.net.au...
> >
> > This is a translation to PL/I of DF's Fortran code.
> > Like that code, it's ASCII specific.
> >
> <snip source>
> Congratulations,
> very interesting translation of my
> http://home.earthlink.net/~dave_gemini/wc.f90 fortran source
> that NO-ONE expected to see after almost 3yrs from the original challenge.
>
> > .......
> > 3 zuph
> > 5 zur
> > 1 zuriel
> > 5 zurishaddai
> > 1 zuzims
> > total words = 789781
> > unique words = 12691
> > COLLISIONS= 4318;
> > */
>
> I note you have removed any benchmark timing,

I didn't remove anything. I just didn't implement it,
as it's irrelevant.

> surely the code will beat
> Eric's RUBY version = 2.6 sec ??

As I posted previously, time is irrelevant on different PCs.

> Prove that PL/I "once upon a time" supported EASY distribution of a windows
> exe program by
> making your exe available to run on our PCs, even tho I sense that no
> longer is supported with the
> "web-sphere PL/I system"

It's a function of the linker, not the compiler.

> OTOH, I can EASILY make my windows exe (1 file) available on request for
> anyone's use, and note that it will process
> ANY text file

No, your code WON'T process any text file.
It will ONLY handle ASCII.

> Perhaps someone will confirm your source is valid for their compiler..

Perhaps someone will confirm that YOUR source is valid
for their compiler.
But YOUR source code is non-portable, so it 's not guaranteed to
compile on every Fortran compiler.


0 new messages