OTOH, despite the claims that PL/I has superior string-handling thats needed
in a word-processing application,
there wont be any solutions posted here in comp.lang.pl1 and its for sure
Weinkam cant produce a competitive
Cobol solution.
------------- below posted in comp.lang.fortran ----------
Want to give this a shot?
Get the text file at: http://patriot.net/~bmcgin/kjvpage.html
Using a text editor (notepad) remove text before
Book 01 Genesis and text after last word in Revelations, (amen)
producing bible.txt file containing
4,947,047 chars.
The challenge is to process bible.txt file into words array and count unique
word occurances in count array.
(this challenge was initiated by LR, whose C++ source/results will be posted
later, along with my own source/results)..
In this processing, convert all punctuation and numbers to blanks, and
uppercase to lower.
One punctuation exception is ' (within a word) is deleted leaving wife's
as wifes
When bible.txt is "thusly" processed I expect you shud get following outputs
total words = 789781
unique words = 12691
xx.xx Sec ?.?? Ghz PC
Pls post your time and PC speed..
! A few template statements in this word-processing benchmark challenge to
get started are:
! -----------------------------
program count_word_occurances ! in bible.txt
implicit none
integer,parameter :: maxw = 65536 ! or lower if possible
character(24) :: words(maxw)
integer :: i, n, counts(maxw)=0, t1(8), t2(8)
call date_and_time(values=t1) ! get benchmark start time
! open file='bible.txt' ...
! process word occurances into the 2 arrays words,counts
! until EOF as per your word-processing algorithm
call date_and_time(values=t2) ! get benchmark stop time
n = 0 ! count unique words found
do i = 1,maxw
if (counts(i) /= 0) n = n+1
end do
write (*,*) 'total words =',sum(counts)
write (*,*) 'unique words =',n
write (*,'(f0.2,a)') (t2(5)*3600.+t2(6)*60.+t2(7) +t2(8)/1000.) &
-(t1(5)*3600.+t1(6)*60.+t1(7) +t1(8)/1000.), ' Sec <- 2.8 Ghz PC'
end program
As soon as someone provides a valid specification in the COBOL newsgroup, I
suspect there will be a number of solutions provided.
NOTE:
I won't even ask in the Fortran newsgroup how to provide all the functionality
of the "built-in" Report-Writer feature of COBOL - which among other things
provides:
- page headers and footers
- control break headings
- running totals
- "print" (or screen) output based on column and line specifications
all produced by the single COBOL "generate" statement.
--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:420f4915$0$38863$ec3e...@news.usenetmonster.com...
Are you counting CR, LF, and other record separators in your
4,947,047 chars
count? I am NOT counting those - and I get
4,719,042
characters in the file (when only dealing with characters IN each record)
1) Am I correct that you are ignoring (in your "words" the "00n : 00n" in
columns 1-7 (i.e. chapter and verse information)
2) Do you want to count "1" in "1 Samuel" and "2" in "2 Samuel" (etc) as words?
3) Are you treating hyphenated words as a single word, e.g.
"BRICK-KILN" as one or two words in the line
"IRON AND MADE THEM PASS THROUGH THE BRICK-KILN AND THUS DID "
--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:XFQPd.1786116$B07.2...@news.easynews.com...
I still need to find out about hyphenated words.
--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:KRRPd.1791168$B07.2...@news.easynews.com...
> total words = 789781
> unique words = 12691
My COBOL program does get 12691 for "unique" words (when I treat hyphenated
words as TWO separate words),
However, I am only getting
+789715
input words. I wonder what the original poster was counting that I am not?
(Possibly some the CR/LF or other record delimiters)? (Or could they be couting
input words BEFORE eliminating numbers?)
For access to my acutal source code, see (separate) post in comp.lang.cobol
--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:KRRPd.1791168$B07.2...@news.easynews.com...
> My COBOL program does get 12691 for "unique" words (when I treat
hyphenated
> words as TWO separate words), However, I am only getting +789715
input words.
I have not seen your source code yet due to Google latency, but I
suspect that you are not counting the word 'book' that appears in the
first 8 columns. It appears 66 times. That added to your total gives
the correct number of total words.
I asked (earlier) about what to do with columns 1-8 - and I decided just to
ignore them (before seeing the rule about converting numbers to spaces).
My code is now posted (and could EASILY be changed if I really wanted to pick up
those "BOOK" in columns 1-8)
For those NOT in comp.lang.cobol, the source code (with LOTS of extraneous
debugging information) is at:
http://home.comcast.net/~wmklein/DOX/WORDCNT.txt
However, if you aren't reading this in comp.lang.cobol, please note that I
commented there on lots of "performance tuning" things that could be done - if
that was the goal. (Also it would be prettier - if using some of the extensions
available in many PC and Unix COBOL compilers)
For the person asking about "squeezing out" apostrophes, this is done with
INSPECT TALLYING (to figure out where they are) and then REFERENCE MODIFICATION
to move the data - to get rid of them.
--
Bill Klein
wmklein <at> ix.netcom.com
<ep...@juno.com> wrote in message
news:1108352420.7...@c13g2000cwb.googlegroups.com...
http://home.comcast.net/~wmklein/DOX/WORDCNT.txt
For those who don't like "verbose" COBOL, there are LOTS of ways this could be
cut down. I coded this for "understandability" (for the average COBOL
programmer) - not for "terseness" - which could be done *IF* I saw that as a
desirable goal.
As stated in another post, this is "fully conforming" ANSI/ISO 1985 (not even
2002) source code with the single exception that it expects "line sequential"
input file. (Removing the single word "line" from the Select/Assign clause)
would make this valid source code for any "mainframe" or other environment using
"record sequential" text files.
--
Bill Klein
wmklein <at> ix.netcom.com
"William M. Klein" <wmk...@nospam.netcom.com> wrote in message
news:5dVPd.1804097$B07.2...@news.easynews.com...
With these types of 'challenge' it is likely that the goal is simply
'my language is better than yours'. Responding even with a program
that is only fractionally slower or has marginally more lines or only
slightly confuses the setter (compared to the one that they wrote
themselves) simply proves (in the setter's mind) that theirs really is
better.
Unless you can blow them away with a 3 line program that runs 10 times
faster and could be understood by a 5 year old then there is probably
no point in feeding their ego.
Here is how I do it in AWK, including removing the text before
(but not including) the Genesis line. My first version also removed
that one, as I thought that was requested, and also the lines
at the end. It makes sense for programs to do all the work, instead
of only part of it.
The actual unique word identification is done by the statement
words[tolower($i)]++;
and counting unique words by the statement
for(i in words) w++;
Three statements to not count the beginning and end.
Until PL/I, Fortran or COBOL get associative arrays, it will take
a lot more code to do problems like this. It isn't hard to do in Java
as there is a Hashtable class in the standard class libraries.
A few years ago I did one like this in Java, including printing out the
high frequency words on about 18GB of text. It ran for about a day,
which isn't much longer than it takes to read in the whole file.
BEGIN {start=systime();}
/Genesis/ { flag=1;}
/^End/ { flag=0;}
{
if(!flag) next;
gsub("'","");
gsub("[^A-Za-z]"," ");
t += NF;
for(i=1;i<=NF;i++) words[tolower($i)]++;
}
END {
for(i in words) w++;
print "total words",t,"unique words",w,"in",systime()-start,"Sec";
}
total words 789781 unique words 12691 in 18 Sec on a 350MHz P-II.
-- glen
Read the challenge,
you havent loaded 2 arrays ( words,counts ) with individual word
identifiers and count of word occurances
D.F.
Are you REALLY as dumb as you appear - or do you just try to appear dense?
Don't you understand that programming languages are designed to SOLVE
applicaiton problems NOT to "implement" specific methodology or algorithms in
the internal code used to solve those problems.
The "awk" solution SOLVES the application problem (counting all and unique words
in the source file) and that was the point of the original specifications. The
fact that it did NOT ened to (internally) create arrays has NOTHING to do with
"meeting the challenge" - as it specified a "logic" problem..
So, if you really can't understand the difference, then can you provide a
Fortran solution that provides the correct "output" withOUT creating any
internal arrays? Don't ask me why this would be important but if you think that
internal methodology is so important, then you need to provide a solution to
meet the (better?) "awk" way of working.
>> there on lots of "performance tuning" things that could be done - if
> that was the goal.
>
> With these types of 'challenge' it is likely that the goal is simply
> 'my language is better than yours'. Responding even with a program
> that is only fractionally slower or has marginally more lines or only
> slightly confuses the setter (compared to the one that they wrote
> themselves) simply proves (in the setter's mind) that theirs really is
> better.
Ah, you've seen some of Fuckwit Frank's posts already then?
> Unless you can blow them away with a 3 line program that runs 10 times
> faster and could be understood by a 5 year old then there is probably
> no point in feeding their ego.
Even then it wouldn't work as he'd change the goalposts.
--
Tim C.
Any cross-language "challenge" is useful ONLY to the extent that it actually
challenges multiple programming languages to SOLVE a problem - not if they
pre-determine HOW the problem is to be solved.
Your challenge just shows (again) your inability to write an application
specification - and then to understand multiple solutions to the specification.
--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:4210897c$0$39301$ec3e...@news.usenetmonster.com...
>
> "William M. Klein" <wmk...@nospam.netcom.com> wrote in message
> news:A9%Pd.2120565$f47.3...@news.easynews.com...
> Its not my fault if participants are unable to read the posted challenge's
> statement that 2 arrays are to be produced. In order to enforce this
> challenge, I am extending the outputs as shown in my Fortran solution/results
> at
>
> http://home.cfl.rr.com/davegemini/wp_bible.f90
>
> which, IMO makes it clear that 2 arrays are to be produced, and that
> furthermore the results are to written to a file.
>
> btw,
> This is first time I have ever coded/tested a hash algorithm of my own design.
> I tried a couple found on the net, and didnt get anywhere near the minimum
> collisions my unique algorithm produces. Hopefully someone will post a MUCH
> better hash function that has very few collisions processing bible.txt
>
>
Actually the AWK program does just that except that it fills a hash
table with the WORDS as subscripts and the counts as DATA. But you can
think of them as two parallel arrays if you want!
> Read the challenge,
> you havent loaded 2 arrays ( words,counts ) with individual word
> identifiers and count of word occurances
Why waste two arrays when one will do?
The array stores the counts, the subscript is the word.
-- glen
Its not my fault if participants are unable to read the posted challenge's
"David Frank" <dave_...@hotmail.com> wrote in message
news:42106da7$0$38857$ec3e...@news.usenetmonster.com...
>
> "glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
> news:6eGdnYleAJG...@comcast.com...
"Tim Challenger" <tim.cha...@aon.at> wrote in message
news:1108372881.c2a8b0e0945b15b9ef1c989af1351ca4@teranews...
Here is my Fortran source/results as promised:
http://home.cfl.rr.com/davegemini/wp_bible.f90
I wont post LR's lengthy and imo unreadable final solution, but will leave
it to him to post.
Believe me its a p.o.s. " pile of scary syntax"
Sooo, does your FINAL? solution posted in YOUR topic, write its results to
a words.cnt file
as I requested in response to your re-hashed spec criticisms above?
> Sooo, does your FINAL? solution posted in YOUR topic, write its results to
> a words.cnt file
> as I requested in response to your re-hashed spec criticisms above?
That's YOUR addition, it's not specified in the original.
--
Tim C.
Well yes it is my addition which he has since responded to with a Cobol
version that doesnt appear to
write results to file. Instead he continues to criticize "the spec" and I
presume he will continue to do so.
Writing results allows awk,perl etc to "do it their way without 2 arrays"
and still prove they have counted occurances of unique words. Plus its
obviously the right spec extension to get something useful at no extra
charge.
Where are the Awk, Perl solutions that write results to file?
Where is LR's C++ solution? (He initially challenged me to write this
Fortran program)..
Where is your or anyone's PL/I solution?
> "Tim Challenger" <tim.cha...@aon.at> wrote in message
> news:1108457337.7276c74e3a8f715b412a0fb4f3c1ebf0@teranews...
>> On Tue, 15 Feb 2005 03:47:56 -0500, David Frank wrote:
>>
>>> Sooo, does your FINAL? solution posted in YOUR topic, write its results
>>> to
>>> a words.cnt file
>>> as I requested in response to your re-hashed spec criticisms above?
>>
>> That's YOUR addition, it's not specified in the original.
>>
>> --
>> Tim C.
>
> Well yes it is my addition which he has since responded to with a Cobol
> version that doesnt appear to
> write results to file. Instead he continues to criticize "the spec" and I
> presume he will continue to do so.
Whereas you keep changing it. So that's fair isn't it?
--
Tim C.
Do you have problems reading a screen? That *might* explain many of your posts.
P.S. Of course any COBOL programmer could add a "write" statement instead of a
"display" statement. - or on systems that support such things "redirect" the
DISPLAY output to a file (called anything that the O/S supports)
--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:4211b73e$0$38884$ec3e...@news.usenetmonster.com...
--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:4211bda2$0$39274$ec3e...@news.usenetmonster.com...
(snip)
> Writing results allows awk,perl etc to "do it their way without 2 arrays"
> and still prove they have counted occurances of unique words. Plus its
> obviously the right spec extension to get something useful at no extra
> charge.
>
> Where are the Awk, Perl solutions that write results to file?
> Where is LR's C++ solution? (He initially challenged me to write this
> Fortran program)..
> Where is your or anyone's PL/I solution?
Sorry for feeding the troll, but this is too much fun to pass up. Frank, do
you really think outputting the words and counts is "difficult" in any way?
I could do it with a single statement in both Perl and Python, but I'll use
a loop to make it easier to understand. Here are amended versions.
Perl:
--------
use Time::HiRes qw(gettimeofday tv_interval);
$totwords = 0;
%words = ();
$start = [gettimeofday];
open FILE, "<bible12.txt";
while (<FILE>) {
s/'//g;
foreach $word (split(/\W+/, lc)) {
if ($word =~ /^[a-z]+$/) {
$totwords++;
$words{$word}++;
}
}
}
open OUTFILE, ">words.cnt";
foreach $word (sort keys %words) {
print OUTFILE "$word $words{$word}\n"
}
printf("total words : %d\n", $totwords);
printf("unique words : %d\n", scalar(keys %words));
printf("'god' : %s\n", $words{"god"});
printf("collisions : Who cares?\n");
printf("time : %f s.\n", tv_interval($start));
print("1.5 GHz P4 Mobile");
--------
total words : 789781
unique words : 12691
'god' : 4446
collisions : Who cares?
time : 2.987000 s.
1.5 GHz P4 Mobile
--------
Python:
--------
import re, time
words = {}
line_split = re.compile("\W+")
word_test = re.compile("^[a-z]+$")
start = time.clock()
for line in open("bible12.txt").xreadlines():
for word in line_split.split(line.replace("'", "").lower()):
if word_test.match(word):
try: words[word] += 1
except KeyError: words[word] = 1
sorted_keys = words.keys()
sorted_keys.sort()
outfile = open("words.cnt", "w+")
for word in sorted_keys:
print >> outfile, word, words[word]
print "total words : %d" % sum(words.values())
print "unique words : %d" % len(words)
print "'god' : %d" % words["god"]
print "time : %f s." % (time.clock() - start)
print "1.5 GHz P4 Mobile"
--------
total words : 789781
unique words : 12691
'god' : 4446
time : 4.546000 s.
1.5 GHz P4 Mobile
/s. axelsson
OK, finally ...
Those are impressive results especially for Perl. (did you suspect it would
out-perform Python?)
FWIW, I modified my Fortran to use 1 array via a type declaration, I would
have done it initially but thought there would be significant time penalty,
it turns out there are not, ( adds .02 sec to runtime)..
So I have replaced my previous source/results with the 1 array version..
> On Tue, 15 Feb 2005 05:52:01 -0500, David Frank wrote:
>
>> OK, finally ...
>> Those are impressive results especially for Perl. (did you suspect it would
>> out-perform Python?)
>
> Yes. Since Perl's regular expression handling is highly optimized and more
> integrated with the language than in Python. I actually thought the
> difference would be even bigger.
>
> /s. axelsson
And just to see how the algorithm choosen affects the languages
differently:
Perl (change main loop):
--------
while (<FILE>) {
$_ = lc;
s/'//g;
s/[^a-z]+/ /g;
foreach $word (split) {
$totwords++;
$words{$word}++;
}
}
--------
time : 2.755350 s.
Python (change main loop):
--------
trim_nws = re.compile("[^a-z]+")
for line in open("bible12.txt").xreadlines():
for word in re.sub(trim_nws, " ",
line.lower().replace("'", "")).split():
try: words[word] += 1
except KeyError: words[word] = 1
--------
time : 3.523639 s.
So, cutting down on the use of regular expressions gains more for Python
than for Perl.
/s. axelsson
> OK, finally ...
> Those are impressive results especially for Perl. (did you suspect it would
> out-perform Python?)
Yes. Since Perl's regular expression handling is highly optimized and more
| Its not my fault if participants are unable to read the posted challenge's
| statement that 2 arrays are to be produced. In order to enforce this
| challenge, I am extending the outputs as shown in my Fortran
| solution/results at
| http://home.cfl.rr.com/davegemini/wp_bible.f90
| which, IMO makes it clear that 2 arrays are to be produced, and that
| furthermore the results are to written to a file.
.
Once agan, you are changing the specs AFTER the fake "challenge".
(Is this a fishing expedition?)
.
| btw,
| This is first time I have ever coded/tested a hash algorithm of my own
| design. I tried a couple found on the net, and didnt get anywhere near the
| minimum collisions my unique algorithm produces. Hopefully someone will
| post a MUCH better hash function that has very few collisions processing
| bible.txt
.
And now, yet another change - hashing.
> Where is LR's C++ solution? (He initially challenged me to write this
> Fortran program)..
You're kidding about that right? I sent you my original code. You never
managed to duplicate what my original C++ does.
LR
> My first solution (still evadible on the web)DID create an output file with the
> actual words. This must be useful to someone to have after running the
> program - but it is sure hard to figure out WHY someone would need it.
I did almost this program for a real problem not so long ago.
I was working on text searching, and needed some statistics on the
text we were using. I sorted it and printed out the highest frequency
words, not the whole list.
Mine had many more words, as I had about 18GB of text.
I also did a Zipf's law test with the results, which would make it
a little more applicable to a scientific programming language.
OK, add that to the requirements. Fit the data to one form of
Zipf's law and print out the results of the fit.
-- glen
I certainly didn't (don't) know what "Zipf's law" is. I looked on the web and
read a couple of hits - and can honestly say, that I (personally) couldn't write
a program for it in ANY programming language - as I still have no idea "how it
really works".
P.S. I think my reply to the original AWK solution got lost (and/or I posted it
to the wrong place).
I am QUITE happy to say that awk, perl, rexx, python, etc are ALL more likely to
be the "right tool for the right job" in a "text" processing application.
(COBOL, historically, was targeted at a "business data processing" environment.)
Given a detailed spec of an application requirement, I can often/usually find a
COBOL solution (unless it is highly scientific - and even that is now easier
with some of the intrinsic function added in '89 and '02 to COBOL). However, I
certainly do NOT claim that COBOL is always the BEST language to meet a specific
application task. Providing a COBOL solution to a (well specified) "application
specification" to prove it CAN be done (in a "reasonably well performing" - and
portable way) can be fun, but it certainly doesn't prove to me which language is
"better" than others.
--
Bill Klein
wmklein <at> ix.netcom.com
"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
news:apudnVk2Gcs...@comcast.com...
My 1st solution posted here used hashing, there has been no change.
Otoh, you have proven over and over that when you can translate a challenge,
YOU DO!!
when you cant, YOU BLATHER!!
You dont have to use hashing , IF you have a better/faster way, but YOU
DONT!!
Others here have stated over and over that the result is what counts <pun>
not the methodology used.
However rest assured they wont berate you for bad-mouthing me about not
using your map algorithm,
Re: some recent runtimes for your map version vs. your version of my hash
method.
I think you are getting your runtimes mixed up. The last map time was 3.2
sec?
and now my algorithm coded in c++ ran on your pc in 1.7 sec ?
I ran your 123.exe and its a remarkable 583 = 0.58 sec
How can you not be impressed with a algorithm thats 1/2 ? the runtime of
your map version
PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
ability to adapt this algorithm for dynamic allocation (and I see in your
123.cpp you already have to a great degree)
You are in a dilemma, on one hand you have MY algorithm coded thats runs in
1/2 the time of your MAP version.
which of these are you going to post as YOUR C++ solution?
I have uploaded a revised http://home.cfl.rr.com/davegemini/wp_bible.f90
that ALSO uses command line file name as per your change, and outputs
counts to screen (can be
re-directed to file) and runs in 2.14 sec
I'm not bad mouthing you. I'm stating a simple fact. And the
requirement wasn't to use the map algorithm, although, you would have to
use something very much like it to match the functionality. OTOH,
std::string is likely to be causing you problems for a while.
>
>
> Re: some recent runtimes for your map version vs. your version of my hash
> method.
>
> I think you are getting your runtimes mixed up. The last map time was 3.2
> sec?
> and now my algorithm coded in c++ ran on your pc in 1.7 sec ?
>
> I ran your 123.exe and its a remarkable 583 = 0.58 sec
> How can you not be impressed with a algorithm thats 1/2 ? the runtime of
> your map version
Who said I wasn't impressed? But speed isn't the be all and end all of
programming. At least not for me. Besides which that speed increase
doesn't come for free. Will the effort to get it pay for itself over
the lifetime of the code? I doubt it. We aren't discusisng weather
forcasting here.
Sure, lot's of fun for the sake of fun, but I suspect that any
programmer who spent as much time, as I did on this, at work, would be
getting a bad review. (Not to mention that IMO the final C++ code I
wrote that more or less duplicates what you wrote is not very pretty and
somewhat less general.)
> PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
> ability to adapt this algorithm for dynamic allocation (and I see in your
> 123.cpp you already have to a great degree)
Not quite sure what you mean by this.
>
> You are in a dilemma, on one hand you have MY algorithm coded thats runs in
> 1/2 the time of your MAP version.
> which of these are you going to post as YOUR C++ solution?
If I were to post something that off topic, I'd post the original
std::map<std::string,int> thing, because it is so much more readable (by
any halfway decent c++ programmer) and so much more maintainable. There
are certainly times when speed is important. But this probably isn't
one of them. I read somewhere that programs spend around 90%, or was
that 99%, of their life cycle in either maintenance or upgrade. Given
that, in most cases, I'd gladly trade a little speed for readibility.
>
> I have uploaded a revised http://home.cfl.rr.com/davegemini/wp_bible.f90
>
> that ALSO uses command line file name as per your change, and outputs
> counts to screen (can be
> re-directed to file) and runs in 2.14 sec
The important thing, is that you _never_ managed to duplicate the
functionality that exists in the original C++ program I sent you. Can
you do it? Or is this a FORTRAN CAN'T?
LR
>
> The important thing, is that you _never_ managed to duplicate the
> functionality that exists in the original C++ program I sent you. Can you
> do it? Or is this a FORTRAN CAN'T?
>
> LR
There you go again, my program processes/outputs what your program
processes/outputs..
If you want to post a link to a MUCH larger text file than bible.txt, be my
guest, but
I already have a solution that handles MUCH larger text files than bible.txt
waiting in the wings.
Sure you do, I declare my strings with fixed length = 30, you dont.
> "LR" <lr...@superlink.net> wrote in message
> news:qZJQd.2440$h06.4...@monger.newsread.com...
>
>
>>The important thing, is that you _never_ managed to duplicate the
>>functionality that exists in the original C++ program I sent you. Can you
>>do it? Or is this a FORTRAN CAN'T?
>>
>>LR
>
>
> There you go again, my program processes/outputs what your program
> processes/outputs..
As far as I have seen you never produced a program that did what the
original C++ I sent you did, including writing out the results in sorted
order. I think that if you attempt to do this, you'd have to include
whatever sorting you need to do in your timings.
> If you want to post a link to a MUCH larger text file than bible.txt, be my
> guest, but
No. I don't need to do that, here's one line of code from
http://home.cfl.rr.com/davegemini/wp_bible.f90
character(30) :: fname, word, ch*1
If the filename I pass on the argument line is longer than 30 characters
your code won't work.
If any word in the text file is longer than 30 characters your code
won't work. Say for example supercalifragilisticexpialodocious.
But ok, make 'word' larger, much larger, so what? You'll still have to
make it a finite size. Make it really long, say 65k, and you'll
probably be out of memory while you're loading. 65k*65k equals what?
Not to mention what happens if you have more entries in your table than 65k.
Of course, at somepoint std::string won't be able to get bigger either.
But at least, I don't have to have every string in my table reserve
the same amount of memory. And there are mechanisms in C++ to make
std::map read/write the disk. Although I understand that's a lot of work.
> I already have a solution that handles MUCH larger text files than bible.txt
> waiting in the wings.
I'm not sure I understand why your basic algorithm needs to be rewritten
because of file size. I do understand why your code needs to be
rewritten to handle a) larger words, b) files with more than 'maxw' words.
And the best part? I'll still have std::map to do other things with if
I need it. I'm still particularly fond of
std::map<std::set<int>,std::set<int> > for some problems I face in the
real world. Your hash is really rather specific. Maintenance costs money.
LR
I have a version that can create a "sorted and unique output" file and that can
handle arbitrary size input files. However, even this program has an
"arbitrary" limit to each text line (maximum, not that each line needs to be
that size) as well as an arbitrary limit on word size. (COBOL is *very* poor in
handling strings of potentially infinite-minus-one sizes. It also has a
(currently) a limit on the maximum number of words per line - but this could be
fixed more easily.
The COBOL code (does) provide "user friendly" detection and reporting of when
the limits are exceeded, but I would still like to know what the original spec
was.
Ok, I think I understand now. You're saying that std::string is a
FORTRAN CAN'T.
LR
>
>
In the original Zipf's law, if you take a ranked list of almost anything
where the counts are large enough to have a statistical distribution,
populations of cities, states, countries, or word frequencies in
a document, the distribution is proportional to 1/n where n is the
rank.
In an improved Zipf's law the frequency is proportional to n**-a,
for a near 1. Using this, then, one can do a one variable
least squares fit to find a.
-- glen
--
Bill Klein
wmklein <at> ix.netcom.com
"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
news:eKCdnW4mcbW...@comcast.com...
Dynamic string declaration/assignment is in the new Fortran standard as I
have informed you several times.
You obviously have a mental block against remembering this info.
character(:),allocatable :: string = 'the lazy dog'
write (*,*) len(string) ! = 12
string = 'hello'
write (*,*) len(string) ! = 5
Above currently runs in 2.14 sec
Using dynamic array for file, I have reduced my runtime to 1.16 sec
which is a VERY competitive result ( its faster than LR's map exe he sent
me).
I changed the source file name to reflect its ability to process
commandline specified file.
Is that the law first devised by a couple of physicists after an evening
drinking too much Zipfer beer in Austria?
--
Tim C.
>>> PLUS the stuff you bad-mouthed my Fortran for doesnt apply to your C++
>>> ability to adapt this algorithm for dynamic allocation (and I see in your
>>> 123.cpp you already have to a great degree)
>>
>> Not quite sure what you mean by this.
>>
>
> Sure you do, I declare my strings with fixed length = 30, you dont.
Clear as mud.
--
Tim C.
This is, of course, your 3rd or 4th iteration of this program, after LR's
quickly knocked out version. I would expect it to get faster, as that seems
to be your aim here.
--
Tim C.
Oh you mean like VARYING strings, that PLI has had since I can remember?
Wow, I'm impressed. Not.
--
Tim C.
It seems the table has turned. LR is giving you what you always "demanded"
of us and now you're twisting and turning like a twisty, turny thing.
--
Tim C.
Ok, so this is a Fortran Almost, or a Fortran Will Be When Implemented,
or a Fortran Will Someday Have The Features That C++ Has Right Now!
Yet, ALLOCATE, POINTER and TYPE are available right now. Why didn't you
use them to implement analogs for std::string and std::map? Too hard?
Too ugly? Too unmaintainable?
LR
Its one of three aims or claims that I made that LR and I have been
bandying about.
I said Fortran produced more readable, briefer, faster source than C++
Thats when he initiated this word count challenge to me via email.
The facts so far (and you have no way of deciding for yourself as LR hasnt
PUBLICLY posted)
are: .My solution IS more readable, briefer, faster.
Of course I can only see what's been published, so I maintain you've had
more goes at it. I'd expect yours to be faster and shorter at least at this
stage.
Readability is debatable and a matter of personal style to a great degree.
So a statement that something *is* more readable can be little more than an
opinion. I often find that the two words "short" and "readable" tend to
work against each other and it's a good programmer that can combine both.
I've seen precious little evidence to suggest you are one.
--
Tim C.
Its more like a toss-up, LR has sent me several versions, each time running
faster than the previous.
> I'd expect yours to be faster and shorter at least at this
> stage.
>
> Readability is debatable and a matter of personal style to a great degree.
> So a statement that something *is* more readable can be little more than
> an
> opinion. I often find that the two words "short" and "readable" tend to
> work against each other and it's a good programmer that can combine both.
> I've seen precious little evidence to suggest you are one.
>
And we have seen NIL evidence to suggest you have ANY skill level at being a
PL/I programmer.
Lets see a post be-moaning the lack of a PL/I solution, since other readers
have submitted
results (some inadequate with no output of counts) for AWK, Perl, Python,
but NO PL/I.
I may challenge Peter Elderon to have a go, or perhaps some other long-gone
from this newsgroup,
intelligent Pathetic Loser Ibm'er will drop by and dash off a PL/I
solution, but somehow I doubt it.
Where is a REXX solution, isnt this the kind of processing thats supposed to
be right up its alley?
I forgot to mention Klein's Cobol solution, altho I'm not sure it produces a
output of counts either.
> "Tim Challenger" <tim.cha...@aon.at> wrote in message
> news:1108643309.64e595bae371cb9e256d0ced219ec2a5@teranews...
>> On Thu, 17 Feb 2005 07:04:30 -0500, David Frank wrote:
>>
>>> My solution IS more readable, briefer, faster.
>>
>> Of course I can only see what's been published, so I maintain you've had
>> more goes at it.
>
> Its more like a toss-up, LR has sent me several versions, each time running
> faster than the previous.
Maybe he has, I don't know that, and I can't act upon that unknown "fact".
But as that is an argument about C++ and Fucktran on a PLI group you're
hardly keeping us all on tenderhooks.
>> I'd expect yours to be faster and shorter at least at this
>> stage.
>>
>> Readability is debatable and a matter of personal style to a great degree.
>> So a statement that something *is* more readable can be little more than
>> an
>> opinion. I often find that the two words "short" and "readable" tend to
>> work against each other and it's a good programmer that can combine both.
>> I've seen precious little evidence to suggest you are one.
>
> And we have seen NIL evidence to suggest you have ANY skill level at being a
> PL/I programmer.
Fair enough, at least we have similar opinions of each other's ability.
I've posted solutions to your so-called challenges occasionally. I never
claimed to be any good.
> Lets see a post be-moaning the lack of a PL/I solution, since other readers
> have submitted
> results (some inadequate with no output of counts) for AWK, Perl, Python,
> but NO PL/I.
I might feel inclined to do so if it weren't for the fact that I know you'd
quibble about the number of leading blanks in the output lines or that once
done you'd then start off on a track of changing what you want every 10
minutes as usual by defining how many variables are used and their names,
not - as you said yourself in a other recent post - that it's the output
and functionality that counts of course. I have enough problems at work
trying to deal with tossers who keep changing their minds about what they
want without adding another to my list.
> I may challenge Peter Elderon to have a go, or perhaps some other long-gone
> from this newsgroup,
> intelligent Pathetic Loser Ibm'er will drop by and dash off a PL/I
> solution, but somehow I doubt it.
Of course not, for the very reasons I gave above.
> Where is a REXX solution, isnt this the kind of processing thats supposed to
> be right up its alley?
Who cares? This is a PLI group. Or it should be.
Post it on a REXX group, make yourself universally popular.
--
Tim C.
If you want to propose challenges, propose something that is: a) genuinely
useful, and b) not already well known.
If you aren't up to that then please leave us in peace.
> please leave us in peace.
No-one is forcing you to read one of my very informative topics revealing
SHOCKING info about
the inadequacies of PL/I and those here who claim its more powerful than
Fortran.
btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
latest code in
0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
Attn: Vowels ( before you blather about this or another message in this
topic),
he also ran it under SEVERAL commercial compilers with no code
modifications.
only 2 free compilers currently being developed G95 and gfortran had
problems due to lack of
read binary. As you know (and ignore) read binary (stream) data is now in
the standard.
When might your translation get posted, or is it running to embarrassing
looooong?
Not going to post a solution?
Then will you at least admit you have downloaded bible.txt and had a try at
processing it?
> ... my very informative topics ...
???
--
Tim C.
> btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
> latest code in
> 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
Good for him.
--
Tim C.
I added a quicksort to my words output and amazingly my runtime decreased to
0.91 sec
see: http://home.cfl.rr.com/davegemini/wc_file.f90
Good for me.
---
The requirements are to time the execution of reading bible.txt file
producing a sorted list of unique words and their counts. ALL non-alpha
chars are to be treated as blanks, except quote within a word, e.g.
Wife's, in which case its deleted and
becomes the word wifes Upper-case is lower-cased.
Document your results by posting the following info extracted from your
output file.
8177 a
319 aaron
..........
5 zurishaddai
1 zuzims
bible.txt
total words = 789781
unique words = 12691
xx.xx Sec ?.?? Ghz CPU ID
+ any further distinguishing info , e.g.
language/compiler/version/programmer's name
---
DF
start = datetime();
collisions, lines = 0;
do forever;
get edit (text) (a(L));
lines = lines + 1;
if length(text) > 0 then call look(1);
end;
finish = datetime();
put skip list ('total words =', sum(counts));
put skip list ('unique words =', sum(counts>0));
put skip list ('time taken=', secs(finish) - secs(start), ' secs');
put skip list ('collisions=', collisions, ' lines=', lines);
look: procedure (pos) recursive;
dcl pos fixed binary;
dcl (start, ending) fixed binary, word character (24) varying;
start = search(text, alphabet, pos);
if start = 0 then return;
ending = verify (text, alphabet, start);
if ending = 0 then
ending = length(text)+1;
else if (substr(text,ending,1) = '''') then
ending = verify (text, alphabet, ending+1);
if ending < length(text) then call look(ending);
word = substr(text,start,ending-start);
k = index(word, '''');
if k > 0 then substr(word,k) = substr(word,k+1);
call insert_word (word);
end;
"The C++ I mailed to you was very different from the challenge you've posted
here."
"Fuckwit Frank" <dave_...@hotmail.com> wrote in message
news:421494d0$0$39337$ec3e...@news.usenetmonster.com...
I just asked Fortran'ers their opinion..
"David Frank" <dave_...@hotmail.com> wrote in message
news:421855dd$0$38884$ec3e...@news.usenetmonster.com...
>
I just re-confirmed that doing a quicksort of the output word list reduces
my runtime from
1.20 sec to 0.88 sec Can anyone look at below code and give a rational
answer?
The quicksort is fast at 0.016 sec but thats a positive runtime not a
twilight zone negative runtime..
http://home.cfl.rr.com/davegemini/wc_file.f90
Those are your bugs? ha ha..
> And, as well, the timed section of your code excludes the section to
> count the number of unique words.
Look again, the benchmark times ALL the runtime.
http://home.cfl.rr.com/davegemini/wc_file.f90
Your objective of course in posting above is so you can say in the future:
"you have been shown a PL/I solution"
Its pathetic that your fellow pli'ers tolerate or even more pathetic, cant
recognize,
your usual blathering non-solution.
.
Solutions to forming lists of words, hashing, and quicksort
are topics covered in a first year computer science course.
THere's nothing new in any of the above, and PL/I solutions
are to be found in introductory texts.
.
The above demonstrates a PL/I codes using search and verify.
As a language aimed at data transformation of multi-GBs, it's not exactly
optimized for high speed with the specified task. As can be seen the times
are for the complete task, and include both elapsed and CPU times (very
important on z/OS). The code runs on Windows, OS/2 and z/OS (using an EBCDIC
source and with the *scan statement changed to use the right file name).
The source is the bit between <Source> and </Source>:
<Source>
*call time ('R') -- reset CPU timer
*start = time('S')
*words = 0
*total = 0
*uniq = 0
*scan dataset ("D:\TempJunk\WordCount\kjv12.txt") into (line)
* -- As per stupid requirement concerning case conversion and digit
elimination.
* -- Punctuation not specified, so guess.
* line = translate (line, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz0123456789().,;:!?-/"')
* do while (line <> "")
* word = compress(first(line), "'", "") -- stupid requirement: wife's ->
wifes
* line = rest(line)
* words[word] = words[word] + 1
* end
*end scan
*
*traverse reference (words) into (count, word)
&word; &count:l;
* total = total + count
* uniq = uniq + 1
*end
Total words: &total:l;
Unique words: &uniq:l;
Elapsed: &(time('S') - start):ld3; secs on 1.5GHz CPU
CPU: &time('E'):ld3; secs on 1.5GHz CPU
</Source>
The output is the bit between <Output> and </Output>:
<Output>
A 8177
AARON 319
AARONITES 2
AARONS 31
ABADDON 1
ABAGTHA 1
... snipped ...
ZUR 5
ZURIEL 1
ZURISHADDAI 5
ZUZIMS 1
Total words: 789781
Unique words: 12691
Elapsed: ?.??? secs on 1.5GHz CPU
CPU: ?.??? secs on 1.5GHz CPU
</Output>
To determine the times, you'll have to run the program yourself (meaning
you'll need to install a PL/I program that you'll have to obtain legally!).
I'll admit it as being a little slower than Ian's REXX solution.
If the task at hand is for first year CS students, what does that imply
about the "dynamic duo"
skills, neither of you can write a pl/i program that meets the challenge.
Not being standard also has the advantage to customizing function source to
the application, to wit:
program test
character,allocatable :: word(:)
allocate (word(999))
open (1,file='test.f90',form='binary')
do while (getword(1,word))
write (*,*) word,size(word)
read(*,*)
end do
stop
contains
! ---------------------
logical function getword(iunit,word)
< my code for this function is about 10 statements>
end function
1st 5 words in execution of this program:
program 7
test 4
character 9
allocatable 11
word 4
> Fortran Will Someday Have The Features That C++ Has Right Now!
>
Below is more F2003 syntax in response to my inquiry yesterday in c.l.f.
real,allocatable :: x(:) = [1,2,3]
real :: y(3) = [4,5,6]
real :: z(4) = [7,8,9,0]
can c++ replace element plus extend an array with 1 statement a-la below?
x = [0.0,x(2:),y,z] ! replace x(1) with 0 and extend x with y,z
write (*,*) x ! = 0. 2. 3. 4. 5. 6. 7. 8. 9. 0.
write (*,*) size(x) ! = 10
Quit lying, you would dearly love to be able to translate it line for line
but failed.
The dynamic duo's putrid code posted elsewhere shows both of you DEFINITELY
bothered yourself in failed attempts..
> | btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs
> my
> | latest code in
> | 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
> .
> so what?
>
I used Fortran to come up with the fastest version of this program (tying
with a C version)
over MANY solutions in other languages 0.265 sec vs. LR's 2.9 sec
for this version.
> | he also ran it under SEVERAL commercial compilers with no code
> | modifications.
> .
> So what? It contains at least two bugs just lurking, ready to surface.
> We told you about this a while back in re another program,
> but you haven't learnt a thing.
> .
> You're relying on
> 1. ASCII
> 2. 7-bit codes. Using another text file that uses codes like á or o
> might give unexpected results.
>
Latest version http://home.cfl.rr.com/davegemini/wc_file.f90 processes
8bit codes
with no loss of execution time.
What possible evidence to you have that ANYONE other than you cares (in the
SLIGHTEST) about "line-for-line" translations from one programming language to
another?
I have seen ZERO evidence in this - or any other forum - that this is something
important (or even interesting) to other programmers.
--
Bill Klein
wmklein <at> ix.netcom.com
The word-processing topic you opened in comp.lang.fortran has drawn over
200 messages, which makes your statement above R I D I C U L O U S !!
correction: your topic was opened in comp.lang.cobol..
The whole thing that you seem (repeatedly) to miss is that different programming
languages can (and do) use different techniques for solving the same "problem".
Using "one line of source" code to do the same "thing" is unimportant - while
finding "native" (and easily readable to programmers of THAT language)
techniques and algorithms is useful.
What is the "one line" statement in Fortran to do the COBOL "sort" statement? or
report-writer "generate" statement? The fact that Fortran has no "single line"
translations of those verbs doesn't mean that "native Fortran" can't accomplish
the TASK that these statements implement.
--
Bill Klein
wmklein <at> ix.netcom.com
"David Frank" <dave_...@hotmail.com> wrote in message
news:42235bf6$0$39266$ec3e...@news.usenetmonster.com...
| "robin" <rob...@bigpond.mapson.com> wrote in message news:AXGRd.167358$K7.1...@news-server.bigpond.net.au...
| > From: "David Frank" <dave_...@hotmail.com>, Usenet Monster - http://www.usenetmonster.com
| > Date: Fri, 18 Feb 2005 04:39:15 -0500
.
| > Your "topics" are neither informative, nor interesting.
| > Your Fortran word processor code can be translated to PL/I
| > virtually line-for-line. But why would we bother,
| > when your Fortran code contains at least 2 bugs.
.
| Quit lying,
.
I'm not. Your code contains at least 2 bugs.
.
| you would dearly love to be able to translate it line for line
| but failed.
.
Don't talk rot. You still don't understand a line of PL/I.
.
Your code has a line-for-line equivalent in PL/I/
.
| The dynamic duo's putrid code posted elsewhere
.
For someone who knows nil of PL/I, you seem to
think you can judge code quality?
.
| shows both of you DEFINITELY bothered yourself in failed attempts..
.
Again, you talk nonsense.
.
| > | btw, a tester in comp.lang.fortran reports his 3.2 Ghz Pentium4 runs my
| > | latest code in
| > | 0.38 sec using the Fortran world's top compiler, Intel ifort 8.0
| >
| > so what?
|
| I used Fortran to come up with the fastest version of this program (tying
| with a C version)
| over MANY solutions in other languages
.
Only after you saw their code.
.
| > | he also ran it under SEVERAL commercial compilers with no code
| > | modifications.
| >
| > So what? It contains at least two bugs just lurking, ready to surface.
| > We told you about this a while back in re another program,
| > but you haven't learnt a thing.
| >
| > You're relying on
| > 1. ASCII
| > 2. 7-bit codes. Using another text file that uses codes like á or o
| > might give unexpected results.
|
| Latest version http://home.cfl.rr.com/davegemini/wc_file.f90 processes
| 8bit codes
| with no loss of execution time.
.
But it still relies on ASCII, doesn't it.
"David Frank" <dave_...@hotmail.com> wrote in message
news:420f4915$0$38863$ec3e...@news.usenetmonster.com...
> FYI I just posted below in comp.lang.fortran topic "Word-processing
> challenge anyone? "
> and I expect to get at least 1 solution in reply.
>
> OTOH, despite the claims that PL/I has superior string-handling thats
> needed in a word-processing application,
> there wont be any solutions posted here in comp.lang.pl1
<snip>
My code not only keeps track of the unique words, it counts how many
time each appears in the text. Then, after the timing result is
produced, the code outputs information about the most frequent word(s)
(which happens to be "the") and the least frequent words (4004 appear
only once, from "abaddon" to "zuzims"). But again, that output does
not affect the timing.
Also, since I wanted to be able to access the data from the web
directly, there's a little bit of code to allow it to skip the non-
included material at the top and bottom of the file.
Eric
====
Are you interested in on-site Ruby training that uses well-designed,
real-world, hands-on exercises? http://LearnRuby.com
========
# Reads a file containing the text of the Bible and, after processing
# the data slightly, prints out how many total words and how many
# unique words it contained. See http://tinyurl.com/354hry for the
# full problem description.
# This solution is offered by LearnRuby.com (http://learnruby.com).
# If there is a file named "kjv12.txt" in the current directory, the
# data will be read from that file. Otherwise, the data will be read
# from the URI "http://patriot.net/users/bmcgin/kjv12.txt".
start_time = Time.now
Bible_Filename = "kjv12.txt"
Bible_URI = "http://patriot.net/users/bmcgin/kjv12.txt"
input = begin
open Bible_Filename
rescue
require 'open-uri'
puts "NOTE: time taken is invalid since it includes web
access\n\n"
open Bible_URI
end
state = :skip_top
word_count = 0
words_seen = Hash.new(0)
input.each_line do |line|
state = :process if
state == :skip_top && line =~ /Book\s+01\s+Genesis/
next unless state == :process
state = :skip_bottom if line =~ /022:021.*Amen\./
# remove apostrophe between letters
mod_line = line.gsub /([[:alpha:]])'([[:alpha:]])/, '\1\2'
# convert sequences of non-letters to single spaces, remove white
# space at either end, and convert letters to lower case
mod_line.gsub!(/[^[:alpha:]]+/, ' ').strip!.downcase!
words = mod_line.split
word_count += words.size
words.each { |word| words_seen[word] += 1 }
end
input.close
puts "Number of words: %d" % word_count
puts "Number of unique words: %d" % words_seen.size
end_time = Time.now
puts "Time taken to compute: %f seconds" % (end_time - start_time)
#
# Extra information, just for the fun of it...
#
# figure out the counts for the most and least frequent words
word_counts = words_seen.values
top_word_count = word_counts.max
bottom_word_count = word_counts.min
# put together a list of the most frequent word(s) and the least
# frequent word(s)
top_words = words_seen.select { |word, count|
count == top_word_count
}.map { |e| e[0] }
bottom_words = words_seen.select { |word, count|
count == bottom_word_count
}.map { |e| e[0] }
# output information about the most and least frequent words
puts("\nThe following %d most frequent word(s) each appeared %d
time(s):" %
[top_words.size, top_word_count])
puts top_words.sort.join("\n").gsub(/^/, ' ')
puts "\nThe following %d least frequent word(s) each appeared %d
time(s):" %
[bottom_words.size, bottom_word_count]
puts bottom_words.sort.join("\n").gsub(/^/, ' ')
====
This is a translation to PL/I of DF's Fortran code.
Like that code, it's ASCII specific.
(nofofl, nosize):
word_counts: proc options (main);
dcl (hashbits value(17), maxw value ((2**hashbits)), wlen value (30)) fixed binary (31);
dcl line char (80) var, sword(wlen) char (1), ch char (1);
dcl (ich, i, k, n, nchars, (nc, collisions, total, unique) init (0), odd, even) fixed bin (31);
dcl 1 wc(maxw) static,
2 word char(wlen),
2 count fixed binary (31);
dcl letters char (52) init ('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPWQSTUVWXYZ');
wc.word = ' ' ; wc.count = 0;
on endfile (sysin) go to compact;
more:
nc = 0;
get edit (line) (L);
line = translate(line, 'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ');
ml:
do forever; nc = nc+1 ;
if nc > length(line) then go to more;
nc = search(line, letters, nc); /* skip until alpha start */
if nc = 0 then go to more; /* no alpha chars at end of line. */
ch = substr(line,nc,1);
ich = unspec(ch);
k = 1 ; sword(1) = ch; /* found start new word */
odd = ich ; even = 1; /* init. hash with 1st char */
il:
do forever;
nc = nc+1;
if nc > length(line) then leave il;
ch = substr(line,nc,1);
if ch = '''' then iterate il; /* delete ' from word */
if ch < 'A' | ch > 'z' then leave il; /* end word */
if ch > 'Z' & ch < 'a' then leave il;
ich = unspec (ch);
k = k+1 ; sword(k) = ch;
if (iand(k,1) = 0) then
even = ieor(isll(even,5),ich); /* accum even hash */
else
odd = ieor(isll(odd, 5),ich); /* accum odd hash */
end;
n = ieor(isrl(odd*even,hashbits+2),odd*even); /* hash product pieces */
n = iand(n,maxw-1); /* positive index */
do forever;
n = n+1 ; if (n > maxw-1) then n = 1; /* reset index */
if (wc(n).count = 0) then
do; wc(n).word = substr(string(sword),1,k) ; wc(n).count = 1 ; iterate ml; end;
/* initial entry */
else if substr(string(sword),1,k) = wc(n).word then
do; wc(n).count = wc(n).count+1 ; iterate ml; end; /* count occurrences */
else
collisions = collisions+1;
end;
end;
compact:
n = 0;
do i = 1 to maxw; /* make entries contiguous from wc(1: */
if (wc(i).count = 0) then iterate;
n = n+1 ; wc(n) = wc(i);
total = total + wc(i).count ; unique = unique+1;
end;
call qsort(0,n-1); /* quicksort wc(1:n) entries */
put skip edit ( (wc(i).count, trim(wc(i).word) do i=1 to n)) (f(5), x(1), a, skip);
put skip list ( 'total words =', total );
put skip list ( 'unique words =', unique);
put skip data (collisions);
qsort: proc (l,r) recursive;
dcl 1 tempwc,
2 word char(wlen),
2 count fixed binary (31);
dcl sword char(wlen);
dcl ( l, r, i,j ) fixed binary (31);
i = l ; j = r ; sword = wc((l+r+2)/2).word;
do while (i <= j);
do while (wc(i+1).word < sword & i < r);
i = i+1;
end;
do while (sword < wc(j+1).word & j > l);
j = j-1;
end;
if (i <= j) then
do;
tempwc = wc(i+1);
wc(i+1) = wc(j+1);
wc(j+1) = tempwc; /* swap words,counts */
i = i+1;
j = j-1;
end;
end;
if (l < j) then call qsort(l, j);
if (i < r) then call qsort(i, r);
end qsort;
end word_counts;
/*
.......
3 zuph
5 zur
1 zuriel
5 zurishaddai
1 zuzims
total words = 789781
unique words = 12691
COLLISIONS= 4318;
*/
<snip source>
Congratulations,
very interesting translation of my
http://home.earthlink.net/~dave_gemini/wc.f90 fortran source
that NO-ONE expected to see after almost 3yrs from the original challenge.
> .......
> 3 zuph
> 5 zur
> 1 zuriel
> 5 zurishaddai
> 1 zuzims
> total words = 789781
> unique words = 12691
> COLLISIONS= 4318;
> */
I note you have removed any benchmark timing, surely the code will beat
Eric's RUBY version = 2.6 sec ??
Prove that PL/I "once upon a time" supported EASY distribution of a windows
exe program by
making your exe available to run on our PCs, even tho I sense that no
longer is supported with the
"web-sphere PL/I system"
OTOH, I can EASILY make my windows exe (1 file) available on request for
anyone's use, and note that it will process
ANY text file
Perhaps someone will confirm your source is valid for their compiler..
| <snip source>
| Congratulations,
| very interesting translation of my
| http://home.earthlink.net/~dave_gemini/wc.f90 fortran source
| that NO-ONE expected to see after almost 3yrs from the original challenge.
|
| > .......
| > 3 zuph
| > 5 zur
| > 1 zuriel
| > 5 zurishaddai
| > 1 zuzims
| > total words = 789781
| > unique words = 12691
| > COLLISIONS= 4318;
| > */
|
| I note you have removed any benchmark timing, surely the code will beat
| Eric's RUBY version = 2.6 sec ??
Benchmark timings from different CPUs are meaningless unless
all benchmarks are done on the same CPU, same operating
systems, same harddrive(s), etc. Saying that I ran some
code on my computer, and it ran in 2.2 seconds. So what?
I'd have to run the Fortran code (with a particular Fortran
compiler) also.
I also wish you would write the code to not be ASCII dependent.
______________________________________________________Gerard S.
I didn't remove anything. I just didn't implement it,
as it's irrelevant.
> surely the code will beat
> Eric's RUBY version = 2.6 sec ??
As I posted previously, time is irrelevant on different PCs.
> Prove that PL/I "once upon a time" supported EASY distribution of a windows
> exe program by
> making your exe available to run on our PCs, even tho I sense that no
> longer is supported with the
> "web-sphere PL/I system"
It's a function of the linker, not the compiler.
> OTOH, I can EASILY make my windows exe (1 file) available on request for
> anyone's use, and note that it will process
> ANY text file
No, your code WON'T process any text file.
It will ONLY handle ASCII.
> Perhaps someone will confirm your source is valid for their compiler..
Perhaps someone will confirm that YOUR source is valid
for their compiler.
But YOUR source code is non-portable, so it 's not guaranteed to
compile on every Fortran compiler.