On Friday, April 27, 2012 6:15:49 AM UTC-6, Alex Wegel wrote:
> For those still there - here's a final(?) version of my approach to the
> alien program.
> Now it's down to 150 words of source - i think that's how i leave it.
>
> #! /usr/local/bin/gforth-fast
> \ : key pad dup 1 stdin read-file throw drop c@ ; ( fast vsn. of KEY)
> : <n> 0 begin key [char] 0 - dup 10 u< while swap 10 * + repeat drop ;
> 0 value L
> : <dims> <n> to L <n> <n> ;
>
> : ord [char] a - ;
> : az? ord dup 26 u< ;
> : az* begin key az? while c, repeat drop ;
> : <dic> 0 do az* loop align ;
>
> : >msk [ -1 dup 1 rshift xor ] literal swap rshift invert ;
> : az*) -1 begin key az? while >msk and repeat drop ;
> : -pat -1 cells L * allot ;
> :
>
> begin key
> az? if
> >msk
> else
> [char] ( ord = if az*) else exit then
> then ,
> again ;
>
> : m?
> here dup rot cells - do
> i @ over c@ lshift
> 0< if unloop drop false exit then
> char+
> cell +loop
> drop true ;
>
> : #m 0 -rot L * bounds do i L m? - L +loop ;
> : .m ." Case #" 1 .r ." : " 1 .r cr ;
> : chk 0 do 2dup
> #m -pat i 1+ .m loop 2drop ;
>
> here <dims> over <dic> chk bye
Here is a new version of my program, including these upgrades:
1.) I implemented a SSEQ data type that is similar to my SEQ data type except that it allows for big (>255 char) strings.
2.) I replaced SEQ with SSEQ throughout the ALIEN program as needed so that the program will now handle the large sample file.
3.) I switched to bit-masks similar to Alex's program.
4.) I fixed a stack-picture comment that was incorrect.
This is not plain Forth --- I've never written a plain Forth program in my life. I find plain Forth to be largely unreadable (although I can *decipher* it given enough time). If I were offered a job maintaining plain Forth software, I would refuse and just stick with cab driving --- unless it paid a *lot* of money, but that is unrealistic because the employer would just have the program rewritten from scratch in a readable language for less money.
My program seems rather slow on the large sample file. I think this is due to my heavy use of the heap, which is quite slow on all Forth systems (I don't know why). I also compile functions at run-time, which may be slow depending upon which compiler is being used and how much optimization is being done.
This is SSEQ which will go into the novice package:
list
w field .cnt
w field .chars \ pointer to heap
constant sseq
: sseqable ( adr cnt -- new-adr cnt ) \ put string in heap for SSEQ so <KILL-SSEQ> will work on the SSEQ
>r
r@ alloc \ -- adr new-adr \r: -- cnt
tuck r@ cmove> \ -- new-adr
r> ;
: init-sseq ( adr cnt node -- node )
init-list >r
sseqable r@ .cnt ! r@ .chars !
r> ;
: new-sseq ( adr cnt -- node )
sseq alloc
init-sseq ;
: <kill-sseq> ( node -- )
dup .chars @ dealloc
dealloc ;
: kill-sseq ( head -- )
each[ <kill-sseq> ]each ;
macro: <sseq> ( node -- adr cnt )
dup .chars @ swap .cnt @ ;
: <show-sseq> ( node -- )
<sseq> type cr ;
: show-sseq ( head -- )
cr
['] <show-sseq> each ;
1000 value sseq-size \ making this too big could be a problem if such a large block can't be found in the heap
: <read-sseq> ( adr cnt -- head )
r/o <open-file> >r \ r: -- file-id
nil begin \ -- head
sseq-size 2+ alloc \ -- head chars
dup sseq-size r@ read-line abort" *** READ-SSEQ failed to read line ***"
while \ -- head chars cnt
dup >r realloc r> \ -- head new-chars cnt
new-sseq \ -- head node
link repeat 2drop
r> <close-file> ;
: read-sseq ( name -- head )
count <read-sseq> ;
: <write-sseq> ( head adr cnt -- )
w/o <create-file> swap \ -- file-id head
each[ <sseq> rover write-line abort" *** <WRITE-SSEQ> failed to write ***" ]each
<close-file> ;
: write-sseq ( head name -- )
count <write-sseq> ;
This is the new ALIEN program:
\ This is the solution to the "alien language" example problem from Google CodeJam.
\
http://code.google.com/codejam/contest/90101/dashboard#s=p0
\ Written by Hugh Aguilar --- copyright (c) 2012 --- BSD license
\ requires novice.4th and list.4th
marker alien.4th
\ ******
\ ****** input and output
\ ******
variable #letters
variable #words
variable #patterns
: sseqs ( name-str -- word-sseq pattern-sseq )
read-sseq
dup <sseq> evaluate
#patterns ! #words ! #letters !
.fore @ \ -- word-seq
dup #words @ nth \ -- word-seq pattern-seq
delink
dup #patterns @ nth \ -- word-seq pattern-seq extraneous-seq
delink kill-sseq \ -- word-seq pattern-seq
\ error checking
over length #words @ <> abort" *** bad word-sseq ***"
dup length #patterns @ <> abort" *** bad pattern-sseq ***"
over each[ .cnt @ #letters @ <> abort" *** string in word-sseq is wrong length ***" ]each ;
: dump-result ( pattern-list name-str -- )
<cstr +cstr c" .result" +cstr cstr> write-sseq ;
\ ******
\ ****** convert pattern string into any-seq list
\ ****** each node in the list represents one char in the target string
\ ****** the .LINE string of each node contains all of the character that would match
\ ******
: check-any-str { pattern-adr pattern-cnt head -- }
head length #letters @ <> if
cr ." *** any-seq is the wrong length ***"
cr pattern-adr pattern-cnt type
cr true abort" *** aborting ***
then ;
: <make-any-seq> ( head str -- head )
dup c@ if new-seq link \ str has characters in it
else drop then ;
: make-any-seq { pattern-adr pattern-cnt | group? -- any-seq }
nil <cstr
pattern-adr pattern-cnt bounds ?do \ -- head
group? if \ if inside of ( ) group
I c@ [char] ) = if false to group? cstr> <make-any-seq> <cstr
else I c@ char+cstr then
else \ else outside of ( ) group
I c@ [char] ( = if true to group?
else I c@ char+cstr cstr> <make-any-seq> <cstr then
then
loop
cstr> <make-any-seq>
pattern-adr pattern-cnt rover check-any-str ;
\ ******
\ ****** generate pattern matcher
\ ******
char & comment \ this is an example of what will get generated
:noname ( mask-adr -- match? ) \ this is for: a(bc)
dup @ 1 and 0= if drop false exit then
w + \ -- new-mask-adr
dup @ 6 and 0= if drop false exit then
w + \ -- new-mask-adr
drop true ;
&
: char>mask ( char -- mask ) \ this assumes that there are < 32 chars, and they start with 'a'
[char] a - \ -- ordinal
1 swap lshift ;
: <generate-pattern> ( node -- )
>r
s" dup @ " evaluate
0 r> .line @ count bounds do \ -- mask
I c@ char>mask or loop lit, \ --
s" and 0= if drop false exit then w + " evaluate ;
: generate-pattern ( any-seq -- xt )
>r
s" :noname ( char-adr -- match? ) " evaluate
r> ['] <generate-pattern> each
s" drop true ; " evaluate ;
\ ******
\ ****** upgrade word-seq
\ ******
list
w field .mask \ pointer to array of words
constant mask
: <kill-mask> ( node -- )
dup .mask @ dealloc
dealloc ;
: kill-mask ( head -- )
each[ <kill-mask> ]each ;
: init-mask ( word-adr word-cnt node -- node )
init-list >r
dup w * alloc \ -- word-adr word-cnt mask-array
dup r@ .mask !
-rot bounds do \ -- mask-element
I c@ char>mask over !
w + loop drop \ --
r> ;
: new-mask ( word-adr word-cnt -- )
mask alloc
init-mask ;
: upgrade-word ( word-sseq -- mask-list ) \ create a MASK list given a SSEQ list
nil swap \ -- mask-list word-sseq
each[ <sseq> new-mask link ]each ;
\ ******
\ ****** upgrade pattern-seq
\ ******
sseq \ starts out as pattern-str, later gets changed to result-str
w field .any \ pointer to any-seq
w field .xt \ xt of generated pattern matcher
w field .matches \ count of matches for this pattern
constant pattern
: <kill-pattern> ( node -- )
dup .any @ kill-seq
<kill-sseq> ;
: kill-pattern ( head -- )
each[ <kill-pattern> ]each ;
: init-pattern ( pattern-adr pattern-cnt node -- node )
init-sseq >r
r@ .chars @ r@ .cnt @ make-any-seq r@ .any !
r@ .any @ generate-pattern r@ .xt !
0 r@ .matches !
r> ;
: new-pattern ( adr cnt -- node )
pattern alloc
init-pattern ;
: upgrade-pattern ( pattern-sseq -- pattern-list ) \ create a PATTERN list given a SSEQ list
nil swap \ -- pattern-list pattern-sseq
each[ <sseq> new-pattern link ]each ;
\ ******
\ ****** pattern-match
\ ******
: <check-word> ( pattern-list-node mask-node -- pattern-list-node )
.mask @ \ -- pattern-list-node mask-adr
over .xt @ execute \ -- pattern-list-node match?
if 1 over .matches +! then ; \ -- pattern-list-node
: <check-pattern> ( mask-list pattern-list-node -- mask-list )
over ['] <check-word> each \ -- word-sseq pattern-list-node
drop ;
: check-pattern ( mask-list pattern-list -- )
['] <check-pattern> each \ -- mask-list
drop ;
\ ******
\ ****** make result strings
\ ******
: u>str ( u -- adr cnt )
u>d <# #s #> ;
: <fill-result> ( pattern# pattern-list-node -- new-pattern# )
dup .chars @ dealloc \ get rid of pattern-str
<cstr
c" Case #" +cstr
over u>str <+cstr>
c" : " +cstr
dup .matches @ u>str <+cstr>
cstr> count sseqable rover .cnt ! swap .chars ! \ -- pattern# \ set result-str
1+ ; \ -- new-pattern#
: fill-result ( pattern-list -- )
1 swap ['] <fill-result> each \ -- pattern#
drop ;
\ ******
\ ****** main program
\ ******
: alien ( name-str -- )
dup sseqs { name-str word-sseq pattern-sseq | mask-list pattern-list -- }
s" marker upgrade-pattern-stuff " evaluate \ so we can get rid of the UPGRADE-PATTERN words
word-sseq upgrade-word to mask-list
pattern-sseq upgrade-pattern to pattern-list
mask-list pattern-list check-pattern
pattern-list fill-result
pattern-list name-str dump-result
\ clean up
word-sseq kill-sseq
pattern-sseq kill-sseq
mask-list kill-mask
pattern-list kill-pattern
s" upgrade-pattern-stuff " evaluate ;