The peculiar use of THROW is ... interesting. Note that I didn't test
for stack depth with other than the given test file.
-marcel
-- ---------------------------------------------------------------------------------
\ LC53_3.frt -- Skeletal LC53, compiles under many 32-bit ANS Forth systems
\ Copyright (C) 2009 by some cab driver
\ Apparently the THROW simply *aborts* the recursive loop when a match is found
\ (to clean up the stacks).
ANEW -lc53_3
#32 2^x 5 - CONSTANT rnd-unity
#32 2^x #333333333 - CONSTANT rnd-mult
1 #31 LSHIFT CONSTANT |word
1 7 LSHIFT DUP >< OR
DUP #16 LSHIFT OR CONSTANT |bytes
HERE VALUE seedi
2VARIABLE sample
: <prng> rnd-mult UM* rnd-unity UM/MOD DROP ; ( rnd -- new-rnd )
: init-seed #123456789 TO seedi ; ( -- )
: prng seedi <prng> DUP TO seedi ; ( -- u )
: rnd-bytes prng ; ( -- r )
: crypt BOUNDS DO I @ rnd-bytes XOR I ! CELL +LOOP ; ( addr cnt -- )
: test-file S" test.txt" ; ( -- addr cnt )
: show-file sample 2@ TYPE ; ( -- )
: crypt-file init-seed sample 2@ crypt ; ( -- )
: load-file ( -- ) \ sets sample
test-file R/O OPEN-FILE ABORT" *** failure to open test file ***" LOCAL fid
fid FILE-SIZE NIP ABORT" *** failure to determine file size ***"
DUP 4 + ALLOCATE ABORT" *** failure to allocate memory ***"
SWAP
2DUP fid READ-FILE NIP ABORT" *** failure to read test file ***"
fid CLOSE-FILE ABORT" *** failure to close test file *** "
2DUP sample 2! + 0! ; ( erase 4 bytes at EOF )
: how-far ( seed -- count|true ) \ how far we went, or TRUE if all the way
TO seedi sample 2@ OVER LOCAL start
BOUNDS ?DO I @ rnd-bytes XOR |bytes AND IF I start - UNLOOP EXIT ENDIF CELL +LOOP TRUE ;
: <find-seed> ( upper-count seed mask -- u ) RECURSIVE
DUP 0= IF NIP NIP EXIT ENDIF
LOCALS| mask seed |
( upper-count ) seed mask OR how-far
DUP 0< IF seed mask OR THROW ENDIF ( escape to stack cleaner, found it )
2DUP < IF NIP seed mask OR mask 1 RSHIFT <find-seed>
seed mask 1 RSHIFT <find-seed> EXIT
ENDIF
DROP seed mask 1 RSHIFT <find-seed>
seed mask OR mask 1 RSHIFT <find-seed> ;
\ returns FALSE if no seed was found
: find-seed ( -- seed|false ) 0 0 |word ['] <find-seed> CATCH DUP IF NIP NIP NIP ENDIF ;
: test ( -- )
load-file show-file crypt-file 0 TO seedi
CR ." SAMPLE is encrypted ..."
TIMER-RESET find-seed .ELAPSED
?DUP IF DUP TO seedi CR ." The seed is: " sample 2@ crypt DEC.
ELSE CR ." We did a complete search and were unable to find a valid seed."
ENDIF ;
\ visits to <find-seed> are: 464723587
\ Factor time: 9 minutes and 14 seconds
\ Forth time: 22 seconds
\ iForth time: 10.992 seconds ( PIV 3 GHz )
\ iForth time: 4.553 seconds ( Core-Duo 2.66 GHz )
\ assembly language: 17 seconds
Here is the link to my program. Weirdly enough, you didn't include it
in your post, or mention my name as the author.
www.rosycrew.org/LC53.4th
I didn't compile your program because it is not ANS-Forth. I don't
have iForth and I'm not going to spend the $100 to buy it. I really
need ANS-Forth in order to be able to run programs such as this.
I did look over your program however, and it doesn't work. You can't
test 4 bytes at a time! That doesn't make any sense. In my experience,
HOW-FAR typically only tests 0 to 3 bytes of the file before it runs
into a byte that doesn't decrypt properly. Your HOW-FAR is going to
return zero pretty much *every* time that it is called. In the cases
that my HOW-FAR would return a value of 4, yours will return a value
of 1. By the time that your seed is good enough to get 4 bytes into
the file without failing, you are so close to completion that it
doesn't matter any more. The important thing is to make correct
decisions early in the process, when you are testing the high bits, so
that you can avoid testing millions of seeds unnecessarily. Your HOW-
FAR has to be working properly to do this though.
The HOW-FAR function is used to determine if you test a 0 bit before a
1 bit, or vice-versa. If HOW-FAR isn't working, the program will test
the 0 and 1 bits in an arbitrary order. You will still find the
correct seed eventually, and your program will seem to work, even
though your HOW-FAR function makes no sense. You might actually get a
fast execution time. For example, consider the case in which you
*always* test a 0 bit before a 1 bit. If the high bits of the seed
that you are looking for happen to be mostly 0 bits, then your program
may actually run as fast as (maybe even faster) than the correctly
written program. This is just luck though --- it all depends upon what
the seed is that you are looking for.
You seem to think that my use of CATCH and THROW is unidiomatic, and
you're not the first person to say this. I can relate; I've never
before seen CATCH and THROW used this way either. How else am I
supposed to escape from a recursive-descent search though? Forth isn't
Icon --- it doesn't have generators, which are designed for recursive-
descent searches. I tried to talk the Forth-200x committee into
introducing code that could be used to implement generators in Forth,
but they absolutely hated the idea --- pretty much the same response
as I've gotten with all of my ideas. I've given up on Forth-200x now.
Note that I only put that copyright notice on top so that people
wouldn't take my program to a job interview and pretend that they
wrote it themselves to impress the interviewers. In the rather
unlikely event that there is a job opening for Forth programmers, I
would want that job myself. I did write LC53, which has got to be
worth something. If you change the algorithm and rewrite LC53 however,
then I want you to remove my name from the copyright notice ---
especially if your modification doesn't make sense.
> I didn't compile your program because it is not ANS-Forth. I don't
> have iForth and I'm not going to spend the $100 to buy it. I really
> need ANS-Forth in order to be able to run programs such as this.
I suppose "gforth" is not an ANS-Forth? Your version of the program
does not compile under gforth 0.7.0:
lc53.f:27: Cannot tick compile-only word (try COMP' ... DROP)
over ['] >>>;<<< = if 2drop postpone ; immediate exit
then
I compiled under SwiftForth, but I made every effort to be ANS-Forth
compatible. What line did that error message come from?
I don't use gforth. What does that error message mean? What is a
"compile-only word?"
Line 27 (the ":27:" at the beginning tells you the line number).
It doesn't like you taking the address of ";" , though I don't
understand why. I don't use gforth myself except when ANS-iness is
required. My "Reva" forth is not ANS, so I cannot directly test your
code on it at the moment.
It might be informative to run it against Win32Forth as well; though I
no longer use Windows.
>> > =A0 =A0 =A0 =A0 =A0 =A0 over ['] >>>;<<< =3D if =A02drop =A0postpone ; =
...
>> I don't use gforth. What does that error message mean? What is a
>> "compile-only word?"
A word without interpretation semantics.
>It doesn't like you taking the address of ";" , though I don't
>understand why.
Because ";" has no execution or interpretation semantics. The
execution token returned by ['] represents the execution semantics,
but since ";" does not have that ...
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2009: http://www.euroforth.org/ef09/
I have trouble with the word "semantics" when applied to Forth words.
Sorry to sound like an idiot.
What does it mean? Does it mean "action"? As in "thing that it does"
Execution semantic: "Thing(s) or action(s) that the word does if
executing (running a program)"
Interpretation semantic: "Thing(s) or action(s) that the word does if
interpreting"
Would that be another way of saying it?
Sorry to hijack.
Mark
Yes. That's programming language terminology. A language (or
language construct) has syntax and semantics. I.e., semantics is
pretty much everything that is not syntax. In Forth that means "what
it does".
>Execution semantic: "Thing(s) or action(s) that the word does if
>executing (running a program)"
>Interpretation semantic: "Thing(s) or action(s) that the word does if
>interpreting"
That would be uses of the various semantics, and you did not quite get
them correct. From the usage side, execution semantics are not really
used directly, except for ' and ['] (and even there one can argue that
one should use interpretation semantics) and [COMPILE] (and one can
discuss about that). The semantics used are:
Interpretation semantics: in the text interpreter in interpret state,
in ' and ['].
Compilation semantics: in the text interpreter in compilation state,
in POSTPONE (and [COMPILE]).
The other side are the definitions of the various semantics. For most
words, execution semantics are defined, and from that we derive
interpretation and compilation semantics through a default mechanism.
But for some words, interpretation semantics and compilation semantics
are defined explicitly (and usually no execution semantics are defined
in those cases, because they are not needed); in some cases, e.g.,
";", the interpretation semantics are undefined.
> Because ";" has no execution or interpretation semantics. The
> execution token returned by ['] represents the execution semantics,
> but since ";" does not have that ...
Thanks Anton, that explains it concisely enough. Reva lets you take
the address of ';' (or any other word, though it may not evaluate to
an executable address) -- I'm pretty sure I could do that in other
Forths as well, but it's been a while since I played with them
extensively.
It puzzles me that ";" doesn't have execution semantics -- does it not
cause the word to "move on" to the next word (return to caller or
whatever gforth does)?
Regards,
Ron
Yes, most Forth systems implement ";" as immediate word (which has
interpretation semantics), and/or implement ' and ['] such that they
produce an xt for the compilation semantics, if a word is
compile-only. So they don't complain when you tick ";", they produce
some xt.
>It puzzles me that ";" doesn't have execution semantics -- does it not
>cause the word to "move on" to the next word (return to caller or
>whatever gforth does)?
No, the compilation semantics of ";" is to compile an EXIT, switch to
interpretation, make the colon definition visible (if it has a name),
and maybe clean up some other stuff (depending on the system).
The word you are thinking of that just returns and has execution
semantics is called ";s" or UNNEST in various systems (";s" in
Gforth); and in full-blown native-code systems it may not exist;
instead, the compiler just generates a native return instruction or
even rewrites the preceding call into a jump.
Yeah, most systems don't have truly separate interpretation and
compilation semantics: gforth is one of the few systems that does.
> It puzzles me that ";" doesn't have execution semantics -- does it not
> cause the word to "move on" to the next word (return to caller or
> whatever gforth does)?
Sure, but it does that at compile time, so it only needs compilation
semantics.
From the standard:
Interpretation: Interpretation semantics for this word are undefined.
Compilation: ( C: colon-sys -- )
Append the run-time semantics below to the current definition. End
the current definition, allow it to be found in the dictionary and
enter interpretation state, consuming colon-sys. If the data-space
pointer is not aligned, reserve enough data space to align it.
Run-time: ( -- ) ( R: nest-sys -- )
Return to the calling definition specified by nest-sys.
--Josh
>It puzzles me that ";" doesn't have execution semantics -- does it not
>cause the word to "move on" to the next word (return to caller or
>whatever gforth does)?
Interpretation and compilation refer to when the text is processed.
It is perfectly legal for a system encountering ";" while interpreting
to issue an error message.
In essence, run-time refers to what happens when the compiled code
executes, which is to return from the word. run-time is one time
frame later than interetation and compilation.
Although it is conventional to implement ";" as an immediate word
that checks interpretation/compilation state, doing this for
other words can introduce problems, as Anton is fond of pointing
out.
To avoid these sorts of problems, some Forths separate the
interpretation and compilation actions, e.g. for IF, into
separate words. In VFX Forth from v4.3/4.4 onwards IF and
string words such as S" are no longer immediate. These days
I have a lot more sympathy for Anton's view that "state smart
words are evil", although I would actually prefer to
say that defining words with immediate and state-smart
children are evil. There is still a place for state-smart
words in application code, even if language lawyers can
produce pathological examples that break the code.
--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
I'm still having a little trouble with this!
My ; word in my hobby Forth program is something like:
: ; ['] EXIT , LATEST @ HIDDEN [ ; IMMEDIATE
(it's actually written using assembler DATA directives that reference
the entry points to the actual routines but it's something like the
above in real Forth. The HIDDEN word toggles the hidden bit,
previously set by : )
So,
Interpretation semantics: Compile EXIT into HERE, un-hide the word as
defined by LATEST and set interpretation state.
Compilation Semantics: Exit the currently defined word (into NEXT)
Right?
Mark
I don't understand how a word can not have execution semantics. That
would be like a person not having a body --- we seem to be descending
into mysticism. For most words (DUP, +, etc.) the execution semantics
takes place at run-time. For immediate words (IF, ;, etc.) the
execution semantics takes place at compile-time. These words may use
COMPILE, to compile the xt of some word (0BRANCH for IF, UNNEST
for ;), but this is not an issue that ['] has to worry about --- [']
just provides the xt of the word that it is given (DUP, +, IF, ;,
etc.).
I looked up ['] in the ANS-Forth document and it says:
"Place name's execution token xt on the stack."
The ANS-Forth document does NOT say:
"Place name's execution token xt on the stack unless we don't want to,
in which case the programmer will be given an error message implying
that this is somehow his fault."
What I am saying is that I didn't have any warning that what I was
doing in MACRO: was going to be a problem. This whole issue with
gforth has come as a complete surprise to me. Is gforth the only
system that fails to compile my program? If it is, then the solution
is obvious --- ignore gforth. I've been ignoring gforth for years, and
I'm willing to continue. :-)
: <find-seed> { upper-count passkey mask | lower-count -- }
[ debug? ] [if] 1 visits +! [then]
mask if \ we
still have bits to try
passkey mask or how-far to lower-count \ -1
or count
lower-count 0< if passkey mask or throw then \
throw good seed back to CATCH
upper-count lower-count < if \
setting this bit helped
lower-count passkey mask or mask 1 rshift
recurse
upper-count passkey mask 1 rshift
recurse
else \
setting this bit didn't help
upper-count passkey mask 1 rshift
recurse
lower-count passkey mask or mask 1 rshift
recurse
then
then ;
I wrote a test program to determine how many of each possible value
HOW-FAR returns:
40 constant count-max
create <counts> count-max 1+ cells allot
: counts ( index -- adr )
cells <counts> + ;
: test-how-far ( -- ) \ fill counts
load-file crypt-file
count-max 1+ 0 do 0 I counts ! loop
0 1 do I how-far
dup 0< if drop
else count-max min counts 1 swap +! then
loop
count-max 1+ 0 do cr I 6 u.r I counts @ 12 u.r loop
0 count-max 1+ 0 do I counts @ + loop
cr ." total:" 12 u.r ;
Searching for the seed value 123456789, I get these results:
0 2147483647
1 1073741823
2 536870906
3 268435469
4 134217717
5 67108916
6 33554450
7 16777294
8 8388637
9 4194373
10 2097153
11 1048391
12 524327
13 262137
14 130973
15 65512
16 32698
17 16446
18 8201
19 4161
20 2026
21 1004
22 487
23 264
24 140
25 77
26 28
27 17
28 8
29 7
30 5
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
total: 4294967294
From the above test, we can see that the values < 4 are pretty common.
2147483647 1073741823 536870906 268435469 + + + u.
gives us: 4026531845
This is about 4.0 billion HOW-FAR values that are < 4 out of about 4.3
billion possible seeds. What I said before is true --- you can't test
4 bytes at a time --- that isn't going to work.
Probably not.
> If it is, then the solution
> is obvious --- ignore gforth. I've been ignoring gforth for years,
> and
> I'm willing to continue. :-)
See the fifth ambiguous condition in section 4.1.2 of the ANS Forth
standard. Standard systems can do anything when an ambiguous condition
occurs. If you want portability then avoid ambiguous conditions.
This one is easy to avoid in your definition of macro: by doing a
string comparison before find e.g. (untested):
dup count s" ;" compare 0= if drop postpone ; immediate exit then
and remove the appropriate stuff after find.
Gerry
>My ; word in my hobby Forth program is something like:
>
>: ; ['] EXIT , LATEST @ HIDDEN [ ; IMMEDIATE
...
>Interpretation semantics: Compile EXIT into HERE, un-hide the word as
>defined by LATEST and set interpretation state.
>Compilation Semantics: Exit the currently defined word (into NEXT)
>
>Right?
Sorry, no - you are confusing compilation action (how ; compiles)
with run-time (what happens when the compiled code executes). Because
your version of ; is IMMEDIATE your code is executed at compile time,
and so the compilation and interpretation semantics are the same.
Start with the normal case, i.e. what happens in a colon
definition. The ANS phrase for "Compile <foo> at HERE" is "Append the
run-time semantics below to the current definition" because some
CPUs have separate code and data address spaces. Your compilation
action is to append EXIT to the current definition and enter
interpretation state.
Because the word is immediate, your interpretation action is the same
as the compilation action.
The run-time action of ; is the action of EXIT.
Here's the complete ANS text for ;
6.1.0460 ;
semicolon CORE
Interpretation: Interpretation semantics for this word are
undefined.
Compilation: ( C: colon-sys -- )
Append the run-time semantics below to the current definition. End the
current definition, allow it to be found in the dictionary and enter
interpretation state, consuming colon-sys. If the data-space pointer
is not aligned, reserve enough data space to align it.
Run-time: ( -- ) ( R: nest-sys -- )
Return to the calling definition specified by nest-sys.
Stephen
Yes.
The compilation semantics is the same, because this word is immediate.
>Compilation Semantics: Exit the currently defined word (into NEXT)
No. The definition of ";" in the standard contains a paragraph on
"run-time semantics", and your descritption fits that. But the
run-time semantics is just used to define the compilation semantics,
it's used nowhere else.
That's quite simple: By not defining execution semantics in the
definition of the word in the standard document. Look up the
definition of ";" in the standard, and you will see that there is no
"Execution:" section there; and there are other labeled sections
there, so the "omitted label" sentence of 3.4.3.1 does not apply.
>For immediate words (IF, ;, etc.)
IF and ";" are not immediate words in the standard. A system can
implement them as immediate words, but a program cannot rely on their
immediacy.
>I looked up ['] in the ANS-Forth document and it says:
>
>"Place name's execution token xt on the stack."
What's the execution token of a word that has no execution semantics.
Hmm, since you think that ";" is immediate, I guess you want an
execution token that, when executed, performs the compilation
semantics. You can get that as follows. Before the rest of the
program, define:
: ; postpone ; ; immediate
Now you have an immediate ";" with an execution semantics that's the
same as the compilation semantics, and you can tick it.
>What I am saying is that I didn't have any warning that what I was
>doing in MACRO: was going to be a problem.
A system that would tell us all non-standard usages would be nice, but
we don't have that. For now the solution is to test on as many
systems as possible.
The ANS effort tried to allow a wide variety of different methods to
build Forth compilers. There was the example of the Forth-83 standard
which AIR officially required Forths to be 16-bit and indirect-threaded.
We can't be sure what will be the most efficient way to build a Forth in
the future, so they made an attempt not to require it to be one
particular way.
I'm not sure exactly what they were trying to allow with this stuff, but
somehow what they did was to make a whole class of special words --
basically almost all of the IMMEDIATE words that the compiler is
supposed to provide already. If you make your own IMMEDIATE word you
just say IMMEDIATE and it becomes IMMEDIATE . If you use a word like IF
or ; then it might be special and you can't treat it the same.
I think part of what they were doing was to allow some Forth systems to
have interpreted versions of things like IF and ' so that on those
systems you get exactly the same result from the code whether it's just
typed in at the keyboard or compiled. That's handy for engineers who
just learn the language in a couple of days and start using it. They
don't want to figure out ' and ['] , CHAR and [CHAR] , IF and [IF] etc.
So some Forth systems do it that way, and some Forth systems do it the
old way, and they can both be standard. But if you want your code to run
on both of them, you can't use anything that would call up the special
features of the special standard Forth systems.
So anyway, if you know whether the word you want to ['] will be a
special word, you can handle it some other way.
If you want to compare to see whether you have a ; or not, you can
compare strings instead. S" ;" COMPARE .
If you want to compile a ; you can do POSTPONE ;
If you need to handle whatever comes in the door then you can find
something which will work on most systems.
> I looked up ['] in the ANS-Forth document and it says:
>
> "Place name's execution token xt on the stack."
>
> The ANS-Forth document does NOT say:
>
> "Place name's execution token xt on the stack unless we don't want to,
> in which case the programmer will be given an error message implying
> that this is somehow his fault."
>
> What I am saying is that I didn't have any warning that what I was
> doing in MACRO: was going to be a problem. This whole issue with
> gforth has come as a complete surprise to me. Is gforth the only
> system that fails to compile my program? If it is, then the solution
> is obvious --- ignore gforth. I've been ignoring gforth for years, and
> I'm willing to continue. :-)
The ANS document was designed to be correct more than to be easily
readable. There have been remarkably few gaffes where it turned out to
be truly incorrect. But it has never been particularly easy to
understand. Lots of people mostly ignore the standards, partly for that
reason. You can get a long ways on portability if you just remember to
use CELLS instead of 2* or 4* or whatever. A few tricks like that will
handle 95% of the problems with porting your code. You can then put in a
tremendous effort trying to get to 100%, or at some point you can just
accept that 97% or 99% is good enough.
I can special-case the ['] of ; prior to doing FIND. The problem I
foresee however, is that FIND has to produce an xt of a wide variety
of words, including words such as >R and so forth, that are likely to
be as problematic as ; was. If these words don't have an xt, then they
are going to fail too, but I won't know about this until I write a
MACRO: word that has one of the bad'uns in it and FIND crashes when it
tries to produce an xt for that word. The end result is going to be a
gigantic CASE statement special-casing all of the bad words that I
discover (through trial and error) will crash FIND. The problem here
is that I will never know when I am done special-casing bad words,
because I have no way to predict which words have an xt and which
words don't. Some words might have an xt on one system, but not on
another, so I would have to test all of the popular systems and
special-case anything that fails anywhere.
If MACRO: is going to become this complicated, then I'll have to
forget about it and just write colon words. I originally had a version
of LC53 written using colon words, but it ran in about two and half
minutes, which is an order of magnitude slower than the version that
uses MACRO: words. It was also much more difficult to read, as all of
the access of the seed and mask on the r-stack had to be done with R>
and R@, and this code couldn't be factored out into words like NEXT-
SEED and NEXT-MASK. That tangle of R> and R@ words was confusing the
heck out of me, so I can only imagine what effect it would have on
novices studying LC53 in their effort to learn Forth.
I can special-case the ['] of ; prior to doing FIND. The problem I
> ... The HIDDEN word toggles the hidden bit, ...
Most Forths call it SMUDGE .
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
There's a much simpler way to define a single line macro (I'm not sure
where it came from, possibly Wil Baden)
: macro
: char parse postpone sliteral postpone evaluate
postpone ; immediate
;
Usage is e.g.
macro square " dup *" ok
: foo 5 square . ; ok
foo 25 ok
Gerry
> If MACRO: is going to become this complicated, then I'll have to
> forget about it and just write colon words. I originally had a version
> of LC53 written using colon words, but it ran in about two and half
> minutes, which is an order of magnitude slower than the version that
> uses MACRO: words. It was also much more difficult to read, as all of
> the access of the seed and mask on the r-stack had to be done with R>
> and R@, and this code couldn't be factored out into words like NEXT-
> SEED and NEXT-MASK. That tangle of R> and R@ words was confusing the
> heck out of me, so I can only imagine what effect it would have on
> novices studying LC53 in their effort to learn Forth.
OK, I looked at MACRO: but not at all the ways you use it.
It looks like it creates a compile-only word. The compile-only word will
itself compile a predefined sequence of code. It gives an error if the
original source code for that sequence does not fit on one line.
This looks like a useful function and it's only natural that someone
would have done it before. I did it with the following syntax:
Where you do
MACRO: <name> <code ;>
I do
: <name> ]] code [[ ;
I believe I didn't get around to allowing negative number literals in my
macros. Apart from that I believe my version does everything yours does,
and it runs on gforth and was designed to be widely portable to Forth-94
systems.
However, it looks like you aren't building new words at run-time that
you compile macros into. So the speed that your macros work is not a
very big deal. If you were to waste time it would be wasted compile
time.
So something like this might work:
: TEXT: ( "name" "macro text" ";" -- )
:
[CHAR] ; PARSE
POSTPONE SLITERAL
POSTPONE EVALUATE
POSTPONE ; IMMEDIATE ;
It has limitations. You can't do ( ... ; ... ) inside the macro text. (
) comments get saved and then executed each time, which is wasteful. You
mustn't rename words that you use inside macros or you'll get the new
version every time you make a macro after the word gets redefined.
But it's simple and it will probably work on standard systems without a
lot of fuss.
If you care about running on most standard Forth systems, text macros
might be easy to port and they have a different set of constraints than
compiler macros.
> > See the fifth ambiguous condition in section 4.1.2 of the ANS Forth
> > standard. Standard systems can do anything when an ambiguous
> > condition occurs. If you want portability then avoid ambiguous
> > conditions.
> >
> > This one is easy to avoid in your definition of macro: by doing a
> > string comparison before find e.g. (untested):
> >
> > dup count s" ;" compare 0= if drop postpone ; immediate exit then
> >
> > and remove the appropriate stuff after find.
>
> I can special-case the ['] of ; prior to doing FIND. The problem I
> foresee however, is that FIND has to produce an xt of a wide variety
> of words, including words such as >R and so forth, that are likely to
> be as problematic as ; was. If these words don't have an xt, then they
> are going to fail too, but I won't know about this until I write a
> MACRO: word that has one of the bad'uns in it and FIND crashes when it
> tries to produce an xt for that word. The end result is going to be a
> gigantic CASE statement special-casing all of the bad words that I
> discover (through trial and error) will crash FIND. The problem here
> is that I will never know when I am done special-casing bad words,
> because I have no way to predict which words have an xt and which
> words don't. Some words might have an xt on one system, but not on
> another, so I would have to test all of the popular systems and
> special-case anything that fails anywhere.
Since this is entirely a problem for standard systems, you could look at
the standard and count the words which have no execution behavior. It's
a finite number. I believe every one of them is in the core wordset, but
I could be wrong, there may be some in core extension and maybe some in
the tools wordset.
> If MACRO: is going to become this complicated, then I'll have to
> forget about it and just write colon words. I originally had a version
> of LC53 written using colon words, but it ran in about two and half
> minutes, which is an order of magnitude slower than the version that
> uses MACRO: words. It was also much more difficult to read, as all of
> the access of the seed and mask on the r-stack had to be done with R>
> and R@, and this code couldn't be factored out into words like NEXT-
> SEED and NEXT-MASK. That tangle of R> and R@ words was confusing the
> heck out of me, so I can only imagine what effect it would have on
> novices studying LC53 in their effort to learn Forth.
I haven't studied your code to see what you're doing, but if you're just
compiling code you could do it with text macros.
Instead of getting a sequence of xt's to compile whenever you want, you
could get a sequence of words. " XLERB THEN ;" You can compile them with
EVALUATE . Just make sure you haven't already made new definitions with
old names, because if you've done that then you'll compile the new words
instead of the old words.
You could make it easier to read by using locals in place of R> R@ but
you still couldn't factor it easily. Locals only factor when you put
them back onto the data stack before you call the factor. Then you can
make new locals inside the factor and it's added overhead. Easier to
read than lots of R@ R'@ 2R@ R> 2R> though.
Try:
: * ." bug" ;
: foo 5 square . ;
foo
You're not alone.
For whatever reason ANS-94 chose to use legalese instead of
common language. There's a look-up list in ANS section 2.1.
Interesting to note that even the legal profession is looking
at replacing legal terminology and buzz words with everyday
language. It seems not only did the average person not
understand documents written in legalese, neither did the
lawyers!
Substitute the word "semantics" with "behaviour".
> MarkWills wrote:
>>>Because ";" has no execution or interpretation semantics
>>
>> I have trouble with the word "semantics" when applied to Forth words.
>> Sorry to sound like an idiot.
>>
>> What does it mean? Does it mean "action"? As in "thing that it does"
>>
>> Execution semantic: "Thing(s) or action(s) that the word does if
>> executing (running a program)"
>> Interpretation semantic: "Thing(s) or action(s) that the word does if
>> interpreting"
>>
>> Would that be another way of saying it?
>>
>> Sorry to hijack.
>
> You're not alone.
>
> For whatever reason ANS-94 chose to use legalese instead of
> common language. There's a look-up list in ANS section 2.1.
>
For whatever reason?
The Standard Document is a document recognised by ANSI and ISO and
contains therefore legalese terminology. It's an offical document.
> Interesting to note that even the legal profession is looking
> at replacing legal terminology and buzz words with everyday
> language. It seems not only did the average person not
> understand documents written in legalese, neither did the
> lawyers!
Nonsense, look at the standard documents of other languages like
Pascal, COBOL, C etcetera. They use the same type of language. As an
analogy, the constitution of a country is not meant for every day use,
but as a legal help and must not be ambiguous.
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
Understanding. Don't use an obsure word when a common one is
equally (or more) effective.
> > Interesting to note that even the legal profession is looking
> > at replacing legal terminology and buzz words with everyday
> > language. It seems not only did the average person not
> > understand documents written in legalese, neither did the
> > lawyers!
>
> Nonsense, look at the standard documents of other languages like
> Pascal, COBOL, C etcetera. They use the same type of language. As an
> analogy, the constitution of a country is not meant for every day use,
> but as a legal help and must not be ambiguous.
Constitutions are unambiguous? Usually they're so terse they're
subject to constant "interpreting" by courts :)
Unlike constitutions, a computer language standard is meant for
everyday use - like legal contracts. Consequently implementors
and users alike need to understand what it means.
"semantics" is not an everyday term - in computing or elsewhere.
If all it means is "behaviour" then use that. The benefit is one
won't need to look it up each time one hears it.
String expansion as done by the C preprocessor (with #define) has
problems of its own. Forth is supposed to be an improvement over C, at
least that is the rumor I heard.
Yes. Totally agree. It's not a snub to the standards members. I guess
they followed the style and wording of other coding standards that
existed at the time.
I do find the standards documents hard to read, however. They are long
winded and could be simplified with more natural language, and
illustrations, diagrams etc. Just my humble opinion, not really
grumbling, as it makes little difference to me. I'm sure I would
struggle though with the standard document if I was writing a standard
compliant version of the language.
I remember having the same problems trying to understand the Allen
Bradley DH communications protocol years ago. The book was obviously
written by somebody that was paid by the word! After two conversations
with a knowledgeable chap at AB, I understood perfectly and was able
to write some code.
Regards
Mark
That's reasonable enough.
> > Sorry to hijack.
> You're not alone.
> For whatever reason ANS-94 chose to use legalese instead of
> common language.
It's nothing to do with legalese, it's the usual meaning of
"semantics" used in the literature of programming languages. "Common
language" would have been inappropriate: it's a programming language
standard, not a novel.
Andrew.
> Try:
> : * ." bug" ;
> : foo 5 square . ;
> foo
This prints "bug5", as you'd expect.
What's your point? You've redefined * , so * has a new behaviour.
This is the way that macros have always worked.
Andrew.
Who would expect that? That's a deviation from the usual early name
binding of Forth, so I would expect that quite a few people would be
surprised by this behaviour.
>What's your point? You've redefined * , so * has a new behaviour.
But why has SQUARE a new behaviour? It's un-Forth-ish that defining a
new word changes an old one.
>This is the way that macros have always worked.
: square ]] dup * [[ ; immediate
: * ." bug" ;
: foo 5 square . ;
foo
prints 25, just as I would expect. All the macros I have written have
worked that way, so obviously macros have not always worked the wrong
way.
The Scheme community has been fighting for getting hygienic macros for
a long time. We have them, we just need to use them.
I thought there was an ANS Forth definition of the GForth macro
words ]] ... [[ but I can't find it anywhere - does it exist?
I seem to remember that some Forth systems carry out source code
in-lining as an optimising technique, if so do they suffer the same
problem with redefinitions?
Gerry
> Who would expect that? That's a deviation from the usual early name
> binding of Forth, so I would expect that quite a few people would be
> surprised by this behaviour.
Of course it's a deviation from early name binding. Given that the
whole purpose of MACRO is, presumably, to provide later name binding,
that's what I'd expect.
> >What's your point? You've redefined * , so * has a new behaviour.
> But why has SQUARE a new behaviour? It's un-Forth-ish that defining
> a new word changes an old one.
Because that's what MACRO does: it provides a that type of binding
behaviour. If you wanted : you'd use : .
> >This is the way that macros have always worked.
> : square ]] dup * [[ ; immediate
> : * ." bug" ;
> : foo 5 square . ;
> foo
> prints 25, just as I would expect.
The default behaviour I expect of a macro is the same as cpp:
#define square dup *
: * ." bug" ;
: foo 5 square . ;
which expands to:
: * ." bug" ;
: foo 5 dup * . ;
While you might argue that cpp is not a great macro processor, you
could hardly argue that people don't expect that behaviour.
> All the macros I have written have worked that way, so obviously
> macros have not always worked the wrong way.
> The Scheme community has been fighting for getting hygienic macros for
> a long time. We have them, we just need to use them.
This is, rather, a different way of thinking about macros. fair
enough, but it doesn't make the common usage "wrong".
Andrew.
There is "late", and there is "later".
With early binding, and:
: square dup * ;
then 'square' uses what '*' pointed to at the time that 'square' was
defined.
With the macro that Anton talks about, when he uses 'square', the '*'
which is used is the one which was in force when 'square' was compiled
into the word 'foo' -- that it, at the time compilation semantics of
the 'square' macro are invoked. This is definitely late binding.
With your macro, the '*' which is used is the one which is in action
when 'foo' is invoked -- i.e. when the _execution semantics_ associated
with 'square' are invoked. This is definitely late binding, and
definitely later binding than the kind of late binding used in Anton's
example.
There are uses for both kinds of late bindings and neither is more
"legitimate" than the other, but for your kind of late binding is
usually embodied by deferred words.
> The default behaviour I expect of a macro is the same as cpp:
>
> #define square dup *
> : * ." bug" ;
> : foo 5 square . ;
>
> which expands to:
>
> : * ." bug" ;
> : foo 5 dup * . ;
>
> While you might argue that cpp is not a great macro processor, you
> could hardly argue that people don't expect that behaviour.
That's not the code which was used before in this discussion. In
cpp terms, it would rather look like:
#define square dup *
: foo 5 square . ;
: * ." bug" ;
and then one would expect 'foo' to use the '*' in force _before_ its
redefinition, not after.
--Thomas Pornin
>I seem to remember that some Forth systems carry out source code
>in-lining as an optimising technique, if so do they suffer the same
>problem with redefinitions?
VFX Forth up to v4.2 used to do this. VFX v4.4 uses a tokeniser.
The tokeniser is simpler and smaller. To avoid problems like the
above, a source inliner has to perform a lot of housekeeping. Our
observation with a tokeniser is that it is more resistant to guru
code (thanks, Bernd), requires less kernel support, and is more
reliable.
> There is "late", and there is "later".
> With early binding, and:
> : square dup * ;
> then 'square' uses what '*' pointed to at the time that 'square' was
> defined.
> With the macro that Anton talks about, when he uses 'square', the
> '*' which is used is the one which was in force when 'square' was
> compiled into the word 'foo' -- that it, at the time compilation
> semantics of the 'square' macro are invoked.
Yes.
> This is definitely late binding.
> With your macro,
My macro? It's nothing to do with me.
> the '*' which is used is the one which is in action when 'foo' is
> invoked -- i.e. when the _execution semantics_ associated with
> 'square' are invoked. This is definitely late binding, and
> definitely later binding than the kind of late binding used in
> Anton's example.
But we understand all of this. What is your point?
> > The default behaviour I expect of a macro is the same as cpp:
> >
> > #define square dup *
> > : * ." bug" ;
> > : foo 5 square . ;
> >
> > which expands to:
> >
> > : * ." bug" ;
> > : foo 5 dup * . ;
> >
> > While you might argue that cpp is not a great macro processor, you
> > could hardly argue that people don't expect that behaviour.
> That's not the code which was used before in this discussion. In
> cpp terms, it would rather look like:
> #define square dup *
> : foo 5 square . ;
> : * ." bug" ;
The code Anton posted was:
> : * ." bug" ;
> : foo 5 square . ;
> foo
i.e. * is [re]defined before square is expanded.
Andrew.
> On Fri, 27 Nov 2009 11:44:34 -0000, "Gerry"
> <ge...@jackson9000.fsnet.co.uk> wrote:
>
>>I seem to remember that some Forth systems carry out source code
>>in-lining as an optimising technique, if so do they suffer the same
>>problem with redefinitions?
>
> VFX Forth up to v4.2 used to do this. VFX v4.4 uses a tokeniser.
> The tokeniser is simpler and smaller. To avoid problems like the
> above, a source inliner has to perform a lot of housekeeping. Our
> observation with a tokeniser is that it is more resistant to guru
> code (thanks, Bernd), requires less kernel support, and is more
> reliable.
And at least as far as I can tell, also significantly faster, which usually
comes along with "simpler&smaller".
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
I don't believe that that's the whole purpose, nor that it is the
primary purpose. Actually I believe that it's an unwanted side
effect.
So what _is_ the purpose of such a word:
1) The primary purpose is to provide a convenient syntax for giving a
name to things that cannot be done in ordinary (non-immediate) colon
definitions, e.g, things like:
: map[ postpone (map) postpone ?do postpone i postpone @ ; immediate
People don't like all these POSTPONEs, so they try to define a more
convenient syntax, such as
macro map[ " (map) ?do i @"
If the implementation of that syntax has a different binding
behaviour, that's an unwanted side effect. They are either unaware of
that, or they just think that the difference produces a different
result in few enough cases that it's acceptable.
2) Another purpose of macros is as a poor man's replacement for an
inlining compiler. Looking at the comment
\ The speed-critical words can be converted from colon to MACRO: so that they compile inline.
\ MACRO: has several limitations:
from <http://www.rosycrew.org/LC53.4th>, that's the purpose of the
macros there. And the implementaion of MACRO: there performs early
binding, as is appropriate in Forth.
The in-between-binding implementation was suggested by someone else as
a replacement, so the purpose there obviously was not to provide
non-early binding.
BTW, I have real trouble imagining circumstances when anybody would
want such in-between binding semantics. I think that late binding is
inferior to Forth's early binding, but there are idioms where it can
be useful, but that definition of SQUARE used neither early nor late
binding.
Whenever I pointed out the non-early binding (and other problems) of
this kind of construction before, the defenders of this usage replied
by saying that one should program in such a way that this usage
behaves in the same way as early binding. This suggests to me that
none of the defenders actually intended the result to be different
from early binding. If that is your purpose in using such macros, you
appear to be alone in that usage.
>If you wanted : you'd use : .
That's certainly a good suggestion for the SQUARE and the usage of
MACRO: in LC53.4th. However, for MAP[ you cannot use a simple
(non-immediate) colon definition, you have to use a number of
POSTPONEs and IMMEDIATE.
>The default behaviour I expect of a macro is the same as cpp:
>
>#define square dup *
You obviously really mean the C preprocessor. There is a reason why
it's not called the all-language preprocessor: It's shortcomings are
less severe in C than in some other languages, including Forth. In
particular, you cannot redefine a function in C, and any reference to
a function refers to the single definition, wherever it is defined; so
referencing C functions in CPP macros does not change the semantics at
all.
But cpp macros (especially with parameters) have a few unpleasant
properties even in C, and I have often read warnings about these
properties, so even in C the deviation from theusual function
evaluation semantics is an unwanted side effect. C++ includes a
number of language features that are intended to make the definition
of cpp macros unnecessary.
>While you might argue that cpp is not a great macro processor, you
>could hardly argue that people don't expect that behaviour.
They certainly don't expect it in Forth, because Forth is not used
with cpp. And they obviously don't even expect it in C, or the
warnings about the unusual behaviour would be unnecessary.
>This is, rather, a different way of thinking about macros. fair
>enough, but it doesn't make the common usage "wrong".
What is common about EVALUATE-based macros? In this thread we have
discussed three macro implementations, only one of which used
EVALUATE.
Redefinition is not the only risk. Search order changes, BASE, and
other global state are additional risks.
>Most systems do warn of redefinitions which helps with this. Also if
>use is restricted to core words, the risk is reduced even further.
They typically don't warn if the definition is in a different word
list. And people usually don't have qualms defining words like "*" in
other word lists (e.g., in a word list intended to provide a different
language to the end user).
>I thought there was an ANS Forth definition of the GForth macro
>words ]] ... [[ but I can't find it anywhere - does it exist?
I also remember seeing something like this. I guess I should put a
definition for it in the compat library.
>I seem to remember that some Forth systems carry out source code
>in-lining as an optimising technique, if so do they suffer the same
>problem with redefinitions?
VFX used to do that, but no longer does, for good reason: My first
test (IIRC involving STATE) already did not work as without inlining.
> I don't believe that that's the whole purpose, nor that it is the
> primary purpose. Actually I believe that it's an unwanted side
> effect.
Oh, I see. I was assuming the author actually wanted to do what he'd
done. That seems like a reasonable starting position. :-)
For example, you might have a word VECTORS that picks up the most
recent definitions of a few words in the dictionary and creates a new
task using those words.
A simple example:
macro vectors ['] (type) ['] (accept) ... ['] run
: (type) type for this task ... ;
: (accept) accept for this task ... ;
... etc
: run ... whatever the task will do ... ;
: launch-task1 vectors start-task ;
This kind of thing is basic meat-n'-potatoes Forth: I've certainly
seem it done in several industrial applications.
> Whenever I pointed out the non-early binding (and other problems) of
> this kind of construction before, the defenders of this usage
> replied by saying that one should program in such a way that this
> usage behaves in the same way as early binding. This suggests to me
> that none of the defenders actually intended the result to be
> different from early binding. If that is your purpose in using such
> macros, you appear to be alone in that usage.
I can live with that, but, as I said, I've certainly seen this sort of
thing done before, and not just by me.
> You obviously really mean the C preprocessor.
Obviously.
> > While you might argue that cpp is not a great macro processor, you
> > could hardly argue that people don't expect that behaviour.
> They certainly don't expect it in Forth, because Forth is not used
> with cpp.
I wouldn't expect that a Forth macro which works in the way that the
most commonly used macro processor works would be much of a surprise
to a programmer working in this area. But each of us has their own
expectations.
> >This is, rather, a different way of thinking about macros. fair
> >enough, but it doesn't make the common usage "wrong".
> What is common about EVALUATE-based macros?
By "the common usage" I mean "what cpp does". Which is common in
programming, albeit not in Forth.
Andrew.
> >Of course it's a deviation from early name binding. Given that the
> >whole purpose of MACRO is, presumably, to provide later name
> >binding, that's what I'd expect.
>
> I don't believe that that's the whole purpose, nor that it is the
> primary purpose. Actually I believe that it's an unwanted side
> effect.
>
> So what _is_ the purpose of such a word:
>
> 1) The primary purpose is to provide a convenient syntax for giving a
> name to things that cannot be done in ordinary (non-immediate)
> colon definitions, e.g, things like:
>
> : map[ postpone (map) postpone ?do postpone i postpone @ ;
> immediate
>
> People don't like all these POSTPONEs, so they try to define a more
> convenient syntax, such as
>
> macro map[ " (map) ?do i @"
>
> If the implementation of that syntax has a different binding
> behaviour, that's an unwanted side effect. They are either unaware
> of that, or they just think that the difference produces a
> different result in few enough cases that it's acceptable.
Yes. You can expand your compiler so it can do POSTPONE for every word
and LITERAL POSTPONE LITERAL for numbers, and something similar for
double numbers and floats, and maybe something special for [CHAR] and
['] etc.
Or ]] [[ will write a bunch of POSTPONEs for you, and if you want to do
something fancy you can drop out of ]] [[ to handle it.
Text macros give you very similar results with somewhat different
cautions. Since you usually care about inlining the particular code you
foresee inlining, not random code forever, you have pretty good control
over whether the words you use get redefined in the meantime or not, or
BASE changed, etc. You can test the words and if they work then your
macro has done its job.
> 2) Another purpose of macros is as a poor man's replacement for an
> inlining compiler. Looking at the comment
>
> \ The speed-critical words can be converted from colon to MACRO: so
> that they compile inline.\ MACRO: has several limitations:
>
> from <http://www.rosycrew.org/LC53.4th>, that's the purpose of the
> macros there. And the implementaion of MACRO: there performs early
> binding, as is appropriate in Forth.
>
> The in-between-binding implementation was suggested by someone else
> as a replacement, so the purpose there obviously was not to provide
> non-early binding.
Yes, same deal exactly.
> BTW, I have real trouble imagining circumstances when anybody would
> want such in-between binding semantics. I think that late binding is
> inferior to Forth's early binding, but there are idioms where it can
> be useful, but that definition of SQUARE used neither early nor late
> binding.
>
> Whenever I pointed out the non-early binding (and other problems) of
> this kind of construction before, the defenders of this usage replied
> by saying that one should program in such a way that this usage
> behaves in the same way as early binding. This suggests to me that
> none of the defenders actually intended the result to be different
> from early binding. If that is your purpose in using such macros, you
> appear to be alone in that usage.
I say, if your intent is to provide macros that your following known
code will use, then it's entirely your choice whether to redefine words
in the meantime and use the non-early binding. If that's useful to you,
do it. If it would break things, don't do it.
The big problems come when you try to provide macros that anybody can
use anytime in the future. Then the text macros can bite because you
don't know what redefinitions anybody in the future might make.
If everybody in the future is knowledgeable and smart, they can redefine
words to get fancy working results that you didn't anticipate, and
nobody loses. But that isn't the way to bet.
> >If you wanted : you'd use : .
>
> That's certainly a good suggestion for the SQUARE and the usage of
> MACRO: in LC53.4th. However, for MAP[ you cannot use a simple
> (non-immediate) colon definition, you have to use a number of
> POSTPONEs and IMMEDIATE.
> >This is, rather, a different way of thinking about macros. fair
> >enough, but it doesn't make the common usage "wrong".
>
> What is common about EVALUATE-based macros? In this thread we have
> discussed three macro implementations, only one of which used
> EVALUATE.
There are lots of ways to do it, no two quite alike or with quite the
same gotchas.
> The recently anounced lc53 algorithm implementation presented some
> intriguing statistics. Here it is in slightly condensed form, without
> unnecessary optimizations, for easier evaluation. It is a classical
> example of optimizing the wrong things too early. The obvious change
> to processing four bytes at a time made it three times faster.
Well, that "algorithm" lends itself to even more optimization, because
using the original code to experiment with other seeds (very slowly) proves
that it is not much more than a convoluted brute-force search.
This latest version runs the built-in test in 1.9 seconds now (quoted was
17 seconds in assembly language and 554 seconds in Factor). The worst case
run time is around 64 seconds. It parallelizes trivially.
-marcel
-- -------------------------------------------------------------------------------
\ LC53_3.frt -- Skeletal LC53, compiles under many 32-bit ANS Forth systems
\ This is just a brute-force search that takes a maximum of 152 / 64 seconds worst
\ case on the test machines.
ANEW -lc53_3
#32 2^x 5 - CONSTANT rnd-unity
#32 2^x #333333333 - CONSTANT rnd-mult
1 7 LSHIFT CONSTANT |byte1
|byte1 8 LSHIFT CONSTANT |byte2
|byte2 8 LSHIFT CONSTANT |byte3
|byte3 8 LSHIFT CONSTANT |byte4
|byte1 |byte2 OR
|byte3 OR |byte4 OR CONSTANT |bytes
HERE VALUE seedi
2VARIABLE sample
: <prng> rnd-mult UM* rnd-unity UM/MOD DROP ; ( rnd -- new-rnd )
: init-seed #123456789 ( -97436470 ) TO seedi ; ( -- ) ( worst case )
: prng seedi <prng> DUP TO seedi ; ( -- u )
: rnd-bytes prng ; ( -- r )
: crypt BOUNDS DO I @ rnd-bytes XOR I ! CELL +LOOP ; ( addr cnt -- )
: test-file S" test.txt" ; ( -- addr cnt )
: show-file sample 2@ TYPE ; ( -- )
: crypt-file init-seed sample 2@ crypt ; ( -- )
: load-file ( -- ) \ sets sample
test-file R/O OPEN-FILE ABORT" *** failure to open test file ***" LOCALS| fid |
fid FILE-SIZE NIP ABORT" *** failure to determine file size ***"
DUP CELL+ ALLOCATE ABORT" *** failure to allocate memory ***"
SWAP
2DUP fid READ-FILE NIP ABORT" *** failure to read test file ***"
fid CLOSE-FILE ABORT" *** failure to close test file *** "
2DUP sample 2! + OFF ; ( erase 4 bytes at EOF )
: all? ( seed -- bool ) \ all high bits tested as 0 => 1, else 0
TO seedi sample 2@
BOUNDS ?DO I @ rnd-bytes XOR |bytes AND
IF 0 UNLOOP EXIT ENDIF
CELL +LOOP 1 ;
: find-seed ( -- u ) 0 BEGIN 1+ DUP all? ?EXIT AGAIN ;
: test ( -- )
load-file show-file crypt-file 0 TO seedi
CR ." SAMPLE is encrypted ..."
TIMER-RESET find-seed .ELAPSED
?DUP IF DUP TO seedi CR ." The seed is: " . CR sample 2@ crypt show-file
ELSE CR ." A complete search did not find a valid seed."
ENDIF ;
\ Factor time: 9 minutes and 14 seconds ( machine X )
\ SwiftForth time: 22 seconds ( machine X )
\ iForth time: 4.440 seconds ( PIV 3 GHz )
\ iForth time: 1.911 seconds ( Core-Duo 2.66 GHz )
\ assembly language: 17 seconds ( machine X )
> I thought there was an ANS Forth definition of the GForth macro
> words ]] ... [[ but I can't find it anywhere - does it exist?
I'm sure I have it somewhere. Since I started using mostly GForth I
didn't need to copy it much.
Over some time using it, I started getting annoyed with some of the
features of the original code. I rewrote it as follows:
: ]] ( "words" -- )
BEGIN
>IN @ BL WORD COUNT S" [[" COMPARE WHILE
>IN ! POSTPONE POSTPONE
REPEAT DROP ; IMMEDIATE
The original problem was that I didn't like to write a whole bunch of
POSTPONEs. This solves that problem. For one POSTPONE it uses 6
characters instead of 9 characters, and it still uses 6 characters for
as long a sequence of words as you want to POSTPONE up to one line.
: insert-irrelevant-loop
3 0 ]] 2literal do i . loop cr .s [[ ; immediate
: foo 3 9 insert-irrelevant-loop . . ;
It doesn't work with numbers. To put in numbers, drop out of the [[ and
put your numbers on the stack and then ]] LITERAL them.
It doesn't work with comments.
3 0 ]] 2literal do i . [[ ( irrelevant loop ) ]] loop cr .s [[ ;
immediate
It doesn't work with anything you have to do while the macro is being
created.
.... [[ ['] . ]] literal execute loop cr .s [[ ....
Instead of having [[ handle any special parsing words, you always drop
out with [[ and do the parsing and then figure out how to save your
results in the macro.
I complicated it to do multiple lines.
: MAYBE-PARSE-NEXT-LINE
>IN @ BL WORD C@ 0= IF
DROP REFILL 0= 117 AND THROW ELSE
>IN !
THEN ;
: ]] ( "words" -- )
BEGIN
MAYBE-PARSE-NEXT-LINE
>IN @ BL WORD COUNT S" [[" COMPARE WHILE
>IN ! POSTPONE POSTPONE
REPEAT DROP ; IMMEDIATE
Now it doesn't stop until it reaches [[ or a word it can't POSTPONE .
If you leave off a [[ that needs to be there I'm not sure how
informative the error message will be. It saves having to write ]] at
the start of each new postponed line and [[ at the end, but I don't
actually make defining words that long. I'm not at all sure this is
worth doing.
I started adding syntactic sugar that reduced typing.
: ]]L
]] ]] LITERAL [[ ; IMMEDIATE
: ]]2L
]] ]] 2LITERAL [[ ; IMMEDIATE
.... CHAR A ]]L WORD COUNT [[ ....
It saves writing ]] LITERAL .
Now I think this is not fully compatible with the GForth ]] and so I
should change the name. Any suggestions for a name with no more than 2
characters?
Well, I rewrote my MACRO: to use string expansion, as ANS-Forth seems
to not support my MACRO: as I originally wrote it. Here is the new
version:
: macro: ( -- )
: [char] ; parse
postpone sliteral
postpone evaluate
postpone ; immediate ;
Here is the new program: http://www.rosycrew.com/LC53.4th
I don't really like the string-expansion MACRO: very much, but using
it will get us past the ANS-Forth problems and allow us to compile and
run the LC53 program, which is supposedly the topic of this thread.
Nope --- my LC53 is not a convoluted brute-force search. The whole
point of comparing the HOW-FAR values is to avoid testing large
portions of the possible seeds. Your program just linearly searches
through the possible seeds starting at the low values and working up
to the high values. This is similar to your last effort in that it
provides a fast execution time if the high bits of the seed that you
are looking for are 0 bits. This is just luck though; it all depends
upon what seed you are looking for. If the seed is close to the top,
then you would have to search through some 4.3 billion seeds before
you find it. Also, btw, the lowest possible seed is 1, not 0.
Keep trying though!
Thanks. I don't think I've ever used ]] ... [[, when I've had several
words to postpone I've put them into another colon definition and
postponed that. That doesn't work for unbalanced >Rs or incomplete
control structures of course.
[...]
>
> I complicated it to do multiple lines.
>
>> MAYBE-PARSE-NEXT-LINE
> >IN @ BL WORD C@ 0= IF
> DROP REFILL 0= 117 AND THROW ELSE
> >IN !
> THEN ;
>
>> ]] ( "words" -- )
> BEGIN
> MAYBE-PARSE-NEXT-LINE
> >IN @ BL WORD COUNT S" [[" COMPARE WHILE
> >IN ! POSTPONE POSTPONE
> REPEAT DROP ; IMMEDIATE
>
Having to mess around with >IN suggests that we could do with a
non-parsing version of POSTPONE.
[..]
Gerry
I thought that Gforth's ]] has the same restriction, but apparently it
doesn't, at least partially. Gforth's ]] manages singles and doubles,
but not floats.
>It doesn't work with comments.
Neither does Gforth's implementation of ]].
>It doesn't work with anything you have to do while the macro is being
>created.
>
> .... [[ ['] . ]] literal execute loop cr .s [[ ....
>
>Instead of having [[ handle any special parsing words, you always drop
>out with [[ and do the parsing and then figure out how to save your
>results in the macro.
Gforth's implementation of ]] does not work for parsing words, either.
>I started adding syntactic sugar that reduced typing.
>
>: ]]L
> ]] ]] LITERAL [[ ; IMMEDIATE
>
>: ]]2L
> ]] ]] 2LITERAL [[ ; IMMEDIATE
>
>
> .... CHAR A ]]L WORD COUNT [[ ....
>
>It saves writing ]] LITERAL .
I guess that's a better solution than trying to deal with literal
numbers.
>Now I think this is not fully compatible with the GForth ]] and so I
>should change the name. Any suggestions for a name with no more than 2
>characters?
The common subset is useful enough, and moreover, AFAIK there are no
cases that work in both implementations but produce a different
result, so it's a good idea to use the same name: Someone using the
common subset can just switch between the implementations without
ado.
> On Nov 27, 2:49 pm, m...@iae.nl (Marcel Hendrix) wrote:
>> m...@iae.nl (Marcel Hendrix) wrote Re: LC53 statistics
>> Well, that "algorithm" lends itself to even more optimization, because
>> using the original code to experiment with other seeds (very slowly) proves
>> that it is not much more than a convoluted brute-force search.
>> This latest version runs the built-in test in 1.9 seconds now (quoted was
>> 17 seconds in assembly language and 554 seconds in Factor). The worst case
>> run time is around 64 seconds. It parallelizes trivially.
> Nope --- my LC53 is not a convoluted brute-force search.
If you say so. You seem to have a huge success with that approach.
Plugging in different seed values (by modifying init-seed), a huge
variation in run-time is evident. Here it is compared to the brute-force
search runtime (no macro's) which is of course directly proportional to
the value of the seed.
( Gforth-fast 0.7.0 on an old PIV 3 GHz machine )
seed run-time [seconds] brute-force [seconds]
------------+-----------------------+------------------------------
4 217 ( finds -1 ) 0 ( finds 4 )
$0000FFFF 795 0
123456789 44 10 ( default seed in demo )
1234567890 599 100
4234567890 300 358
-1 585 ( finds -1 ) 0 ( finds 4 )
Finding a seed that produces the longest runtime is left as an
exercise for the reader.
This data is enough to conclude that:
1. the worst-case runtime of the lc53 cracking algorithm is more
than twice higher than the worstcase time for the brute-force
search.
2. There are at least two seeds that are indistinguishable (4 and -1).
I checked that these seeds indeed give a readable result.
It is of course possible that I have accidentally found the set
of values with exceptional runtimes, and the single bad seed,
while all other seeds produce vastly lower runtimes and are all unique.
I'll leave finding the actual distribution as an exercise for the reader too.
My point is not that the algorithm does not do what it is designed to do
(I don't know what it is designed to do), only that a straightforward and
vastly simpler approach delivers better results.
-marcel
Gforth factors that into several parts:
FIND-NAME c-addr u -- nt | 0
Find the name c-addr u in the current search order. Return its nt, if
found, otherwise 0.
NAME>COMP nt -- w xt
w xt is the compilation token for the word nt.
POSTPONE, w xt --
Compile the compilation semantics represented by the compilation token
w xt.
> Jonah Thomas <jeth...@gmail.com> writes:
> [Implementation of ]] in Forth-94]
>>It doesn't work with numbers. To put in numbers, drop out of the [[ and
>>put your numbers on the stack and then ]] LITERAL them.
>
> I thought that Gforth's ]] has the same restriction, but apparently it
> doesn't, at least partially. Gforth's ]] manages singles and doubles,
> but not floats.
>
>>It doesn't work with comments.
>
> Neither does Gforth's implementation of ]].
>
In a delayed way, it does:
: quote ]] ." [[ ; immediate
: greet quote Hello" ;
Not a recommended technique, though.
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
> >>It doesn't work with comments.
> >
> > Neither does Gforth's implementation of ]].
> >
>
> In a delayed way, it does:
>
> : quote ]] ." [[ ; immediate
> : greet quote Hello" ;
>
> Not a recommended technique, though.
Yes. Everything you want ]] to parse and handle immediately, you have to
add to ]] . So ]] gets bigger and bigger the more special cases you want
to handle. Or we could have a special word that says "go ahead and
execute the following word even though you're doing ]]".
I guess at the extreme we could build ]] into the compiler. Say, when
STATE is -2 you're postponing everything. Except we could have
exceptions, words that execute even while postponing. FIND could set bit
1 for those, so 2 would say it's an IMMEDIATELY word, and 1 would say
it's IMMEDIATE while 3 would say it's both IMMEDIATE and IMMEDIATELY .
Then we'd want special words to use when we want to postpone an
immediately word after all, and so on.
I think it's simpler to just turn on ]] when you want to POSTPONE things
and then turn it off [[ when you want to stop POSTPONEing things.
But even with that, you don't just want execution, you also want a
delay. E.g., you probably want
]] ." bla" [[
not to mean the same as
." bla"
but the same as
s" bla" ]] 2literal type [[
>I guess at the extreme we could build ]] into the compiler.
That's what we have done in Gforth. It's not sufficient. We would
also have to define parsing words as consisting of a number of stages
(in a way accessible by the compiler):
1) parse-time action (parsing, initial allocation)
2) time-shifting action (typically LITERAL or somesuch). This is used
once for each time-shift that has to occur: i.e., 0 times when
interpreting, 1 time when compiling, 2 times inside ]] ... [[.
3) run-time action (TYPE in the case of .")
But I am not convinced that all these complications are really
justified. Parsing words are not that important in Forth, and ]]
... [[ is not very common. IMO asking the user to write [[ ... ]]L or
somesuch around parsing words is good enough.
Well at least its simpler and has fewer limitations that your original
definition ( but beware Anton Ertl's forebodings :-)
> but using
> it will get us past the ANS-Forth problems and allow us to compile
> and
> run the LC53 program, which is supposedly the topic of this thread.
Gerry
Have you tried this? The LITERAL here would try to take a value from
]]L (but none is provided) and compile that value into any word that
contains ]]L. Also, with a parsing ]] the literal would only be
compiled after all the postponed stuff.
Correct:
: postpone-literal postpone literal ;
: ]]L ( postponing: x -- ; compiling: -- x ) \ gforth right-bracket-bracket-l
]] postpone-literal ]] [[ ; immediate
To see how this works, let's look at
: foo 1 ]]l . [[ ; immediate
: bar foo ;
Here the POSTPONE-LITERAL is compiled into FOO, and when FOO executes,
it compiles the 1 as literal into BAR. The second ]] in ]]L executes
in the definition of FOO and postpones "." there, which then compiles
"." into BAR. As a result, I see:
see bar
: bar
1 . ; ok
> >: ]]L
> > ]] ]] LITERAL [[ ; IMMEDIATE
>
> Have you tried this? The LITERAL here would try to take a value from
> ]]L (but none is provided) and compile that value into any word that
> contains ]]L. Also, with a parsing ]] the literal would only be
> compiled after all the postponed stuff.
Yes, I tried it and it didn't work. My second approch didn't work
either, for similar reasons.
I thought about publishing the third correct approach but then I
reconsidered, it was possible that nobody cared enough to notice and in
that case they probably wouldn't care to see the correction.
> Correct:
>
> : postpone-literal postpone literal ;
> : ]]L ( postponing: x -- ; compiling: -- x ) \ gforth
> right-bracket-bracket-l
> ]] postpone-literal ]] [[ ; immediate
Yes, that works. I did "postpone literaled postpone ]]" because it was
only two of them. One of my attempts at complexifying ]] left me with
something that needed to be factored, and so I had
: POSTPONED
POSTPONE POSTPONE ;
leading to
: LITERALED ( -- )
POSTPONE LITERAL ;
: 2LITERALED
POSTPONE 2LITERAL
etc.
> To see how this works, let's look at
>
> : foo 1 ]]l . [[ ; immediate
> : bar foo ;
>
> Here the POSTPONE-LITERAL is compiled into FOO, and when FOO executes,
> it compiles the 1 as literal into BAR. The second ]] in ]]L executes
> in the definition of FOO and postpones "." there, which then compiles
> "." into BAR. As a result, I see:
>
> see bar
> : bar
> 1 . ; ok
I apologise for leaving you to repeat the work. But a lot of the time
nobody is interested.
Here is yet another version of the string-expansion MACRO:. This one
allows for multiple lines. Those one-line macros tended to be
unreadable. Also, the programmer can just replace colon with MACRO:
and not have to edit the source-code any further.
: parse? ( char -- adr cnt more? )
>in @ >r
parse \ -- adr cnt
>in @ r> - \ -- adr cnt consumed
over = ; \ -- adr cnt ?
: macro: ( -- )
: begin
[char] ; parse? >r
postpone sliteral postpone evaluate
r> while
refill 0= abort" *** MACRO: no ; ***"
repeat
postpone ; immediate ;
\ RR@ is an example of a useful macro.
macro: rr@ ( -- n ) \ r: n x -- n x
2r@ drop ;
I am starting to like string-expansion better. One good thing is that
a macro can be used interpretively (assuming that it doesn't have any
words like 2R@ or ." that can't be used interpretively), which wasn't
true of my original MACRO: words.
Maybe it would help if I explained what my algorithm does. I am taking
advantage of the fact that when PRNG is run, the high bits in the seed
have more of an effect than the low bits do on whether the high bit of
the seed will be 1 or 0. The high bit, of course, is the one that gets
xor'd into the high bit of the character, which is the bit that we
know (0 in 7-bit ascii). This is why I start with the high bit in my
guess and work down toward the low bit.
When I guess at a bit as being 1 or 0, I use HOW-FAR to obtain a
qualitative measure of how good my guess is. If 1 is better than 0,
then I recurse with a 1 bit first and a 0 bit second, or vice-versa if
0 is better than 1. The idea is to try the more likely guess first in
order to improve my chances of running into the correct seed. It is a
little bit like a binary search of a sorted array, but it is not
guaranteed to work. For one thing, HOW-FAR often returns equal values
for a 1-bit guess and a 0-bit guess, in which case I have to just
choose arbitrarily as I don't know which guess is better.
Imagine that ANS-Forth provided us with these words:
DUM* ( duA duB -- quA*B )
DUM/MOD ( qu du -- du-remainder du-quotient )
If this were true, I could have written the encryption program to use
a 64-bit seed rather than a 32-bit seed. Given a 32-bit seed, your
exhaustive search takes about one minute to run. Given a 64-bit seed,
an exhaustive search would take 4.3 billion minutes to run. Obviously,
an exhaustive search would no longer be practical. By comparison, my
algorithm would still be practical, and would run in a few minutes.
This is because my algorithm is directed toward the correct seed. It
does not just blindly search from low to high the way that yours does.
On Nov 28, 2:48 am, m...@iae.nl (Marcel Hendrix) wrote:
>> My point is not that the algorithm does not do what it is designed to do
>> (I don't know what it is designed to do), only that a straightforward and
>> vastly simpler approach delivers better results.
> Maybe it would help if I explained what my algorithm does. I am taking
> advantage of the fact that when PRNG is run, the high bits in the seed
> have more of an effect than the low bits do on whether the high bit of
> the seed will be 1 or 0. The high bit, of course, is the one that gets
> xor'd into the high bit of the character, which is the bit that we
> know (0 in 7-bit ascii). This is why I start with the high bit in my
> guess and work down toward the low bit.
That is very instructive, thanks. So you are using a known deficiency
of the particular PRNG you are using, and you are exploiting a known
effect this deficiency will have on plain text. Maybe it would be good
to supply a reference for that observed behavior, or a test program
that demonstrates it?
I was going by the assumption that your PRNG would be perfect, or at
least passes G. Marsaglia's Diehard test suite (see my 2006 Forth
article). In that case it is obvious that the only possible plan of
attack is a brute-force search, and your approach deeply confused me.
Although this additional piece of information removes my distaste for
your solving strategy, it does not answer all questions with respect
to its realization. The experiments I did with the original algorithm
do not show a tendency for the run-time to be bounded, unless that
bound is around 800 seconds. And even if the run-time were constant,
if the algorithm is to be useful with 64-bit PRNGs it remains to be
seen how it scales with the number of bits of the PRNG.
> If this were true, I could have written the encryption program to use
> a 64-bit seed rather than a 32-bit seed. Given a 32-bit seed, your
> exhaustive search takes about one minute to run. Given a 64-bit seed,
> an exhaustive search would take 4.3 billion minutes to run. Obviously,
> an exhaustive search would no longer be practical. By comparison, my
> algorithm would still be practical, and would run in a few minutes.
> This is because my algorithm is directed toward the correct seed. It
> does not just blindly search from low to high the way that yours does.
It is not completely blind. If the PRNG has a deficiency, different
initial seeds will generated the same cycle. This is demonstrated in
my program for seed -1, which gives the same result as seed 4 and is
therefore found immediately. Of course, there could be PRNG deficiencies
constructed that do not result in lower brute-force search times.
My program does a brute-force search because that is a better approach
for this particular problem (with a 32-bit PRNG). If you change the
conditions, the solution approach might also change. There is a Forth
truth in that.
-marcel
Before the additional information on LC53 was supplied
( <0981a799-0cbc-4875...@k13g2000prh.googlegroups.com> ),
I had already adapted the brute-force code so that it can be run in
parallel on two cores, so why not show it.
It will take a while before the ANS committee will tackle these kind
of issues, so the chance you can run it on just any Forth is nil. Just
consider this for reference.
The idea here is to run the already shown algorithm with two threads,
each on its own partition of the seed space.
( iForth 4.0 on a Core-Duo 2.66 GHz )
.-----------+-----------+-----------.
| seed | bf(1) [s] | bf(2) [s] |
`-----------+-----------+-----------+
4 | 0 | 0 | (1) == 1 thread
$0000FFFF | 0 | 0 | (2) == 2 threads
123456789 | 1.876 | 1.974 |
1234567890 | 18.859 | 19.801 |
4234567890 | 65.717 | 35.252 |
-1 | 0 | 0 |
It's nice to see that the parallel code slows down only a little from
the threading overhead. And indeed, the worst case timing drops to 50%.
On a single core machine (not shown), for most entries the run-time
worsens more than two times (searching the second half of the seed
space in parallel is fruitless but still costs 50% of the single
core's available CPU time).
-marcel
-- --------------------------------------------------------------------
\ LC53_4.frt -- Skeletal LC53.
\ This is a brute-force search that takes a maximum of 183 / 35 seconds
\ worst case on the test machines. Uses multi-threading.
NEEDS -threads
ANEW -lc53_4
#32 2^x 5 - CONSTANT rnd-unity
#32 2^x #333333333 - CONSTANT rnd-mult
1 7 LSHIFT CONSTANT |byte1
|byte1 8 LSHIFT CONSTANT |byte2
|byte2 8 LSHIFT CONSTANT |byte3
|byte3 8 LSHIFT CONSTANT |byte4
|byte1 |byte2 OR
|byte3 OR |byte4 OR CONSTANT |bytes
0 VALUE result
2VARIABLE sample
HERE VALUE seedi1
0 VALUE stop1?
#10000 ALLOT ( thread variables sould not be in the same cache lines )
HERE VALUE seedi2
0 VALUE stop2?
: <prng> rnd-mult UM* rnd-unity UM/MOD DROP ; ( rnd -- new-rnd )
: rnd-bytes1 seedi1 <prng> DUP TO seedi1 ; ( -- u )
: rnd-bytes2 seedi2 <prng> DUP TO seedi2 ; ( -- u )
: show-file sample 2@ TYPE ; ( -- )
: crypt-file sample 2@ BOUNDS DO I @ rnd-bytes1 XOR I ! CELL +LOOP ; ( addr cnt -- )
: free-file sample 2@ DROP FREE ?ALLOCATE ; ( -- )
: load-file ( -- ) \ sets sample
S" test.txt" R/O OPEN-FILE ABORT" *** failure to open test file ***" LOCALS| fid |
fid FILE-SIZE NIP ABORT" *** failure to determine file size ***"
DUP CELL+ ALLOCATE ABORT" *** failure to allocate memory ***"
SWAP
2DUP fid READ-FILE NIP ABORT" *** failure to read test file ***"
fid CLOSE-FILE ABORT" *** failure to close test file *** "
2DUP sample 2! + OFF ; ( erase CELL bytes at EOF )
: all1? ( seed -- bool ) \ all high bits tested as 0 => 1, else 0
TO seedi1 sample 2@ BOUNDS ?DO I @ rnd-bytes1 XOR |bytes AND IF 0 UNLOOP EXIT ENDIF CELL +LOOP 1 ;
: all2? ( seed -- bool ) \ all high bits tested as 0 => 1, else 0
TO seedi2 sample 2@ BOUNDS ?DO I @ rnd-bytes2 XOR |bytes AND IF 0 UNLOOP EXIT ENDIF CELL +LOOP 1 ;
: find-seed ( -- u )
0 TO result
0 TO stop1?
0 TO stop2?
PAR
STARTP $80000000 1 DO stop1? ?LEAVE I all1? IF I TO result -1 TO stop2? LEAVE ENDIF LOOP ENDP
STARTP 0 $80000000 DO stop2? ?LEAVE I all2? IF I TO result -1 TO stop1? LEAVE ENDIF LOOP ENDP
ENDPAR
result ;
: (test) ( seed -- )
DUP TO seedi1 TO seedi2 load-file show-file crypt-file
CR ." SAMPLE is encrypted ..."
0 TO seedi1 0 TO seedi2 TIMER-RESET find-seed .ELAPSED
?DUP IF DUP TO seedi1 CR ." The seed is: " U. CR crypt-file show-file
ELSE CR ." A complete search failed to find a valid seed."
ENDIF
free-file ;
: test #123456789 (test) ;
: tests #123456789 (test)
#1234567890 (test)
#4234567890 (test)
$FFFF (test)
4 (test)
-1 (test) ;
I have now added an implementation of ]] ... [[ in Forth200x to the
compat library <http://www.complang.tuwien.ac.at/forth/compat.zip>, in
the file compat/macros.fs. It has all the features: multiple lines
and syntactic sugar: ]]L ]]2L ]]FL ]]SL (this syntactic sugar is not
in Gforth 0.7.0, but will be in the next version).
You can look at it directly on
http://www.complang.tuwien.ac.at/viewcvs/cgi-bin/viewcvs.cgi/gforth/compat/macros.fs?view=markup
I have also written a test file (not included in compat.zip) that
tests all of these features:
http://www.complang.tuwien.ac.at/viewcvs/cgi-bin/viewcvs.cgi/gforth/test/macros.fs?view=markup
Have fun.
That's a nice way to detect that ; has occurred.
>> macro: ( -- )
> : begin
> [char] ; parse? >r
> postpone sliteral postpone evaluate
> r> while
> refill 0= abort" *** MACRO: no ; ***"
> repeat
> postpone ; immediate ;
>
I find this rather amusing:
macro: square dup * . \ This compiles and works ;
macro: square dup * . \; This fails but the macro still works
macro: square \ The following fails and the macro doesn't work ;
dup * .
;
It's not worth worrying about though, it either works but is a bit
obscure or something will fail immediately.
[...]
Gerry
> Hugh Aguilar wrote:
>> On Nov 28, 9:28 am, "Gerry" <ge...@jackson9000.fsnet.co.uk> wrote:
>>> Well at least its simpler and has fewer limitations that your
>>> original definition ( but beware Anton Ertl's forebodings :-)
>>
>> Here is yet another version of the string-expansion MACRO:. This one
>> allows for multiple lines. Those one-line macros tended to be
>> unreadable. Also, the programmer can just replace colon with MACRO:
>> and not have to edit the source-code any further.
>>
>>> parse? ( char -- adr cnt more? )
>> >in @ >r
>> parse \ -- adr cnt
>> >in @ r> - \ -- adr cnt consumed
>> over = ; \ -- adr cnt ?
>>
>
> That's a nice way to detect that ; has occurred.
No, it detects the end of the input line.
This one will detect it without the need of parsing:
: end-of-input? ( -- flag )
source nip >in @ = ;
I'm aware of those problems, but I don't have a solution. It is
unlikely that anybody is going to put a ; in a comment. A more likely
problem would occur if a person had a Forth word that contained a ; in
it, and used that word in the macro. Remember that I only wrote this
version as a workaround for gforth's problem regarding [']. I
originally wrote MACRO: to extract each word and obtain an xt for it.
I may go back to that approach again, although the result is likely to
be pretty ugly, with a significant portion of the standard word-set
being special-cased. Even without the ['] problem, a lot of words have
to be special-cased (with S" being one example).
If I were writing a compiler, I wouldn't mess around with either
solution. I would just make sure that words are compiled to use
relative jumps internally, which allows them to be relocated. If a
word is to be inlined, its machine-code representation can just be
pasted into the new word. How simple would that be? To help code
optimization, an intermediate form of the word could be compiled in
rather than the machine-code itself. Most likely, that is how Factor
does inlining. This may also be the reason why Factor lacks an EXIT
word.
I like EXIT. If you compare these files, you will see that the Forth
code is much more readable than the Factor code because the Factor
code has stair-stepped conditionals, whereas the Forth code uses EXIT.
http://www.rosycrew.org/symtab.4th
http://www.rosycrew.org/symtab.factor
See the words STRONG-SIDE and <FIND-KEY> for example.
I think that ANS-Forth should have provided macros as part of the
language. It makes no sense to expect common programmers to write
their own macro facility.
I'm always willing to explain my code. The other side of the coin
however, is that your original post should have indicated that the
algorithm was mine, and also provided a link to my program. Posting a
modified version of my program with no mention of me as the author was
uncool.
I don't understand, parse? detects whether the char being parsed for
is present or not e.g.
char ; parse? a b c ok
. 2drop -1 ok
char ; parse? a b ; ok
. 2drop 0 ok
How does use of end-of-input? replace that functionality?
Gerry
Use of buzz words is endemic in technical writing as it alludes
to expertise. On c.l.f. "semantics" and "normative" have assumed
plague proportions :)
comp.lang.c is no different, and is pretty much understandable, like
this newsgroup. English is not my native tongue, but I do understand
the writings here, with a little effort. It is not meant for seventh
grade.
I have never seen it anywhere, except maybe as a bad example. Can you
cite a public source for this usage, e.g., a text book or a program
with published source code? That's totally unidiomatic in Forth. A
Forth programmer is much more likely to write the following instead:
: (type) type for this task ... ;
: (accept) accept for this task ... ;
... etc
: run ... whatever the task will do ... ;
: vectors ['] (type) ['] (accept) ... ['] run ;
: launch-task1 vectors start-task ;
Even more likely is to define VECTORS as a table (and resulting
changes in START-TASK), in the traditional but non-standard way as
follows:
create vectors ] (type) (accept) ... run [
>> They certainly don't expect it in Forth, because Forth is not used
>> with cpp.
>
>I wouldn't expect that a Forth macro which works in the way that the
>most commonly used macro processor works would be much of a surprise
>to a programmer working in this area.
It would be just as much of a surprise as using an obscure Chinese
grammar construction in English would be to an English speaker, even
though the Chinese grammar is the most commonly used grammar.
Also, even in C the properties of the C preprocessor that deviate from
the usual semantics of function calls are usually unwanted. But since
C came out of New Jersey [1], they are not working on making the
macros hygienic like the MIT school of Scheme and Lisp; instead, they
added a number of features to C++ that allows to avoid most uses of
cpp macros.
[1] "New Jersey" and "MIT" are references to the Worse-is-better paper
by Richard Gabriel <http://www.jwz.org/doc/worse-is-better.html>.
No.
> That's totally unidiomatic in Forth.
We may be familiar with different idioms. :-)
> A Forth programmer is much more likely to write the following
> instead:
> : (type) type for this task ... ;
> : (accept) accept for this task ... ;
> ... etc
> : run ... whatever the task will do ... ;
> : vectors ['] (type) ['] (accept) ... ['] run ;
> : launch-task1 vectors start-task ;
It all depends on how many tasks you have: without using a macro or
something similar, VECTORS has to be redefined for every task. That's
clumsy at best.
> >> They certainly don't expect it in Forth, because Forth is not used
> >> with cpp.
> >
> >I wouldn't expect that a Forth macro which works in the way that
> >the most commonly used macro processor works would be much of a
> >surprise to a programmer working in this area.
> It would be just as much of a surprise as using an obscure Chinese
> grammar construction in English would be to an English speaker, even
> though the Chinese grammar is the most commonly used grammar.
Well, our experiences differ. Without doing a representative survey,
all I can say is that it wouldn't surprise *me* in the slightest. But
you knew that already...
Andrew.
Sure, for a throwaway program these kinds of macros (bugros) may be
acceptable.
For programs that are to be maintained, however, bugros make the
maintenance harder: E.g., if you change BASE for a short region, then
you can normally determine the effect by just looking at this region.
But not if you use bugros; then you have to check every word in the
region, and if it is a bugro, you have to check the implementation of
that word, and all the words used in that implementation, and so on.
And BASE is not the only feature that is affected by bugros; word
definitions, search order, ticking, POSTPONE, STATE and other
variables are also affected.
>The big problems come when you try to provide macros that anybody can
>use anytime in the future.
Yes, that's even worse than maintaining an existing program with
bugros.
>If everybody in the future is knowledgeable and smart, they can redefine
>words to get fancy working results that you didn't anticipate, and
>nobody loses.
Sure there is a loss. The implementation of the macro can then not be
changed, or the code relying on that implementation would break.
E.g., if there is a bug or inefficiency in the implementation, it
cannot be fixed.
In Forth we have a better solution for letting users plug in
alternative words inside existing words: deferred words. There is
much better control about what is changed and how with deferred words;
and if you don't overdo the deferred words, you can change the
implementation later.
As I wrote, an easy fix is to define
: ; postpone ; ; immediate
before your definition of MACRO:.
A better plan is to leave the colon definition as it is and use a
Forth system that inlines on its own (e.g., VFX).
It is a factor of parse?.
I think parse? was badly designed, it parsed _and_ looked if the input
buffer was empty. Better use separate steps. Like, first look if the
input buffer is empty, refill if necessary. Then look for the word ';'
with parse-name s" ;" compare or something like that.
That's good, many thanks. One minor improvement to the source code
file would be to include any limitations e.g. no comments, no parsing
words ...
Gerry
Essentially, what PARSE? does is tell you whether PARSE found its
terminator (which is ; in MACRO:), or whether there was no terminator
and PARSE grabbed the entire input buffer. This is pretty crucial
information. Your solution described above involves searching for the
terminator character twice --- first to determine if it is there, and
then in PARSE to grab the data up to the terminator character. This is
redundant. What if you decide to change your terminator from ; to some
other character, but you only change one of the [CHAR] ; in your
source-code? You get a bug, that's what happens!
The idea is to put PARSE? it in a WHILE loop; it provides data and it
also provides a flag that tells you if there is more input that needs
to be read. This isn't exactly the same as READ-LINE, whose flag tells
you if it had obtained data or not, but the idea of making the word
WHILE-friendly is similar.
I think that it would be confusing to have a word that tells you if
there is data remaining in the input buffer or not. If the last char
in the input buffer is ;, then PARSE is going to empty the input
buffer. Also, if there was no ; in the input buffer, then PARSE is
going to empty the input buffer. The fact that the input buffer is
empty doesn't tell you whether the last call to PARSE found a ; or
not, which is the question that you are interested in.
That will allow MACRO: to compile, which is definitely a step
forward. :-) The reason why I didn't do this is because I foresaw
trouble with FIND. If some words don't have an xt, then FIND is
presumably going to crash when it tries to provide their xt. As I
don't know which words words have an xt and which don't, and this
varies between ANS-Forth implementations, the only solution for me is
to special-case *all* of the words that seem likely to fail. I may
very well end up taking this approach eventually, but the string-
expansion MACRO: that I provided right now is a workable hack. My
primary goal right now is to just allow people to compile LC53 and run
it, so that they can criticize it. It was never my intention for a
discussion of LC53 to get diverted into a discussion of MACRO:, which
is very peripheral to the subject of encryption cracking. So far only
Mr. Hendrix has looked at the encryption-cracking aspect of LC53, or
at least he is the only one to comment on that subject.
I have heard a lot of good things about VFX. If I were to buy a
commercial Forth system, I might choose VFX rather than SwiftForth.
Unfortunately, I don't have any money right now. Besides that, LC53
was intended for public exposure, so tying it to a commercial system
would be a bad idea --- I am pretty much obliged to stick with ANS-
Forth.
So it's only for experts i.e. those who can navigate the buzz
words?
I always considered an expert to be one who can explain what
he means using the simplest of language.
It depends on the purpose of the document and its intended audience.
When I'm writing user manuals I use a very different style from that
which is common and appropriate in formal standards documents.
Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
But isn't that what the OP was trying to avoid? Anyway if the line is
parsed using parse-name then it indicates end of line when it returns
zero length, so your end-of-input? is unnecessary.
Gerry
We're not talking about style but the use of awkward or archaic
terminology when plain language would have been more effective.
If governments and law reform commissions can come to the
conclusion that technical documents don't need to be written
in quaint language - that indeed such language is a hindrance
and costs - then what is stopping others from seeing it.
When I read a standards document should I change hats to
demonstrate to the world that the target audience has changed?
Some divisions are simply arbitrary.
It doesn't matter very much what words are used, so long as they are
defined somewhere in the document. I agree that the words should
correspond to plain language whenever possible to help with the
readability. This isn't always possible, as some programming concepts
don't really have a parallel in the physical world, in which case a
bad analogy is worse than no analogy at all, and you are better off to
just introduce a new word and provide a definition so that people know
what you are talking about.
I don't consider the ANS-Forth document to be too bad. On the other
hand though, I was totally surprised that ['] ; crashed gforth, so
maybe my understanding of ANS-Forth was weak. A good argument can be
made, however, that no standard should contain ambiguity. What kind of
dumb standard is that? The problem isn't that the words are hard to
read, the problem is that they don't make any sense. You could write
the document in Esperanto and it still wouldn't make any sense to have
ambiguity in there. Duh!
We also have inconsistency in the ANS-Forth document. In some cases
the concept of a string means an address of a counted string (first
byte being the count, the rest being the data). In other cases the
concept of a string means a two word pair, the first being the address
and the second being the count (possibly > 256). That is just
inconsistent. It should have been one or the other throughout.
We also have words such as CREATE that have two completely different
behaviors, depending upon the circumstance. Mostly, CREATE is a
synonym for HERE CONSTANT. If there is a DOES> after the CREATE (not
necessarily in the same word) though, the CREATE word is not a
constant any more. That is really confusing!
I lot of ANS-Forth is just 30-year-old cruft. I seriously doubt that
Chuck Moore ever intended his hacks of the 1970s to be written in
stone under the imposing title of "ANSI-standard." I've never met the
man, but my impression of him is that he doesn't try to force people
into a procrustean bed of his making --- he is pretty casual about
letting people do what they want to do and he is not above learning
new tricks from reading their code, even if those programmers aren't
(for the most part) in the same league as he is.
I read a ghost story one time. There was a house with a hallway that
echoed. The strange thing about this echo however, was that only the
bad echoed. If you said something foolish or mean, it would echo back
and forth for all to hear. If you said something wise and kind, it
would fall flat as if the walls were carpeted. In many ways, ANS-Forth
is that hallway. Every mistake that Chuck Moore made in the 1970s is
echoed repeatedly as late as 2009. But every intelligent idea that he
had, is forgotten immediately. Maybe it would be different if Chuck
Moore had written the ANS-Forth document, but I doubt it --- it is
human nature to screw everything up, and Chuck Moore would have likely
done as badly as anybody --- maybe worse, as he has more ego
involvement, which is always a major factor in most screw ups. The
wisest thing Chuck Moore ever did was to refuse to get involved in
writing a standards document.
I could be wrong here, but I think Chuck may have made some
suggestions for inclusion in one of the standards many years ago, but
they weren't included for some reason.
ANS is probably going in a different direction to Chuck anyways. ANS
is about portability and general purpose use, whereas Chuck advocates
the opposite: Only code what you need for the application at hand. So
I can't see that Chuck would be particularly interested in ANS in it's
current form. I guess if there was a 'Micro Forth' standard (say,
perhaps for Embedded machines) that might interest him.
Still, I shouldn't speak for him! Like you, never met him. But I find
his view points very interesting, and I largely agree with them.
That is what I am interested in. I did write a cross-compiler. I had
HOST and TARG and SYSTEM environments, with H', T' and S' for each
environment ([H'], [T'] and [S'] also). This obviated the problem in
ANS-Forth that we ran into in gforth in regard to my MICRO: --- that
there was no run-time meaning for ;, but there was a compile-time
meaning for it. In my system, only [T'] would fail for ;, but [H']
would work. That was so much better than the ANS-Forth system in which
there is just ['] that is supposed to work on every word, without
regard for whether this is in host or target. I would really like to
design a standard Forth based on my cross-compiler that would work as
described above. Even if you are compiling for a desktop computer, you
can still think of this as cross-compiling and have a host and target
environment.
ANS-Forth is trying to address both cross-compiling and desktop-
compiling, and it represents the worst of both worlds. If we focused
on cross-compiling, we could fix most of the problems in ANS-Forth. I
would be interested in a standard like that. I have no interest in
Forth-200x though, as it just gives us all of the same mistakes made
in ANS-Forth --- and more!
> Still, I shouldn't speak for him! Like you, never met him. But I find
> his view points very interesting, and I largely agree with them.
I don't speak for Chuck Moore, or for anybody else. It is just
speculation as to what he was thinking back in the 1970s when he
invented Forth. Right now we need to look forward, not backward --- we
need to start focusing on Forth's future, which is embedded
controllers as far as I am concerned.
If you had an ANS-Forth system that did not have [T'] and [H'] and
wanted that, wouldn't adding that to an ANS-Forth system do the job?
It'd still be an ANS-Forth system after all, since [T'] and [H'] have
no conflict with the standard specification.
What's this about? Are host/Ttarget systems covered by ANS Forth?
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
No. Cross compilers are not supported by Forth-94, nor in the current
draft of Forth200x. There is a cross-compiler proposal, and there are
people working on getting it ready for Forth 200x (or 201x).
> > If you had an ANS-Forth system that did not have [T'] and [H'] and
> > wanted that, wouldn't adding that to an ANS-Forth system do the job?
> > It'd still be an ANS-Forth system after all, since [T'] and [H'] have
> > no conflict with the standard specification.
> What's this about? Are host/Ttarget systems covered by ANS Forth?
I was having trouble pinning down what its about, which is why I tried
asking a question about a part that puzzled them. I could not work out
what the surprise is that Forth-94 standard words are insufficient for
coping with target compilers since target compilation is not within
its scope.
"This is so much better than the ANS-Forth system" did not make any
sense to me when I read it, since if "this" was added to an existing
Forth-94 system, it would still continue to be a Forth-94 system.
So what tree is Mr. Agular barking up?
No, although there is a published proposed separate standard for target
compilers. Neither of these words is in it, although there are
specified ways to access host and target words.
I think that it is a mistake to draw a distinction between cross-
compiling and desktop-compiling. If Forth *only* supported cross-
compiling, this would make desktop-compiling much simpler. The big
problem in ANS-Forth is that words such as ' and ['] are ambiguous
because it is not clear if you mean the compile-time action or the run-
time action. In some compilers (including mine), *all* of the words
are immediate. Most of the words' compile-time action is automatically
generated and consists of giving an xt to COMPILE, for compilation.
The run-time action is what the programmer wrote. The word has both a
compile-time and run-time xt associated with it. Other words were
explicitly written as immediate by the programmer (IF being a good
example) and the compile-time action is what the programmer wrote. My
own convention is use the same name for both compile-time and run-time
actions. For example, IF (written in HOST mode) will compile an IF
(written in TARG mode) that executes at run-time. There is no
confusion between the two IF words, despite their having the same
name, because they are in different word-lists. If you use ' or ['] to
obtain the xt of IF, which one you get will depend upon whether you
are in HOST or TARG mode (or SYSTEM mode, although that is not used
very much). If you want to, you can explicitly specify which xt you
want by using H', T' or S' (or [H'], [T'] or [S']). The need to be
explicit like this is pretty common in my experience.
I think that Forth would be much simplified if we just assumed that
*everything* we did was cross-compilation --- because it is! This
would also allow us to generate smaller desktop executables because we
could more easily discard all of the HOST mode code without confusion
about what functions are needed at run-time, compile-time or both.
> "This is so much better than the ANS-Forth system" did not make any
> sense to me when I read it, since if "this" was added to an existing
> Forth-94 system, it would still continue to be a Forth-94 system.
My cross-compiler was written in Forth-83 with recourse to UR/Forth-
specific assembly-language. To be fair though, I could have written it
in ANS-Forth. I didn't do this because neither myself or my boss (John
Hart) knew ANS-Forth at the time (this was 1995, to the best of my
recollection).
The problem is that a cross-compiler written in ANS-Forth is
unidiomatic. The same thing is true of :NAME written in ANS-Forth ---
it is unidiomatic. Unless the standard adopts these constructs, there
is going to be a continuing inertia to ignore them and program the way
that ANS-Forth has always been programmed --- with the same problems
associated with ['] and CREATE DOES> arising over and over again.
These problems have been screwing us up for 20 years and they are
going to continue to screw us up until some hero eventually fixes the
standard. This is why I'm so turned off by the Forth-200x standard ---
there is zero interest in fixing problems, but the entire focus is on
standardizing cruft. This is a dead-end. We need to ask ourselves why
Forth has become unpopular and strive to fix the problems. The
definition of insanity is to do the same thing repeatedly and expect
different results every time, but this is what the Forth-200x
committee is doing.
> I think that it is a mistake to draw a distinction between cross-
> compiling and desktop-compiling. If Forth *only* supported cross-
> compiling, this would make desktop-compiling much simpler.
... setting this to one side ...
> The big
> problem in ANS-Forth is that words such as ' and ['] are ambiguous
> because it is not clear if you mean the compile-time action or the run-
> time action.
Certainly not. It is spelled out in tedious computer science
technicalese
precisely which a Forth system is supposed to understand when it is
told
' or ['] by a programmer via a program.
It is not always the thing that some particular programmer might wish
it
to do, but then again, any system is of course free to add additional
types of ticks to specify distinctive things they are trying to
specify
about a word in that system ...
... Forth-94 is, after all, not an implementation standard,
but rather a communication standard, so any system that reads ' and
[']
with the meaning they have in the standard is free to have
Tom' [Dick']
and {{Harry}}' with whatever specific behaviour is desired - it is
still
a Forth-94 system.
There are a few implementation options that are ruled out by the
specifics
of the Forth-94 standard, but [T'] and [H'] is certainly not examples
of
things that are ruled out.
It is true that a programmer can write a cross-compiler on top of ANS-
Forth, and that all of the words that he writes will have two versions
of themselves --- one in HOST mode and one in TARG mode (although a
few, such as THEN, won't have a TARG mode version). The programmer can
write his own H' and T' that return the correct xt. The problem
however, is that the underlying system is ANS-Forth and it is a
jumbled mess --- sometimes the xt is the run-time version of the word
and sometimes it is the compile-time version. Also, ANS-Forth is
currently haphazard and unspecific in its naming. What is the run-time
aspect of IF? On some systems it is 0BRANCH, on other systems it is
<IF>, and on other systems it doesn't have a name because IF just
manually compiles the necessary machine-code.
What I want is for the standard system to be more regular. The HOST
and TARG modes would be part of the standard. Words such as IF would
be defined as existing in both modes and would have distinct
definitions for each version (the HOST mode version is defined as
performing a compilation of the TARG mode IF, and also of leaving
appropriate information on the stack for use by ELSE or THEN, and the
TARG mode version is defined as performing a conditional jump
forward). This would make so much more sense than the way that ANS-
Forth is written right now, in which words such as IF have two
different definitions (run-time and compile-time), but only one xt.
How confusing is that? It seems to me that because IF has two
definitions, there should be two IF words (in different wordlists).
This would correspond perfectly with the Forth philosophy that each
word should be simple and should perform exactly *one* job.
I think that making cross-compilation an inherent aspect of Forth is
much simpler than having an ANS-Forth standard that ignores cross-
compilation, and then writing a cross-compiler as an extension. The
ANS-Forth standard is complicated and confusing because it is not a
cross-compiler --- and yet, cross-compiling is what we are essentially
*always* doing, even on a desktop-system. We could get rid of all that
"tedious technicalese" --- suddenly everything will become simple and
obvious. We won't have to depend upon Anton Ertl to interpret the
standard for us because it will be so straightforward that any novice
can understand it easily. My goal is that blue-collar folks such as
myself won't be made to look foolish by writing words like MACRO: and
then discovering that their code doesn't even compile on many of the
popular Forth systems. I'm really glad that Marcel Hendrix started
this thread about LC53 and I consequently found out that gforth spits
MACRO: out with an error message. Imagine if I had gone to a job
interview and showed LC53 as an example program, but it wouldn't even
compile on the interviewer's computer. Lets just hope that the cab
company hasn't already leased my vehicle out to somebody else, because
I'll still be needing it after all...