Well, this is turning out a very interesting discussion. Until I address the points you brought up, allow me to do some preliminaries in this regard.
The Forth community is not one of great ideas. It's very pragmatic. On one hand that's great, it produces very usable products, on the other hand it's disastrous, because it tends to become an incoherent collection of quirks, dongles and bad decisions. AKA Forth-83. With ANS Forth they had found out it's not too easy to reverse all that because there is code involved. Code, which you have to patch in order to make it work. There are plenty of "bad decisions" in Forth, linguistically speaking:
- DO..LOOP, we've been fighting that one for years now - and still, nothing has significantly improved. I find myself using BEGIN..REPEAT more and more, because I can either put the count on the Return stack or leave it TOS, it doesn't save me any space - and the count remains as accessible either way;
- VALUE, all data is treated like an address - EXCEPT VALUE. Now why is that and what does it add? Nothing. An initialization? That's how Forth-79 worked. They got rid of that - for what? As a matter of fact, it just adds behavior to an address (i.e. retrieving the value). In 4tH, such defining words are not allowed an additional DOES> definition, because implicitly, they already got one;
- No clear separation between single and double words. There is a cell, and there is a double cell. There are words that act on double words and words that act on single words. Some functionality is missing, like picturing words is only available as a double cell words. You have to convert to a double number to use there. Worse, for that reason they were added to the CORE wordset - instead of the DOUBLE wordset, making the mess even greater. Save with 2DROP, 2DUP, 2OVER and 2SWAP.
And that's just a small collection of the idiosyncrasies that populate Forth. And nobody has the courage to correct these fundamentally. Worse, in ANS Forth, they even added a few more (like the horrible FILE wordset). And instead of abstracting away the mess that Forth string support is, they declared counted strings "obsolete" without fixing anything. Leaving the entire discussion in limbo.
Talking of ANS Forth, contrary to C, it is not a language standard. It is an architecture standard. For the love of God, you cannot make an ANS compliant language. You simply can't. You have to use the classical Forth VM. That's why the 4tH manual states: "According to the ANS-Forth standard, section 5.2.2, this system is capable of compiling:" I quote, section 3.3 (all of chapter 3 are hard requirements for a Forth system): "Forth words are organized into a structure called the dictionary. While
the form of this structure is not specified by the Standard, it can be
described as consisting of three logical parts: a name space, a code
space, and a data space. The logical separation of these parts does not
require their physical separation."
Ever heard a C standard asking for a symbol table? With a specific layout? That's why there are C interpreters. C is a language standard, not an architecture standard. So, that was what I was working with - a badly designed architecture standard:
- Now, I rarely dabbled with double cell words, so these had to go. That meant that picture words (<# #S #>) had to become single cell words. In order to "fake" that, S>D and D>S became dummies;
- Some of the things I hated in Forth-83 had to go - like SIGN and negative +LOOP's. To this day they behave like Forth-79;
- The 4tH Code Segment is one long parameter field. Now how do you handle subroutines? Simple. You jump over them like with AHEAD;
- The words in the String Segment are a combination of opcode and operand - also for those opcodes without an operand. Yeah, it's overhead, but it's faster and simpler;
- Where do you put constants? You inline them in the code with a NOOP runtime. Yeah, they need a special word to access, @C (unofficial CROSS wordset);
- Where do you put string constants? And how much space do they take up? Well, we got the source in memory, if we move strings up front, we end up with a Segment holding all string constants. A short resize() and we're in business;
- Note most words are assigned to a specific datatype, like C@ and C! for characters, ! and @ to integers, so we can divy up the data space into dedicated segments. No alignment problems, addressing each data segment is dead easy, because 1 CELLS is always one and 1 CHARS is always one. And it solves a lot of problems with strict C data typing.
Now the next thing was to bring a bit of consistency to the thing. Having no data types doesn't mean there are no implicit types. That became more urgent when addr/count strings were introduced, making 2DROP, 2DUP, 2OVER and 2SWAP viable concepts. The idea of the compiler auto-expanding words like that made that possible, without introducing new bytecode opcodes. But - to this very day if I need to copy two successive, but unrelated stack items, I still use OVER OVER. I'm signaling to my future self "These have nothing to do with each other. They're not a double word or string". That's also why I'm reluctant to use a flag as anything but a string. Not a bit mask. Not an integer. It's a flag. And that's why a flag in 4tH is either "1" or "0". Thou shalt not be tempted to use a flag as anything but a flag.
As you might have notices, 4tH does not have a cascade of @-like words. One. For the Code Segment - although it's a bit more intelligent than you think, because string constant addresses are compiled with a different bytecode than the true integer constants. And it makes @C act differently. You see, @C will always return one value. I hate words that have different stack diagrams. That's why ?DUP is not supported out of the box. But if that value is a string address, that string will have been secretly copied to PAD, returning its address there.
C, works differently. It forces you to define an OFFSET. It creates a word with that name that - when invoked - takes an offset from the stack and return the contents of the string constant there. Talking of strings, CMOVE>, CMOVE and MOVE all do the same thing - it shifts bytes in the Character Segment in whatever direction you want, overlapping or not.
Files in 4tH are dead easy. Open it, if you treat it like a text file, it is a text file. Treat it like a binary file, and it is a binary file. You don't have to learn any special input or output words, just use the ones you use for the terminal. They'll do fine. Another thing - use the Windows version and it will write Windows text files. Use the Linux version and it will write Linux text files. You can read Windows files under Linux and it will do the same thing as it will do when reading Linux files under Windows. That's all handled for you in the background.
Use EXCEPT as you would in "Starting Forth" (on the terminal) and it will work as expected. Use it to fill a binary file buffer - and it will work as expected. Try to blow it up, and (most likely :-) it will just return an error message message, instead of bombing inelegantly. That's a lengthy story of how I tried to make Forth friendlier and more consistent.
However, IMHO it still helps if you know what's happening under the hood. I never think in terms of immediacy, vocabularies, dictionaries, CFA, PFA, or LFA. I think in segments and operands. Not in dictionary entries, but symbols. It's the translations of abstractions that make it actually work. AKA everything still has its place - but it's a different place. So I can wholeheartedly agree with your: "in a Forth you can play with dictionary contents and word contents in a way you cannot in 4tH". E.g. in 4tH you have no idea of the name of a word at runtime. That was in the symbol table - and that baby is gone.
I agree Chuck is a very wise and clever man - and I wish he would communicate in much greater and more fundamental detail than he already does. Because - I don't think the choice of stack manipulation words over a more symbolic implementation was an accident. I think it was a very deliberate choice. Why? Because he didn't implement anything like that in colorForth, although it was more than obvious how to do that, after that avalanche of papers on the subject. Also an interesting detail - often he expresses "performance" in terms of time to deliver ;-)
One last thing about the REPL and then I'm gone. I don't feel like the REPL was such a good thing. In 1985, if you FAFO'd it was a blank screen and the copyright message. So, I never found that one that useful. Nowadays, if I work witrh gForth, I either need to set a MARKER, so I can clear things later, or exit and reenter (if I want a blank slate). With 4tH I usually invoke it from either Kate or gEdit. That one has a terminal window. So, I run it and when it bombs, I use the window above to edit it. Click the window below and do a quick <Arrow up><Enter> - and it executes again. In gForth it's hardly any different from 4tH - not counting the additional BYE.
Hans Bezemer