Hi 4tH-ers!
I promised you some insight on how these strings work. First of all, it didn't come easy. I studied other BASIC implementations and at one time I was almost ready to add support myself until I realized that it would not only take massive changes to the interpreter itself, but also would not work with our current implementation of functions. The classic implementation is realized by introducing an entire new datatype with it's own parser. Not that every expression in uBasic is resolved by a recursive descent parser - and strings would need their own. Expressing some in functions helps, e.g. CONCAT() will eradicate the need for the overloading of "+" to join strings, but it would still be messy. But it wasn't a too bad idea itself..
For work, I did some Python programming and quickly learned that strings are immutable. At first I discarded the own idea and did not really mount my already low regard for the language, but it would prove to be an essential component to the solution.
Another part was when I realized that functions returning string pointers to immutable string could really work. But where would these strings be stored? At first I thought of a circular buffer with immutable strings, where you could manipulate the pointers a bit as long as you stayed within the designated area. You could do little harm there. I wrote a 4tH library, but later I abandoned the idea, because a program of considerable size would show strange behavior when long lived strings would be overridden.
Instead I went for a poor mans garbage collection. The current implementation looks a lot like the ANSMEM.4TH library, where an area of memory is divided in PARAGRAPH's - If you request memory, you don't get exactly the amount of memory you requested, but a multiple of a paragraph, usually rounded up. A table, the Heap Allocation Table, keeps track which paragraph is taken by which request. It's primitive, but it works.
But note that dynamic memory allocation is pretty vulnerable, since freeing put memory is delicate and easily corrupted - not quite the robustness you want for a beginners language. So this implementation divides an area up in fixed size strings, to be precise: 256 strings with a maximum length of 64 chars. Note that all kinds of "clippers" are in place that secretly keep user strings confined to these restrictions. And since we use immutable strings, almost each and every function creates a new string, which is allocated in its own space - perfectly fitting and perfectly aligned. E.g. CLIP(PUT("Hello world!"),1) keeps the original string intact and creates another with one character less.To allow for a quick distribution of free addresses I created a "freemap", containing all free addresses. When a new string is needed, the value under the pointer is submitted and a pointer to the "freemap" incremented.
Now we were getting there. A valid string pointer was:
- Within the designated area;
- Aligned at a 64 byte offset.
But there were things missing. You run out of strings pretty quickly. Some garbage collection was required. What I did is easy: When string memory got low I tested all variables (locals, globals and array elements) for addresses which fit that profile. Those strings were kept and the others discarded. This works as follows: first, the entire "freemap" is reinitialized. Then all variables are tested for strings addresses and these are taken from the "freemap". Finally, the "freemap" is compacted and the failed request is repeated.
However, although there are only 256 valid string addresses, they are all located in the low positive numbers, e.g. 10240, etc. The chances that "normal" values are recognized as strings is pretty high. Not that it has immediate adverse effects - a string will be kept that could be freed, that's it. But to minimize this possibility I decided to set the sign bit, so these addresses would move to the low -2000000000 range, where user programs dwell with a much lower frequency and probability. True, hashes come there, but are rarely stored in variables, but instead passed as labels. Added advantage: the large negative number really discourages pointer manipulation. Doing so almost always results in an address that is no longer recognized as a string by uBasic.
So these test became much more precise and effective:
- Is the sign bit set;
- Within the designated area;
- Aligned at a 64 byte offset.
The interpreter simply strips the sign bit before use. Still there was a remote possibility that during a long sequence of string manipulations under low memory conditions the GC would kick in. In itself, that's not too bad. Addresses of temporary strings are still valid, but are marked for reuse. It could be that some temporary string would overwrite another temporary string it is dependent on. So, I added a function, FREE(), which does not only allow inspection of the available free strings, but also preemptive call it, e.g.
IF FREE(0) < 16 THEN a@ = FREE(_All)
May be not a too elegant solution, but IMHO the most robust one. If all fails, the thing gives up, of course. Another nice thing: due to tail call optimization the repeated request does not take up additional stack space.
Whatever you think of all this, most is transparent to the user. And you can believe this or not, but apart from some tables not a line of the original uBasic program has changed. Everything is simply added to the existing source.
Hans Bezemer