uBasic has got strings!

32 views
Skip to first unread message

The Beez

unread,
Jun 27, 2019, 4:57:06 PM6/27/19
to 4tH-compiler
Hi 4tH-ers!

You all know I have a weak spot for uBasic. It's already late down here and I've terminated my first tests successfully just minutes ago. I've accomplished it by adding a little more than a dozen new functions. Note that strings are first class citizens, so you can pass them as parameters and receive them as function returns.

As you might expect, it's a kind of elegant, but a bit oddball implementation, with much more details later. All I can tell you now that these strings are immutable (as in Python) with a light weight garbage collector, which is doing pretty good on its own, but can also be manipulated manually. Bottom line: if you've assigned a string to a variable (ANY variable - global, local, array element), it will be just fine.

A quick peek:

a = put ("Hello")
b = put (" world")
c = join (join (a, b), char (33))

print show (c), len (c), free (0)
free (_All)               ' This will trigger the GC
                              ' but it's not required at all
print peek (c, 3), comp (c, put ("Hello world!"))
d = ask ("Is this cool? ")

I'm curious about your thoughts!

Hans Bezemer

The Beez

unread,
Jun 28, 2019, 2:20:36 AM6/28/19
to 4tH-compiler
Hi 4tH-ers!

I promised you some insight on how these strings work. First of all, it didn't come easy. I studied other BASIC implementations and at one time I was almost ready to add support myself until I realized that it would not only take massive changes to the interpreter itself, but also would not work with our current implementation of functions. The classic implementation is realized by introducing an entire new datatype with it's own parser. Not that every expression in uBasic is resolved by a recursive descent parser - and strings would need their own. Expressing some in functions helps, e.g. CONCAT() will eradicate the need for the overloading of "+" to join strings, but it would still be messy. But it wasn't a too bad idea itself..

For work, I did some Python programming and quickly learned that strings are immutable. At first I discarded the own idea and did not really mount my already low regard for the language, but it would prove to be an essential component to the solution.

Another part was when I realized that functions returning string pointers to immutable string could really work. But where would these strings be stored? At first I thought of a circular buffer with immutable strings, where you could manipulate the pointers a bit as long as you stayed within the designated area. You could do little harm there. I wrote a 4tH library, but later I abandoned the idea, because a program of considerable size would show strange behavior when long lived strings would be overridden.

Instead I went for a poor mans garbage collection. The current implementation looks a lot like the ANSMEM.4TH library, where an area of memory is divided in PARAGRAPH's - If you request memory, you don't get exactly the amount of memory you requested, but a multiple of a paragraph, usually rounded up. A table, the Heap Allocation Table, keeps track which paragraph is taken by which request. It's primitive, but it works.

But note that dynamic memory allocation is pretty vulnerable, since freeing put memory is delicate and easily corrupted - not quite the robustness you want for a beginners language. So this implementation divides an area up in fixed size strings, to be precise: 256 strings with a maximum length of 64 chars. Note that all kinds of "clippers" are in place that secretly keep user strings confined to these restrictions. And since we use immutable strings, almost each and every function creates a new string, which is allocated in its own space - perfectly fitting and perfectly aligned. E.g. CLIP(PUT("Hello world!"),1) keeps the original string intact and creates another with one character less.To allow for a quick distribution of free addresses I created a "freemap", containing all free addresses. When a new string is needed, the value under the pointer is submitted and a pointer to the "freemap" incremented.

Now we were getting there. A valid string pointer was:
- Within the designated area;
- Aligned at a 64 byte offset.

But there were things missing. You run out of strings pretty quickly. Some garbage collection was required. What I did is easy: When string memory got low I tested all variables (locals, globals and array elements) for addresses which fit that profile. Those strings were kept and the others discarded. This works as follows: first, the entire "freemap" is reinitialized. Then all variables are tested for strings addresses and these are taken from the "freemap". Finally, the "freemap" is compacted and the failed request is repeated.

However, although there are only 256 valid string addresses, they are all located in the low positive numbers, e.g. 10240, etc. The chances that "normal" values are recognized as strings is pretty high. Not that it has immediate adverse effects - a string will be kept that could be freed, that's it. But to minimize this possibility I decided to set the sign bit, so these addresses would move to the low -2000000000 range, where user programs dwell with a much lower frequency and probability. True, hashes come there, but are rarely stored in variables, but instead passed as labels. Added advantage: the large negative number really discourages pointer manipulation. Doing so almost always results in an address that is no longer recognized as a string by uBasic.

So these test became much more precise and effective:
- Is the sign bit set;
- Within the designated area;
- Aligned at a 64 byte offset.

The interpreter simply strips the sign bit before use. Still there was a remote possibility that during a long sequence of string manipulations under low memory conditions the GC would kick in. In itself, that's not too bad. Addresses of temporary strings are still valid, but are marked for reuse. It could be that some temporary string would overwrite another temporary string it is dependent on. So, I added a function, FREE(), which does not only allow inspection of the available free strings, but also preemptive call it, e.g.

IF FREE(0) < 16 THEN a@ = FREE(_All)

May be not a too elegant solution, but IMHO the most robust one. If all fails, the thing gives up, of course. Another nice thing: due to tail call optimization the repeated request does not take up additional stack space.

Whatever you think of all this, most is transparent to the user. And you can believe this or not, but apart from some tables not a line of the original uBasic program has changed. Everything is simply added to the existing source.

Hans Bezemer

The Beez

unread,
Jun 30, 2019, 4:28:37 PM6/30/19
to 4tH-compiler
Hi 4tHers!

The whole shebang (including example programs) is committed to SVN. With a little trouble you might get it to run on v3.62.5, 

Hans Bezemer

The Beez

unread,
Jul 3, 2019, 3:32:45 PM7/3/19
to 4tH-compiler
Hi 4tH-ers!

I've added the NAME() function to 4tH. That will convert a string to a label, e.g.

   PRINT _Hello, NAME(DUP("Hello"))

Will print the same number twice (the underscore is not part of the label, just it's prefix).

So - what can you do with it. Like an RPN calculator in Basic in page of code?


Hans Bezemer

The Beez

unread,
Jul 4, 2019, 5:07:09 AM7/4/19
to 4tH-compiler
You know what?

The fun of making your own programming languages is that you can add things whenever you want. Up to now uBasic had no way to validate a label. Now it has, so my RPN calculator doesn't bomb out anymore when you issue an invalid command.

Hans Bezemer
Reply all
Reply to author
Forward
0 new messages