A question about string stacks
I have been reading the discussion of string stacks, and have a
question.
Why?
What existing applications have used this mechanism?
What applications are intended?
In my experience with string processing in all programming languages,
_scanning_ and _cataloging_ are what is needed.
(Maybe it's just me.)
-------------------------------
In the 1980s Ray Duncan used this technique for string pads. It was
widwly adopted.
CREATE SHEAD 0 ,
CREATE SBUF 896 128 + ( = 1024) chars ALLOT
: SPAD ( -- addr ) SHEAD @ 896 AND chars SBUF + ;
: >SPAD ( str len -- addr ) 128 SHEAD +! SPAD dup >R PLACE R> ;
You may want to double the literal values. In my work, strings are
words, phrases, and lines, and this size holds them.
--
Wil Baden Costa Mesa, California Per neil...@earthlink.net
String processing is not a significant part of most embedded systems
programming. When you're talking to a machine, you generally know ahead of
time what you need to tell it.
When string processing does become significant, a string stack has the
same benefits which the data stack has. It provides an reasonably efficient
reentrant mechanism for passing data between functions.
--
-GJC
-gch...@TheWorld.com
-War is the last resort of the incompetent.
> When string processing does become significant, a string stack has the
> same benefits which the data stack has. It provides an reasonably efficient
> reentrant mechanism for passing data between functions.
I guess it is me. I don't see a use for the string stack machinery
that was discussed.
--
Wil Baden
I'm with you, Wil. We have occasionally had string-intensive
applications, and the data stack worked fine for passing string
parameters.
Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
> I have been reading the discussion of string stacks, and have a
> question.
> Why?
I haven't actually used any of this, but I can see some plausibility to
it.
So far I've seen three approaches.
1. Keep strings in a string area, do garbage collection. Pass string
addresses on a string stack that's separate from the data stack. (As
near as I could tell, the garbage collection method could be made to
work even if the string address were kept on the data stack. The
separate stack is an extra flourish.)
2. Keep strings on a literal string stack and copy actual strings with
MOVE when you need new copies that will be consumed by string
operations.
3. Define a single 1K buffer that holds 8 127-byte strings.
#3 could be generalised to use blocks, of course.
What the first two methods give you is an easy way to reclaim
no-longer-needed string buffers. When the last stack item is gone, so
is the string.
It makes sense to me that if you can mostly get strings out of the input
buffer or a file buffer and do whatever you're going to do to those,
then the buffer will get reclaimed with no problem. You know when
you've finished with it, and when you're done you can refill it.
Similarly, if what you're doing is filling up a single output buffer,
you can just get it the way you want it and send it to output and then
reclaim it. You know when you're done with it and you're only
interested in one at a time.
So the place for garbage collection would seem to involve a lot of
temporary strings, where you don't know ahead of time just how temporary
each of them is. I wouldn't be surprised if there are a lot of
applications like that. I also wouldn't be surprised if many of the
applications that look like they'd need that, can be rethought so they
don't.
>Why?
Because they make the programmers' lives easier. Because it makes
the separation of the system layers much easier to maintain.
>What existing applications have used this mechanism?
The CCS applications (approx 1 million lines of source code) use
them heavily. The application handles very large structured
documents. The standard string unit in this application requires
a 16 bit count, and 32 bit counts are also used.
Stephen
--
Stephen Pelc, s...@mpeltd.demon.co.uk
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads
I use a cyclic buffer that can serve 8 strings of 255 bytes (but
perhaps 127 would be enough).
I've a word NEW-STRING that allocates such a buffer
and strings can stored there with PLACE APPEND and APPEND-CHAR
The fact that the buffers are cyclic has bitten me once or twice, but
I don't encounter more than 8 strings at once (in my apps that is..)
Coos Hak
I am afraid I must agree. IMO putting literal strings on
a stack is a terrible waste of cpu time, both in copying/
moving the strings and in the book keeping overhead (since
the strings are of differing lengths). Even doing it for
short strings is wasteful. How do I know? My first attempt
at a FORmula TRANslator did it that way. (I was afraid I
might overwrite the input buffer.) Now, of course, the input
formula is kept in a buffer and pointers to the fragments
are put on a stack. If I were to use a tree for parsing,
rather than a stack, I would still only put pointers in
the leaves, not the literal string fragments themselves.
--
Julian V. Noble
Professor of Physics
j...@virginia.edu
"Science knows only one commandment: contribute to science."
-- Bertolt Brecht, "Galileo".
> On Thu, 16 May 2002 14:44:40 GMT, Wil Baden <neil...@earthlink.net>
> wrote:
>
> >Why?
> Because they make the programmers' lives easier. Because it makes
> the separation of the system layers much easier to maintain.
>
> >What existing applications have used this mechanism?
> The CCS applications (approx 1 million lines of source code) use
> them heavily. The application handles very large structured
> documents. The standard string unit in this application requires
> a 16 bit count, and 32 bit counts are also used.
Thank you. But tell a little more, please?
What is CCS?
Why a stack of strings?
--
Wil Baden
This makes me all the more curious for a real life example
using string stacks. I have also always considered it as -- to put it
politely -- a programming exercise.
So again, applications, anyone?
>Elizabeth D. Rather (US & Canada) 800-55-FORTH
Greetings Albert
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
To suffer is the prerogative of the strong. The weak -- perish.
alb...@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst
I can't think of an application which would make a good example without
being too big and complex to use as an example.
I'll instead offer the whole Quest32 system as an example of how useful
a string stack can be when it is well integrated into the kernel.
>In article <3ce41928...@192.168.0.1>, Stephen Pelc
CCS = Construction Computer Software.
See www.ccssa.com for more details. This is a construction planning
package used all over the world. It was used to plan the construction
of the Hong Kong metro and the Hong Kong airport.
Like so many of our clients, CCS do not lurk on this newsgroup, but
they are usually at the EuroForth conference.
I can appreciate that a large and complex application is needed to
amortize a string stack package. I asked for an example. Not to post
the source code.
> I'll instead offer the whole Quest32 system as an example of how useful
>a string stack can be when it is well integrated into the kernel.
To me ``Quest32'' is just a name. I vaguely remember it being mentioned.
What kind of application is it? Sounds like an adventure game.
Maybe you can tell something about the advantages over alternatives
or some such.
>-GJC
Greetings Albert.
It's my Windows development system. It's written in a Forth dialect
which has evolved in a very different direction then ANS Forth has.
Probably string handling is the area that has diverged the most (it actually
admits that strings exist!).
It may be possible to do garbage collection without a string
stack, but I don't think it's a good idea. Not just for
implementation reasons--to me, that's a conceptual complication.
In my scheme I worried a lot about when to actually copy strings
and when not to. What emerged was that the string stack
provides a mechanism for _avoiding_ copying when you want to
manipulate the same string in various ways. Having the same
string in multiple variables, on the other hand requires
copying. This amplifies a bit the corresponding difference between
variables and the data stack in Forth.
> So the place for garbage collection would seem to involve a lot of
> temporary strings, where you don't know ahead of time just how temporary
> each of them is. I wouldn't be surprised if there are a lot of
> applications like that. I also wouldn't be surprised if many of the
> applications that look like they'd need that, can be rethought so they
> don't.
I basically agree, except that it's also handy when you do know
ahead of time how temporary the strings are. I agree about the
possibility of rethinking to avoid the need, but there's no
point, given that you have a reliable and convenient system.
That of course is the rub! It comes down to whether it's cost
effective to expend the effort to ensure reliability.
-- David
> > [...]
> > 1. Keep strings in a string area, do garbage collection. Pass
> > string addresses on a string stack that's separate from the data
> > stack. (As near as I could tell, the garbage collection method
> > could be made to work even if the string address were kept on the
> > data stack. The separate stack is an extra flourish.)
> It may be possible to do garbage collection without a string
> stack, but I don't think it's a good idea. Not just for
> implementation reasons--to me, that's a conceptual complication.
When I thought about it more, it wasn't as simple as I'd imagined. I
figured that with each string you kept a pointer to the lowest item on
the string stack that used it. When you consumed that lowest item you
no longer needed the string. You could do the same thing keeping the
lowest item on the data stack that uses it, provided you didn't do
things like SWAP and ROT . If you move that last item upward then when
it's consumed you won't know it's the last item. And if you move it
down then you'll think it's gone before it is. Sorry, I didn't think it
through.
> In my scheme I worried a lot about when to actually copy strings
> and when not to. What emerged was that the string stack
> provides a mechanism for _avoiding_ copying when you want to
> manipulate the same string in various ways. Having the same
> string in multiple variables, on the other hand requires
> copying. This amplifies a bit the corresponding difference between
> variables and the data stack in Forth.
I like the argument that unless you have very long strings or a slow
processor, you won't notice the string copying anyway. A string stack
with actual strings on it is conceptually even simpler.
> > So the place for garbage collection would seem to involve a lot of
> > temporary strings, where you don't know ahead of time just how
> > temporary each of them is. I wouldn't be surprised if there are a
> > lot of applications like that. I also wouldn't be surprised if many
> > of the applications that look like they'd need that, can be
> > rethought so they don't.
> I basically agree, except that it's also handy when you do know
> ahead of time how temporary the strings are. I agree about the
> possibility of rethinking to avoid the need, but there's no
> point, given that you have a reliable and convenient system.
> That of course is the rub! It comes down to whether it's cost
> effective to expend the effort to ensure reliability.
Yes. If you know that you'll be done with every string you allocate
inside the current definition by the time the current definition is
over, you could just ALLOT each of them and then negative-allot at the
end. If you have ALLOCATE and FREE you could let the system do the
garbage-collection for you, provided you don't mind keeping track of
when to get rid of each one.
If you don't have strings longer than 1022 chars, you could put them
into block buffers. If it's very temporary use BUFFER , otherwise use
BLOCK . If you don't like the waste of copying to mass storage, put all
your extra memory into block buffers. Or put your block file into a
ramdisk. If you can declare your own block file then when you're done
you can copy whatever final strings you need to keep and then delete the
file.
I'm beginning to see why there's no standard practice. Unless your
needs are complex you can get by without much string manipulation at
all. And if your needs are complex there are so many ways to do it that
no two people are likely to come up with the same methods. And lots of
methods are pretty good, so there isn't one obvious best way.
So this is totally besides the point. Rather c.s. argue that there
are no *applications* needing a string stack of such complexity.
If you can mention *applications* written using this development
system, I would be interested.
By the way ciforth also admits that strings exists. Those strings
are no add-on but are instrumental in defining the Forth itself.
It does away with about as many words as are added (but I must
leave them in, because they are standard.) In other words no
added complexity. This is approximately the same wordset
Bernd Paysan uses.
In other words: I'm still not convinced.
>-GJC
Groetjes Albert.
Yes, but more importantly, the complex situation that demands special
string handling probably has an intrinsic "shape" to its needs that
will heavily influence an optimal solution, so that solution isn't
general. And a general solution will likely fail to satisfy the needs
of this particular complex situation without extensive mods.
Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH