> On Nov 13, 1:00 am, "Elizabeth D. Rather" <erat...@forth.com> wrote:
> > On 11/12/12 2:49 PM, Rod Pemberton wrote:
> > ...
> > > BTW, this is the original definition of COUNT:
> > > " COUNT addr1 --- addr2 n L0
> > > Leave the byte address addr2 and byte count n of a message
> > > text beginning at addr1. It is presumed that the first
> > > byte at addr1 contains the text byte count and the actual
> > > text starts with the second byte. Typically COUNT is
> > > followed by TYPE."
> > > It's also more accurate. It doesn't assume counted strings.
> > I'm not sure what "original" means in this context. COUNT has been
> > around since 1971, which is 7 years before the first, very preliminary,
> > attempt at standards. In any case, the sentence, "It is presumed that
> > the first byte at addr1 contains the text byte count and the actual text
> > starts with the second byte" is an accurate description of a counted string.
> > Cheers,
> > Elizabeth
> > --
> > ==================================================
> > Elizabeth D. Rather (US & Canada) 800-55-FORTH
> > FORTH Inc. +1 310.999.6784
> > 5959 West Century Blvd. Suite 700
> > Los Angeles, CA 90045http://www.forth.com
> > "Forth-based products and Services for real-time
> > applications since 1973."
> > ==================================================
> The definition of COUNT is quite interesting. It actually indirectly,
> and presumably un-intentionally mandates the format of a string in
> Forth. Here's the ANS definition:
> 6.1.0980 COUNT
> CORE
> ( c-addr1 -- c-addr2 u )
> Return the character string specification for the counted string
> stored at c-addr1. c-addr2 is the address of the first character after
> c-addr1. u is the contents of the character at c-addr1, which is the
> length in characters of the string at c-addr2.
> Note the last line: u *is* the contents of the character at c-addr1...
> So, there's no way around it in actual fact. If you wanted to
> implement strings in a different way under the covers, you'd be
> prevented from doing so. It's slightly off topic, but I thought it
> interesting.
> There is no clue to the format of a string in memory in the ANS
> definition of S" :
> 6.1.2165 S"
> s-quote CORE
> Interpretation: Interpretation semantics for this word are undefined.
> Compilation: ( "ccc<quote>" -- )
> Parse ccc delimited by " (double-quote). Append the run-time semantics
> given below to the current definition.
> Run-time: ( -- c-addr u )
> Return c-addr and u describing a string consisting of the characters
> ccc. A program shall not alter the returned string.
> That's disappointing. If the format of a string is mandatory, then the
> appropriate place to describe it (or refer to it) is within the
> definition of the word S" not COUNT.- Hide quoted text -
> - Show quoted text -
On second thoughts, perhaps it's just me not reading the standard in
enough detail. In fairness, the stack picture given in S" does clearly
say:
c-addr u
That is *c-addr* - i.e. the address of a *character*. If I interpret
that correctly, then S" *must* return the address of a *character*,
not the address of a "thing" (for example, the address of entry into
an array of string pointers).
So, whilst the format of a Forth string isn't explicitly described in
English in the standard in the description of S" it *is* there. It's
in the stack sig. The devil is in the details, as they say.
> On Nov 13, 8:50 am, Mark Wills <forthfr...@gmail.com> wrote:
>> On Nov 13, 1:00 am, "Elizabeth D. Rather" <erat...@forth.com> wrote:
>>> On 11/12/12 2:49 PM, Rod Pemberton wrote:
>>> ...
>>>> BTW, this is the original definition of COUNT:
>>>> " COUNT addr1 --- addr2 n L0
>>>> Leave the byte address addr2 and byte count n of a message
>>>> text beginning at addr1. It is presumed that the first
>>>> byte at addr1 contains the text byte count and the actual
>>>> text starts with the second byte. Typically COUNT is
>>>> followed by TYPE."
>>>> It's also more accurate. It doesn't assume counted strings.
>>> I'm not sure what "original" means in this context. COUNT has been
>>> around since 1971, which is 7 years before the first, very preliminary,
>>> attempt at standards. In any case, the sentence, "It is presumed that
>>> the first byte at addr1 contains the text byte count and the actual text
>>> starts with the second byte" is an accurate description of a counted string.
>> The definition of COUNT is quite interesting. It actually indirectly,
>> and presumably un-intentionally mandates the format of a string in
>> Forth. Here's the ANS definition:
>> 6.1.0980 COUNT
>> CORE
>> ( c-addr1 -- c-addr2 u )
>> Return the character string specification for the counted string
>> stored at c-addr1. c-addr2 is the address of the first character after
>> c-addr1. u is the contents of the character at c-addr1, which is the
>> length in characters of the string at c-addr2.
>> Note the last line: u *is* the contents of the character at c-addr1...
>> So, there's no way around it in actual fact. If you wanted to
>> implement strings in a different way under the covers, you'd be
>> prevented from doing so. It's slightly off topic, but I thought it
>> interesting.
>> There is no clue to the format of a string in memory in the ANS
>> definition of S" :
>> 6.1.2165 S"
>> s-quote CORE
>> Interpretation: Interpretation semantics for this word are undefined.
>> Compilation: ( "ccc<quote>" -- )
>> Parse ccc delimited by " (double-quote). Append the run-time semantics
>> given below to the current definition.
>> Run-time: ( -- c-addr u )
>> Return c-addr and u describing a string consisting of the characters
>> ccc. A program shall not alter the returned string.
>> That's disappointing. If the format of a string is mandatory, then the
>> appropriate place to describe it (or refer to it) is within the
>> definition of the word S" not COUNT.- Hide quoted text -
>> - Show quoted text -
> On second thoughts, perhaps it's just me not reading the standard in
> enough detail. In fairness, the stack picture given in S" does clearly
> say:
> c-addr u
> That is *c-addr* - i.e. the address of a *character*. If I interpret
> that correctly, then S" *must* return the address of a *character*,
> not the address of a "thing" (for example, the address of entry into
> an array of string pointers).
> So, whilst the format of a Forth string isn't explicitly described in
> English in the standard in the description of S" it *is* there. It's
> in the stack sig. The devil is in the details, as they say.
A couple of important points:
The definition of 6.1.0980 COUNT that you cite explicitly states that it is for a "counted string" which is defined in 2.1 Definitions of Terms: "counted string: A data structure consisting of one character containing a length followed by zero or more contiguous data characters. Normally, counted strings contain text."
The definition of S" however makes *no mention* of counted strings, and specifies absolutely nothing about its storage format. The similar word C" *does* return the address of a counted string; that is the difference between them.
Counted strings have been around Forth for a long time (since 1970), because they're a very efficient format that's useful for a wide variety of things, and is still widely used internally. But this format is not *mandated* except in specific circumstances such as C" above.
Cheers,
Elizabeth
-- ==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
Rod Pemberton <do_not_h...@notemailnotz.cnm> wrote:
> "Andrew Haley" <andre...@littlepinkcloud.invalid> wrote in message
> news:lo6dnQzHTZphWD3NnZ2dnUVZ8qqdnZ2d@supernews.com...
>> Rod Pemberton <do_not_h...@notemailnotz.cnm> wrote:
>> > "Hugh Aguilar" <hughaguila...@yahoo.com> wrote in message
>> >> COUNT does consume adr1 --- and then it gives you back
>> >> some other data
>> > That's a matter of interpretation. len is definately different.
>> >> (adr2 is not the same datum as adr1).
>> > False. "adr2 is" _typically_ "not the same datum as adr1", but can be.
>> > There is no requirement that adr2 be different from adr1.
>> Of course there is, because the count is at c-addr1, and c-addr2 is
>> the start of the string:
>> 6.1.0980 COUNT
>> CORE
>> ( c-addr1 -- c-addr2 u )
>> Return the character string specification for the counted string
>> stored at c-addr1. c-addr2 is the address of the first character after
>> c-addr1. u is the contents of the character at c-addr1, which is the
>> length in characters of the string at c-addr2.
>> > There is only a requirement that COUNT returns an address to the
>> > start of the string. Historically, since counted strings are used
>> > in Forth, where the count precedes the string, they are different.
>> > But, if no count precedes the string, they'll be the same.
>> The count precedes the string, as COUNT's glossary entry makes clear.
> Did you just pop in and ignore *EVERYTHING* that was said
> previously, i.e., including the now snipped context? You *keep*
> responding near the end of a thread after relevant context for you
> reply has been snipped. Please, read the entire thread first.
I'm responding to what I quoted; nothing more, nothing less. If you
disagee with the part you wrote that I responded to, feel free to say
so.
> BTW, this is the original definition of COUNT:
> " COUNT addr1 --- addr2 n L0
> Leave the byte address addr2 and byte count n of a
> message text beginning at addr1. It is presumed that
> the first byte at addr1 contains the text byte count
> and the actual text starts with the second byte.
> Typically COUNT is followed by TYPE."
> It's also more accurate.
In what more is it more accurate than the standard definition?
>On Nov 13, 1:00 am, "Elizabeth D. Rather" <erat...@forth.com> wrote:
>> On 11/12/12 2:49 PM, Rod Pemberton wrote:
>> ...
>> > BTW, this is the original definition of COUNT:
>> > " COUNT addr1 --- addr2 n L0
>> > Leave the byte address addr2 and byte count n of a message
>> > text beginning at addr1. It is presumed that the first
>> > byte at addr1 contains the text byte count and the actual
>> > text starts with the second byte. Typically COUNT is
>> > followed by TYPE."
>> > It's also more accurate. It doesn't assume counted strings.
>> I'm not sure what "original" means in this context. COUNT has been
>> around since 1971, which is 7 years before the first, very preliminary,
>> attempt at standards. In any case, the sentence, "It is presumed that
>> the first byte at addr1 contains the text byte count and the actual text
>> starts with the second byte" is an accurate description of a counted string.
>> Cheers,
>> Elizabeth
>> --
>> ==================================================
>> Elizabeth D. Rather (US & Canada) 800-55-FORTH
>> FORTH Inc. +1 310.999.6784
>> 5959 West Century Blvd. Suite 700
>> Los Angeles, CA 90045http://www.forth.com
>> "Forth-based products and Services for real-time
>> applications since 1973."
>> ==================================================
>The definition of COUNT is quite interesting. It actually indirectly,
>and presumably un-intentionally mandates the format of a string in
>Forth. Here's the ANS definition:
>6.1.0980 COUNT
>CORE
> ( c-addr1 -- c-addr2 u )
>Return the character string specification for the counted string
>stored at c-addr1. c-addr2 is the address of the first character after
>c-addr1. u is the contents of the character at c-addr1, which is the
>length in characters of the string at c-addr2.
>Note the last line: u *is* the contents of the character at c-addr1...
>So, there's no way around it in actual fact. If you wanted to
>implement strings in a different way under the covers, you'd be
>prevented from doing so. It's slightly off topic, but I thought it
>interesting.
Well, no. My forth implementation revolves around strings with a
cell count. I.e. even on a 64 bit system
' APE >NFA @
points to a ciforth-regular string stored in memory.
So a subsequent fetch ( @ ) gives the length of the string.
A $@ gives something to be passable to TYPE.
COUNT is aliased as $@-BD .
I feel not much constrained by the Standard, although
WORD and FIND have become loadable extension.
>There is no clue to the format of a string in memory in the ANS
>definition of S" :
>6.1.2165 S"
>s-quote CORE
> Interpretation: Interpretation semantics for this word are undefined.
> Compilation: ( "ccc<quote>" -- )
>Parse ccc delimited by " (double-quote). Append the run-time semantics
>given below to the current definition.
> Run-time: ( -- c-addr u )
>Return c-addr and u describing a string consisting of the characters
>ccc. A program shall not alter the returned string.
>That's disappointing. If the format of a string is mandatory, then the
>appropriate place to describe it (or refer to it) is within the
>definition of the word S" not COUNT.
Hell no. I would have had a lot of trouble using sensible strings
in the core of my Forth, had the standard prescribed this.
Groetjes Albert
-- Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Mark Wills <forthfr...@gmail.com> writes:
>The definition of COUNT is quite interesting. It actually indirectly,
>and presumably un-intentionally mandates the format of a string in
>Forth. Here's the ANS definition:
>6.1.0980 COUNT
>CORE
> ( c-addr1 -- c-addr2 u )
>Return the character string specification for the counted string
>stored at c-addr1. c-addr2 is the address of the first character after
>c-addr1. u is the contents of the character at c-addr1, which is the
>length in characters of the string at c-addr2.
>Note the last line: u *is* the contents of the character at c-addr1...
>So, there's no way around it in actual fact. If you wanted to
>implement strings in a different way under the covers, you'd be
>prevented from doing so.
At least for counted strings, which I don't recommend using.
It is interesting, though, that some people write about COUNT as if
its specification was more abstract than it is.
>There is no clue to the format of a string in memory in the ANS
>definition of S" :
>6.1.2165 S"
>s-quote CORE
[...]
>Return c-addr and u describing a string consisting of the characters
>ccc. A program shall not alter the returned string.
The format of a c-addr u string is just that the first char is at
c-addr, the next char is at c-addr char+ etc. Hmm, but is that
anywhere in the standard text?
>That's disappointing. If the format of a string is mandatory, then the
>appropriate place to describe it (or refer to it) is within the
>definition of the word S" not COUNT.
COUNT is for counted strings, S" is for c-addr u strings, and whether
is stores the string as counted string is up to the implementation (a
high-quality S" can deal with arbitrary-length strings, there counted
strings are not an option). Neither says anything about the
arrangement of the characters themselves.
> Mark Wills <forthfr...@gmail.com> writes:
> >The definition of COUNT is quite interesting. It actually indirectly,
> >and presumably un-intentionally mandates the format of a string in
> >Forth. Here's the ANS definition:
> >6.1.0980 COUNT
> >CORE
> > ( c-addr1 -- c-addr2 u )
> >Return the character string specification for the counted string
> >stored at c-addr1. c-addr2 is the address of the first character after
> >c-addr1. u is the contents of the character at c-addr1, which is the
> >length in characters of the string at c-addr2.
> >Note the last line: u *is* the contents of the character at c-addr1...
> >So, there's no way around it in actual fact. If you wanted to
> >implement strings in a different way under the covers, you'd be
> >prevented from doing so.
> At least for counted strings, which I don't recommend using.
> It is interesting, though, that some people write about COUNT as if
> its specification was more abstract than it is.
> >There is no clue to the format of a string in memory in the ANS
> >definition of S" :
> >6.1.2165 S"
> >s-quote CORE
> [...]
> >Return c-addr and u describing a string consisting of the characters
> >ccc. A program shall not alter the returned string.
> The format of a c-addr u string is just that the first char is at
> c-addr, the next char is at c-addr char+ etc. Hmm, but is that
> anywhere in the standard text?
> >That's disappointing. If the format of a string is mandatory, then the
> >appropriate place to describe it (or refer to it) is within the
> >definition of the word S" not COUNT.
> COUNT is for counted strings, S" is for c-addr u strings, and whether
> is stores the string as counted string is up to the implementation (a
> high-quality S" can deal with arbitrary-length strings, there counted
> strings are not an option). Neither says anything about the
> arrangement of the characters themselves.
The arrangement is specified. There's the usual Western ASCII right to
left bias specifically in the normative "Terms, notation, and
references" section of the standard.
character string: Data space that is associated with a sequence of
consecutive character-aligned addresses. Character strings usually
contain text. Unless otherwise indicated, the term “string” means
“character string”.
Counted strings get a mention too;
counted string: A data structure consisting of one character
containing a length followed by zero or more contiguous data
characters. Normally, counted strings contain text.
> Mark Wills <forthfr...@gmail.com> writes:
> >The definition of COUNT is quite interesting. It actually indirectly,
> >and presumably un-intentionally mandates the format of a string in
> >Forth. Here's the ANS definition:
> >6.1.0980 COUNT
> >CORE
> > ( c-addr1 -- c-addr2 u )
> >Return the character string specification for the counted string
> >stored at c-addr1. c-addr2 is the address of the first character after
> >c-addr1. u is the contents of the character at c-addr1, which is the
> >length in characters of the string at c-addr2.
> >Note the last line: u *is* the contents of the character at c-addr1...
> >So, there's no way around it in actual fact. If you wanted to
> >implement strings in a different way under the covers, you'd be
> >prevented from doing so.
> At least for counted strings, which I don't recommend using.
> It is interesting, though, that some people write about COUNT as if
> its specification was more abstract than it is.
> >There is no clue to the format of a string in memory in the ANS
> >definition of S" :
> >6.1.2165 S"
> >s-quote CORE
> [...]
> >Return c-addr and u describing a string consisting of the characters
> >ccc. A program shall not alter the returned string.
> The format of a c-addr u string is just that the first char is at
> c-addr, the next char is at c-addr char+ etc. Hmm, but is that
> anywhere in the standard text?
> >That's disappointing. If the format of a string is mandatory, then the
> >appropriate place to describe it (or refer to it) is within the
> >definition of the word S" not COUNT.
> COUNT is for counted strings, S" is for c-addr u strings, and whether
> is stores the string as counted string is up to the implementation (a
> high-quality S" can deal with arbitrary-length strings, there counted
> strings are not an option). Neither says anything about the
> arrangement of the characters themselves.
Of course. You're right. Elizabeth too. I really should pay more
attention. Somehow I never made the mental *dis*connect between
counted strings and c-addr u strings, which are very different from
each other.
I implemented them both in my system and they work just fine. If I do
S" HELLO" I get 5 and an address on the stack. On the other hand, when
I use file streams, I get a counted stream back (by design):
PAD myFile #GET ABORT" Can't read from the file"
PAD COUNT TYPE
Yet somehow, I'd never really considered them to be different, just
the same, but in different states: c-addr u is for carrying around on
the stack when you want to do work with them. Counted strings on the
other hand is how they are stored in memory.
I wonder if my version of S" is a hybrid of both techniques? It's the
classic state-smart implementation as far as I'm aware. If used in
interpretation state, it places the string in a transitory/temporary
memory area and pushes len addr to the stack. If compiled, it compiles
(S") len <s t r i n g> to memory. At run time, len is pushed by (S")
and the address is *derived* (by (S")) by examining the Forth VM IP.
Mark Wills <forthfr...@gmail.com> wrote:
> The definition of COUNT is quite interesting. It actually indirectly,
> and presumably un-intentionally mandates the format of a string in
> Forth.
It mandates (assumes?) the format of a counted string, and quite
deliberately so.
On Tuesday, November 13, 2012 1:50:25 AM UTC-7, M.R.W Wills wrote:
> That's disappointing. If the format of a string is mandatory, then the
> appropriate place to describe it (or refer to it) is within the
> definition of the word S" not COUNT.
There's no reason S" can't store the string length as a long. ANS made an effort to promote the ( c-addr ulength ) string format and also support legacy counted strings.
I don't use S" and ." in my larger apps anyway, because of internationalization needs. In any given application domain, you can probably throw out half of ANS and not miss it.
Alex McDonald <b...@rivadpm.com> writes:
>On Nov 13, 2:06=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>wrote:
>> Neither says anything about the
>> arrangement of the characters themselves.
>The arrangement is specified. There's the usual Western ASCII right to
>left bias specifically in the normative "Terms, notation, and
>references" section of the standard.
>character string: Data space that is associated with a sequence of
>consecutive character-aligned addresses. Character strings usually
>contain text. Unless otherwise indicated, the term =93string=94 means
>=93character string=94.
I don't see a clear specification of the arrangement. It says
"sequence of consecutive character-aligned addresses", but that does
not say anything about the order of the characters. I don't see any
right-to-left bias here, either.
Of course, given that there has not been a question on the order of
characters since Forth-94 came out, it is obviously unnecessary to
specify the order of characters in more detail, at least for the main
purpose of the standard.
>Counted strings get a mention too;
>counted string: A data structure consisting of one character
>containing a length followed by zero or more contiguous data
>characters. Normally, counted strings contain text.
This clearly specifies the count concretely, but the other characters
still have no specified order.
> Alex McDonald <b...@rivadpm.com> writes:
> >On Nov 13, 2:06=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Neither says anything about the
> >> arrangement of the characters themselves.
> >The arrangement is specified. There's the usual Western ASCII right to
> >left bias specifically in the normative "Terms, notation, and
> >references" section of the standard.
> >character string: Data space that is associated with a sequence of
> >consecutive character-aligned addresses. Character strings usually
> >contain text. Unless otherwise indicated, the term =93string=94 means
> >=93character string=94.
> I don't see a clear specification of the arrangement. It says
> "sequence of consecutive character-aligned addresses", but that does
> not say anything about the order of the characters. I don't see any
> right-to-left bias here, either.
3.1.4.2 Character strings
A string is specified by a cell pair (c-addr u) representing its
starting address and length in characters.
For non-Western orderings, this is possible if c-addr points at the
starting address (c-addr+u) and the string extends (conceptually)
leftwards or through decreasing addresses. But then;
17.6.1.0245 /STRING “slash-string” STRING
( c-addr1 u1 n -- c-addr2 u2 )
Adjust the character string at c-addr1 by n characters. The resulting
character string, specified by c-addr2 u2, begins at c-addr1 plus n
characters and is u1 minus n characters long.
makes such an ordering impossible to implement. Of course, the string
could be held backwards ("sdrawkcab"), but then that introduces a
whole set of other issues; for instance, how would we interpret the
result of 1 /STRING ? How should a SEARCH for the character 'a' in the
example given operate?
> Of course, given that there has not been a question on the order of
> characters since Forth-94 came out, it is obviously unnecessary to
> specify the order of characters in more detail, at least for the main
> purpose of the standard.
Many (most?) standards have this issue, so I wouldn't consider it a
defect.
> >Counted strings get a mention too;
> >counted string: A data structure consisting of one character
> >containing a length followed by zero or more contiguous data
> >characters. Normally, counted strings contain text.
> This clearly specifies the count concretely, but the other characters
> still have no specified order.
As above, since the result of COUNT must be treatable by /STRING;
therefore they must have the order left to right at ascending
addresses.
> Rod Pemberton <do_not_h...@notemailnotz.cnm> wrote:
...
> > BTW, this is the original definition of COUNT:
> > " COUNT addr1 --- addr2 n L0
> > Leave the byte address addr2 and byte count n of a
> > message text beginning at addr1. It is presumed that
> > the first byte at addr1 contains the text byte count
> > and the actual text starts with the second byte.
> > Typically COUNT is followed by TYPE."
> > It's also more accurate.
> [How] is it more accurate than the standard definition?
It doesn't assume counted strings.
> > It doesn't assume counted strings.
> Yes it does. Read it again.
No it doesn't. Read it again.
This time look for the word "presumed".
Look up the definition for "presumed".
> On 11/12/12 2:49 PM, Rod Pemberton wrote:
> ...
> > BTW, this is the original definition of COUNT:
> > " COUNT addr1 --- addr2 n L0
> > Leave the byte address addr2 and byte count n of a
> > message text beginning at addr1. It is presumed that
> > the first byte at addr1 contains the text byte count and
> > the actual text starts with the second byte. Typically
> > COUNT is followed by TYPE."
> > It's also more accurate. It doesn't assume counted strings.
> I'm not sure what "original" means in this context. COUNT has been
> around since 1971, which is 7 years before the first, very preliminary,
> attempt at standards. In any case, the sentence, "It is presumed that
> the first byte at addr1 contains the text byte count and the actual text
> starts with the second byte" is an accurate description of a counted
> string.
> Of course. You're right. Elizabeth too. I really should pay more
> attention. Somehow I never made the mental *dis*connect between
> counted strings and c-addr u strings, which are very different from
> each other.
Why? Why are they different? Why should they be different?
I have but one string format for Forth. It's not a counted string format.
Why would I implement two string formats?
An address describes a string adequately. An address and length does so
too.
> Yet somehow, I'd never really considered them to be different, just
> the same, but in different states: c-addr u is for carrying around on
> the stack when you want to do work with them. Counted strings on the
> other hand is how they are stored in memory.
I see your confusion as resulting from a lack of familiarity
with C's string model. Ms. Rather has demonstrated similar
confusion in the past when discussing the merits of null
terminated strings of C versus counted strings of Forth and PL/1.
> The definition of COUNT is quite interesting. It actually indirectly,
> and presumably un-intentionally mandates the format of a string in
> Forth.
> [This is at least second time the ANS COUNT definition
> was posted in this thread ...]
...
Yes. The ANS COUNT definition defines a string as counted, or more
precisely assumes a counted string. As long as the stack arguments are
functionally correct, the verbal definition is irrelevant. fig-Forth by
using the word "presumes" doesn't define a string as counted. It allows
for non-counted string implementations also.
> Note the last line: u *is* the contents of the character at c-addr1...
Yes. Also note that Ms. Rather has stated in the past that the count isn't
required to be a character in size for Forth. IIRC, she suggested a word
(16-bits) be used for a the count of a counted string.
> So, there's no way around it in actual fact. If you wanted to
> implement strings in a different way under the covers, you'd be
> prevented from doing so.
Wrong, or it should be wrong if it isn't. Officially, the ANS Forth
specifications don't support a machine model. Defining a string format
requires a machine model model to be defined in part. Numerous Forth
"experts" here have even stated ANS doesn't define a machine model. Earlier
Forth specifications did define a machine model. Supposedly, those
specifications were very problematic because the fixed the sizes of integers
and addresses were hardcoded and inflexible. Well, string formats are no
different and would suffer the same problem. I.e., you have to take the ANS
COUNT definition requiring counted strings as "wrong".
> [more ANS stuff]
ANS Forth specification has a variety of errors in it. E.g., "immediacy" is
not required for ; semicolon.
> If the format of a string is mandatory, then the
> appropriate place to describe it (or refer to it) is within the
> definition of the word S" not COUNT.
The appropriate place is *before* definitions, but the string format
shouldn't be defined. If it is, then the specification hasn't been
fully abstracted from the machine model, i.e., the Forth specification
authors failed in their jobs.
>> > BTW, this is the original definition of COUNT:
>> > " COUNT addr1 --- addr2 n L0
>> > Leave the byte address addr2 and byte count n of a
>> > message text beginning at addr1. It is presumed that
>> > the first byte at addr1 contains the text byte count
>> > and the actual text starts with the second byte.
>> > Typically COUNT is followed by TYPE."
>> > It's also more accurate.
>> [How] is it more accurate than the standard definition?
> It doesn't assume counted strings.
In which case it would be less accurate; if your claim were true,
which it isn't.
FWIW, this isn't the "original" definition of COUNT . COUNT dates
from before fig-FORTH, and we'd need Elizabeth's help to find its
earliest definition.
>> > It doesn't assume counted strings.
>> Yes it does. Read it again.
> No it doesn't. Read it again.
> This time look for the word "presumed".
> Look up the definition for "presumed".
It's the past participle of the verb "to presume", which means
variously, to assume to be true, to take for granted, to suppose, etc.
fig-FORTH isn't a standard; it is defined by its implementation. The
fig-FORTH implementation of COUNT is
> > Of course. You're right. Elizabeth too. I really should pay more
> > attention. Somehow I never made the mental *dis*connect between
> > counted strings and c-addr u strings, which are very different from
> > each other.
> Why? Why are they different? Why should they be different?
They're different because they're, well... different!
S" Hello" results in c-addr u
C" Hello" results in addr and requires COUNT to convert it to c-addr
u.
The advantage of the latter is it is more convenient to carry about on
the stack; you only have to carry the address around, not the address
*and* the length. When you want the length, COUNT will get it for you.
They are different. Clearly. Though the *storage format* (how it is
stored in memory) is probably the same. A C" string, when executed,
will push the address of the count cell. A S" string, when executed
will push the address of the the first character, and the length.
*Intenrally* they are probably stored in memory in the same way, so,
yes, I can see why you might say they are the same!
I say *probably* stored in the same way, because they don't have to
be.
In the early days, my system compiled a string like this:
S" hello"
LIT addr LIT 5 branch xxx h e l l o _
Where the branch jumps over the string payload and _ is an alignment
padding byte. This is probably how most beginners approach it. Later,
I modified it to store a counted string, so S" hello" now compiles:
(S") 5 h e l l o
while C" hello" compiles (C") 5 h e l l o
Stored in the same way, but different effects at run time.
That's what I was getting at, though I admit probably not explained
very well.
> I have but one string format for Forth. It's not a counted string format.
> Why would I implement two string formats?
You don't have to. Just implement counted strings. COUNT converts a
counted string to a c-addr u string that words like TYPE need.
> I see your confusion as resulting from a lack of familiarity
> with C's string model. Ms. Rather has demonstrated similar
> confusion in the past when discussing the merits of null
> terminated strings of C versus counted strings of Forth and PL/1.
> Rod Pemberton
<troll>
And don't get me started on C's crack-smoking "bunch o' bytes with a /
0 at the end"! Pants method of string storage! For reasons well
trodden in previous CLF threads.
</troll>
On Wed, 14 Nov 2012 06:42:19 -0500, "Rod Pemberton"
<do_not_h...@notemailnotz.cnm> wrote:
>Yes. Also note that Ms. Rather has stated in the past that the count isn't
>required to be a character in size for Forth. IIRC, she suggested a word
>(16-bits) be used for a the count of a counted string.
In order to test the ANS model, JaxForth used 16 bit characters. As
a consequence, the unit of COUNT was 16 bit items on a byte-addressed
machine.
: count ( addr1 -- addr2 len )
dup w@ swap 2 +
;
There is common practice in some Forth shops to use COUNT to step
through memory. To resolve this, and to cope with multi-byte
character sets including UTF-8, the Forth200x document treats the
word "character" as meaning a primitive character, usually a byte,
from which wide characters and multibyte characters are derived.
COUNT now refers to a byte count followed by primitive characters.
Counted strings are just a storage mechanism, in the same way
that zero terminated strings are a storage mechanism. Serious
string libraries work in terms of objects or structures whose
internal format is unlikely to be either 8-bit counted or zero
terminated.
These days, in order to write internationalised applications
for OS X and Windows, the programmer is likely to standardise
on UTF-16. See http://site.icu-project.org/ for an example
library. As a result, the relevance of counted and zero
terminated strings in larger applications is really a kernel
issue rather than an application issue.
Stephen
-- Stephen Pelc, stephen...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
>>Yes. Also note that Ms. Rather has stated in the past that the count isn't
>>required to be a character in size for Forth. IIRC, she suggested a word
>>(16-bits) be used for a the count of a counted string.
>In order to test the ANS model, JaxForth used 16 bit characters. As
>a consequence, the unit of COUNT was 16 bit items on a byte-addressed
>machine.
This works on all standard Forth systems, even on JaxForth, and if
COUNT in JaxForth was defined in high-level, I would be surprised if
it did not use a standard-compliant definition of COUNT.
In any case, COUNT in every Forth standard I know of is required to
use a character-sized count. In Forth-94 characters can be wider than
8 bits, however.
>COUNT now refers to a byte count followed by primitive characters.
Or, more generally, a (p)char count followed by (p)chars.
>These days, in order to write internationalised applications
>for OS X and Windows, the programmer is likely to standardise
>on UTF-16.
UTF-16 is the dead-end extension of UCS-2, which became obsolete with
Unicode 2.0. It is present in some systems that were designed around
1990, like Windows NT and Java, but even there it's not universal.
Even if you have to interface with UTF-16-based Windows API functions,
I would recommend designing your interfaces such that they continue to
work if you switch to UTF-8-based API functions (switching to stuff
like Big5 and GB then is probably no additional effort).
>> > Of course. You're right. Elizabeth too. I really should pay more
>> > attention. Somehow I never made the mental *dis*connect between
>> > counted strings and c-addr u strings, which are very different from
>> > each other.
>> Why? Why are they different? Why should they be different?
>They're different because they're, well... different!
>S" Hello" results in c-addr u
>C" Hello" results in addr and requires COUNT to convert it to c-addr
>u.
>The advantage of the latter is it is more convenient to carry about on
>the stack; you only have to carry the address around, not the address
>*and* the length. When you want the length, COUNT will get it for you.
The main advantage shouldn't be missed. Using c-addr u consistently,
an implementation can interpret a buffer without copying things around
all the time. If you use FIND you're almost obliged to.
Groetjes Albert
-- Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Anton Ertl wrote:
>>These days, in order to write internationalised applications
>>for OS X and Windows, the programmer is likely to standardise
>>on UTF-16.
> UTF-16 is the dead-end extension of UCS-2, which became obsolete with
> Unicode 2.0. It is present in some systems that were designed around
> 1990, like Windows NT and Java, but even there it's not universal.
> Even if you have to interface with UTF-16-based Windows API functions,
> I would recommend designing your interfaces such that they continue to
> work if you switch to UTF-8-based API functions (switching to stuff
> like Big5 and GB then is probably no additional effort).
Given that both Cocoa and Win32 like "zero terminated strings", you better not use their strings as native objects in Forth, but convert on the fly when you call Cocoa or Win32. We don't have that in libcc.fs now, but I suggest we should have a datatype string, which is addr len in Forth, and 0-terminated char* in C, and converted on call/return.
For UTF-16 a string16 type (which still is UTF-8 on the Forth side).
It should be noted that Cocoa strings are a rather complex object, and when you feed data in or get data out, you can select the encoding - both UTF-8 and UTF-16 are first class encodings. Once it's inside Cocoa's string class, you access it through Objective-C methods, and don't care about internal repesentation.
It's a lot better with Xlib. There, strings are represented as addr len entities (yes, *the* addr len you use in Forth anyways, number of bytes/pchars for Utf8, no zero termination needed), with Utf8 as the preferred first-class encoding.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
> "Elizabeth D. Rather" <erat...@forth.com> wrote in message
> news:mdqdnYH8wqQ6BjzNnZ2dnUVZ_rWdnZ2d@supernews.com...
>> On 11/12/12 2:49 PM, Rod Pemberton wrote:
>> ...
>>> BTW, this is the original definition of COUNT:
>>> " COUNT addr1 --- addr2 n L0
>>> Leave the byte address addr2 and byte count n of a
>>> message text beginning at addr1. It is presumed that
>>> the first byte at addr1 contains the text byte count and
>>> the actual text starts with the second byte. Typically
>>> COUNT is followed by TYPE."
>>> It's also more accurate. It doesn't assume counted strings.
>> I'm not sure what "original" means in this context. COUNT has been
>> around since 1971, which is 7 years before the first, very preliminary,
>> attempt at standards. In any case, the sentence, "It is presumed that
>> the first byte at addr1 contains the text byte count and the actual text
>> starts with the second byte" is an accurate description of a counted
>> string.
> What does "presumed" mean to you?
From dictionary.com:
pre·sume [pri-zoom] Show IPA verb, pre·sumed, pre·sum·ing.
verb (used with object)
1. to take for granted, assume, or suppose: I presume you're tired after your drive.
2. Law . to assume as true in the absence of proof to the contrary.
In other words, it's a synonym to 'assume', so all definitions of COUNT both assume and presume a counted string format.
Cheers,
Elizabeth
-- ==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
>>> Of course. You're right. Elizabeth too. I really should pay more
>>> attention. Somehow I never made the mental *dis*connect between
>>> counted strings and c-addr u strings, which are very different from
>>> each other.
>> Why? Why are they different? Why should they be different?
> They're different because they're, well... different!
They're different in that a "counted string" is a storage format, and 'c-addr u' is a stack notation providing the address and length of a string independent of its storage format.
Address alone cannot define a string; you need some way to know how long the string is, either by knowing its storage format (e.g., 'counted string' or null-terminated) or by having a length on the stack in addition to the address.
Cheers,
Elizabeth
-- ==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
On Nov 14, 7:15 am, stephen...@mpeforth.com (Stephen Pelc) wrote:
> There is common practice in some Forth shops to use COUNT to step
> through memory. To resolve this, and to cope with multi-byte
> character sets including UTF-8, the Forth200x document treats the
> word "character" as meaning a primitive character, usually a byte,
> from which wide characters and multibyte characters are derived.
> COUNT now refers to a byte count followed by primitive characters.
Even when I was 18 and programming the Vic-20, I knew better than to
use COUNT for stepping through an array of chars. For one thing, it
won't work if chars are assumed to be 2 bytes and/or the count is 2
bytes, which was possible even on a 6502 (especially the count being 2
bytes, although chars were generally always 1 byte in those days). For
another thing, it makes for unreadable code, as the reader has to
wonder: "Why is COUNT being used? What is being counted?". I wrote a
word that did the same thing --- I think I called it c@c+ and there
was also w@w+ that was for stepping through an array of words --- or
something like that. I knew about the concept of abstraction way back
then when most of the Forth community was doing things like using
COUNT to step through char arrays or using 2+ for word arrays on the
assumption that words were inherently 2 bytes in size, and so forth.
Back in the 1980s, there seemed to be a lot of Forthers who didn't
understand basic programming concepts such as abstraction --- and,
that is true today too.
I don't think that I will support counted strings at all in Straight
Forth (I mean, strings with the count stored in the 0'th array
element). I will only support adr,len strings (the address of the char
array and the size of the array on the stack). I have a doubles stack
that is distinct from the parameter stack. This is for double numbers
and also for adr,len strings. I will also have a float stack that is
distinct from the parameter stack and is for floating-point numbers. I
may actually have two float stacks, one for low-precision and one for
high-precision floats. Modern processors have beaucoup registers, so
there is no need to mix data types together on the parameter stack,
which results in a lot of ugly stack-juggling --- each data type will
have its own stack in Straight Forth --- each stack will have a
register dedicated as its stack-pointer.
BTW Stephen --- I downloaded your VFX evaluation Forth system. So far
all I have done is cycle through the "tip of the day." When I get time
however, I will try to compile and run the novice package. If VFX
works correctly, this will be the first time ever. In all of the other
Forth systems that I have tried (SwiftForth, Gforth, Win32Forth and
FICL), doing this revealed bugs in the Forth system, and I had to
rewrite some portion of the novice package to work-around the bug
(FICL had so many problems that I didn't support it, but the others
did get supported).