CfV: Text Substitution

Peter Knaggs

unread,

Feb 23, 2010, 6:26:26 AM2/23/10

to

This is actually a poll about how widely the proposal is implemented
and how popular it is among the programmers. It is called a CfV
(call-for-votes) because the process is inspired by the Usenet Rdf/CfV
process.

You find the actual ballot further down (look for "VOTING PROCEDURE"),
after the proposal on which you vote.

PROPOSAL

Text substitution proposal

Revised: 16 September 2009

Author:
Stephen Pelc, MicroProcessor Engineering, ste...@mpeforth.com
Input from:
Peter Knaggs, Leon Wagner, Bernd Paysan, Anton Ertl

Contact:
Stephen Pelc
MicroProcessor Engineering
133 Hill Lane
Southampton SO15 5AF
England

Tel: +44 (0)23 80631441
Fax: +44 (0)23 80339691
Net: ste...@mpeforth.com
Web: http://www.mpeforth.com

Rationale
=========

Why do this?
------------
Text substitution is useful for a number of applications,
including:
1)Directory handling
2)Configuration management
3)Localisation

Many applications need to be able to perform text substitution,
for example:

Your balance at <time> on <date> is <currencyvalue>.

We can provide for these requirements by defining a text
substitution facility. For example, we can provide an initial
string in the form:

Your balance at %time% on %date% is %currencyvalue%.

In this example, % is used as delimiters for the substitution
name. The text "time", "date" and "currencyvalue" are text
substitutions, where "time" and "date" insert the current time
and date, and "currencyvalue" inserts the top item on the stack
as a string in the active currency.

Implementation
--------------
Implementation of SUBSTITUTE may be considered as being equivalent
to a wordlist which is searched. If the substitution name is
found, the word is executed to return a string that replaces the
substitution name and delimiters. Such words can be deferred or
multiple wordlists can be used. The implementation techniques
required are similar to those many implementers have used with
ENVIRONMENT?. The range of functions required can be handled by
standardising substitution names. Practical use indicates that
implemention with a wordlist eases adding substitutions that
perform run-time manipulations, e.g. time and date.

This proposal is derived from an implementation used in a
Forth system for more than ten years. The % character was
chosen as the default delimiter because it is an illegal character
in DOS and Windows file and path names. By choosing an illegal
character, substitutions in file name handling can very easily be
implemented, even at the level of OPEN-FILE and CREATE-FILE.

There is no provision for changing the delimiter character because
it causes difficulties when porting code and can have dangerous
side effects at run-time, especially when delimiter usage varies
in different source libraries. Similarly, use of separate leading
and trailing delimiter characters increases the complexity of some
substitutions. The simplest choice is to use one fixed character.
Whatever character is chosen, it will cause problems somewhere. As
with the whole standard document, nothing in this proposal
prohibits a system from providing system-specific extensions.

Usage
-----

Translation of a sentence or message from one language to another
may result in changes to the displayed parameter order. For
example, the substitution in

Your balance at %time% on %date% is %currencyvalue%.

requires a different order in Russian.

When an application provides substitution of parameters on the
data stack by SUBSTITUTE and uses substitutions to access these
items, it is recommended that items not be destroyed but instead
be removed after execution of SUBSTITUTE. For example, the
previous string could be redone as:

Your balance at %$4% on %$2% is %i0%.

where the substitution $x takes the caddr/len string at stack
depth x, and the substitution ix takes a signed integer. The
depths are defined after the parameters to SUBSTITUTE itself have
been removed. The word SUBSTITUTE would then be called as:

$time $tlen $date $dlen val src srclen dest dlen SUBSTITUTE

Proposal
========
The following three words are defined to handle substitution of
text, and output of application data such as date, time, currency,
path and so on.

17.6.2.aaaa REPLACES "replaces" STRING EXT

Execution: c-addr1 len1 c-addr2 len2 --

Set the string defined by c-addr1 and len1 as the text to substitute
for the substitution named by c-addr2 and len2. If the substitution
does not exist it is created. The program may then reuse the buffer
c-addr1/len1 without affecting the definition of the substitution.

Ambiguous conditions occur as follows
1) The substitution cannot be created.
2) The name of a substitution contains a delimiter character.

17.6.2.bbbb SUBSTITUTE "substitute" STRING EXT

Execution: c-addr1 len1 c-addr2 len2 -- c-addr2 len3 n

Perform substitution on the string at c-addr1 and len1 placing the
result at string c-addr2 and len2, returning c-addr2 and len3, the
length of the resulting string. An ambiguous condition occurs if
the resulting string will not fit into c-addr2/len2 or if c-addr2
is the same as c-addr1. The return value n is positive (0..+n) on
success and indicates the number of substitutions made. A negative
value for n indicates that an error occurred, leaving c-addr2 and
len3 undefined. Substitution occurs from the start of c-addr1 and
len1 in one pass and is non-recursive.

When a substitution name surrounded by '%' (ASCII 0x25) delimiters
is encountered by SUBSTITUTE, the following occurs:
1) If the name is null, a single delimiter character is substituted,
i.e. %% is replaced by %.
2) If the name is a valid substitution name, the leading and
trailing delimiter characters and the enclosed substitution name
are replaced by the substitution text.
3) If the name is not a valid substitution name, the name with
leading and trailing delimiters is passed unchanged to the output.

17.6.2.cccc UNESCAPE "unescape" STRING EXT

Execution: c-addr1 len1 c-addr2 -- c-addr2 len2

Replace each '%' character in the input string c-addr1/len1 by two
'%' characters. The output is represented by c-addr2/len2. The
buffer at c-addr2 must be big enough to hold the unescaped string.

Reference implementation
========================
decimal

[undefined] bounds [if]
: bounds \ addr len -- addr+len addr
over + swap ;
[then]

[undefined] -rot [if]
: -rot \ a b c -- c a b
rot rot ;
[then]

[undefined] place [if]
: place \ c-addr1 u c-addr2 --
\ Copy the string described by c-addr1 u as a counted string at
\ the memory address described by c-addr2.
2dup 2>r
1 chars + swap move
2r> c! ;
[then]

char % constant delim
\ Character used as the substitution name delimiter.

wordlist constant wid-subst
\ Wordlist ID of the wordlist used to hold substitution names
\ and replacement text.

256 buffer: Name \ -- addr
\ Scratch buffer to hold substitution name as a counted string.
variable DestLen \ -- addr
\ Maximum length of the destination buffer.
2variable Dest \ -- addr
\ Holds destination string current length and address.
variable SubstErr \ -- addr
\ Holds zero or an error code.

[defined] VFXforth [if] \ VFX Forth
: makeSubst \ caddr len -- caddr
\ Given a name string create a substution and storage space.
\ Return the address of the buffer for the substitution text.
\ This word requires carnal knowledge of the host Forth.
\ Some systems may need to perform case conversion here.
get-current >r wid-subst set-current
($create) \ like CREATE but takes caddr/len
r> set-current
here 256 allot 0 over c! \ create buffer space
;
[then]

[defined] (WID-CREATE) [if] \ SwiftForth
: makeSubst \ caddr len -- caddr
wid-subst (WID-CREATE) \ like CREATE but takes caddr/len/wid
LAST @ >CREATE !
here 256 allot 0 over c! \ create buffer space
;
[then]

: findSubst \ caddr len -- xt flag | 0
\ Given a name string, find the substitution. Return xt and flag
\ if found, or just zero if not found. Some systems may need to
\ perform case conversion here.
wid-subst search-wordlist
;

: replaces \ text tlen name nlen --
\ Define the string text/tlen as the text to substitute for the
\ substitution named name/nlen. If the substitution does not
\ exist it is created.
2dup findSubst if
nip nip execute \ get buffer address
else
makeSubst
then
place \ copy as counted string
;

: addDest \ char --
\ Add the character to the destination string.
Dest @ DestLen @ < if
Dest 2@ + c! 1 chars Dest +!
else
drop -1 SubstErr !
then
;

: formName \ caddr len -- caddr' len'
\ Given a source string pointing at a leading delimiter, place
\ the name string in the name buffer.
1 /string 2dup delim scan >r drop \ find length of residue
2dup r> - dup >r Name place \ save name in buffer
r> 1 chars + /string \ step over name and trailing %
;

: >dest \ caddr len --
\ Add a string to the output string.
bounds
?do i c@ addDest 1 chars +loop
;

: processName \ -- flag
\ Process the last substitution name. Return true if found,
\ 0 if not found.
Name count findSubst dup >r if
execute count >dest
else
delim addDest Name count >dest delim addDest
then
r>
;

: substitute \ src slen dest dlen -- dest dlen' n
\ Expand the source string using substitutions. Note that this
\ version is simplistic, performs no error checking, and requires
\ a global buffer and global variables.
Destlen ! 0 Dest 2! 0 -rot \ -- 0 src slen
0 SubstErr !
begin
dup 0 >
while
over c@ delim <> if \ character not %
over c@ addDest 1 /string
else
over 1 chars + c@ delim = if \ %% for one output %
delim addDest 2 /string \ add one % to output
else
formName processName
if rot 1+ -rot then \ count substitutions
then
then
repeat
2drop Dest 2@ rot SubstErr @
if drop SubstErr @ then
;

: unescape \ c-addr1 len1 c-addr2 -- c-addr2 len2
\ Replace each '%' character in the input string c-addr1/len1 by
\ two '%' characters. The output is represented by caddr2/len2.
\ If you pass a string through UNESCAPE and then SUBSTITUTE,
\ you get the original string.
dup 2swap over + swap ?do
i c@ [char] % =
if [char] % over c! 1+ then
i c@ over c! 1+
loop
over -
;

Tests
=====
create tb 256 allot \ -- addr
\ Buffer for text.
create db 256 allot \ -- addr
\ destination buffer for text.

: >tb \ caddr len -- caddr' len
\ Place string in TB, and return the string. Done
\ this way to avoid problems with transient regions.
tb place tb count
;

: .sub \ caddr len n --
\ Display the result of a substitution.
cr . ." Substitutions, result:" type ." :"
;

: tsub \ caddr len --
\ Run the substitution text and display the results.
db 256 substitute .sub
;

s" hello" >tb s" hl" replaces
s" world" >tb s" wld" replaces

s" Start: %hl%,%wld%! :End" tsub
s" Hello, world!" tsub
s" aaa%foobar%bbb" tsub
s" aaa%%bbb" tsub

Change history
==============
16 September 2009
Revised definition of SUBSTITUTE to reduce the number of
ambiguous definitions.
Added more test cases.

9 September 2009
Incorporated changes from Leon Wagner to UNESCAPE.

4 September 2009
Added UNESCAPE.

6 April 2009
Reworked rationale.
Specified delimiter in SUBSTITUTE.

2 April 2009
Added reference implementation and test code

26 March 2009
Extracted from the internationalisation proposal.

VOTING INSTRUCTIONS
===================

Fill out the appropriate ballot(s) below and mail it/them to
<vo...@forth200x.org>. Your vote will be published (including your
name (without email address) and/or the name of your system) on
<http://www.forth200x.org/substitute.html>. You can vote (or change
your vote) at any time, and the results will be published there.

Note that you can be both a system implementor and a programmer, so
you can submit both kinds of ballots.

Ballot for systems

If you maintain several systems, please mention the systems separately
in the ballot. Insert the system name or version between the brackets.
Multiple hits for the same system are possible (if they do not
conflict).

[ ] conforms to ANS Forth.
[ ] already implements the proposal in full since release [ ].
[ ] implements the proposal in full in a development version.
[ ] will implement the proposal in full in release [ ].
[ ] will implement the proposal in full in some future release.
[ ] There are no plans to implement the proposal in full in [ ].
[ ] will never implement the proposal in full.

If you want to provide information on partial implementation, please do
so informally, and I will aggregate this information in some way.

Ballot for programmers

Just mark the statements that are correct for you (e.g., by putting an
"x" between the brackets). If some statements are true for some of your
programs, but not others, please mark the statements for the dominating
class of programs you write.

[ ] I have used (parts of) this proposal in my programs.
[ ] I would use (parts of) this proposal in my programs if the systems
I am interested in implemented it.
[ ] I would use (parts of) this proposal in my programs if this
proposal was in the Forth standard.
[ ] I would not use (parts of) this proposal in my programs.

If you feel that there is closely related functionality missing from the
proposal (especially if you have used that in your programs), make an
informal comment, and I will collect these, too. Note that the best time
to voice such issues is the RfD stage.

CREDITS

Proponent: Stephen Pelc
Votetaker: Peter Knaggs

--
Peter Knaggs

Graham Smith

unread,

Feb 23, 2010, 10:43:44 AM2/23/10

to

In message <op.u8kveci7su5d0p@david>, Peter Knaggs <p...@bcs.org.uk>
writes

>
>Translation of a sentence or message from one language to another
>may result in changes to the displayed parameter order. For
>example, the substitution in
>
> Your balance at %time% on %date% is %currencyvalue%.
>
>requires a different order in Russian.
>
>When an application provides substitution of parameters on the
>data stack by SUBSTITUTE and uses substitutions to access these
>items, it is recommended that items not be destroyed but instead
>be removed after execution of SUBSTITUTE. For example, the
>previous string could be redone as:
>
> Your balance at %$4% on %$2% is %i0%.
>
>where the substitution $x takes the caddr/len string at stack
>depth x, and the substitution ix takes a signed integer. The
>depths are defined after the parameters to SUBSTITUTE itself have
>been removed. The word SUBSTITUTE would then be called as:
>
>$time $tlen $date $dlen val src srclen dest dlen SUBSTITUTE
>

Hmm!

I think don't like this word which may or may not 'use' things at
unknown positions on the stack. Unfortunately I cannot think of an
alternative. Could it be that this rather complicated 'solution' could
be formulated in easier ways?

SUBSTITUTE does two different things:
1) replaces text as defined in some list which has a predictable stack
effect, and
2) replaces text with unknown text specified at unknown positions on the
stack.

This word is crying out to be at least factored!

How about SUBST for the well defined action, and SUBST(DANGER) or
SUBST(RAND) for the other?

>
>17.6.2.bbbb SUBSTITUTE "substitute" STRING EXT
>
> Execution: c-addr1 len1 c-addr2 len2 -- c-addr2 len3 n
>
> Perform substitution on the string at c-addr1 and len1 placing the
> result at string c-addr2 and len2, returning c-addr2 and len3, the
> length of the resulting string. An ambiguous condition occurs if
> the resulting string will not fit into c-addr2/len2 or if c-addr2
> is the same as c-addr1. The return value n is positive (0..+n) on
> success and indicates the number of substitutions made. A negative
> value for n indicates that an error occurred, leaving c-addr2 and
> len3 undefined. Substitution occurs from the start of c-addr1 and
> len1 in one pass and is non-recursive.
>
> When a substitution name surrounded by '%' (ASCII 0x25) delimiters
> is encountered by SUBSTITUTE, the following occurs:
> 1) If the name is null, a single delimiter character is substituted,
> i.e. %% is replaced by %.
> 2) If the name is a valid substitution name, the leading and
> trailing delimiter characters and the enclosed substitution name
> are replaced by the substitution text.
> 3) If the name is not a valid substitution name, the name with
> leading and trailing delimiters is passed unchanged to the output.

But now no mention of %$n% and related. What have I missed?

>
>
>17.6.2.cccc UNESCAPE "unescape" STRING EXT
>
> Execution: c-addr1 len1 c-addr2 -- c-addr2 len2
>
> Replace each '%' character in the input string c-addr1/len1 by two
> '%' characters. The output is represented by c-addr2/len2. The
> buffer at c-addr2 must be big enough to hold the unescaped string.
>

No! No! No!

To my mind ESCAPE means "translate escaped characters in a string
returning the result". (This word is conspicuously absent from the
Escaped Strings proposal).

UNESCAPE is the reverse of ESCAPE.

These two word allow me to write text containing escape characters to
disk (as text files) and read them back again later. They allow me to
communicate with other programs written in other languages.

The above definition of UNESCAPE does NOT do what it sounds like. If you
want a "reverse substitution" of some sort, say so! E.g. RSUBST.

UNESCAPE in defined as above for me is a code breaker, and I cannot
think of any reason why this name was chosen!

Graham Smith

--
E-mail: Remove X's and underscores from X_gra...@tectime.com

Bruce McFarling

unread,

Feb 23, 2010, 11:56:12 AM2/23/10

to

On Feb 23, 10:43 am, Graham Smith <w...@tectime.com> wrote:
> I think don't like this word which may or may not 'use' things at
> unknown positions on the stack. Unfortunately I cannot think of an
> alternative. Could it be that this rather complicated 'solution' could
> be formulated in easier ways?

The proposal is only for the part of the existing implementation that
uses predefined search strings. There is nothing in the text of the
proposal regarding special leading characters that trigger text
substitutions from the stack, even though that is, of course, part of
the original implementation.

So as I said during one of the previous RfD, the "usage" needs to be
updated to conform with the text of the proposal.

So to be specific:

> SUBSTITUTE does two different things:
> 1) replaces text as defined in some list which has a predictable stack
> effect, and

> 2) replaces text with unknown text specified at unknown positions on the
> stack.

SUBSTITUTE is required to do (1). (2) is a extension of the proposed
standard SUBSTITUTE. Looking at the text of SUBSTITUTE, it is not
clear that (2) is in fact compatible with the proposed language.

> This word is crying out to be at least factored!

The proposed SUBSTITUTE is indeed a possible factor for the existing
substitute, since a first factor can do a run that *only* replaced
recognized stack substitutions, and then provided that it leaves %%
alone, the proposed SUBSTITUTE acting on the string would complete the
behavior of the existing word.

But the "usage" in the CfV lines up with the usage of these two
factors used in sequence, not with the usage of the actual proposed
SUBSTITUTE.

> How about SUBST for the well defined action, and SUBST(DANGER) or
> SUBST(RAND) for the other?

SUBSTITUTE is a fine name for the proposed action, its just not the
action described in the "Usage" section.

> >17.6.2.cccc UNESCAPE "unescape" STRING EXT

> No! No! No!

> To my mind ESCAPE means "translate escaped characters in a string
> returning the result". (This word is conspicuously absent from the
> Escaped Strings proposal).

> UNESCAPE is the reverse of ESCAPE.

Precisely my argument in the last RfD - UNESCAPE means to reverse the
action of "ESCAPING" a string. I've got an unescape.sed script that
filters html files and translates Javascript escapes into their
original contents.

If using the INCLUDE:INCLUDED REQUIRE:REQUIRED naming convention, the
word "UNESCAPE" ought to be called "ESCAPED".

Because of likely name clashes, I'd have suggested a more specific
name, but I know that the C/Forth bilingual contingent claim to have
trouble reading ordinary commonplace hyphenated words like STRING-
ESCAPED or whatever.

Anton Ertl

unread,

Feb 24, 2010, 6:10:15 AM2/24/10

to

Graham Smith <w...@tectime.com> writes:
>UNESCAPE in defined as above for me is a code breaker,

What code does it break, and have you mentioned that during the RfDs?

>and I cannot
>think of any reason why this name was chosen!

I proposed the word and did not find a good name, so I proposed
DEESCAPE, and said that I don't insist on the name, only on the
functionality. The idea behind the name DEESCAPE was to demine the
string of any incidential escape characters by escaping these
characters.

I am not sure how it became UNESCAPE, but IIRC in every RfD the name
was criticized. I would have expected it to change for the final RfD
and CfV, but that has not happened.

Anyway, I wonder why people discuss so much about names, and so little
about the functionality. Do we always get the functionality right, or
is it the bike shed effect?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2009: http://www.euroforth.org/ef09/

Graham Smith

unread,

Feb 24, 2010, 10:55:01 AM2/24/10

to

In message <2010Feb2...@mips.complang.tuwien.ac.at>, Anton Ertl
<an...@mips.complang.tuwien.ac.at> writes

>Graham Smith <w...@tectime.com> writes:
>>UNESCAPE in defined as above for me is a code breaker,
>
>What code does it break, and have you mentioned that during the RfDs?

My code.

And I believe I have mentioned UNESCAPE before - but IIRC it was part of
the discussion here about Escaped Strings.

>
>>and I cannot
>>think of any reason why this name was chosen!
>
>I proposed the word and did not find a good name, so I proposed
>DEESCAPE, and said that I don't insist on the name, only on the
>functionality. The idea behind the name DEESCAPE was to demine the
>string of any incidential escape characters by escaping these
>characters.

OK.

>
>I am not sure how it became UNESCAPE, but IIRC in every RfD the name
>was criticized. I would have expected it to change for the final RfD
>and CfV, but that has not happened.
>
>Anyway, I wonder why people discuss so much about names, and so little
>about the functionality. Do we always get the functionality right, or
>is it the bike shed effect?

I see your point.

But naming IS important. In this proposal SUBSTITUTE and UNESCAPE are
related. That is not clear from the names.

My use of ESCAPE and UNESCAPE are related in a way which is (almost)
apparent from the names.

Your use of the word 'functionality' has got me thinking about how
people choose words. I would guess that your choice of the word DEESCAPE
was chosen with thoughts to what the word does - the word's 'mechanics'
if I can put it like that. I might pick a name which reflects my
understanding of the word's 'intent', or the reason it exists. To that
end SUBSTSAFE comes to mind because it would be used to make it safe to
expose a string to SUBSTITUTE.

Bruce McFarling

unread,

Feb 24, 2010, 12:35:18 PM2/24/10

to

On Feb 24, 10:55 am, Graham Smith <X_Graha...@tectime.com> wrote:
> In message <2010Feb24.121...@mips.complang.tuwien.ac.at>, Anton Ertl

> <an...@mips.complang.tuwien.ac.at> writes>Graham Smith <w...@tectime.com> writes:
> >I proposed the word and did not find a good name, so I proposed
> >DEESCAPE, and said that I don't insist on the name, only on the
> >functionality. The idea behind the name DEESCAPE was to demine the
> >string of any incidential escape characters by escaping these
> >characters.

> >I am not sure how it became UNESCAPE, but IIRC in every RfD the name

> >was criticized. I would have expected it to change for the final RfD
> >and CfV, but that has not happened.

> >Anyway, I wonder why people discuss so much about names, and so little
> >about the functionality. Do we always get the functionality right, or
> >is it the bike shed effect?

One element is that the name is a big part of the *point* of the
standardization. Being able to say,

[UNDEFINED] xyz [IF] ... [THEN]

... and a second element is that if the standard picks a name you
aren't using, an implementer is still standard just by doing things
their own way under their own names. So names get the attention both
of people who will use the proposed function and of people who will
not.

> But naming IS important. In this proposal SUBSTITUTE and UNESCAPE are
> related. That is not clear from the names.

> My use of ESCAPE and UNESCAPE are related in a way which is (almost)
> apparent from the names.

> Your use of the word 'functionality' has got me thinking about how
> people choose words. I would guess that your choice of the word DEESCAPE
> was chosen with thoughts to what the word does - the word's 'mechanics'
> if I can put it like that. I might pick a name which reflects my
> understanding of the word's 'intent', or the reason it exists. To that
> end SUBSTSAFE comes to mind because it would be used to make it safe to
> expose a string to SUBSTITUTE.

">SAFE" is the most compact name that conveys that.

"PROTECT-STRING" conveys the meaning if the context is known, but I
guess the C-language speaking and German speaking contingent does not
like plain-english hyphenated words.

Bruce McFarling

unread,

Feb 24, 2010, 12:37:19 PM2/24/10

to

On Feb 24, 6:10 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Anyway, I wonder why people discuss so much about names, and so little
> about the functionality. Do we always get the functionality right, or
> is it the bike shed effect?

I raised the problem of whether the definition of SUBSTITUTE as
proposed was consistent with the "Usage" section, but the naming issue
definitely generated more discussion than whether the proposal was in
fact compatible with the prior experience it was partially
standardizing.

Jerry Avins

unread,

Feb 24, 2010, 1:03:13 PM2/24/10

to

Anton Ertl wrote:

...

> Anyway, I wonder why people discuss so much about names, and so little
> about the functionality. Do we always get the functionality right, or
> is it the bike shed effect?

Bike shed effect? I don't know that allusion.

Jerry
--
Engineering is the art of making what you want from things you can get.
��

Anton Ertl

unread,

Feb 24, 2010, 1:11:17 PM2/24/10

to

Jerry Avins <j...@ieee.org> writes:
>Bike shed effect? I don't know that allusion.

http://en.wikipedia.org/wiki/Bike_shed

Especially related to the present discussion is the point at
<http://en.wikipedia.org/wiki/Bike_shed#Related_principles_and_formulations>
about Wadler's Law.

Jerry Avins

unread,

Feb 24, 2010, 1:37:30 PM2/24/10

to

Anton Ertl wrote:
> Jerry Avins <j...@ieee.org> writes:
>> Bike shed effect? I don't know that allusion.
>
> http://en.wikipedia.org/wiki/Bike_shed
>
> Especially related to the present discussion is the point at
> <http://en.wikipedia.org/wiki/Bike_shed#Related_principles_and_formulations>
> about Wadler's Law.

Thanks. I read that when it was first published, but it didn't make
enough of an impression on me for its name to became canonical.

Bernd Paysan

unread,

Feb 24, 2010, 5:33:22 PM2/24/10

to

Anton Ertl wrote:
> Anyway, I wonder why people discuss so much about names, and so little
> about the functionality. Do we always get the functionality right, or
> is it the bike shed effect?

I'm not sure we got the functionality right. In my implementation of
SUBSTITUTE, I don't directly implement it as suggested - i.e. I don't
directly populate the output buffer with the results, and check for the
size to fit, I rather use my string words and push the output into a
string buffer which grows to the size of the string. I'm not sure if I
did this good enough, though (it better should take a string variable,
expand that in place, and thereby allow recursive substitution).
However, it fits much more to the Forth concept of transforming some
input on the stack as the original proposal, which requires some pre-
allocated string buffer, where the process of substitution can generate
an arbitrary long string, and you don't know how long it will be.

This substitute bases on the general concept of providing a buffer for
the output. Same as READ-LINE or READ-FILE. However, I find words like
SLURP (which reads in a file in total, and cares about the allocated
space) much more suitable - they don't take a buffer, they *provide* the
buffer space necessary.

SUBSTITUTE ( addr1 u1 -- addr2 u2 n )

would be a lot easier to use. You'll have to FREE addr2 later to avoid
memory leaks, though. Or you use lifetime restrictions, like addr2 u2
would live only up to the next SUBSTITUTE, and if you want to preserve
them, you have to save that string away (that's how my $SUBSTITUTE
works).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

J Thomas

unread,

Feb 24, 2010, 7:08:36 PM2/24/10

to

Jerry Avins wrote:

> Anton Ertl wrote:

>> Anyway, I wonder why people discuss so much about names, and so little
>> about the functionality. Do we always get the functionality right, or
>> is it the bike shed effect?
>
> Bike shed effect? I don't know that allusion.

I believe he's talking about something C Northcote Parkinson wrote about.

A budget committee has a hundred million dollar budget to deal with. They
don't ask questions about the twenty million dollar turbines. Anything
they say might make them look stupid, and if they ask questions they
aren't sure they'll understand the answers. The discussion goes very
fast. But given a $4000 bike shed they have opinions. Everybody
understands a bike shed, and they have a clear concept of that amount of
money. Do they really need a bike shed that costs so much? Couldn't they
get by with a cheaper one? Do they need a bike shed at all?

Parkinson wrote about how to find an item that was just the right size to
give committee members the sense that they were doing something. Too big
and they'd feel intimidated and pass it unexamined. Too small and it
would be beneath them and also ignored. But an item that's the right size
will generate enough discussion to fill up the time allotted for the
meeting, and without that there's the chance someone may feel obligated
to ask hard questions despite himself.

The analogy would be that people here tend to avoid big hard problems and
concentrate on things like names which are important enough to be worth
considering but are also pretty easy.

Bruce McFarling

unread,

Feb 24, 2010, 9:37:01 PM2/24/10

to

On Feb 24, 5:33 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:

> I'm not sure we got the functionality right. In my implementation of
> SUBSTITUTE, I don't directly implement it as suggested - i.e. I don't
> directly populate the output buffer with the results, and check for the
> size to fit, I rather use my string words and push the output into a
> string buffer which grows to the size of the string.

Fine, just don't call it SUBSTITUTE. SUBSTITUTE as defined is more
portable to space constrained systems ... and also to working with a
pool of string memory that is resized if it runs short, where the size
of the string may well be the balance of the pool, where freeing the
pool has less risk of memory leaks.

The problem remains that the definition might be phrased in a way that
interferes with existing practice, where parametric replacements from
the stack are supported alongside simple text macros.

idknow

unread,

Feb 24, 2010, 11:38:12 PM2/24/10

to

On Feb 24, 6:10 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

[snip]

> Anyway, I wonder why people discuss so much about names, and so little
> about the functionality. Do we always get the functionality right, or
> is it the bike shed effect?
>
> - anton

Anton, A properly chosen name completely conveys the essence of its
function.

dup swap drop nip tuck squat pike roll pick ; :-)

Using a really good dictionary (print or bits) can help immensely.

Stephen Pelc

unread,

Feb 27, 2010, 10:26:46 AM2/27/10

to

On Tue, 23 Feb 2010 15:43:44 +0000, Graham Smith <w...@tectime.com>
wrote:

>SUBSTITUTE does two different things:
>1) replaces text as defined in some list which has a predictable stack
>effect, and
>2) replaces text with unknown text specified at unknown positions on the
>stack.
>
>This word is crying out to be at least factored!

The point is that the macros *may* be implemented as words which have
non-null stack effects. You are entitled to do this, but you don't
have to do it.

>>17.6.2.cccc UNESCAPE "unescape" STRING EXT
>>
>> Execution: c-addr1 len1 c-addr2 -- c-addr2 len2
>>
>> Replace each '%' character in the input string c-addr1/len1 by two
>> '%' characters. The output is represented by c-addr2/len2. The
>> buffer at c-addr2 must be big enough to hold the unescaped string.
>>
>
>No! No! No!

None of us who worked on the document like the name UNESCAPE.
Alternatives already proposed are:
ESCAPE-MACRO
APPLY-ESCAPE
EMBED-ESCAPE
%-ESCAPE
ESCAPED-TEXT
DOUBLE-%
SAFE-ESCAPES
SUBST-SAFE
Since the proposal describes the '%' character as a delimiter
UNDELIMIT
DOUBLE-DELIMITERS
DUPLICATE-DELIMITERS
SAFE-DELIMITERS

I'll be quite happy to change the name.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Anton Ertl

unread,

Feb 27, 2010, 10:41:58 AM2/27/10

to

steph...@mpeforth.com (Stephen Pelc) writes:
>None of us who worked on the document like the name UNESCAPE.

...

>I'll be quite happy to change the name.

That would be fine with me, if you had done so in the RfD stage.
However, Peter pulled the trigger and posted this as a CfV, so this
proposal is now frozen; i.e., no changing of word names or semantics
for this proposal.

You could do another proposal (and hopefully with enough RfDs and
working the feedback into the revised RfDs to have one that's good
enough, e.g., with a straw poll on the name), and do a CfV on that
other proposal, and then the TC could decide not to incorporate this
proposal and decide to incorporate the other proposal instead.

But I would very much prefer it if proponents did not post CfVs for
proposals that are apparently so immature that they want to retract
them a few days later. A CfV costs effort on the part of the voters,
the votetaker, potentially system implementors, and also incurs other
costs, so it should not be taken lightly.

Peter Knaggs

unread,

Feb 28, 2010, 9:12:53 AM2/28/10

to fort...@yahoogroups.com

On Sat, 27 Feb 2010 15:41:58 -0000, Anton Ertl
<an...@mips.complang.tuwien.ac.at> wrote:
>
> That would be fine with me, if you had done so in the RfD stage.
> However, Peter pulled the trigger and posted this as a CfV, so this
> proposal is now frozen; i.e., no changing of word names or semantics
> for this proposal.

Given the amount of discussion this has provoked I would recommend
withdrawing the CfV and publishing a revised RfD.

> But I would very much prefer it if proponents did not post CfVs for
> proposals that are apparently so immature that they want to retract
> them a few days later. A CfV costs effort on the part of the voters,
> the votetaker, potentially system implementors, and also incurs other
> costs, so it should not be taken lightly.

True, but I have seen no discussion of this RfD on the mail list, thus it
was a good candidate for a CfV.

--
Peter Knaggs

Bruce McFarling

unread,

Feb 28, 2010, 10:56:21 AM2/28/10

to

On Feb 28, 9:12 am, "Peter Knaggs" <p...@bcs.org.uk> wrote:
> True, but I have seen no discussion of this RfD on the mail list, thus it
> was a good candidate for a CfV.

But the points that were raised on clf for prior RfD's, both on the
mismatch between "Usage" and proposal specification and on the naming
of "UNESCAPE", were simply left unresolved.

Two unchecked points on a "To Do" list before issuing a revised RfD,
let alone a CfV.

Bernd Paysan

unread,

Feb 28, 2010, 1:49:21 PM2/28/10

to

Peter Knaggs wrote:
> True, but I have seen no discussion of this RfD on the mail list, thus
> it was a good candidate for a CfV.

Nobody discusses RfDs, they all wait for the CfV to raise their
objections ;-). Or it's just that there are more and less appropriate
times of the year for requesting discussions.

Bruce McFarling

unread,

Feb 28, 2010, 3:49:21 PM2/28/10

to

On Feb 28, 1:49 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:

> Nobody discusses RfDs, they all wait for the CfV to raise their
> objections ;-). Or it's just that there are more and less appropriate
> times of the year for requesting discussions.

QUOTE
From: Graham Smith <w...@tectime.com>
Newsgroups: comp.lang.forth
Subject: Re: RfD Substitute v3
Date: Thu, 17 Sep 2009 10:20:42 +0100
...

I am uneasy with the word UNESCAPE.

As a user of VFX, I have been using its text substitution
capabilities
(which this proposal is based upon I expect), and its "escaped
strings"
words and perhaps my opinion is coloured by my familiarity here. But
to
me the word UNESCAPE strongly suggests something to do with escaped
strings and not text substitution.

In addition, I have extended (high jacked?) the VFX escaped strings
words to give me
Escape \ caddr ulen -- caddr' ulen'
and
UnEscape \ caddr ulen -- caddr' ulen'

which I hope are self explanatory.

Perhaps another name should be used for the UNESCAPE of this
proposal.
How about UNSUBST for instance?

Graham Smith
UNQUOTE

Perhaps the assumption is made that there is a certain *number* of
people who have to chime in, "ditto", "ditto", "ditto", before a
clearly explained problem with an aspect of a proposal that is quite
clearly a problem needs to actually be addressed before the proposal
is issued with a CfV without addressing the issue?

Josh Graham pointed out the fact that the Usage involved use beyond
the scope of the proposal in April 2009 ... while he did not catch the
problem that the specification seems to rule out the actual VFX Forth
usage, clearly a Usage section that does not match up with the
specification indicates that the specification has not received much
consideration as to how it stands up in its own right.

In the brief discussion that followed, PRESUBTITUTE was one of the
alternative names that was suggested, which Josh Graham was fine with.
I was not following clf in September, but would awkward it is at least
informative as opposed to the misleading UNESCAPE.

Michael L. Gassanenko offered a list of names that he preferred to
UNESCAPE. QUOTE:

%->%% ( c-addr1 len1 c-addr2 -- c-addr2 len2 )
%>%% ( c-addr1 len1 c-addr2 -- c-addr2 len2 )
DOUBLE-DELIMITERS ( c-addr1 len1 c-addr2 -- c-addr2 len2 )
DUPLICATE-DELIMITERS
DOUBLE-%
SAFE-DELIMITERS
SAFE-ESCAPES
SUBST-SAFE

:UNQUOTE

The resolution of the discussion that followed the RfD? A CfV with the
same unsatisfactory name choice. The discussion that took place
following the third RfD was simply ignored.

It was also in April that I pointed out how one would have to do the
kind of thing provided for in the Usage example with parametric
substitution ... at this time, I did not catch the problem that the
specification might rule out the VFX Forth usage, but I did not have
an evaluation copy of VFX Forth at the time.

In any event, the assertion "Nobody discusses RfD's" is simply not
true. The RfD's may not have been discussed in the mailing list, but
issues were indeed raised along the way never resolved before the
CfV.

Obviously people on clf do not seem to treat an RfD with as much
*urgency* as a CfV, but then again if the habitual practice is to
refuse to resolve issues if they are raised only on clf, that does not
do much to encourage people to give serious consideration of RfD's on
clf.

Bernd Paysan

unread,

Feb 28, 2010, 5:29:21 PM2/28/10

to

Bruce McFarling wrote:

> On Feb 28, 1:49 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>
>> Nobody discusses RfDs, they all wait for the CfV to raise their
>> objections ;-). Or it's just that there are more and less
>> appropriate times of the year for requesting discussions.

[...]

> Perhaps the assumption is made that there is a certain *number* of
> people who have to chime in, "ditto", "ditto", "ditto", before a
> clearly explained problem with an aspect of a proposal that is quite
> clearly a problem needs to actually be addressed before the proposal
> is issued with a CfV without addressing the issue?

Probably. After all, this follows the IETF principle of rough
consensus. If there's a single voice objecting, it's probably not
enough. The discussion however might be to informal, and maybe we need
a "bug tracker" for more formal tracking of issues with RfCs.

Bruce McFarling

unread,

Feb 28, 2010, 7:59:10 PM2/28/10

to

On Feb 28, 5:29 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Probably. After all, this follows the IETF principle of rough
> consensus. If there's a single voice objecting, it's probably not
> enough.

Except in this case it was multiple voices objecting, one of those
being offered alternative, and accepting one of the alternatives as
superior ... and its only at that point the result of the discussion
is dropped.

Someone just following that discussion who was happy with the same
alternative that was already offered and accepted might be completely
unaware that a chorus of "uh, huh", "sounds good", "yup", etc. is
required for the discussion to actually count.

Perhaps change the name to RfDCoD, "Request for Discussion and a
Chorus of Dittos".

Even if someone does not use google groups to access Usenet, it can
still be used to search inside a specific Usenet Group for a phrase
like "RfD Text Substitution" and yield hooks into the discussion that
took place. That is, after all, how I finally "remembered" the
details, including both the short comment I made and the longer
discussion when I was not following clf.

Stephen Pelc

unread,

Mar 1, 2010, 6:46:15 AM3/1/10

to

On Tue, 23 Feb 2010 11:26:26 -0000, "Peter Knaggs" <p...@bcs.org.uk>
wrote:

>This is actually a poll about how widely the proposal is implemented
>and how popular it is among the programmers. It is called a CfV
>(call-for-votes) because the process is inspired by the Usenet Rdf/CfV
>process.

Nobody likes the name UNESCAPE. To avoid losing time I suggest the
following approach.

1) Vote for the existing proposal
2) Immediately issue a proposal/CfV to change the name
UNESCAPE to DOUBLE-DELIMITERS.

Bruce McFarling

unread,

Mar 1, 2010, 11:43:31 AM3/1/10

to

On Mar 1, 6:46 am, stephen...@mpeforth.com (Stephen Pelc) wrote:
> On Tue, 23 Feb 2010 11:26:26 -0000, "Peter Knaggs" <p...@bcs.org.uk>
> wrote:
>
> >This is actually a poll about how widely the proposal is implemented
> >and how popular it is among the programmers. It is called a CfV
> >(call-for-votes) because the process is inspired by the Usenet Rdf/CfV
> >process.
>
> Nobody likes the name UNESCAPE. To avoid losing time I suggest the
> following approach.
>
> 1) Vote for the existing proposal
> 2) Immediately issue a proposal/CfV to change the name
> UNESCAPE to DOUBLE-DELIMITERS.
>
> Stephen
>
> --

> Stephen Pelc, stephen...@mpeforth.com

> MicroProcessor Engineering Ltd - More Real, Less Time
> 133 Hill Lane, Southampton SO15 5AF, England
> tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
> web:http://www.mpeforth.com- free VFX Forth downloads

Repeat of a comment in the other forum, this time with some suggested
language:

(A) There's still the problem that the VFX Forth parametric text
substitution along the lines of %$4% is nowhere allowed for in any
specification of a valid substitution name.

If its *intended* that SUBSTITUTE allow those, either (i) the leading
characters allowed must be set out, or else, (ii) if its to be search
for the named values first, then make parametric substitution, that
should be specified.

(ii) would be along the lines of:
2) If the name is a valid substitution name defined by REPLACES,

the leading and trailing delimiter characters and the enclosed
substitution name are replaced by the substitution text.

3) If the name is a valid substitution name otherwise supported by
the implementation, the leading and trailing delimiter characters and
the enclosed substitution name are replaced by text generated by the
implementation.

4) If the name is not a valid substitution name, the name with

leading and trailing delimiters is passed unchanged to the output."

(B) It is quicker to simply withdraw the first CfV and issue a new one
in a single action, by saying:

CfV: Text Substitution (amended)

"This is actually a poll about how widely the proposal is implemented
and how popular it is among the programmers. It is called a CfV (call-
for-votes) because the process is inspired by the Usenet Rdf/CfV
process.

This is a replacement to the original CfV that amends a proposed name
and clarifies an ambiguity in the language of the original CfV. The
CfV titled "CfV: Text Substitution" issued on ## February 2010 is
withdrawn."

Stephen Pelc

unread,

Mar 1, 2010, 12:40:23 PM3/1/10

to

On Mon, 1 Mar 2010 08:43:31 -0800 (PST), Bruce McFarling
<agi...@netscape.net> wrote:

>(A) There's still the problem that the VFX Forth parametric text
>substitution along the lines of %$4% is nowhere allowed for in any
>specification of a valid substitution name.

The proposal does not mandate how macros are implemented. If, for
example, you implement macros as words in the SUBSTITUTIONS
vocabulary, you can define a macro to do whatever you want.

Because the proposal does not mandate implementation, it cannot
specify the notation for macros such as the one below.

also susbstitutions definitions
: $4 \ -- caddr len
...
;
previous definitions

Stephen

--
Stephen Pelc, steph...@mpeforth.com

MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691

Anton Ertl

unread,

Mar 1, 2010, 1:03:57 PM3/1/10

to

steph...@mpeforth.com (Stephen Pelc) writes:
>On Tue, 23 Feb 2010 11:26:26 -0000, "Peter Knaggs" <p...@bcs.org.uk>
>wrote:
>
>>This is actually a poll about how widely the proposal is implemented
>>and how popular it is among the programmers. It is called a CfV
>>(call-for-votes) because the process is inspired by the Usenet Rdf/CfV
>>process.
>
>Nobody likes the name UNESCAPE. To avoid losing time I suggest the
>following approach.
>
>1) Vote for the existing proposal

You mean that we should claim that we implement UNESCAPE in our system
or use UNESCAPE in our programs?

>2) Immediately issue a proposal/CfV to change the name
>UNESCAPE to DOUBLE-DELIMITERS.

That would be a proposal for removing an existing extension (in
addition to adding a new one). If lots of people implement UNESCAPE,
as they should in order to do 1), there is no reason to remove this.
What's more, if lots of people just said that they use the proposal
that contains UNESCAPE, it's certainly a bad idea to remove this
feature; in your words, this would disenfrenchise existing users.

If you really want to have a revised proposal, I suggest: Do not vote
on the existing proposal (that saves all of us time, especially me).
Start on a new proposal, and this time don't rush it. Make a straw
poll on the new name for UNESCAPE, and also consider all the other
suggestions that have been made in RfDs for the existing proposal and
that will be made for the new one.

Bruce McFarling

unread,

Mar 1, 2010, 3:15:39 PM3/1/10

to

On Mar 1, 12:40 pm, stephen...@mpeforth.com (Stephen Pelc) wrote:
> Because the proposal does not mandate implementation, it cannot
> specify the notation for macros such as the one below.

However, it specifies that SUBSTITUTE *only* replaces text when it
encounters a valid substitution name.

There is no reason to believe from the text of the proposal that %$4%
is a *valid* substitution name unless someone has used REPLACES to
assign that as a name. And the specification is explicit that if its
not a valid substitution name, its passed on unchanged.

*Being accustomed to using them*, its natural for you take for granted
that they are valid substitution names *in practice*. The question is
whether that existing practice complies with the proposal *as
written*.

The question at hand is not what the proposal *mandates*, but what the
proposal as written *forbids*, which is replacing the text if the word
name is not a valid name.

So its not what implementation is *mandated*, its what implementation
is *permitted*.

SOLUTION?

As long as, eg, a %$4% defined by REPLACES would take precedence over
the default parametric substitution, its relatively easy to "give
explicit permission to do". Just add an explicit permission between
handling words defined by REPLACES and skipping names that are not
recognized.

Bruce McFarling

unread,

Mar 1, 2010, 3:35:43 PM3/1/10

to

On Mar 1, 12:40 pm, stephen...@mpeforth.com (Stephen Pelc) wrote:

> also susbstitutions definitions
> : $4 \ -- caddr len
> ...
> ;
> previous definitions

A different way of putting it is the question:

Where *in the proposal* is SUBSTITUTE allowed to *execute* $4?

(1) Since the only substitution definition that is in the scope of the
proposal is one defined by REPLACES,

(2) and since it nowhere in the proposal explicitly *says* that there
is any other way to define a substitution name,

(3) it could be argued that "valid substitution names" are restricted
to names defined by REPLACES.

Perhaps the simplest fix of the language is to specify that "valid
substitution name" is a category that can in fact extend beyond the
REPLACED process:

"Proposal
========
The following three words are defined to handle substitution of
text, and output of application data such as date, time, currency,
path and so on.

An implementation may provide additional means for defining valid
substitution names."

Aleksej Saushev

unread,

Mar 6, 2010, 7:24:25 AM3/6/10

to

Bernd Paysan <bernd....@gmx.de> writes:

> Bruce McFarling wrote:
>
>> On Feb 28, 1:49 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>>
>>> Nobody discusses RfDs, they all wait for the CfV to raise their
>>> objections ;-). Or it's just that there are more and less
>>> appropriate times of the year for requesting discussions.
> [...]
>> Perhaps the assumption is made that there is a certain *number* of
>> people who have to chime in, "ditto", "ditto", "ditto", before a
>> clearly explained problem with an aspect of a proposal that is quite
>> clearly a problem needs to actually be addressed before the proposal
>> is issued with a CfV without addressing the issue?
>
> Probably. After all, this follows the IETF principle of rough
> consensus. If there's a single voice objecting, it's probably not
> enough. The discussion however might be to informal, and maybe we need
> a "bug tracker" for more formal tracking of issues with RfCs.

There's big difference you ignore when comparing to IETF.

IEIF doesn't write standards.
Not every RFC it issues enters standard.

What you do is basically you cook up a library with a limited use
and instead of recognising that it is just a library, you start
pushing it into standard ignoring anything else.

That's why instead of having standard name spaces and libraries,
you see ugly code incompatible with all previous practice.

Sure, why bother about the rest of the world when its so easy to avoid
possible clashes by properly choosing names?

--
HE CE3OH...