Advice wanted: "case" command and regexps

7 views
Skip to first unread message

John Ousterhout

unread,
May 6, 1993, 11:43:22 AM5/6/93
to
For Tcl 7.0 I'd like to extend the "case" command to support alternate
forms of matching. Right now each of the patterns is treated as a
glob-style pattern to be matched against string (same matching rules
as "string match"). I'd like to add a switch to allow either regexp-style
matching or exact matches only, but I'm having trouble doing this in
a compatible way. What I've implemented so far is an optional switch
just after the command name:

case -regexp $x ...
case -exact $x ...
case -glob $x ...

If the switch is omitted (case $x ...) then it defaults to -glob to
duplicate the current behavior. However, this introduces some
compatibility problems. If current scripts are left as-is and they
contain commands like

case $x ...

then if $x should just happen to contain "-regexp" it will be interpreted
as the mode switch rather than the string to use for matching, which will
cause the command to misbehave. This won't happen very often but when
it does happen it's likely to cause a lot of confusion. The fact that
it's data-dependent will make this kind of bug hard to track down.

I'm not sure what to do about this problem, hence this message. I can
see four alternatives:

1. Leave things as I've implemented them so far (i.e. the switch is optional,
with the potential problem described above).

2. Make the switch mandatory. This will avoid confusing errors but makes
the change incompatible: all existing case commands will have to be
modified. Except for the compatibility issue this is the approach that
I think would be best.

3. Remove the change, leaving "case" as it is, with glob-style matching only.

4. Move the switch to be at the end of the command: "case $x ... -regexp"
I think this would eliminate compatibility problems but the syntax is
gross and different from what's used elsewhere in Tcl, so this is my
least favorite option.

I'd be interested in hearing from the Tcl/Tk community on this. If you
have any other suggestions than the above 4, please post to the newsgroup.
If you have preference among the four suggestions above (and in particular
if you think it would be a bad idea to do anything incompatible, like #2),
please send your vote to me. I'll report the results back here in a couple
of weeks, and if there's a strong consensus I'll probably follow it.

Roland Schemers

unread,
May 6, 1993, 3:18:10 PM5/6/93
to
In article <1sbbmq$q...@agate.berkeley.edu> you write:
>case -regexp $x ...
>case -exact $x ...
>case -glob $x ...
>

>2. Make the switch mandatory. This will avoid confusing errors but makes


> the change incompatible: all existing case commands will have to be
> modified. Except for the compatibility issue this is the approach that
> I think would be best.

I would probably vote for 2. Either that or could you modify it
so that:

case $x -switch in ..... or
case $x in -switch

In other words make the "in" required to specifiy a switch. I kind of like:

case $x in -regex ...

Better because it sounds more english like and looks more readable.

Are there inherent parsing problems with that?

I really think the added benefit of the switches is worth the hassle,
whatever it may turn out to be :-)

Roland

--
Roland J. Schemers III | Networking Systems |
Systems Programmer | G16 Redwood Hall (415) 723-6740| # ping Elvis
Distributed Computing Group | Stanford, CA 94305-4122 | Elvis is alive
Stanford University | sche...@Slapshot.Stanford.EDU |

Roland Schemers

unread,
May 6, 1993, 3:17:26 PM5/6/93
to

In article <1sbbmq$q...@agate.berkeley.edu> ous...@sprite.Berkeley.EDU (John Ousterhout) writes:
>For Tcl 7.0 I'd like to extend the "case" command to support alternate
>forms of matching. Right now each of the patterns is treated as a
...

>2. Make the switch mandatory. This will avoid confusing errors but makes
> the change incompatible: all existing case commands will have to be
> modified. Except for the compatibility issue this is the approach that
> I think would be best.

I would probably vote for 2. Either that or could you modify it
so that:

case $x -switch in ..... or
case $x in -switch

In other words make the "in" required to specifiy a switch. I kind of like:

case $x in -regex ...

Better because it sounds more english-like and looks more readable.

Ed Gould

unread,
May 6, 1993, 4:09:22 PM5/6/93
to
> If you have any other suggestions than the above 4 ...

Two possibilities come to mind. Both leave the existing "case" as it is,
and augment it thusly:

5. Add xcase ("extended case") that requires the switch to select the
matching style. case would then be a synonym for xcase -glob.

6. Add regexp-case and exact-case (or case-regexp and case-exact; I have
no preference for the name order) as new commands for the additional
functionality. glob-case might be added as well, for completeness,
as a synonym for case.

Neither of these is completely satisfactory, in that they both add to
namespace pollution. Both are, however, completely compatible with
existing code. I prefer (5) to (6).

--
Ed Gould e...@pa.dec.com Digital Equipment Corporation
+1 415 688 1309 Network Systems Lab 250 University Ave, Palo Alto, CA 94301

"Unison is only one form of harmony." -- LW

Paul Eggert

unread,
May 6, 1993, 3:45:32 PM5/6/93
to
Aside from introducing an incompatibility, changes #1 and #2 don't
solve the compatibility problem once and for all like they should.
E.g. suppose Tcl 8.x needs a new -nocase option so that
`case -regexp -nocase $x ...' does case-insensitive regexp matching?
We'll have to go through this whole compatibility rigamarole again.

Here's another syntax proposal that's upwards compatible, supports the
new functionality, and allows for future extensions: use a new keyword
`switch' for the enhanced version of the case command.
`case $x in ...' would be equivalent to `switch -glob -- $x ...'.
The `--' option tells `switch' that the next argument is the string to match,
not an option (even if the argument begins with `-').

The mnemonic could be ``Use `switch' if you want a case with switches.''

It would also be nice to add the `--' option to regexp, regsub, glob, etc.,
so that any command can be applied to a string that happens to look
like an option, without mistakenly interpreting it as that option.

Michael Hoegeman

unread,
May 7, 1993, 4:30:32 AM5/7/93
to
In article <1sbbmq$q...@agate.berkeley.edu> ous...@sprite.Berkeley.EDU (John Ousterhout) writes:
>For Tcl 7.0 I'd like to extend the "case" command to support alternate
>forms of matching. Right now each of the patterns is treated as a
>glob-style pattern to be matched against string (same matching rules
>as "string match"). I'd like to add a switch to allow either regexp-style
>matching or exact matches only, but I'm having trouble doing this in
>a compatible way. What I've implemented so far is an optional switch
>just after the command name:

Here is a suggestion that might keep everyone happy.

-- Make a new command called switch or pick that requires the option field
-- Leave case as it is and just implement it internally as switch -glob
-- Document case as anachronistic and that it should be avoided.
(or take it out the documentation completely).

This scheme leaves 'case' unchanged but allows the new (and very desireable!)
features to be added.

If you don't find this acceptable, I would then vote to make the
new version of the case command require the option field. A script could
be made that could upgrade most tcl scripts with a minimum of hassle.

The optional switch is least desirable way to go i think. it is too
fragile for me. Having -glob -regex and -exact magically excluded from
being used as a first argument to case could break things in very
frustrating ways. I would rather have case predictably incompatible.

Thanks for listening (whatever you decide).

-Mike Hoegeman
--
------------------------------------------------------------------------------
Mike Hoegeman email: m...@wx.gtegsc.com tel: (818)706-4145
GTE Weather Systems Group 31717 La Tienda Dr, Westlake Village CA. 91359

Larry W. Virden

unread,
May 7, 1993, 6:29:58 AM5/7/93
to

> It would also be nice to add the `--' option to regexp, regsub, glob, etc.,
> so that any command can be applied to a string that happens to look
> like an option, without mistakenly interpreting it as that option.

This I think is an important point. Why not, if we are fixing Tcl syntax
for a long period of time, we should add support for this type of idea
everywhere switches and data can be confused.

That way, if Extended Tcl, Tcl DP, or one of the other 10,000,000
extensions need to augment the command with a flag, folk will at least
have been warned how they should code their commands.
--
:s
:s Larry W. Virden INET: lvi...@cas.org
:s Personal: 674 Falls Place, Reynoldsburg, OH 43068-1614

Marc R. Ewing

unread,
May 7, 1993, 9:40:44 AM5/7/93
to
I vote for #1 - switch optional w/ potential problem.

I'd say that this is the "right" way to do it. Also, the
incompatibility mentioned is not a difficult one to handle.

There is an option #5. Instead of using a switch, or in addition to
using a switch on the standard "case", you could provide:

case-regexp
case-exact
case-glob

as commands, which do not respond to -regexp, -exact, or -glob
switches.

-Marc

Don Libes

unread,
May 8, 1993, 2:17:48 AM5/8/93
to
In article <1sbbmq$q...@agate.berkeley.edu> ous...@sprite.Berkeley.EDU (John Ousterhout) writes:
> a compatible way. What I've implemented so far is an optional switch
> just after the command name:
>
> case -regexp $x ...
>
> If the switch is omitted (case $x ...) then it defaults to -glob to
> duplicate the current behavior. However, this introduces some

That's the same route I took with the expect command except that I
used "-re" instead of "-regexp". At the time, I considered the longer
spelling too long - but that's because patterns are so frequent in
Expect scripts. One is constantly writing "-re". In other Tcl
applications, writing "-regexp" might not be so tiresome.

Note that Expect has supported "-re" since October '91. No one has
ever suggested that "-re" was the wrong way to go, so I'd vote for #1
or perhaps amended to use "-re". (Ok, everyone, now you can tell me
that I've been misguided for a year and a half.)

I agree that "--" sounds like a great idea.

Don

Scott Hess

unread,
May 10, 1993, 8:53:16 AM5/10/93
to
In article <1sbbmq$q...@agate.berkeley.edu>,

ous...@sprite.Berkeley.EDU (John Ousterhout) writes:
>For Tcl 7.0 I'd like to extend the "case" command to support alternate
>forms of matching. Right now each of the patterns is treated as a
>glob-style pattern to be matched against string (same matching rules
>as "string match"). I'd like to add a switch to allow either
>regexp-style matching or exact matches only, but I'm having trouble
>doing this in a compatible way.

It looks like you want to have it affect the entire case command.
Could there be an argument for per-case differentiation? Also,
though I see the need for -exact and -regexp style matching, I don't
see that there's so much need for -glob style. I mean, the regexp
matching in TCL seems to be faster than glob in most cases, anyhow,
and regexp is certainly a superset in terms of functionality.

There also may be arguments for greater revision of the syntax.
For instance, it's often inconvenient to have to ponder what escaping
is required so that your patterns can contain spaces. I'm not
certain I've ever had a really good use for the pattern-list syntax.
Usually, when I want to match multiple entries, I'll either use *
or [], else I have to use something more flexible like regexp.

Perhaps two commands are in order, the plain old case, and another
command called regcase or somesuch. Or, perhaps there could be a
flag set that indicates how case works. If you don't set the flag,
case remains the same, but if you do set it it changes how case
acts. Procs could, of course, set it in their scope and it would
work as expected. After all, I doubt that most people will mix
glob and regexp style cases in their code, in the interests of not
having to switch mental gears too often.

It might also be interesting to somehow open up the case command
so that other packages can hook into it. For instance, expect's
expect command is alot like case, just with a different source of
input data. So's interact. I can think of a couple more possible
uses. For instance, perhaps a case that would take a list of
possibilities and return the index of the match (that's lsearch).
Or it could return a list of the matches (that's glob - glob's sort
of like expect, case with a different source of input data). If
the match list is formatted correctly, it can allow execution of
code fragments on matches. This would all probably argue for an
alternative command that is more flexible, and the current case
would just be a wrapper around it.

I do like the another poster's suggestion about having -- to indicate
explicit end of parameters in cases like this, though. Also, I'm
starting to think it would be nice to have a TCL C function to
handle commands like "file option ..." and "string option ..." more
cleanly, and this applies to -option handling, too. Something like
getopt, where you give the possible options and the arguments you
got, and the C function spits back what options you got and what
arguments are left. Handling of -- could be folded into such a
function relatively easily.

[In fact, perhaps it'd be interesting to have explicit support for
commands with subcommands, so that TclX and other packages don't
have to do stuff like implement infox, but could just latch their
subcommands onto the info command. You could register C functions
for explicit "command subcommand" pairs, and also a backdrop function
for "command" and "command *" where * doesn't match any registered
subcommands. But I am rambling far afield.]

Later,
--
scott hess <sh...@ssesco.com> <To the BatCube, Robin>
12901 Upton Avenue South, #326 Burnsville, MN 55337 (612) 895-1208 Anytime!

Michael Halle

unread,
May 10, 1993, 7:59:08 PM5/10/93
to

Here's some food for argument.

I absolutely agree that a clean, consistent way of handling argument
parsing would greatly improve Tcl and Tk. Embedded languages are
useful in part because they take the burden of parsing off of the
programmer. However, argument parsing is restricted to command
names in Tcl. Further parsing and keying off other arguments is
inconsistent and ad hoc for all commands.

One place the parsing problem shows up repeatedly is in "object style"
Tcl coding, such as that used in Tk. Each widget has code like the
following to parse its "object command" args:

len = strlen(s);
if(*s == 'c' && strncmp(s, "clear", len) == 0){
...
} else if(*s == 'c' && strncmp(s, "clobber", len) == 0){
...
}

Not only does this lead to inconsistent parsing, it is error prone.
More than once I've added a new object command and forgot to change
the "*s == 'c' part, so the test failed. A consistent, modular
parsing scheme would greatly improve this situation, and the code that
implemented it could be optimized to centrally provide efficient
parsing. It would also allow changes in policy like "no abbreviations
allowed" to be made for subcommands as they can be for commands.

Of course, that's software philosophy, not software engineering, and I
haven't yet been able to think of any way to provide a Tcl-like
mechanism for parsing. "getopt"-like solutions are straightforward,
but are essentially string-switch statements; is there an interpreter
or hash-table based approach that offers more flexibility or is more
like a recursive interpreter call? How would such an idea fit with
previously-discussed ideas for multiple interpreters or encapsulated
scope? Can it be clean *and manageable*?

It's a hard problem, for sure. Ideas?

--Michael Halle
Spatial Imaging Group
MIT Media Lab
hal...@media.mit.edu

Scott Schwartz

unread,
May 10, 1993, 9:41:18 PM5/10/93
to
hal...@media.mit.edu (Michael Halle) writes:
| if(*s == 'c' && strncmp(s, "clear", len) == 0){
|
| Not only does this lead to inconsistent parsing, it is error prone.

Agreed. But this at least is easy to improve upon. From cnews/h/news.h:

/* STREQN is an optimised strncmp(a,b,n)==0; assumes n > 0 */
#define STREQN(a, b, n) ((a)[0] == (b)[0] && strncmp(a, b, n) == 0)

Louis A. Mamakos

unread,
May 10, 1993, 11:17:15 PM5/10/93
to
Why not use TCL's hashtable for sub-commands, just like TCL does to
manage commands? It is the same mechanism that TCL uses to manage its
commands.

[Begin slight digression into Objective-C land.. bear with me]

That's what I do in the Objective-C based program that I use TCL in.
The base class (TCLObj, subclass of Object) of an object has a certain
set of subcommands (the "name" of the object is the TCL command name).
This base class has TCL subcommands like "hashstats", and "list" to
manage the TCL subcommands. It also implements Objective-C methods
like:

- createSubCommand:(char *)name for:(SEL)selector;

which maps a subcommand name to an Objective-C method. The methods
look like:

- (int) TCLcmdHashStats:(Tcl_Interp *)interp argc:(int)argc args:(char **)argv;

Each subclass can easily add its own commands, and they just get stuck
into the hashtable. The TCL commands get created when the object's
init: method is invoked. It works great and is dynamically
extensible: a big advantage if you load code/classes at run time.

Louis Mamakos

Thomas A Fine

unread,
May 10, 1993, 11:42:26 PM5/10/93
to

At this point I'm in a pretty confused state. Are you all trying to
tell me that strncmp does something BESIDES comparing the first character
of each string first? Because I can't imagine how this could possibly
be an optimization.

Certainly I could see it if this was used for some sort of hashing:

if(*s == 'c') {
if (strncmp(s,"clear",len) == 0) {
foo();
} else if (strncmp(s,"cloak",len) == 0) {
bar();
} else ...

} else if (*s == 'd') {
...

But I don't get the impression that that is the case.

tom

Juergen Wagner

unread,
May 11, 1993, 1:36:04 AM5/11/93
to
Argument passing can be made arbitrarily complex (cf. the find(1)
command). In my view, an embedded language should strive for a high
degree of flexibility, not a plethora of argument syntax forms. LISP
is a nice example of a language allowing arbitrary syntactic
constructs for argument lists (leaving the parsing to the called
function), and yet providing the most basic ones (required, optional,
and keyworded arguments). As for Tcl, I would strongly recommend not
to go further than LISP in interpreting argument lists. An embedded
(!) language shouldn't attempt to be a kitchen sink with respect to
argument parsing. It should only provide the basics.

Personally, I use the following function to be able to use a mixture
of required and keyworded arguments on Tcl functions (optional
arguments are already provided):

proc args {args} {
upvar args arguments

foreach i $args {
uplevel [list set [lindex $i 0] [lindex $i 1]]
}

while { $arguments != {} } {
set key [lindex $arguments 0]
set value [lindex $arguments 1]
set arguments [lrange $arguments 2 end]

set found 0
foreach s $args {
if { "-[lindex $s 0]" == $key } {
uplevel [list set [lindex $s 0] $value]
set found 1
break
}
}
if { $found == 0 } {
puts stderr "\nUnknown option: $key ($value)"
puts stderr " ** Args $arguments"
puts stderr " ** Opts: $args"
puts stderr " ** Cmd: [info level -1]"
catch {puts stderr " : [info level -2]"}
puts stderr " -----"
}
}
return {}
}

An example of its typical use would be something like

proc new(message) {name args} {
global font color

args text {relief sunken} {textfont text} {aspect 200} layout

message $name \
-aspect $aspect -relief $relief \
-borderwidth 2 -padx 5 -pady 5 \
-background $color(bg) -foreground $color(fg) \
-font $font($textfont) -justify left \
-text $text

layout(window) $name $layout
}

The function "new(message)" takes one required argument, "name", and
the options "-text", "-relief" (default "sunken"), "-textfont"
(default "text"), "-aspect" (default 200), and "-layout". The values
supplied in a call are bound to the respective variables whose names
are identical to option names, except for the leading dash.

Although I'm not quite happy with the current argumnet parsing in Tcl
("args" being somewhat magic, and no support for keyworded argument
lists), special functionality can be provided where needed. If more
syntactic sugar is called for, a procedure definition "defun" could be
defined:

proc defun {name arglist body} {
eval [list proc $name {args} \
[concat [concat {args} $arglist] ";" $body]]
}

(This, of course, is just a crude attempt, ignoring any required or
optional arguments.)

To make it short: please keep the syntax of Tcl as little
sophisticated as possible, while providing easy means of extending it.
It might be useful to add something like the above keyworded list
parsing routine to Tcl. On the other hand, as you can see, it can be
done very easily in Tcl itself (and performance isn't the problem
here).

Greetings,
--Juergen


J_Wa...@iao.fhg.de
gan...@csli.stanford.edu

Larry W. Virden

unread,
May 11, 1993, 8:16:06 AM5/11/93
to
If someone wants argument parsing procedures more than Tcl/Tk has,
the parseargs package provides a common interface across quite a few
languages...

Michael Halle

unread,
May 11, 1993, 9:01:16 AM5/11/93
to

Testing the first characters of two strings for equality before
calling strcmp() is always "legal" with any length non-NULL string,
avoids the cost of the function call if the simple test fails,
and costs very little additional if the strings are in fact equal.

--Mike


Joe Armstrong

unread,
May 12, 1993, 5:49:39 AM5/12/93
to
Im trying to run the TCL windows port (i.e. w_tclbin.zip). I have
followed the readme.txt instructions and installed TCL but when I click on
the TCL icon to start everything I get a prompter box up saying something
like:

usage: entry <command> <dll> <entry> <language> <return> { ....}


and an OK button. When I click on OK TCL terminates.

help - anbody - pleeeeeeeeeeeeeese


Joe



Norm MacNeil

unread,
May 12, 1993, 8:16:17 AM5/12/93
to


I had that exact same problem. It turns out that there is a typo in one of the
files included in the bundle. I can't remember which file it is (init.tcl?)
but there is a file where there is just the word "entry". Anyway, it's about
the 10th line in the file so it's easy to see. Since this word is just by
itself, I wonder if there is some "corruption" in the file although I haven't
noticed any degradation in the application.

--
Norm.

+-----------------------------------------------------------------------+
Norm MacNeil Phone: (613) 763-3372
Data Systems Fax: (613) 765-2854
Bell-Northern Research Ltd. EMail: no...@bnr.ca (INTERNET)
#include <disclaimer.std> "Roller bladers do it in-line!"

John Ousterhout

unread,
May 17, 1993, 4:21:36 PM5/17/93
to
After reading through all the responses to my request for advice
on the "case" command, I've decided that the best solution is to
leave the current "case" command alone so that there are no
compatibility problems. Tcl 7.0 will contain a new "switch" command
that allows different forms of matching, and you will *have* to say
which form of matching you want so that there's no ambiguity. The
"case" command will continue to be supported, but it will become
deprecated and I may drop it from the documentation to discourage
its use.

Since "switch" is going to be a new command, there's no reason why
its syntax has to be the same as the current "case" command. I'm
considering the possibility of changing the patterns from pattern
lists to single patterns. For example, where you can now say

case $x {*a *b} foo ...

and foo will be executed if $x matches either *a or *b, you'd have
to say

switch -glob $x *a foo *b foo ...

There are two reasons for this change: (a) I suspect that the list
feature is rarely used, if ever, and (b) the use of lists requires
extra braces in some situations, which can lead to confusion. For
example, if you want a pattern to consist of a single backslash,
you have to say

case $x {\\} ...

in the current "case" command. If switch has single patterns rather
than pattern lists, you'd be able to say

switch -glob $x \\ ...

which is more obvious, I think.

I'd like to get feedback on this proposed change. If you have an opinion
about whether I should change pattern lists back to single patterns for
the new "switch" command, send me your vote. If the number of "no"s is
a substantial fraction of the number of "yes"es then I'll stick with the
current pattern list approach.

Reply all
Reply to author
Forward
0 new messages