Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

some questions on Tcl_ParseCommand (tclParse.c)

127 views
Skip to first unread message

two...@gmail.com

unread,
Nov 20, 2018, 6:53:21 PM11/20/18
to
I have two questions on the following:

1. Is there a way to “pre-process” a command before parsing
2. Is parse called 2 times for every command that needs parsing


----------------------------------------------------
I’ve been studying the TCL parser, in particular Tcl_ParseCommand (tclParse.c). I was hoping there might be a way to “pre-process” a command and then submit a modified command for parsing and then execution and compiling. I’ve not had success with this.

I’ve done the following,

Renamed tcl_ParseCommand to tcl_ParseCommand0, and then created a wrapper (here’s some pseudo code, full code below)

Tcl_ParseCommand (… args …) {

ret = tcl_ParseCommand0(… same args as called with…);
…. Some fprintf(stderr, ….) debugging info on the parse results
return ret;
}

This seems to work without any problems.

Next, I tried to allocate some memory with ckalloc and copy the input string to that memory and do the call to tcl_ParseCommand0 using the copy. For this test, I don’t free up the memory. This appeared to get into an infinite loop. I don’t think not freeing memory was the cause. I do not get any segment faults.

Could it be because the tokens generated by parsing store string pointers back to the original script, which in my case, was no longer the original, but rather a copy of the script text that the caller had pointed to. Might there be code that looks at the original scripts and expects it not to move, and fails if it does?

Might there be a way to do this that is supported (or reasonable) in tcl?

----------------------------------------------------
Is tcl_ParseCommand always called twice?

What I’m seeing is that the first time through tcl_ParseCommand there’s the command with a newline at the end, and the second time that is shaved off. I’m testing on linux.

Does that seem right?

----------------------------------------------------

Below is the actual code I used.

For the parse dump, I have a static variable to turn it onn, off, and ons (on with a short dump). That’s what the setting of xxx does. These are treated as invalid commands, but I can see them first and set my static variable that way. I test using a terminal.

./tclsh

or

rlwrap –pred ./tclsh

--------------------------------------------------------------------------

int
Tcl_ParseCommand(
Tcl_Interp *interp, /* Interpreter to use for error reporting; if
* NULL, then no error message is provided. */
const char *start, /* First character of string containing one or
* more Tcl commands. */
register int numBytes, /* Total number of bytes in string. If < 0,
* the script consists of all bytes up to the
* first null character. */
int nested, /* Non-zero means this is a nested command:
* close bracket should be considered a
* command terminator. If zero, then close
* bracket has no special meaning. */
register Tcl_Parse *parsePtr)
/* Structure to fill in with information about
* the parsed command; any previous
* information in the structure is ignored. */
{
int r,n;
char *buf;
static int xxx = 0; // startup with tracing off

n = 100; // to limit the size of traces

if (numBytes >= n || numBytes <=0 ){
return Tcl_ParseCommand0 ( interp,start,numBytes,nested,parsePtr ); // this works
} else {
buf = ckalloc( numBytes + 32);
memcpy(buf,start,numBytes+2);
buf[numBytes] = 0;
// use undefined commands to set dump flag
if (buf[0] == 'o' && buf[1] == 'n' && buf[2] == 'n' ) {xxx = 2;}
if (buf[0] == 'o' && buf[1] == 'f' && buf[2] == 'f' ) {xxx = 0;}
if (buf[0] == 'o' && buf[1] == 'n' && buf[2] == 's' ) {xxx = 1;}

//r = Tcl_ParseCommand0 ( interp,start,numBytes,nested,parsePtr ); // this works
r = Tcl_ParseCommand0 ( interp,buf,numBytes,nested,parsePtr ); // this fails

if (numBytes > 1 && xxx > 0) {
fprintf(stderr,"numBytes= %3d nest= %d ret= %d start= %8x <%s>\n"
,numBytes,nested,r,(unsigned int)start,buf);
if (xxx > 1) {
pprint(parsePtr); // verbose dump to stderr
}
}

return r;

}
}


stefan

unread,
Nov 21, 2018, 2:55:09 AM11/21/18
to
> I was hoping there might be a way to “pre-process” a command and then submit a modified command for parsing and then execution and compiling. I’ve not had success with this.

I am not so sure what you are trying to achieve exactly? Why do you need a copy of the parse to become modified, why not walk the given parse and patch it (e.g., drop tokens etc.)?

In any case, you would have to make a deep copy of the parse, not just a shallow top-level memcpy. Look at e.g., Tcl_FreeParse or so to see what needs to become duplicated. I am not aware of sth. built-in to achieve this, I am afraid.

Stefan

heinrichmartin

unread,
Nov 21, 2018, 5:42:56 AM11/21/18
to
On Wednesday, November 21, 2018 at 8:55:09 AM UTC+1, stefan wrote:
> > I was hoping there might be a way to “pre-process” a command and then submit a modified command for parsing and then execution and compiling. I’ve not had success with this.
>
> I am not so sure what you are trying to achieve exactly? Why do you need a copy of the parse to become modified, why not walk the given parse and patch it (e.g., drop tokens etc.)?

Assuming you want to replace words in parens with expr (as discussed in the other thread).

Stefan's message in other words: Tcl_ParseCommand *is* the pre-processor you are looking for.

I'd try this way (big words from someone with little time, browsing tclParse.c only) (my reference is the source code from Tcl 8.6.9):
* line 348: insert an else if for '('
* write Tcl_ParseParens to do the job
** Grab the string like Tcl_ParseBraces, except: look for balanced parens, not braces
** fill parsePtr as if you parsed [expr {<that_string>}]
* line 577: insert the error message for "extra characters after close-parens"

HTH
Martin

two...@gmail.com

unread,
Nov 21, 2018, 8:06:57 AM11/21/18
to
On Wednesday, November 21, 2018 at 2:42:56 AM UTC-8, heinrichmartin wrote:
>
> Assuming you want to replace words in parens with expr (as discussed in the other thread).
>

Yes, I've been thinking about that.

So, just to refresh (if stefan is interested), a pair of ( )'s would be like a pair of { } but automatically handed to [expr] similar to how [if] does it.

Unlike for [if] this would happen for any command argument that began with '(' but not if otherwise embedded.

>
> I'd try this way (big words from someone with little time, browsing tclParse.c only) (my reference is the source code from Tcl 8.6.9):
> * line 348: insert an else if for '('

Ah, interesting. I tried modifying the previous test for both ( and { but that didn’t fly. I have time (retired), but get frustrated easily :)

> * write Tcl_ParseParens to do the job
> ** Grab the string like Tcl_ParseBraces, except: look for balanced parens, not braces

Hmmm. So say I clone Tcl_ParseBraces -> Tcl_ParseParens , and swap any literals of {} for () do you think that would have a good chance of working. I see other places that have literals {}'s - are they only in places where we wouldn't begin with an '('? Can I ignore those?

What about that table that describes char types. I see some references to TYPE_CLOSE_BRACK, might I not also need to add the one for TYPE_CLOSE_PAREN at the line where it does,

terminators = TYPE_COMMAND_END | TYPE_CLOSE_BRACK;
(line 276 in my source)

> ** fill parsePtr as if you parsed [expr {<that_string>}]

Would that be just building 4 tokens as follows for say, (1+2)

set var (1+2)

would be like,

set var [expr {1+2}]

I traced,

set var [expr {1+2}]

and I see this, is that would I would need to do?

------------------
SIMPLE_WORD size = 4 numCom = 1 start = 009d15f1 <expr>
TEXT size = 4 numCom = 0 start = 009d15f1 <expr>
SIMPLE_WORD size = 5 numCom = 1 start = 009d15f6 <{1+2}>
TEXT size = 3 numCom = 0 start = 009d15f7 <1+2>
------------------

less the outer <...>

How would I get that 3rd one, which has the {}'s since I'm in ()'s or would I just have the token contain the ()'s. I'm thinking that would mess something else up a bit downstream.

Also, what about the text "expr" ?

Can tokens point to text other than what's in the original script? When I couldn't copy the string int Tcl_ParseCommand and have it work, I figured I can't point to just any text, such as static text strings.



> * line 577: insert the error message for "extra characters after close-parens"
>
Ok, I see that. So that would be if an elseif just ahead testing src[-1] for ')'.





two...@gmail.com

unread,
Nov 21, 2018, 1:17:03 PM11/21/18
to
On Wednesday, November 21, 2018 at 5:06:57 AM UTC-8, two...@gmail.com wrote:

> Hmmm. So say I clone Tcl_ParseBraces -> Tcl_ParseParens , and swap any literals of {} for ()

To answer my own question, this does appear to work. Seemed too easy - waiting for some gotcha.

So, half way home. But I'm sure I would need help adding the tokens.

I also wish I could figure out how to format code and fixed font in this group postings, so I could show the output of my parser trace. It messes up all the work I did creating aligned columns (so my poor old eyes won't hurt reading it).

Alternatively, I also tried to create a new wiki page, and it error-ed out with some sort of failure getting a lock. There's a page for expr, but it's really long and I didn't want to put this stuff at the end. I wanted to post my parse result and token list tracer in case anyone else was interested.


And I still don't understand why it seems to parse the same thing multiple times. I hesitate to put my trace here, as it formats so badly.


heinrichmartin

unread,
Nov 21, 2018, 1:45:53 PM11/21/18
to
On Wednesday, November 21, 2018 at 2:06:57 PM UTC+1, two...@gmail.com wrote:
> Would that be just building 4 tokens as follows for say, (1+2)
>
> set var (1+2)
>
> would be like,
>
> set var [expr {1+2}]
>
> I traced,
>
> set var [expr {1+2}]
>
> and I see this, is that would I would need to do?
>
> ------------------
> SIMPLE_WORD size = 4 numCom = 1 start = 009d15f1 <expr>
> TEXT size = 4 numCom = 0 start = 009d15f1 <expr>
> SIMPLE_WORD size = 5 numCom = 1 start = 009d15f6 <{1+2}>
> TEXT size = 3 numCom = 0 start = 009d15f7 <1+2>
> ------------------

It would be interesting to get the full tree, including set with nested [expr].

> How would I get that 3rd one, which has the {}'s since I'm in ()'s or would I just have the token contain the ()'s. I'm thinking that would mess something else up a bit downstream.

I'd try the 3rd pointing to the inner text only, like the 4th.

> Also, what about the text "expr" ?
>
> Can tokens point to text other than what's in the original script? When I couldn't copy the string int Tcl_ParseCommand and have it work, I figured I can't point to just any text, such as static text strings.

That's bad luck - this would have been my idea, i.e. constant struct Tcl_Token (one SIMPLE_WORD, one TEXT).

http://tcl.tk/man/tcl8.6/TclLib/ParseCmd.htm#M6

Great you are half way through :-)

Rich

unread,
Nov 21, 2018, 2:01:49 PM11/21/18
to
two...@gmail.com wrote:
> I also wish I could figure out how to format code and fixed font in
> this group postings, so I could show the output of my parser trace.
> It messes up all the work I did creating aligned columns (so my poor
> old eyes won't hurt reading it).

How to do this involves:

1) Stop using google groups (it is simply the most awful UI for
accessing Usenet news ever created).

2) Instead, use a real usenet newsreader
(https://en.wikipedia.org/wiki/List_of_Usenet_newsreaders). A
text based client is all but guaranteed to use a fixed width font
(unless you go way out of your way to use a proportional font in
your terminal windows. Some of the graphical ones 'might' be able
to be configured for a fixed width font, but YMMV there.

3) Use Eternal September (https://www.eternal-september.org/) or AIOE
(https://news.aioe.org/) (or another paid Usenet provider if you
already have one or prefer a paid one)

4) Hard wrap your articles with newlines before you post them.

Then you can do aligned columns in usenet postings just fine
(preferably keep to about 72 characters wide per line, but even that
isn't a hard rule, just good netiquette).

two...@gmail.com

unread,
Nov 21, 2018, 3:06:33 PM11/21/18
to
On Wednesday, November 21, 2018 at 11:01:49 AM UTC-8, Rich wrote:

> 1) Stop using google groups (it is simply the most awful UI for
> accessing Usenet news ever created).
>

I will look into that, but does this mean that others see the correct
formatting, while only I see it messed up using google groups?

If others see it mis-formated, it does seem that one can copy/paste into
an editor, so below are a few traces. Sorry, but some lines are longer than
72 columns.

>It would be interesting to get the full tree, including set with nested [expr].

see below


First, this isn't working quite right


% set var (1+2)xxx
extra characters after close-brace
%

I'm not getting this error reported correctly, though I don't know why.


if (src[-1] == '"') {
if (interp != NULL) {
Tcl_SetObjResult(interp, Tcl_NewStringObj(
"extra characters after close-quote", -1));
}
parsePtr->errorType = TCL_PARSE_QUOTE_EXTRA;
} else if (src[-1] == '(') {
if (interp != NULL) {
Tcl_SetObjResult(interp, Tcl_NewStringObj(
"extra characters after close-paren", -1));
}
parsePtr->errorType = TCL_PARSE_QUOTE_EXTRA;
} else {
if (interp != NULL) {
Tcl_SetObjResult(interp, Tcl_NewStringObj(
"extra characters after close-brace", -1));
}
parsePtr->errorType = TCL_PARSE_BRACE_EXTRA;
}
parsePtr->term = src;
goto error;



==================================
Here are two parse traces, copy/paste to editor if formatting is off. Here goes....


(I suppress full trace when the interp arg is null.)


%
% set var [expr {1+2}]
numBytes= 12 nested= 1 ret= 0 start= 7fe331 <expr {1+2}]
>
--- suppress p-print, interp= 0 strlen= 12 <expr {1+2}]>

numBytes= 21 nested= 0 ret= 0 start= 7fe328 <set var [expr {1+2}]
>
--- suppress p-print, interp= 0 strlen= 21 <set var [expr {1+2}]
>

numBytes= 11 nested= 1 ret= 0 start= 7fe331 <expr {1+2}]>

------------------ parse block interp= 7927b8 parse= 7996e0
commandStart = e commandSize = 11
tokenPtr = 799718 staticTokens = 799718
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33c <>
string = 7fe331 <expr {1+2}]>
term = 5d <]>
------------------
-- tk @ 799718 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe331 <expr>
-- tk @ 799728 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe331 <expr>
-- tk @ 799738 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe336 <{1+2}>
-- tk @ 799748 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe337 <1+2>
------------------
numBytes= 20 nested= 0 ret= 0 start= 7fe328 <set var [expr {1+2}]>

------------------ parse block interp= 7927b8 parse= bfd3393c
commandStart = s commandSize = 20
tokenPtr = bfd33974 staticTokens = bfd33974
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33c <>
string = 7fe328 <set var [expr {1+2}]>
term = 00 <>
------------------
-- tk @ bfd33974 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe328 <set>
-- tk @ bfd33984 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 007fe328 <set>
-- tk @ bfd33994 ( 2) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe32c <var>
-- tk @ bfd339a4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe32c <var>
-- tk @ bfd339b4 ( 4) type= 1 WORD size= 12 numCom= 1 start= 007fe330 <[expr {1+2}]>
-- tk @ bfd339c4 ( 5) type= 16 COMMAND size= 12 numCom= 0 start= 007fe330 <[expr {1+2}]>
------------------
numBytes= 10 nested= 0 ret= 0 start= 7fe331 <expr {1+2}>

------------------ parse block interp= 7927b8 parse= bfd3347c
commandStart = e commandSize = 10
tokenPtr = bfd334b4 staticTokens = bfd334b4
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33b <]>
string = 7fe331 <expr {1+2}>
term = 5d <]>
------------------
-- tk @ bfd334b4 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe331 <expr>
-- tk @ bfd334c4 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe331 <expr>
-- tk @ bfd334d4 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe336 <{1+2}>
-- tk @ bfd334e4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe337 <1+2>
------------------
numBytes= 11 nested= 1 ret= 0 start= 7fe331 <expr {1+2}]>

------------------ parse block interp= 7927b8 parse= 7996e0
commandStart = e commandSize = 11
tokenPtr = 799718 staticTokens = 799718
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33c <>
string = 7fe331 <expr {1+2}]>
term = 5d <]>
------------------
-- tk @ 799718 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe331 <expr>
-- tk @ 799728 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe331 <expr>
-- tk @ 799738 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe336 <{1+2}>
-- tk @ 799748 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe337 <1+2>
------------------
numBytes= 20 nested= 0 ret= 0 start= 7fe328 <set var [expr {1+2}]>

------------------ parse block interp= 7927b8 parse= bfd3393c
commandStart = s commandSize = 20
tokenPtr = bfd33974 staticTokens = bfd33974
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33c <>
string = 7fe328 <set var [expr {1+2}]>
term = 00 <>
------------------
-- tk @ bfd33974 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe328 <set>
-- tk @ bfd33984 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 007fe328 <set>
-- tk @ bfd33994 ( 2) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe32c <var>
-- tk @ bfd339a4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe32c <var>
-- tk @ bfd339b4 ( 4) type= 1 WORD size= 12 numCom= 1 start= 007fe330 <[expr {1+2}]>
-- tk @ bfd339c4 ( 5) type= 16 COMMAND size= 12 numCom= 0 start= 007fe330 <[expr {1+2}]>
------------------
numBytes= 10 nested= 0 ret= 0 start= 7fe331 <expr {1+2}>

------------------ parse block interp= 7927b8 parse= bfd3347c
commandStart = e commandSize = 10
tokenPtr = bfd334b4 staticTokens = bfd334b4
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe33b <]>
string = 7fe331 <expr {1+2}>
term = 5d <]>
------------------
-- tk @ bfd334b4 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe331 <expr>
-- tk @ bfd334c4 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe331 <expr>
-- tk @ bfd334d4 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe336 <{1+2}>
-- tk @ bfd334e4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe337 <1+2>
------------------
3

=================================

Here's a trace, after implementing ()'s as suggested, but no tokens yet.
So, at this point, () work just like {} - with minimal testing only though


% set a [list (1+2) (3+4)]
numBytes= 18 nested= 1 ret= 0 start= 7fe42f <list (1+2) (3+4)]
>
--- suppress p-print, interp= 0 strlen= 18 <list (1+2) (3+4)]>

numBytes= 25 nested= 0 ret= 0 start= 7fe428 <set a [list (1+2) (3+4)]
>
--- suppress p-print, interp= 0 strlen= 25 <set a [list (1+2) (3+4)]
>

numBytes= 17 nested= 1 ret= 0 start= 7fe42f <list (1+2) (3+4)]>

------------------ parse block interp= 7927b8 parse= 7996e0
commandStart = l commandSize = 17
tokenPtr = 799718 staticTokens = 799718
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe440 <>
string = 7fe42f <list (1+2) (3+4)]>
term = 5d <]>
------------------
-- tk @ 799718 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe42f <list>
-- tk @ 799728 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe42f <list>
-- tk @ 799738 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe434 <(1+2)>
-- tk @ 799748 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe435 <1+2>
-- tk @ 799758 ( 4) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe43a <(3+4)>
-- tk @ 799768 ( 5) type= 4 TEXT size= 3 numCom= 0 start= 007fe43b <3+4>
------------------
numBytes= 24 nested= 0 ret= 0 start= 7fe428 <set a [list (1+2) (3+4)]>

------------------ parse block interp= 7927b8 parse= bfd3393c
commandStart = s commandSize = 24
tokenPtr = bfd33974 staticTokens = bfd33974
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe440 <>
string = 7fe428 <set a [list (1+2) (3+4)]>
term = 00 <>
------------------
-- tk @ bfd33974 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe428 <set>
-- tk @ bfd33984 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 007fe428 <set>
-- tk @ bfd33994 ( 2) type= 2 SIMPLE_WORD size= 1 numCom= 1 start= 007fe42c <a>
-- tk @ bfd339a4 ( 3) type= 4 TEXT size= 1 numCom= 0 start= 007fe42c <a>
-- tk @ bfd339b4 ( 4) type= 1 WORD size= 18 numCom= 1 start= 007fe42e <[list (1+2) (3+4)]>
-- tk @ bfd339c4 ( 5) type= 16 COMMAND size= 18 numCom= 0 start= 007fe42e <[list (1+2) (3+4)]>
------------------
numBytes= 16 nested= 0 ret= 0 start= 7fe42f <list (1+2) (3+4)>

------------------ parse block interp= 7927b8 parse= bfd3347c
commandStart = l commandSize = 16
tokenPtr = bfd334b4 staticTokens = bfd334b4
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe43f <]>
string = 7fe42f <list (1+2) (3+4)>
term = 5d <]>
------------------
-- tk @ bfd334b4 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe42f <list>
-- tk @ bfd334c4 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe42f <list>
-- tk @ bfd334d4 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe434 <(1+2)>
-- tk @ bfd334e4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe435 <1+2>
-- tk @ bfd334f4 ( 4) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe43a <(3+4)>
-- tk @ bfd33504 ( 5) type= 4 TEXT size= 3 numCom= 0 start= 007fe43b <3+4>
------------------
numBytes= 17 nested= 1 ret= 0 start= 7fe42f <list (1+2) (3+4)]>

------------------ parse block interp= 7927b8 parse= 7996e0
commandStart = l commandSize = 17
tokenPtr = 799718 staticTokens = 799718
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe440 <>
string = 7fe42f <list (1+2) (3+4)]>
term = 5d <]>
------------------
-- tk @ 799718 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe42f <list>
-- tk @ 799728 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe42f <list>
-- tk @ 799738 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe434 <(1+2)>
-- tk @ 799748 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe435 <1+2>
-- tk @ 799758 ( 4) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe43a <(3+4)>
-- tk @ 799768 ( 5) type= 4 TEXT size= 3 numCom= 0 start= 007fe43b <3+4>
------------------
numBytes= 24 nested= 0 ret= 0 start= 7fe428 <set a [list (1+2) (3+4)]>

------------------ parse block interp= 7927b8 parse= bfd3393c
commandStart = s commandSize = 24
tokenPtr = bfd33974 staticTokens = bfd33974
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe440 <>
string = 7fe428 <set a [list (1+2) (3+4)]>
term = 00 <>
------------------
-- tk @ bfd33974 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 007fe428 <set>
-- tk @ bfd33984 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 007fe428 <set>
-- tk @ bfd33994 ( 2) type= 2 SIMPLE_WORD size= 1 numCom= 1 start= 007fe42c <a>
-- tk @ bfd339a4 ( 3) type= 4 TEXT size= 1 numCom= 0 start= 007fe42c <a>
-- tk @ bfd339b4 ( 4) type= 1 WORD size= 18 numCom= 1 start= 007fe42e <[list (1+2) (3+4)]>
-- tk @ bfd339c4 ( 5) type= 16 COMMAND size= 18 numCom= 0 start= 007fe42e <[list (1+2) (3+4)]>
------------------
numBytes= 16 nested= 0 ret= 0 start= 7fe42f <list (1+2) (3+4)>

------------------ parse block interp= 7927b8 parse= bfd3347c
commandStart = l commandSize = 16
tokenPtr = bfd334b4 staticTokens = bfd334b4
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007927b8 incomplete = 0
end = 7fe43f <]>
string = 7fe42f <list (1+2) (3+4)>
term = 5d <]>
------------------
-- tk @ bfd334b4 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 007fe42f <list>
-- tk @ bfd334c4 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 007fe42f <list>
-- tk @ bfd334d4 ( 2) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe434 <(1+2)>
-- tk @ bfd334e4 ( 3) type= 4 TEXT size= 3 numCom= 0 start= 007fe435 <1+2>
-- tk @ bfd334f4 ( 4) type= 2 SIMPLE_WORD size= 5 numCom= 1 start= 007fe43a <(3+4)>
-- tk @ bfd33504 ( 5) type= 4 TEXT size= 3 numCom= 0 start= 007fe43b <3+4>
------------------
1+2 3+4
%





Rich

unread,
Nov 21, 2018, 3:20:17 PM11/21/18
to
two...@gmail.com wrote:
> On Wednesday, November 21, 2018 at 11:01:49 AM UTC-8, Rich wrote:
>
>> 1) Stop using google groups (it is simply the most awful UI for
>> accessing Usenet news ever created).
>>
>
> I will look into that, but does this mean that others see the correct
> formatting, while only I see it messed up using google groups?

From the looks of your posting below, that answer appears to be yes
(provided that we 'others' are using a fixed width font).

But, GG has had a history of reformatting things (this often happens
for quoting, where they just can't seem to get it right), so there's
little guarantee that GG might continue to work, even if it seems to
work now.

> If others see it mis-formated, it does seem that one can copy/paste
> into an editor, so below are a few traces. Sorry, but some lines are
> longer than 72 columns.
>
>>It would be interesting to get the full tree, including set with
>> nested [expr].
>
> see below
[long list of a lot of 'aligned' items deleted]

All of that looked to be aligned to me. Granted I don't know what the
origional alignment was supposed to be, but things seemed to be neatly
aligned into columns that look like they might be intentional.

heinrichmartin

unread,
Nov 21, 2018, 3:47:12 PM11/21/18
to
On Wednesday, November 21, 2018 at 9:06:33 PM UTC+1, two...@gmail.com wrote:
> % set var (1+2)xxx
> extra characters after close-brace
> %

> } else if (src[-1] == '(') {

Note that src points behind the parsed word by then - you are looking for ')'.

stefan

unread,
Nov 21, 2018, 4:59:14 PM11/21/18
to
> > Assuming you want to replace words in parens with expr (as discussed in the other thread).
> >
>
> Yes, I've been thinking about that.
>
> So, just to refresh (if stefan is interested), a pair of ( )'s would be like a pair of { } but automatically handed to [expr] similar to how [if] does it.

Did you have a look at "sugar" and its exemplary syntax macro in front of [expr]?

https://wiki.tcl-lang.org/page/Sugar+syntax+macros

Stefan

two...@gmail.com

unread,
Nov 21, 2018, 5:16:08 PM11/21/18
to
On Wednesday, November 21, 2018 at 12:47:12 PM UTC-8, heinrichmartin wrote:

> > } else if (src[-1] == '(') {
>
> Note that src points behind the parsed word by then - you are looking for ')'.

Good catch!

I also discovered that the the '(' open paren is classified as TYPE_NORMAL in tclCharTypeTable and so not really understanding how other code uses that shared table, I had to kinda kludge it with the the following literal test for '('

while (1) {
while (++src, --numBytes) {

if (CHAR_TYPE(*src) != TYPE_NORMAL || ( *src == '(' ) ) {
break;
}
}


I'm pretty sure that's a no no, but w/o it, the scan wasn't handling nested ()'s properly.

two...@gmail.com

unread,
Nov 21, 2018, 5:23:41 PM11/21/18
to
Well, blow me down, someone already did it! At least this is confirmation that what I desire would be useful.

I don't quite (yet) understand how that works, although it likely wouldn't be as fast as doing it in the parser. But I think I'll delay further development until I understand what that is doing.

Thanks for the link.

heinrichmartin

unread,
Nov 21, 2018, 7:16:08 PM11/21/18
to
On Wednesday, November 21, 2018 at 9:06:33 PM UTC+1, two...@gmail.com wrote:
> Here are two parse traces, copy/paste to editor if formatting is off. Here goes....

Ok, it seems like there is no single file to implement it, because [expr {TEXT}] is longer than (TEXT).

The proper solution would be to define another type of token, I guess.
For now (and this is probably the end of my effort, too), I implemented a version that is leaking memory: It allocates a Tcl_Obj for the fake code [expr {TEXT}] that is never released.

https://pastebin.de/10960/

The second best implementation could be: xcheck where token->start is pointing and release it if necessary.

two...@gmail.com

unread,
Nov 21, 2018, 8:08:09 PM11/21/18
to
On Wednesday, November 21, 2018 at 4:16:08 PM UTC-8, heinrichmartin wrote:
> On Wednesday, November 21, 2018 at 9:06:33 PM UTC+1, two...@gmail.com wrote:
> > Here are two parse traces, copy/paste to editor if formatting is off. Here goes....
>
> Ok, it seems like there is no single file to implement it, because [expr {TEXT}] is longer than (TEXT).
>
>
> https://pastebin.de/10960/

Thanks!

>
> The second best implementation could be: xcheck where token->start is pointing and release it if necessary.

xcheck? I've not heard of that.

Wow, 2 implementations in 1 day :)

If your code only allocates memory once, and not later after it is compiled, then it would seem to be ok. I don't know enough to really comment on that.

I will definitely checkout your code. I've also been looking at sugar, and I have to say, it is SWEET!

I'm reminded of a quote I think came from Brian Kernighan:

"I'd rather write programs that write programs, than write programs".

The downside of sugar would seem to be complexity. I don't know if I'd want to be maintaining self modifying code, though I'm guilty as charged for doing it.

Here's my test, using the sugar::syntaxmacro sugarmath,


sugar::proc mytest arg {
set a $arg
set b ($a + 10)
return $b
}

and then,

() 6 % mytest 10
20
() 7 % lp mytest ;# my little proc dumper
---------------------
proc mytest {arg} {
set a $arg
set b [expr {$a + 10}]
return $b
}


So, perhaps a downside is that introspection sees only the expanded macro. But still, I love it.

I'll play some more with your code

Thanks all!

two...@gmail.com

unread,
Nov 21, 2018, 10:32:05 PM11/21/18
to
On Wednesday, November 21, 2018 at 4:16:08 PM UTC-8, heinrichmartin wrote:

>I implemented a version that is leaking memory: It allocates a Tcl_Obj for the fake code [expr {TEXT}] that is never released.


I like your implementation. I would never have figured that out. Below is my trace and you can see the address of the generated text changes (still don't know why it runs the parser 4 times - maybe because I'm typing at a console)

Anyway, thanks for the stimulating collaboration.

% set x 10
10
% set y 20
20
% set z ($x + $y)
30
%

<turn on trace>

% set z ($x + $y)
numBytes= 16 nested= 0 ret= 0 start= 828aa0 <set z ($x + $y)
>
--- suppress p-print, interp= 0 strlen= 16 <set z ($x + $y)
>

numBytes= 15 nested= 0 ret= 0 start= 828aa0 <set z ($x + $y)>

------------------ parse block interp= 7bd7b8 parse= bfd87bac
commandStart = s commandSize = 15
tokenPtr = bfd87be4 staticTokens = bfd87be4
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007bd7b8 incomplete = 0
end = 828aaf <>
string = 828aa0 <set z ($x + $y)>
term = 00 <>
------------------
-- tk @ bfd87be4 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 00828aa0 <set>
-- tk @ bfd87bf4 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 00828aa0 <set>
-- tk @ bfd87c04 ( 2) type= 2 SIMPLE_WORD size= 1 numCom= 1 start= 00828aa4 <z>
-- tk @ bfd87c14 ( 3) type= 4 TEXT size= 1 numCom= 0 start= 00828aa4 <z>
-- tk @ bfd87c24 ( 4) type= 1 WORD size= 9 numCom= 1 start= 00828aa6 <($x + $y)>
-- tk @ bfd87c34 ( 5) type= 16 COMMAND size= 16 numCom= 0 start= 0082cd70 <[expr {$x + $y}]>
------------------

numBytes= 14 nested= 0 ret= 0 start= 82cd71 <expr {$x + $y}>

------------------ parse block interp= 7bd7b8 parse= bfd876ec
commandStart = e commandSize = 14
tokenPtr = bfd87724 staticTokens = bfd87724
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007bd7b8 incomplete = 0
end = 82cd7f <]>
string = 82cd71 <expr {$x + $y}>
term = 5d <]>
------------------
-- tk @ bfd87724 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 0082cd71 <expr>
-- tk @ bfd87734 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 0082cd71 <expr>
-- tk @ bfd87744 ( 2) type= 2 SIMPLE_WORD size= 9 numCom= 1 start= 0082cd76 <{$x + $y}>
-- tk @ bfd87754 ( 3) type= 4 TEXT size= 7 numCom= 0 start= 0082cd77 <$x + $y>
------------------

numBytes= 15 nested= 0 ret= 0 start= 828aa0 <set z ($x + $y)>

------------------ parse block interp= 7bd7b8 parse= bfd87bac
commandStart = s commandSize = 15
tokenPtr = bfd87be4 staticTokens = bfd87be4
numWords = 3 numTokens = 6
tokensAvailable = 20 errorType = 0
interp = 007bd7b8 incomplete = 0
end = 828aaf <>
string = 828aa0 <set z ($x + $y)>
term = 00 <>
------------------
-- tk @ bfd87be4 ( 0) type= 2 SIMPLE_WORD size= 3 numCom= 1 start= 00828aa0 <set>
-- tk @ bfd87bf4 ( 1) type= 4 TEXT size= 3 numCom= 0 start= 00828aa0 <set>
-- tk @ bfd87c04 ( 2) type= 2 SIMPLE_WORD size= 1 numCom= 1 start= 00828aa4 <z>
-- tk @ bfd87c14 ( 3) type= 4 TEXT size= 1 numCom= 0 start= 00828aa4 <z>
-- tk @ bfd87c24 ( 4) type= 1 WORD size= 9 numCom= 1 start= 00828aa6 <($x + $y)>
-- tk @ bfd87c34 ( 5) type= 16 COMMAND size= 16 numCom= 0 start= 008288e0 <[expr {$x + $y}]>
------------------

numBytes= 14 nested= 0 ret= 0 start= 8288e1 <expr {$x + $y}>

------------------ parse block interp= 7bd7b8 parse= bfd876ec
commandStart = e commandSize = 14
tokenPtr = bfd87724 staticTokens = bfd87724
numWords = 2 numTokens = 4
tokensAvailable = 20 errorType = 0
interp = 007bd7b8 incomplete = 0
end = 8288ef <]>
string = 8288e1 <expr {$x + $y}>
term = 5d <]>
------------------
-- tk @ bfd87724 ( 0) type= 2 SIMPLE_WORD size= 4 numCom= 1 start= 008288e1 <expr>
-- tk @ bfd87734 ( 1) type= 4 TEXT size= 4 numCom= 0 start= 008288e1 <expr>
-- tk @ bfd87744 ( 2) type= 2 SIMPLE_WORD size= 9 numCom= 1 start= 008288e6 <{$x + $y}>
-- tk @ bfd87754 ( 3) type= 4 TEXT size= 7 numCom= 0 start= 008288e7 <$x + $y>
------------------

30
%


heinrichmartin

unread,
Nov 23, 2018, 4:25:55 AM11/23/18
to
On Thursday, November 22, 2018 at 2:08:09 AM UTC+1, two...@gmail.com wrote:
> > The second best implementation could be: xcheck where token->start is pointing and release it if necessary.
>
> xcheck? I've not heard of that.

Just saved a few chars of "crosscheck". And it's wrong in two ways:
1. Should be plain "check"; no crosscheck involved.
2. This would be a heuristic and this shouldn't be the way to go.

Instead, track in the Tcl_Token whether it owns the memory to which ->start points. But this requires to rebuild all libs involved.

> If your code only allocates memory once, and not later after it is compiled, then it would seem to be ok.

Nope, one Tcl_Obj per parse :-(

> still don't know why it runs the parser 4 times - maybe because I'm typing at a console

On aspect of this could be that the bracket expressions are recursively parsed to find the words, but that info is dropped. It is therefore parsed again when it is eval'd or compiled.

I was quite surprised finding this in the code - and I still wonder whether this is right (in many ways).

set x [expr {1+2}] would tokenize the [expr {1+2}] in the first pass *just to match the correct close-bracket*. The caller must initiate a second pass on that TC_TOKEN_COMMAND to receive the information.

two...@gmail.com

unread,
Nov 23, 2018, 12:04:47 PM11/23/18
to
On Friday, November 23, 2018 at 1:25:55 AM UTC-8, heinrichmartin wrote:
>
> Instead, track in the Tcl_Token whether it owns the memory to which ->start points. But this requires to rebuild all libs involved.
>
>
> Nope, one Tcl_Obj per parse :-(
>

Further testing reveals a problem with lines that trigger errors,
which is likely something to do with how the error reporting tries
to spit out the line in error

I added a hack to turn on the () parsing, by setting a file
wide static paren_grouping, initially 0 and checking thus:

} else if (*src == '(' && paren_grouping) {


If I enter on() then my wrapper around Tcl_ParseCommand will set
the variable to 1 so as to turn on the feature

I also ran the code (with paren_grouping inited to 1) though

make test

and you'd be amazed how many times that set of test code starts
words with (, such as many of the regex tests.



Here's the crash,



[988]$ rlwrap -pred ./tclsh
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
can't read "b((xxxx)": no such variable
% on()
invalid command name "on()"
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
rlwrap: warning: tclsh crashed, killed by SIGSEGV.
rlwrap itself has not crashed, but for transparency,
it will now kill itself (without dumping core) with the same signal


warnings can be silenced by the --no-warnings (-n) option
Segmentation fault






Here's the same thing but wrapping in catch, and it didn't crash,
and even doesn't always crash on the faulty command


[988]$ rlwrap -pred ./tclsh
% on()
invalid command name "on()"
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% set err
-code 1 -level 0 -errorstack {INNER loadArrayStk} -errorcode {TCL LOOKUP VARNAME b} -errorinfo {can't read "b((xxxx)": no such variable
while executing
"expr { (1+2) * (3+4) * $b(\(xxxx) }"} -errorline 4
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% set err
-code 1 -level 0 -errorstack {INNER loadArrayStk} -errorcode {TCL LOOKUP VARNAME b} -errorinfo {can't read "b((xxxx)": no such variable
while executing
"expr { (1+2) * (3+4) * $b(\(xxxx) }"} -errorline 4
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% catch {set a ( (1+2) * (3+4) * $b(\(xxxx) ) } } err
1
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
can't read "b((xxxx)": no such variable
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
rlwrap: warning: tclsh crashed, killed by SIGSEGV.
rlwrap itself has not crashed, but for transparency,
it will now kill itself (without dumping core) with the same signal


warnings can be silenced by the --no-warnings (-n) option
Segmentation fault




Here's the crash run with gdb


[988]$ gdb ./tclsh
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
<...snip...>
Reading symbols from ./tclsh...done.
(gdb) r
Starting program: /home/.../Desktop/tcl8.6.8/unix/test/tclsh
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
% on()
invalid command name "on()"
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }

Program received signal SIGSEGV, Segmentation fault.
TclLogCommandInfo (interp=0x40d7b8, script=0x4793a8 "set a ( (1+2) * (3+4) * $b(\\(xxxx) )", ' ' <repeats 17 times>, "}",
command=0x479329 "expr { (1+2) * (3+4) * $b(\\(xxxx) }]", length=43, pc=0x42e377 "\017\067\001\005\006\004", tosPtr=0x41472c)
at /home/et/Desktop/tcl8.6.8/generic/tclNamesp.c:4905
4905 if (*p == '\n') {

(gdb) bt
#0 TclLogCommandInfo (interp=0x40d7b8, script=0x4793a8 "set a ( (1+2) * (3+4) * $b(\\(xxxx) )", ' ' <repeats 17 times>, "}",
command=0x479329 "expr { (1+2) * (3+4) * $b(\\(xxxx) }]", length=43, pc=0x42e377 "\017\067\001\005\006\004", tosPtr=0x41472c)
at /home/et/Desktop/tcl8.6.8/generic/tclNamesp.c:4905
#1 0xb7ed4007 in TEBCresume (data=0x44413c, interp=0x40d7b8, result=1) at /home/et/Desktop/tcl8.6.8/generic/tclExecute.c:8021
#2 0xb7e1c303 in TclNRRunCallbacks (interp=0x40d7b8, result=0, rootPtr=0x0) at /home/et/Desktop/tcl8.6.8/generic/tclBasic.c:4435
#3 0xb7e1e679 in TclEvalObjEx (interp=0x40d7b8, objPtr=0xf0, flags=131072, invoker=0x0, word=0)
at /home/et/Desktop/tcl8.6.8/generic/tclBasic.c:6001
#4 0xb7e1e61a in Tcl_EvalObjEx (interp=0x40d7b8, objPtr=0xf0, flags=131072) at /home/et/Desktop/tcl8.6.8/generic/tclBasic.c:5982
#5 0xb7ee0ace in Tcl_RecordAndEvalObj (interp=0x40d7b8, cmdPtr=0x442e90, flags=131072)
at /home/et/Desktop/tcl8.6.8/generic/tclHistory.c:190
#6 0xb7f0b40b in Tcl_MainEx (argc=-1, argv=0xbffff2e8, appInitProc=0x40086a <Tcl_AppInit>, interp=0x40d7b8)
at /home/et/Desktop/tcl8.6.8/generic/tclMain.c:538
#7 0x00400857 in main (argc=1, argv=0xbffff2e4) at /home/et/Desktop/tcl8.6.8/unix/tclAppInit.c:84


(gdb) info local
p = 0x489000 <error: Cannot access memory at address 0x489000>
iPtr = 0x40d7b8
overflow = 4383652
limit = 150
varPtr = 0x4
arrayPtr = 0xf


(gdb) info args
interp = 0x40d7b8
script = 0x4793a8 "set a ( (1+2) * (3+4) * $b(\\(xxxx) )", ' ' <repeats 17 times>, "}"
command = 0x479329 "expr { (1+2) * (3+4) * $b(\\(xxxx) }]"
length = 43
pc = 0x42e377 "\017\067\001\005\006\004"
tosPtr = 0x41472c


(gdb) list -10
4890 /*
4891 * Someone else has already logged error information for this command;
4892 * we shouldn't add anything more.
4893 */
4894
4895 return;
4896 }
4897
4898 if (command != NULL) {
4899 /*
(gdb)
4900 * Compute the line number where the error occurred.
4901 */
4902
4903 iPtr->errorLine = 1;
4904 for (p = script; p != command; p++) {
4905 ---> if (*p == '\n') {
4906 iPtr->errorLine++;
4907 }
4908 }
4909
(gdb)
4910 if (length < 0) {
4911 length = strlen(command);
4912 }
4913 overflow = (length > limit);
4914 Tcl_AppendObjToErrorInfo(interp, Tcl_ObjPrintf(
4915 "\n %s\n\"%.*s%s\"", ((iPtr->errorInfo == NULL)
4916 ? "while executing" : "invoked from within"),
4917 (overflow ? limit : length), command,
4918 (overflow ? "..." : "")));
4919
(gdb)

two...@gmail.com

unread,
Nov 23, 2018, 6:28:44 PM11/23/18
to
On Friday, November 23, 2018 at 9:04:47 AM UTC-8, two...@gmail.com wrote:

> Further testing reveals a problem with lines that trigger errors,
> which is likely something to do with how the error reporting tries
> to spit out the line in error
>

Actually, after looking closer, it's the code that is trying to
figure out the line number, by scanning and counting newlines until
it goes from *script up to *command. This will crash tcl if script is higher
in memory than command, as it will go on forever until a segment viol.

This code in TclLogCommandInfo ASSUMES that the script and the command
are from the original text, which the change in () parsing no longer
guarantees.

Here's my quick hack at fixing it. I'm thinking this should be changed even
without using this modification. Seems like a crash waiting to happen in any case.

/*
* Compute the line number where the error occurred.
*/

iPtr->errorLine = 1;
if ( (unsigned int)script <= (unsigned int)command ) {
fprintf(stderr," go ahead scr=%x com=%x\n",(unsigned int)script ,(unsigned int)command );
for (p = script; p != command; p++) {
if (*p == '\n') {
iPtr->errorLine++;
}
}
} else {
fprintf(stderr," whoa, would have crashed scr=%x com=%x\n",(unsigned int)script ,(unsigned int)command );
}




Here's how it looks with my fprints in there...

[988]$ rlwrap -pred ./tclsh
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
go ahead scr=c228b8 com=c22aca
go ahead scr=c228b8 com=c22aca
go ahead scr=bec270 com=bec270
can't read "b((xxxx)": no such variable
% on()
go ahead scr=c324f8 com=c324f8
invalid command name "on()"
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
whoa, would have crashed scr=c39428 com=c393a9
can't read "b((xxxx)": no such variable
% set a ( (1+2) * (3+4) * $b(\(xxxx) ) }
whoa, would have crashed scr=c395a8 com=c0f489

heinrichmartin

unread,
Nov 26, 2018, 8:17:18 AM11/26/18
to
On Saturday, November 24, 2018 at 12:28:44 AM UTC+1, two...@gmail.com wrote:
> Actually, after looking closer, it's the code that is trying to
> figure out the line number, by scanning and counting newlines until
> it goes from *script up to *command. This will crash tcl if script is higher
> in memory than command, as it will go on forever until a segment viol.

Yep, this busts my latest implementation on [set foo ( )].

> This code in TclLogCommandInfo ASSUMES that the script and the command
> are from the original text, which the change in () parsing no longer
> guarantees.

Tcl also expects zero-terminated strings everywhere DESPITE having size properties (and comments that "guarantee" to read at most this many chars).

The overall problem is maybe not the parser, but further processing (e.g. eval, compile, ...). Therefore, it is about the same effort to introduce a new token type.

> Here's my quick hack at fixing it. I'm thinking this should be changed even
> without using this modification. Seems like a crash waiting to happen in any case.
>
> /*
> * Compute the line number where the error occurred.
> */
>
> iPtr->errorLine = 1;
> if ( (unsigned int)script <= (unsigned int)command ) {

No need to cast. And not enough (for correct code), I guess. The two strings can be anywhere in memory. And again, I assume that there is no single location to fix this. Have you tried putting your code in a proc?

two...@gmail.com

unread,
Nov 26, 2018, 3:07:00 PM11/26/18
to
On Monday, November 26, 2018 at 5:17:18 AM UTC-8, heinrichmartin wrote:

>
>The overall problem is maybe not the parser, but further processing (e.g. eval, compile, ...). Therefore, it is about the same effort to introduce a new token type.

My intention was to have a prototype that could be presented with a TIP, but I now think that this idea would be a hard sell to the TCL community.

Using your implementation allowed me to run it through the test suite and while that probably leans towards corner cases, the sheer number of places that it uses opening (‘s for words rather than “brace everywhere” suggests that this change would have to be conditional anyway.

So, I’ve moved on to using the sugar package instead, as it solves the problem of compatibility. Sugar provides an alternative [proc] procedure that does a macro replacement of () -> [expr] as a sample case.

However, I’ve had to spend a few days with it since being just a sample it didn’t really handle all the things the tcl parser does, in particular, nested ()’s or quoted strings with \( etc.

I ended up front ending their sample procedure with some regsub’s to add spaces around ( and ) since their level counting wasn’t working if a word ended with say, )).

> The two strings can be anywhere in memory. And again, I assume that there is no single location to fix this

Yes, that does seem to be a real problem. In the worst case, they might have discontinuous memory segments with unreadable addresses in between. So, just testing for one being higher than the other is clearly not going to be good enough.
0 new messages