My basic issue is I've got a master text file that has the text of
other filenames scattered in it. I'm trying to substitute the
contents of the other files into the place where the filename is in
the master text file. Everything is in the same directory.
The filename and the surrounding text that will need to be replaced is
this (quoted): "{@INPUTFILE example.txt}". The problem I'm having is
that I can't get it to substitute all of that text, all of the time.
Specifically if the file example.txt contains (quoted): "Example:
300-45823" then it won't replace it. If it has "{Example: 300-45823}
then it will.
Here's the code and I'll explain more afterwards (There's a program
called Shorthand for Windows written in TCL and that's where some of
the application specific verbage will be coming from.):
set file [open $filename r];
set data [read $file];
close $file;
foreach textspiel [regexp -all -inline {((\{@INPUTFILE)\s+\w+(.txt
\}))} $data] {
set intspiel [string length $textspiel]
if {$intspiel > 15} {
set txtposition [string first ".txt" $textspiel];
set textfilenameend [expr {$txtposition + 3}];
set textfilename [string range $textspiel 11
$textfilenameend];
set textfilenametrimmed [string trimleft $textfilename];
# check whether the file exists
if {[file exists $textfilenametrimmed]} {
# read in the contents of the file; add them to the
map
set textfile [open $textfilenametrimmed];
set contents [read $textfile];
close $textfile;
regsub -all {$textspiel} $data {$contents} data;
sh_input msg "" "$textspiel AND $contents";
} else {
set filenotfound "{file not found}";
regsub -all $textspiel $data $filenotfound data;
}
}
}
set file [open $filename w];
puts $file $data;
close $file;
As you can see, I did have to bandaid a few if's in there. I'd be
happy if anyone has any general suggestions on this code, too.
But, I think it must have to do with the brackets. It seems like when
the text file that's going to be substituted into the master file
contains bracketed text then the substitution goes forward. If not,
regexp finds it and sends it down through the code but regsub won't
substitute it.
Thanks in advance for your thoughts/suggestions.
Sincerely,
Mark
Starting out by trying to understand the requirements here. You have a
file whose contents includes sequences of the form:
{@INPUTFILE FOOBAR.txt}
and each of those sequences needs to be replaced by the contents of
the file with the given name? Assuming that's so, and that there's no
other quoting to do, then the method is this:
proc processTemplate string {
# This is exactly the replacement to make a string [subst]-safe
set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string]
# Now convert the replacements to embedded commands
regsub -all {{@INPUTFILE (\w+\.txt)}} $s {[readFromFile \1]} s
# Process all the substitutions
return [subst $s]
}
# Simple read-a-file helper
proc readFromFile filename {
set f [open $filename]
set d [read $f]
close $f
return $d
# Use this instead if you want recursive template processing:
# return [processTemplate $d]
}
The use of [string map], [regsub] and [subst] is not as intuitive as
it ought to be. There probably ought to be a -eval or -command option
to [regsub] so that the rest of that stuff can be avoided, but it's
not been implemented yet (it's slightly tricky to make the syntax work
perfectly so that it doesn't clunk, so it's not so far had a high
enough priority for the people doing the Tcl implementation to work
on).
One thing to note about this code. It's a lot simpler than yours.
Tricks like this are why it is useful to ask here (or look on the
Wiki: http://wiki.tcl.tk) when you're having problems.
Donal.
[snip nice solution]
>One thing to note about this code. It's a lot simpler than yours.
>Tricks like this are why it is useful to ask here
I was going to post a somewhat different solution myself but
Donal got there first... but his [subst]-based solution raises
some really interesting questions for me as a sometime teacher
and trainer.
Using [subst] on data is obviously convenient and powerful,
but it has always troubled me somewhat. For example:
- Unlike just about everything else in Tcl, [subst] just
works the way it works and there's not much you can do
to modify its behaviour. That's fine if it does exactly
what you need, but I worry about flexibility. Stuff like
include-file insertion is quite likely to need detailed,
context-dependent intervention: for example, what should
happen if the last character of an included file is
(or is not) a line break?
- The preparatory wardance
set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string]
frightens me a lot. It's a piece of user code that mirrors
the operation of some Tcl internals. Am I alone in finding
that somewhat distasteful?
And finally, although the solution is neat and instructive,
its relationship to the original requirements is not obvious
to anyone who is not highly Tcl-savvy.
None of this is complaint or criticism. Rather, I guess,
it's an open invitation to help me readjust my attitudes :-)
--
Jonathan Bromley
It's magical. I can remember being thoroughly startled the first time I
saw that sort of thing going on too. :-) But it works, and is both fast
and safe.
> - Unlike just about everything else in Tcl, [subst] just
> works the way it works and there's not much you can do
> to modify its behaviour. That's fine if it does exactly
> what you need, but I worry about flexibility. Stuff like
> include-file insertion is quite likely to need detailed,
> context-dependent intervention: for example, what should
> happen if the last character of an included file is
> (or is not) a line break?
Well that's entirely up to how you go about writing both the [regsub] to
make the command substitution producing the string to process, and what
those command substitutions do. In this case, I'm using a very
simple-minded model; I'm sure you can come up with more sophisticated
ones. But in summary, there are three steps:
1. Defang; [string map] makes this easy.
2. Put in the interesting substitutions.
3. Splat through [subst].
You can reduce the amount of quoting needed in step #1 by passing extra
options in step #3 (e.g., I could have not quoted '$' characters if I'd
passed the -novariables option to [subst]). But it's easy enough to
handle all three cases.
> - The preparatory wardance
> set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string]
> frightens me a lot. It's a piece of user code that mirrors
> the operation of some Tcl internals. Am I alone in finding
> that somewhat distasteful?
OK, that just puts a backslash in front of all Tcl's in-double-quotes
metacharacters. Really. An alternative would have been:
regsub -all {[[\\$]} $string {\\&} s
But that's slower and just as magical. :-)
> And finally, although the solution is neat and instructive,
> its relationship to the original requirements is not obvious
> to anyone who is not highly Tcl-savvy.
We know it ought to be more elegant and obvious than this; it's on our
todo list. Maybe next year in Tcl 8.7...?
Donal.
>It's magical. I can remember being thoroughly startled the first time I
>saw that sort of thing going on too. :-) But it works, and is both fast
>and safe.
Understood.
> 1. Defang; [string map] makes this easy.
> 2. Put in the interesting substitutions.
> 3. Splat through [subst].
Nice summary, thanks.
>You can reduce the amount of quoting needed in step #1 by passing extra
>options in step #3 (e.g., I could have not quoted '$' characters if I'd
>passed the -novariables option to [subst]). But it's easy enough to
>handle all three cases.
Right, it seems pointless to do only some of them if one single,
simple recipe will handle the whole lot.
>> - The preparatory wardance
>> set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string]
>> frightens me a lot. It's a piece of user code that mirrors
>> the operation of some Tcl internals. Am I alone in finding
>> that somewhat distasteful?
>
>OK, that just puts a backslash in front of all Tcl's in-double-quotes
>metacharacters. Really.
Yes, I'm aware of that. But a beginner surely would have a hard
time being confident that the set was complete. So it could easily
degenerate into a piece of voodoo, handed down by cut'n'paste from
one project to another, until its original purpose was lost....
Actually I would have thought an encapsulation of that would be
a useful addition to the repertoire: [string unsubst] ??
>> And finally, although the solution is neat and instructive,
>> its relationship to the original requirements is not obvious
>> to anyone who is not highly Tcl-savvy.
>
>We know it ought to be more elegant and obvious than this; it's on our
>todo list. Maybe next year in Tcl 8.7...?
No, I wasn't criticising Tcl's facilities; I was questioning whether
it's good, especially for beginners or occasional users, to apply
techniques that are so many steps away from the original spec.
Even if it's a tad inefficient, pedestrian step-by-step implementation
of such requirements is sometimes a good investment for future
comprehensibility.
Thanks for the response.
--
Jonathan Bromley
That's correct. I really appreciate the input. As you might be able
to tell this isn't my day job. :)
I'll be able to give it a whirl later on and I'll send an even more
grateful thank you email then!
Thank you for your answer!
Take care,
Mark
> foreach textspiel [regexp -all -inline \
> {((\{@INPUTFILE)\s+\w+(.txt\}))} $data] {
That doesn't make sense. The regexp is returning
quadruplets of matches: the whole match, the outer ()
and the two inner (), so you should be skipping through
the list 4 at a time (foreach {a b c d} [regexp..).
I also see you have "." used where you intend "\."
to match a literal period, not any one character.
You aren't doing anything useful with the parentheses,
so I suspect you misunderstand their function. Instead
you should probably have used
foreach {textspiel textfilename} [regexp -all -inline \
{\{@INPUTFILE\s+(\w+\.txt)\s+\}} $data] {
> But, I think it must have to do with the brackets. It seems like when
> the text file that's going to be substituted into the master file
> contains bracketed text then the substitution goes forward. If not,
> regexp finds it and sends it down through the code but regsub won't
> substitute it.
Yes, you suspect correctly (but your demonstration didn't
even get that far).
> regsub -all {$textspiel} $data {$contents} data;
Here you are using literal plain text $textspiel as a
regular expression meaning match the end of a line followed
by "textspiel". Of course you meant "$textspiel" or $textspiel
so that the value of the variable was used. But in that case
you are still using plain (uncontrolled) text as a regular
expression, which it is not. There are three solutions for
that:
1) Don't use regsub: use string map instead.
2) Put escapes (backslashes) in front of all special charcters
in the $textspiel text, using string map. Check re_syntax to
make sure you cover every character that is special to regexp.
3) Tell regexp that the string is literal:
regsub "***=$textspiel" $data {$contents} data
Probably 1) is the best, but 3) is also convenient (but harder
to decipher when you see the program later).
(There is no reason to use -all because you are doing one
item at a time with the foreach loop.)
Donald Arseneau
OK, I've had some time to try to figure out what is happening here. I
didn't have enough time to get it fully working but that's because of
my data, I'm sure. I really, really appreciate ALL of the suggestions
and posts. This is really helpful! I'm basically trying to prepare
data in a TCL based program for use in a simple web app. This piece
has been holding me up for quite a while. I can't tell you how nice
it is to be so close to getting this part done!!!
I'll get another chance tonight (hopefully) to work on this.
#1 I figured I'd push my luck and ask if there's any simple way to
catch the problem (common in my data) that the file (that comes after
"@INPUTFILE") is missing. I was thinking that it'd be nice to
substitute in some little note in the text file. My previous code had
a little phrase that got put in there.
#2 You can tell I didn't get too far, but would this be the proper
implementation of the code you have written:
set file [open $filename r+]
set data [read $file]
processTemplate $data
puts $file $data
close $file
#3 I've got to get to my day job. I can't wait to get back to this,
though!
Take care,
Mark
I know you are on a track with [string map] followed by
[eval], which is good. But for posterity I had better
fix the bug above: I retained some other bogus braces
from your original. It should be
regsub "***=$textspiel" $data $contents data
Donald Arseneau
close. If your intention is to append the processed text to the end of
the unprocessed text then what you're doing is perfect. If on the
other hand you intend to replace the unprocessed text with the
processed text then you should seek to beginning of file:
...
set data [processTemplate $data]
seek $file 0
puts $file $data
...
Proceeding with the code laid out by Donal Fellows. And it produced a
very nice file, so now I'm running into a few extra stinkers that I
hadn't noticed before.
Apparently, I have to deal with a few other varieties of input with
this program.
It's not catching the "@INPUTFILE" string when...
#1 ... there's more than one space between "@INPUTFILE" and the name
of the textfile.
#2 ... the filename has a space (or spaces) in it.
#3 ... the filename has a dash (or plus sign or other allowed
character) in it.
#4 ... the filename has backslashes in it (like a long path)
#5 ... combinations of #1 to #4
Basically in running this and going through the output on a few files,
I've learned that people are naming their files whatever is allowable
by Windows. I think in my earlier post in another thread I had
indicated that there was some limitation to what we'd be looking for.
Could anyone help me with this? I think the regular expression is
going to become much more interesting!
I'm sorry for being such a pain. Sorry for not looking at the
possible file names beforehand... I really am thrilled to finally
have a glimmer of light at the end of the tunnel.
Thank you everyone!
Mark
>Apparently, I have to deal with a few other varieties of input with
>this program.
That happens....
>
>It's not catching the "@INPUTFILE" string when...
>#1 ... there's more than one space between "@INPUTFILE" and the name
>of the textfile.
>
>#2 ... the filename has a space (or spaces) in it.
>
>#3 ... the filename has a dash (or plus sign or other allowed
>character) in it.
>
>#4 ... the filename has backslashes in it (like a long path)
>
>#5 ... combinations of #1 to #4
>
>Basically in running this and going through the output on a few files,
>I've learned that people are naming their files whatever is allowable
>by Windows. I think in my earlier post in another thread I had
>indicated that there was some limitation to what we'd be looking for.
>
>Could anyone help me with this? I think the regular expression is
>going to become much more interesting!
I suspect that's fairly easy. If I can make the assumption that a
filename may not contain a right curly brace }, this regular expr
should do it:
{\s*?\@INPUTFILE\s*?(\S.*?)\s*?}
One step at a time (view in monospaced font):
{ - opening brace
\s*? - optional space before @
@INPUTFILE - the keyword
\s+? - any whitespace after keyword
( ) - parens to capture the filename
(same as before, right?)
\S - first char must be non-space
.*? - any other garbage in filename
\s*? - optional trailing space,
not part of the filename
} - closing brace
You'll note that I've put query characters after every * or +
repetition operator. That's because I don't want the .* filename
match to capture a bazillion characters up to the final } in the
file - it's known as "lazy" matching and forces .* (etc) to match
the shortest possible acceptable string. There are other ways
to get the same effect, but lazy matching conveys the sense
quite nicely here. Tcl regexps have complicated rules about
mixing lazy and greedy in the same expression, but in this case
it's OK to use lazy everywhere, which keeps things simple.
In fact you then don't need to apply the ? lazy modifier
eeverywhere, but I usually do so as a reminder to myself.
>I'm sorry for being such a pain. Sorry for not looking at the
>possible file names beforehand...
As you can imagine, I myself have absolutely no recollection
of ever starting to code without a truly complete
understanding of the requirements......... yeah, right :-)
--
Jonathan Bromley
> If I can make the assumption that a
> filename may not contain a right curly brace }
So then I go and try it, and discover that a Windows filename CAN
contain a right curly brace... arrrgh. So how are you supposed
to parse
{@INPUTFILE stupid}.txt}
???
Is there, for example, a guarantee that the directive stands
on a line of its own, and its closing brace is the last non-space
character on the line? In the absence of such a rule, I have no
idea how one would be supposed to handle arbitrary filenames.
Bah, humbug.
--
Jonathan Bromley
I hate to bother you, but I'm having a little bit of trouble with it
with the new changes.
The line in my code is:
regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile
\1]} s
I think the expression seems to be returning only the first letter of
the the filename, though.
There was a discrepancy between the stepwise description and the first
version (just a plus sign), but that really had the same effect (just
finding the first letter of the filename)
So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s"
Is the a function of the lazy search, my data, or could it be
something to do with my tcl version? I'm using tcl 8.3.1.
Thank you for any help!
Merry Christmas,
Mark
It's because you've not told it to explicitly match the “{}” around the
rest of the substitution term. Changing the regular expression
invocation to this:
regsub -all {\{\s*?@INPUTFILE\s*?(\S.*?)\s*?\}} $s \
{[readFromFile \1]} s
Should give satisfaction. Unless you've got one of those odd Windows
filenames with brace characters in, but you probably don't want anything
to do with those. :-)
(It's indeed caused by the lazy search; “.*?” can lazily match nothing
at all.)
Donal.
It's because you've not told it to explicitly match the “{}” around the
rest of the substitution term. Changing the regular expression
invocation to this:
regsub -all {\{\s*?@INPUTFILE\s*?(\S.*?)\s*?\}} $s \
{[readFromFile \1]} s
Should give satisfaction. Unless you've got one of those odd Windows
>Thank you very much!
>
>I hate to bother you, but I'm having a little bit of trouble with it
>with the new changes.
>
>The line in my code is:
>
> regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile
>\1]} s
>
>I think the expression seems to be returning only the first letter of
>the the filename, though.
Sheesh, I just *knew* that would happen.....
When I described the regular expression (RE), I hope the description
made it clear that its opening and closing braces should match
those same characters in the source text. In other words, they
really form a part of the RE you want to process. However, that
immediately caused an ambiguity that I didn't forestall.
When you put a RE into a typical Tcl command, it's common to
enclose the entire RE in braces because RE syntax contains
a lot of characters that would normally have special meaning
to Tcl - for example, square brackets - and it's usually wise
to prevent the Tcl parser from manipulating them. (Another,
possibly neater, method is carefully to put the required RE
into a Tcl variable, and use that as the argument in preference
to using a literal regular expression.)
In our case, an additional pair of braces around the whole thing
would have done the job:
regsub -all {{\s*?\@INPUTFILE\s*?(\S.*?)\s*?}} $s ...
Donal showed you exactly that, but also with backslashes in
front of the braces that actually form part of the RE. This
is sensible but not strictly necessary. Braces have a special
meaning in REs, but when they appear in isolation (as in our RE)
they don't have this special meaning and don't really need
backslashes. There's a slightly embarrassing back-story to
this: the RE visualiser that I wrote a while back [*] doesn't
correctly deal with that situation, and gets a little
confused if you include isolated backslashes in a sample RE.
Blush.
[*] http://www.doulos.com/knowhow/tcltk/examples/trev/
Since I no longer work for that organisation, I'm not
easily able to fix the oversight..... oops.
>There was a discrepancy between the stepwise description and the first
>version (just a plus sign
yeah, \s+ vs. \s*
I don't think it would make much difference in practice.
), but that really had the same effect (just
>finding the first letter of the filename)
>
>So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s"
Far worse than that; in the form you used, it was looking only for
"@INPUTFILE...." and was taking no notice of the opening brace.
Happy matching in 2010,
--
Jonathan Bromley
First off, it's working now!
I did have to add some brackets of my own in order for it to be able
open up the filenames it found with a space in the middle. So here's
what I ended up with:
proc processTemplate string {
# This is exactly the replacement to make a string [subst]-safe
set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string]
# Now convert the replacements to embedded commands#
regsub -all {\{\s*?@INPUTFILE\s*?(\S.*?)\s*?\}} $s {[readFromFile
{\1}]} s
# Process all the substitutions
return [subst $s]
}
# Simple read-a-file helper
proc readFromFile filename {
if {[file exists $filename]} {
set f [open $filename]
set d [read $f]
close $f
return [processTemplate $d]
} else {
return
}
}
and this is implemented with the following:
set file [open $filename r+];
set data [read $file];
set data [processTemplate $data];
seek $file 0;
puts $file $data;
close $file;
QUESTION #1: Is everyone OK with the brackets around the \1 in ...
{[readFromFile {\1}]}... ?
My computer seems to shudder and the hourglass thing does come up
while it's running. But the output looks ok.
QUESTION #2: Regarding my else statement in "proc readFromFile". Is
there a way to put some text into the master file that says something
generic like "file xxx was not located!"? Furthermore, is there a
better way to handle the issue of not finding the file?
Thank you everyone! I am so happy that it is working! My new
questions are minor issues, to be sure.
Merry Christmas,
Mark
As long as you don't have unbalanced {braces} in there, you'll be
fine. If they become an issue, you can add a bit more to the cleanup
step (the [string map]) to put a backslash in front of every space. Of
course, you then also need to change the regular expression...
If it's working, stop tinkering. :-)
> QUESTION #2: Regarding my else statement in "proc readFromFile". Is
> there a way to put some text into the master file that says something
> generic like "file xxx was not located!"? Furthermore, is there a
> better way to handle the issue of not finding the file?
Sure. It's just a slightly more sophisticated version:
proc readFromFile {filename} {
# Set the message up first
set result "file \"$filename\" was not located!"
catch {
set f [open $filename]
# Replaces that message with the contents unless the
# [open] above failed.
set result [read $f]
close $f
}
return $result
}
That is a good approximation of what a full production implementation
will look like; there's a few other rarer failure modes possible,
e.g., permission denied, but chances are you'll not encounter them
anyway. And pretending that the file doesn't exist in those cases is
actually pretty good anyway. :-)
Now in Tcl 8.6 what I'd do is use the new [try] command to do the more
sophisticated handling, like this:
proc readFromFile {filename} {
try {
set f [open $filename]
return [read $f]
} trap {POSIX ENOENT} {} {
return "file \"$filename\" was not located!"
} trap {POSIX} msg {
return "problem reading \"$filename\": $msg"
} finally {
if {[info exist f]} {close $f}
}
}
You should be able to see what it's doing without much help more than
the [try] manual page <URL:http://www.tcl.tk/man/tcl8.6/TclCmd/
try.htm>
Donal.