I’m writing Oracle pl server pages in a standard text editor. Only
thing is I don’t like some of the syntax. For example:
Instead of writing: <%= blahblah %> I would like to write #blahblah#
And when I need to write # I would like to use ## (very much the same
as coldfusion’s syntax).
There are few other things I would like to change. This would be a
good start. So could tcl act as a preprocessor and change the syntax
of the text file before I issue another command which will load it
into the Oracle db?
So I would create the file a.psp and then execute a tcl script which
would change it into b.psp and then I could load and compile b.psp.
Is tcl a good choice for this sort of thing? Are there already
scripts of this nature that I could use as an example?
It really depends on exactly what you want. If you're happy with
running a preprocessing step that transforms one file into another,
then most certainly Tcl can do it, and easily. I advise using a
different file extension for the input files because it's formally a
different (but closely related) language that they contain.
Here's a sample script, including unix #! magic; feel free to take and
adapt:
#!/usr/bin/env tclsh8.5
# How to do the string transform
proc transformContents {str} {
# NB: Non-greedy RE; _important_!
set str [regsub -all {#(.*?)#} $str {<%= \1 %>}]
# Other transforms go here...
return $str
}
# How to lift from a string to a file
proc transformFile {inFile outFile} {
set f [open $inFile]
set contents [read $f]
close $f
set transformed [transformContents $contents]
set f [open $outFile w]
puts -nonewline $f $transformed
close $f
}
# Apply to all files listed on command line, with a simple method to
# generate output filenames from input ones.
foreach inFile $::argv {
transformFile $inFile [file rootname $inFile].psp
}
Note that much of this code is stupid, and it's not very resilient
against errors, etc. But it should also be easy to see how it works.
If you're running on Windows, you can omit the #! stuff (but should
put it in a file with extension .tcl instead so the OS knows to
associate it with tclsh.exe).
If you're writing your own REs, the first thing to remember is to put
the regular expression in {curly brackets} to avoid confusion. The
second is to try out your regular expressions interactively with
"evil" input texts; it's by far the easiest way to get things right.
For example, I tested the above with this:
% set str "abc#def#ghi#jkl#mno"
abc#def#ghi#jkl#mno
% regsub -all {#(.*?)#} $str {<%= \1 %>}
abc<%= def %>ghi<%= jkl %>mno
Mere seconds to test, but allowed me to know I'd got it right. :-)
Donal.
> Is tcl a good choice for this sort of thing? Are there already
> scripts of this nature that I could use as an example?
Yes, definitely. The meat of what you're trying to do would only be a
couple lines.
Here, for example, is the key line from Mr. Fellows's script, with a
line before and after to handle the ##-to-# conversion, reading from
standard input:
#!/usr/bin/tclsh
set foo [string map {## __hashy} [read stdin]]
regsub -all {#(.*?)#} $foo {<%= \1 %>} foo
puts [string map {__hashy #} $foo]
Test:
bing66:~ caj$ cat foo
##hello
#blahblah# #yakyak#
bing66:~ caj$ cat foo | ./convert.tcl
#hello
<%= blahblah %> <%= yakyak %>
bing66:~ caj$ cat foo
##hello
#blahblah# #yakyak# # inner ##hashmark #
bing66:~ caj$ cat foo | ./foo.tcl
#hello
<%= blahblah %> <%= yakyak %> <%= inner #hashmark %>
--S
The script is 99% doing what I’d hoped. Here’s the only place it’s
off:
Input:
This works: #xxxx# and this works: ##
The only thing that does not work is this: #one##two#
You need to tell us what is the output you see, and what is the output
you expect from this input.
#xxx# -> I get <%xxx%>, I want <%xxx%> (correct)
##
#one#two# -> I get, I want (wrong)
please fill this table. Otherwise we can't know what you mean by "it
works" and "it doesn't work", repectively
Christian
>
> This works: #xxxx# and this works: ##
> The only thing that does not work is this: #one##two#
Hiya,
If you want #one##two# to become <%= one %><%= two %>, then the script
is even shorter:
#!/usr/bin/tclsh
regsub -all {#(.*?)#} [read stdin] {<%= \1 %>} foo
puts [string map {<%=\ \ %> #} $foo]
Testing...
bash-3.2$ cat foo
##hello
#blahblah# #yakyak# #one##two#
bash-3.2$ cat foo | ./convert.tcl
#hello
<%= blahblah %> <%= yakyak %> <%= one %><%= two %>
Note that this new script cannot handle a ## tag inside a #blahblah#
tag.
Could the script be enhanced to transform an if statement?
Turn this: <cif x=1 > into this: <% if x=1 then %> /// where
the x=1 changes for each statment
Turn this: <celse> into <% else %>
Turn this </cif> into this: <% end if; %>
So instead of having to write:
<% if x = 1 then >
do this….
<% else %>
here
<% end if; %>
If could be written as:
<cif x =1 >
do this
<celse>
here
</cfi>
I don’t want to wear out my welcome. These changes are very helpful
and this makes coding so much nicer. I greatly appreciate this.
Hiya,
There are two commands at work here. The simpler one is [string map],
which takes a list of literal before-after pairs. Any simple find-and-
replace job can be done by this command; two thirds of your if
statement problem could be handled this way:
set text [string map {<celse> "<% else %>" </cif> "<% end if; %>"}
$text]
Notice that this matches strings exactly, so "< celse >" would not
match. You'd need the regsub command for that.
The regsub command uses extended regular expression syntax that is
described in detail at http://www.tcl.tk/man/tcl8.4/TclCmd/re_syntax.htm
(for Tcl 8.4). The line we took from Mr Fellows's script:
set str [regsub -all {#(.*?)#} $str {<%= \1 %>}]
...attempts to find a # followed by anything (wildcard .*) followed by
a #. The question mark tells the regexp command to find the smallest
match possible rather than the largest match possible, so that if we
try to match #one##two# it does not decide that "one##two" is the
middle part. The parentheses tell the command to set aside whatever
data is inside them for use later. The last argument is the string to
replace it with, and the \1 means "drop in the stuff from the first
pair of parentheses."
Thus, you could implement cif like so:
regsub -all {<\s*cif(.*?)>} $str {<% if \1 then %>}
Note the \s*, which allows padding in your code. You might want to
use [regsub {<\s*celse\s*>} $str {<% else %>}], for example.
I should warn you, however, that '>' and '<' might be bad delimiters
for an if statement, since those symbols may also appear in your if
expressions. Delimiters always have these sorts of annoying pitfalls,
and half the problem is finding a notation that doesn't burn you
later.
I notice that in your code so far, everything you write is lowercase.
Is that true in general? If so, you can always skip the funny symbols
completely and use uppercase words for all wildcards. For example:
#!/usr/bin/tclsh
array set sub {IF "<% if" THEN "then %>" ELSE "<% else %>" FI "<% end
if; %>"}
foreach a [regexp -inline -all {\s+|\S+} [read stdin]] {
if [string is upper $a] {
if [catch {set a $::sub($a)}] {
set a "<%= [string tolower $a] %>"
}
}
lappend out $a
}
puts [join $out {}]
Testing:
bash-3.2$ cat foo
TAG tag TaG TAG
IF x>5 THEN
Here is a TAG
FI
IF 1 THEN we say HOORAY ELSE we say BOO FI
bash-3.2$ cat foo | ./convert.tcl
<%= tag %> tag TaG <%= tag %>
<% if x>5 then %>
Here is a <%= tag %>
<% end if; %>
<% if 1 then %> we say <%= hooray %> <% else %> we say <%= boo %> <%
end if; %>
...the advantage of this is that your tags require no extra delimiters
at all, and TYPING out your CODE may be FASTER and EASIER than
#typing# out your #code# like <this>.
--S
Writing code in all caps is an idea I’ve never considered. I’m not
sure if it would work because sometimes I’ll need to send the a
capitalized word to the browser or compare a string with caps. One
thing about writing html/js with embedded plsql, for me if commands
can be kept in <> tags with few delimiting characters, it flows and
it’s easy to read and write. Going with code in CAPs is a paradigm
shift, I’ve never considered. Petty amazing concept.
The problem with the > delimiter could be solved with a rule. If the
> is inside of single quotes, then ignore it. This rule would fix it
100% of the time the syntax is determined by Oracle’s pl/sql and the >
can only be used inside a single quoted string.
Here’s an example:
<table>
<tr><th>SERIAL #</th></tr>
<tr><td><cif x = ‘AB>CD’> Type AB: #z.serial_nbr#<celse>Type: unknown</
cif></td></tr>
</table>
I was thinking of possibly doing away with the “c” and just doing:
<if x=’123’>123<else>unk</if>
Ultimately, it should be possible to handle many of the plsql commands
and few conflict with html key words, eg case, loop, while,
assignment operators etc….
proc transformContents {str} {
regsub -all {#(.*?)#} $str {<%= \1 %>} str
set str [string map {<%=\ \ %> # <celse> "<% else %>" </cif> "<% end
if; %>"} $str]
regsub -all {<cif(.*?)>} $str {<% if\1 then %>} str
return $str
}
set transformed [transformContents [read stdin]]
puts $transformed
This:
The first test: ##<br>
Second test: #ct#<br>
<br>
<cif ct = 10>
ten
<celse>
not ten
</cif>
Transforms into:
The first test: #<br>
Second test: <%= ct %><br>
<br>
<% if ct = 10 then %>
ten
<% else %>
not ten
<% end if; %>
<cif x = 'aa>bb' >
I’m thinking of doing the non-greedy re to find x = 'aa> and then if
that string has a ' then do another expression that looks past the '
to find the >
I’ve been trying to get variation of this to work:
regsub -all {<cif(.*?<=')>} $str {<% if\1 then %>} str
I was trying to says only match if the /1 does not contain a single
quote. If this would work then I could say OR match /1 if there is a
single quote.
Not sure if my syntax or if tcl does not do regsub with positive and
negative look behind.
Hi,
You don't need any abstruse regexp features to do this. Your <cif>
has two modes, one without any internal quotes, and one with a single
pair of single quotes that may contain a > mark. Each of these can be
expressed by a simple regular expression, so just use two regsubs
instead of one.
You will want to use bracket notation, e.g. replacing {.*?} (any
string of characters) with {[^'>]*} (any string of characters with no
a quote or > marks.) Note that if you explicitly forbid > marks in
the interior of your tag, you don't need the question mark anymore.
--S
u snpd 2 mch
Youlaterusedtoommuchshorthand
IOW post unintelligible
Prof Craver's post was well english and made perfect sense.
Maybe your news-reader boggled it up?
Here's part of his article with certain non-letters replaced:
" You will want to use bracket notation, e.g. replacing ... (any
" string of characters) with ... (any string of characters with no
" a quote or [greaterthan] marks.) Note that if you explicitly
" forbid [greaterthan] marks in the interior of your tag, you don't
" need the question mark anymore.
Ok, there was one superfluous "a" in it, but I guess that wasn't
what triggered your reply.
would fail on: <if x = '4' and y = 2>
This may have fixed it:
regsub -all {<if([^'>]*)>} $str {<% if\1 then %>} str
regsub -all {<if(.*?'.*'.*)>} $str {<% if\1 then %>} str
Not sure if this is correct for all variations or if it’s the best way
to write it…so far it’s working.
You have no idea how helpful this is to me. Thank you once again.
That's sort of what I was thinking, although I would try:
<if([^'>]*'[^']*'[^'>]*)>
You have to be careful when using .*, because it might accidentally
run off and match all the text from the middle of one <if> to the
middle of another one. You need to be able to prove that this can't
happen. With this expression, we know that the first and third
wildcards can't run past the end of the <if> tag, because they can't
eat a '>' mark. We also know the second wildcard can't run past the
end of the <if> tag, because can only read the interior of a single-
quoted expression.
This is assuming, of course, that the programmer only ever has 0 or 2
single-quote marks inside an <if> tag, and any extraneous '>' symbols
only appear in a single-quoted region.
--S
Thank you.
Problem was combination of lack of sleep and being a Tcl newbie.
eg. <if x=1 and y='a' and z='er' and i='3<' or j='d' and k=4 >
I was thinking: match the "if" then anything up to the first quote,
then greedy match to the last quote then match the final >.
Is it easier to just read each character one by one in a loop and
process each character sequentially?
This won't work: The greedy match could skip over to the next if
statement with a quote. You are working quite hard to get to the limits
of the RE language:)
Recall, that a RE is a finite automaton. This means, that it is a
network of "states" connected by lines which stand for a certain
character. It reads a character from the input, then decides to which
new state it should go, and forgets everything it has seen so far. It
then proceeds with the next character. This means, that a RE cannot
parse anything where it needs to remember more than a single state. For
example, it is not possible to count matching parentheses in a string
like {(-)([])}} - for this one needs to remember, which parentheses have
been opened so far.
Now, in your case, it is still possible to use a RE; the automaton must
work like this: Read <if, then anything up the next ' or >, then
anything which is not a ', then maybe again anything which is not >.
This quoted/non-quoted thing can repeat (0 or more times). Finally, we
have >
(chris) 67 % set s "<if x=1 and y='a' and z='er' and i='3>' or j='d' and
k=4 >"
<if x=1 and y='a' and z='er' and i='3>' or j='d' and k=4 >
(chris) 68 % regexp -inline {<if([^'>]+)('[^']*'[^'>]*)*>} $s
{<if x=1 and y='a' and z='er' and i='3>' or j='d' and k=4 >} { x=1 and
y=} {'d' and k=4 }
(chris) 69 % regexp -inline {<if(([^'>]+)('[^']*'[^'>]*)*)>} $s
{<if x=1 and y='a' and z='er' and i='3>' or j='d' and k=4 >} { x=1 and
y='a' and z='er' and i='3>' or j='d' and k=4 } { x=1 and y=} {'d' and k=4 }
(chris) 70 %
If you now ask for backslash-escaping the ' like 'a\'bcd', then it will
be increasingly difficult up to a point, where it is more easy to do
this by splitting the string at ' and counting angle brackets by hand -
or writing a real parser using some parser generator.
Christian
Damn me for an amateur: there's actually a very simple way to do the
whole thing in one command.
First, here's the expression:
{<if((?:[^'>]*'[^']*')*[^'>]*)>}
If I puts [regsub -all $expression [read stdin] {<% if \1 then %>}]...
bash-3.2$ cat test.txt
<if a=5>
<if a='5'>
<if a='5' && 'b>5' && c='foo' && d='bar'>
bash-3.2$ cat test.txt | ./regexp.tcl
<% if a=5 then %>
<% if a='5' then %>
<% if a='5' && 'b>5' && c='foo' && d='bar' then %>
The trick is this: (expression)* will match 0 or more occurrences of
expression, so we take the expression for matching blah='yak', and
wrap it up in a NON-CAPTURING set of parentheses. You will remember
that ordinary parentheses will cause the regsub command to capture
whatever is inside them, for use later. If you just want to use
parentheses as grouping symbols without the capturing effect, you use
"(?:" instead of "(".
So our overall expression is {<if( inner_expression )>}, and our inner
expression is
(?: quoty_expression )*[^'>]*
And the quoty expression is
[^'>]*'[^']*'
This will match 0 or more instances of blah='yak', followed by some
extra stuff (meaning it will also match <if> tags with no quotes.)
--S
You have a valid point though: regular expressions are insanely
impenetrable.
If you consider just how many data processing tasks can be solved by a
couple regular expressions, you have to wonder why they don't comprise
half the source code on the planet. But then if you try to read or
write nontrivial regular expressions, it becomes obvious why they are
used so sparingly. Programming with regular expressions is about as
much fun as programming a Turing machine.
That being said, Tcl in particular has a fantastic and friendly
interface for regular expressions. Other languages will let you use
regular expressions for matching/substituting an input line. Tcl lets
you capture data and return lists of matches all in one command. Just
consider the following code:
foreach a [regexp -inline -all {\s*|\S*} [read stdin]] {
... blah blah ...
}
This separates standard input into alternating space and non-space
components. I call this a "power split;" it's like [split], except it
records all of the space content so you can put the string back
together exactly the way it was.
This, for example, scrambles each word in a file while leaving the
spaces the same:
proc random args { expr int(rand()*2)*2-1 }
proc scramble word {
join [lsort -command random [split $word ""]] ""
}
foreach a [regexp -inline -all {\s*|\S*} [read stdin]] {
if ![string is space $a] { set a [scramble $a] }
lappend out $a
}
puts -nonewline [join $out ""]
Show me any other language where this is so concise.
--S
This is great. It even accepts: <if z = 'xxxx''yyyy'>
I'll use this for all boolean logic blocks.
(?: )* makes sense. It creates the group without capture.