set string "<%@ Page Language="vb" AutoEventWireup="false"
CodeBehind="AddContact.apsx.vb"
Inherits="MCSDWebAppxVB.AddContact %>"
What I'm currently using is a regexp with subexpressions to find this,
and I want to catch the beginning of the token and the end of the
token, as everything in between can be variable:
regexp -nocase {(\<\%\@ page) (\%\>)} $string result
-> 0
I know this is a primitive question, but I can seem to get the second
sub-expression to be variable in sequence to the first subexpression.
Again, my regexp would be correct if it produced result =
"<%@ Page %>"
Thank you in advance for your help. I've looked through the literature
and groups on this site to no avail.
I hope the above is not your actual test code as it is invalid Tcl.
Since your string contains literal double quote chars you should quote
it with curly braces:
set string {<%@ Page Language="vb" AutoEventWireup="false"
CodeBehind="AddContact.apsx.vb"
Inherits="MCSDWebAppxVB.AddContact %>}
>
> What I'm currently using is a regexp with subexpressions to find this,
> and I want to catch the beginning of the token and the end of the
> token, as everything in between can be variable:
>
> regexp -nocase {(\<\%\@ page) (\%\>)} $string result
> -> 0
What this pattern lacks is any RE spec to match the
"everything in between. Also < % and @ don't need to be
escaped. Finally it doesn't make much sense to capture the
value of the fixed beginning and ending strings, So ...
% regexp -nocase {<%@ page(.*)%>} $string => middle
1
% puts $middle
Language="vb" AutoEventWireup="false"
CodeBehind="AddContact.apsx.vb"
Inherits="MCSDWebAppxVB.AddContact
Does this help?
Roy
You're probably after this:
regexp -nocase {<%@\s*page\s*((?:(?!%>).)*)%>} $string \
wholeMatch interestingContent
How does it work? Simple. Firstly, none of <%@> are special chars to RE
by themselves. Secondly, the first part (up to the second \s*) matches
the start of the interesting chunk from your string, and there's a
literal %> at the end. Thirdly, there's a bunch of real RE magic in the
middle:
((?:(?!%>).)*)
Looking at that in detail, we see this in expanded form
( # Stash this sub-RE in a variable
(?: # Pure grouping of zero-width assertion and matcher
(?! %> ) . # any character, as long as it isn't the start of %>
) * # and as many of them as possible
) # That's All Folks!
The key to doing sane parsing is often the negative lookahead assertion.
It makes life *so* much easier!
Donal.
I have two cases I'm still trying to work with that are causing me some
trouble. Let me see if I can map it out for you, not this is written
in symbolics:
<[1]:[2] [ ]>
The <> at the beginnign and the end are expected and searched for; the
expression is contiguous characters.
[1] is a contiguous expression of a plurality of characters allowing
alpha and numeric only.
[2] is a contiguous expression of a plurality of characters allowing
alpha and numeric only, followed by a required space, as indicated
above
[ ] is the same as the RE in your reply; the colon is also required.
I'm using regexp again, since I've been so successful with what you
showed me, here's what doesn't seem to work:
regexp {<*[a-zA-Z0-9]:*[a-zA-Z0-9] ((?:(?!>).)*)>} $string output
It handles the variability of length in the contiguous first two parts
fine, but allows non-alphanumeric characters through.
Your help is as always greatly appreciated!
You've put the *s in the wrong places, and in any case, if you're
parsing XHTML, you'll want this:
set RE {(?i)<(/?)(?:([-a-z0-9]+):)?([-a-z0-9]+)\s*((?:(?!/?>).)*)(/?)>}
regexp $RE $string -> closer NSpfx localName attributes empty
Variable $closer is non-empty if you've got a close-tag.
Variable $NSpfx has the namespace prefix, if present. The meaning of
this is assigned through inheritable xmlns:... attributes... :^/
Variable $localName has the local name of the element.
Variable $attributes has all the extracted attributes, ready for
parsing. If you're being strict, it's an error for this variable and
$closer to have non-empty content at the same time.
Variable $empty is non-empty if you've got an element without content
and where you should not be looking for a close tag. It is an error for
both this and $closer to have non-empty content at the same time.
If you're really serious about parsing HTML or XML, use a proper HTML or
XML parser. There are enough tricky bits and pieces (non-trivial
entities are the classic examples, but there are many many more) that it
is worth using someone else's parser here.
Donal.