I would like to match opening and closing-XML-tags in a very simple
manner with regexp.
When having the following string:
% set text {<anc>s</anc><stw1><anc>M</anc></stw1>}
% regexp "<(anc)>(.*?)</anc>" $text dummy tag body
% set body
s
Fine.
Now I'm doing something similar to cope with attributes of the tags.
With the following string
% set text2 {<anc h="m">s</anc><stw1><anc n="1">M</anc></stw1>}
and
% regexp "<(anc)\\s+(.+?)>(.*?)</anc>" $text2 dummy tag attributes body
I get
% set attributes
h="m"
% set body
s</anc><stw1><anc n="1">M
Why doesn't the non greedy operator work in this case?
What have I got to do to match the attributes and(!) the nearest closing
tag?
Thanks
Stefan
Stefan,
When you get this worked out to your satisfaction, could you add it to
the Regular Expression Examples page on the wiki?
Thanks,
Bob
--
Bob Techentin techenti...@mayo.edu
Mayo Foundation (507) 284-2702
Rochester MN, 55905 USA http://www.mayo.edu/sppdg/sppdg_home_page.html
Some words of wisdom from Steve Ball:
"My characterisation of using REs for parsing XML is that it
is like performing brain surgery with a chainsaw: you get the
job done, but you have to scrape lots of important bits off
the wall and put them back in where they belong."
If you *still* want to do things this way, take
a look at the grammar productions in the XML specification:
<URL: http://www.w3.org/TR/REC-xml >
The relevant part of the grammar forms a regular language;
translating it into a regexp should be straightforward.
--Joe English
Ah, indeed!
> If you *still* want to do things this way, take
> a look at the grammar productions in the XML specification:
>
> <URL: http://www.w3.org/TR/REC-xml >
>
> The relevant part of the grammar forms a regular language;
> translating it into a regexp should be straightforward.
Yep - that's more-or-less what I've done. It's not always
straight-forward, though.
The Tcl-only parser in TclXML is just a big regexp engine.
Stefan, are you *really* sure you want to reinvent the wheel?
> What have I got to do to match the attributes and(!) the nearest
> closing tag?
Big gotcha here - '>' is permitted in an attribute value.
So trying to match all text between the element type name and
the '>' character won't work in general.
Cheers,
Steve
--
Steve Ball | waX Me Lyrical XML Editor | Training & Seminars
Zveno Pty Ltd | Web Tcl Complete | XML XSL
http://www.zveno.com/ | TclXML TclDOM | Tcl, Web Development
Steve...@zveno.com +---------------------------+---------------------
Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099