I use a tokenizing routine for HTML syntax highlight
coloring. In order to work 100% it has to deal with possible
errors like skipping the end bracket (in which case one
runs into < before the tag ends) but aside from that it's
pretty basic: Walk the string from < to >. A space marks
the start of an attribute and/or the end of an attribute
value. > also marks the end of an attribute value, but must
be checked for XML or "XHTML" tag ending.
= marks the start of the attribute value. While walking
the string a boolean value of InQuote is also needed.
You don't want to count = or " " within quotes as
part of the tag format.
A rough idea:
(This is "air code". The original is part of a routine that builds
a RichText string from plain text, so I can't paste it as-is. Note
that you'll want to use Trim on values, too, because using
extra spaces is not an error in HTML but is a common error
in typing.)
InQuote = False '-- inside quotes
GotSpace = False '-- current point is after " " and before =
Q2 = Chr(34)
EqualPt = 0 '-- offset of last =
SpacePt = 0 '-- offset = last " "
For i = 1 to len(s)
s1 = Mid(s, i, 1)
Select Case s1
Case Q2
InQuote = Not InQuote
Case " "
if InQuote = False then
'-- retrieve attribute value from EqualPt to here
GotSpace = True
SpacePt = i
End If
Case "<"
'start parsing tag here.
Case ">"
' end parsing tag here if InQuote = False
Case "="
If GotSpace = True then
'-- retrieve attribute name from SpacePt up to here.
GotSpace = False
End If
Case Else
'--
End Select
next
If you need to identify tags that's easliy done by using
a boolean like AfterGT, which becomes true after < and
false after " ". With an added system of arrays or even
a class it becomes fairly simple to organize the tags and
their values. If you only need to find "<PARAM" and then
retrieve the value that won't be necessary.
--
--
<
cmatt...@gmail.com> wrote in message
news:96dd07bb-d94c-4f2c...@googlegroups.com...