Assistance with Stripping Value from String

cmatt...@gmail.com

unread,

Mar 11, 2013, 11:16:58 PM3/11/13

to

All,

I have been racking my head and searching techniques for this one but either its so easy I cant see it or more complex for my skill set.

Basically I am reading a configuration file and searching for a specific term to get that specific line into a variable for me to work with.

All is good in that part.
Its once I get this string, I cannot think of a way to parse the one piece of info out of it I ultimately need.

MyString = <PARAM username="joes...@us.mycompany.com"></PARAM>

All I need is the email address inside the quotes.
the length of the line may vary and may have additional whitespaces within the line due to formatting issues so I cannot count over from a specific point which may give me undesirable data.

I imagine I might need to do something with a regular expression but I have no experience in creating the search pattern.

Thanks for any assistance,
Clay

Todd Vargo

unread,

Mar 12, 2013, 12:43:23 AM3/12/13

to

Assuming that you can rely on the "PARAM username=" part being
consistent, you can use InStr to locate it. Of course you first need to
correct the syntax, at least for testing it anyway.

MyString = "<PARAM username=" & Chr(34) & _
"joes...@us.mycompany.com" & Chr(34) & "></PARAM>"

EmailStart = InStr(MyString, "<PARAM username=" & Chr(34)) + 17

If EmailStart > 17 Then
EmailEnd = InStr(EmailStart, MyString, Chr(34))
email = Mid(MyString, EmailStart, EmailEnd - EmailStart)
End If

MsgBox email

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

Thomas Langer

unread,

Mar 12, 2013, 12:47:32 AM3/12/13

to

You can use split-function twice, first time with = as delimiter, second
time with " as delimiter

for usage of split-function
see http://msdn.microsoft.com/en-us/library/0764e5w5.aspx

--
Thomas Langer

bitte nur der NG antworten - please answer to newsgroup only
f�r direkte Mails Spam weglassen - for direct mails omit spam

R.Wieser

unread,

Mar 12, 2013, 7:33:53 AM3/12/13

to

Thomas,

> You can use split-function twice,

Once will be enough. The string(s) will than be in the array on
odd-numbered indices, in the example its a single string at index 1.

Ofcourse, if the property name itself also needs to be extracted than all
bets are off. :-)

Regards,
Rudy Wieser

-- Origional message:
Thomas Langer <Spa...@langer-online.net> schreef in berichtnieuws
khmc14$cv$1...@news.albasani.net...

cmatt...@gmail.com

unread,

Mar 12, 2013, 8:15:10 AM3/12/13

to

Thank you all for your input.

I may end up using the split function on the quotes since those I know while always be on the beginning and end of the value I want to extract.

I also am playing with this Regular Expression pattern someone made to extract an email address from a body of text.

Reg.Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"

I am not 100% familiar with patterns so I want to see why and how it works before including it in my production code.

Thanks again.

Dave "Crash" Dummy

unread,

Mar 12, 2013, 9:27:36 AM3/12/13

to

email=split(MyString,chr(34))(1)
--
Crash

Life is short. Eat dessert first.

Mayayana

unread,

Mar 12, 2013, 10:10:13 AM3/12/13

to

I use a tokenizing routine for HTML syntax highlight
coloring. In order to work 100% it has to deal with possible
errors like skipping the end bracket (in which case one
runs into < before the tag ends) but aside from that it's
pretty basic: Walk the string from < to >. A space marks
the start of an attribute and/or the end of an attribute
value. > also marks the end of an attribute value, but must
be checked for XML or "XHTML" tag ending.
= marks the start of the attribute value. While walking
the string a boolean value of InQuote is also needed.
You don't want to count = or " " within quotes as
part of the tag format.

A rough idea:

(This is "air code". The original is part of a routine that builds
a RichText string from plain text, so I can't paste it as-is. Note
that you'll want to use Trim on values, too, because using
extra spaces is not an error in HTML but is a common error
in typing.)

InQuote = False '-- inside quotes
GotSpace = False '-- current point is after " " and before =
Q2 = Chr(34)
EqualPt = 0 '-- offset of last =
SpacePt = 0 '-- offset = last " "

For i = 1 to len(s)
s1 = Mid(s, i, 1)
Select Case s1
Case Q2
InQuote = Not InQuote
Case " "
if InQuote = False then
'-- retrieve attribute value from EqualPt to here
GotSpace = True
SpacePt = i
End If
Case "<"
'start parsing tag here.
Case ">"
' end parsing tag here if InQuote = False
Case "="
If GotSpace = True then
'-- retrieve attribute name from SpacePt up to here.
GotSpace = False
End If
Case Else
'--
End Select
next

If you need to identify tags that's easliy done by using
a boolean like AfterGT, which becomes true after < and
false after " ". With an added system of arrays or even
a class it becomes fairly simple to organize the tags and
their values. If you only need to find "<PARAM" and then
retrieve the value that won't be necessary.
--
--
<cmatt...@gmail.com> wrote in message
news:96dd07bb-d94c-4f2c...@googlegroups.com...

GS

unread,

Mar 12, 2013, 9:53:54 PM3/12/13

to

Thomas Langer wrote :

> You can use split-function twice, first time with = as delimiter, second time
> with " as delimiter

Why twice? Why not use " as the delimiter...

vArr = Split(MyString, Chr(34))

sMaiTo = vArr(1) '//assumes vArr is zero based

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

Mayayana

unread,

Mar 13, 2013, 8:53:56 AM3/13/13

to

| Why twice? Why not use " as the delimiter...
|
| vArr = Split(MyString, Chr(34))
|

That works fine on the specific example, but
I wondered whether the text can be depended
upon to be exactly as posted. I also wondered
whether the OP had thought of that. PARAM can
have multiple attributes in HTML. This is not valid
HTML, so I'm guessing it's custom (and strangely
designed) XML, in which case there's no way to
guess the possible variations in the string. XML
could also have multiple attributes. If his specific
search term is "username", how does he know for
sure that "username" is not present in an attribute
value? For instance:

<PARAM registereduser="username">

If he finds that line first he won't get an email
address with the method of just splitting on quotes.

For that matter, if it's not being parsed as XML then
why format it as XML? It would be much easier and
more efficient to format it INI-style:

[Settings]
username=joes...@us.mycompany.com

I don't think there's any way to know the best
parsing approach without knowing what the OP
is doing and why his XML seems to be faulty. It's
valid to put the value as an attribute, but it defeats
the purpose of XML. It looks to me like someone
mixed up HTML with XML and needs to redesign the
config. file.

Dr J R Stockton

unread,

Mar 13, 2013, 4:57:16 PM3/13/13

to

In microsoft.public.scripting.vbscript message <388882d8-cee6-4c03-b980-
dc7aef...@googlegroups.com>, Tue, 12 Mar 2013 05:15:10,
cmatt...@gmail.com posted:

>
>I also am playing with this Regular Expression pattern someone made to extract an email address from a body of text.
>
>Reg.Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"
>

Before trying to find an e-mail address by pattern, you should first
find out what patterns and what characters are allowed for E-mail
addresses in your region of interest. Attempting to make deductions
from a few samples is unreliable.

--
(c) John Stockton, nr London, UK. E-mail, see Home Page. Turnpike v6.05.
Website <http://www.merlyn.demon.co.uk/> - w. FAQish topics, links, acronyms
PAS EXE etc. : <http://www.merlyn.demon.co.uk/programs/> - see in 00index.htm
Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm critdate.htm etc.

GS

unread,

Mar 13, 2013, 7:57:05 PM3/13/13

to

Mayayana pretended :

All good points, Joe! I agree...

cmatt...@gmail.com

unread,

Mar 18, 2013, 4:48:03 PM3/18/13

to

Thank you all again for all your input and ideas.

The data I am working with is not XML, HTML, INI , nor do I care in this instance.
It is a configuration file for a VPN client that only has about 300 lines to it so I read it as straight text, line by line, locate the PARAM username line and rip out the username within the quotes.

Just so happens all the usernames are formatted like an email address.
j...@countrycode.companyname.com

Thanks again,

Clay

Todd Vargo

unread,

Mar 19, 2013, 5:50:53 PM3/19/13

to

So did that REG pattern that you posted do what you needed?