Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regex Help

0 views
Skip to first unread message

Ezra Zygmuntowicz

unread,
Aug 22, 2005, 5:25:43 PM8/22/05
to
Hey Guys-
I have a regex problem that I am not sure how to tackle. I am
parsing some classified ads in order to format them for display
online. I have most of the parsing done but I need help with the
final step. So the file has one ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but
what I need to do next is to count 50 chars after the <begad:
11559303> tag and insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50
characters in to the line but if the 50 chars ends in the middle of a
word then I need to match the rest of the word as well. So I need a
way to match at least 50 chars plus the rest of the current word if
the 50'th char lands in the middle of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irri #<= 50 chars ends here# gation, horse barn. $122,000.
509-697-6519<endad>
So it ends in the middle of the word irrigation and I need it to
consume the whole word.

Any help is much appreciated-
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ez...@yakima-herald.com

David A. Black

unread,
Aug 22, 2005, 5:34:42 PM8/22/05
to
Hi --

Here's one idea:

str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")


David
--
David A. Black
dbl...@wobblini.net


John Halderman

unread,
Aug 22, 2005, 5:46:01 PM8/22/05
to
Seems to me that you're trying to do too much with one regular expression. I
would just grab the content between your tags and then trim that down to 50
characters and reassemble it afterwards.

-j

David A. Black

unread,
Aug 22, 2005, 5:53:52 PM8/22/05
to
Hi --

On Tue, 23 Aug 2005, John Halderman wrote:

> Seems to me that you're trying to do too much with one regular expression. I
> would just grab the content between your tags and then trim that down to 50
> characters and reassemble it afterwards.

I'm not sure what you mean by "too much". I think the substitution I
suggested does what Ezra said he needed. Is there an error in it?


David

Ezra Zygmuntowicz

unread,
Aug 22, 2005, 6:27:46 PM8/22/05
to
David-
Thanks, the regex you posted works great. I had considered just
trimming the text inside the tags and then untrimming until a word
end, but I figured there would be a regex that would do it all at once.

Thanks Dave-
Ezra

-Ezra Zygmuntowicz

0 new messages