Regex help

Ezra Zygmuntowicz

unread,

Jun 10, 2005, 7:54:18 PM6/10/05

to

Hey there list
I hope you geniuses can help me with a little regex and date magic.
Say that I have a string like this:

"SPT_051205.jpg"

How could I strip out the numbers that stand for the date and convert
it to look like this:

"May 12 2005"

Any suggestions are much appreciated!
Thanks-
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ez...@yakima-herald.com

Nikolai Weibull

unread,

Jun 10, 2005, 8:21:52 PM6/10/05

to

Ezra Zygmuntowicz wrote:

> Say that I have a string like this:
>
> "SPT_051205.jpg"
>
> How could I strip out the numbers that stand for the date and convert
> it to look like this:
>
> "May 12 2005"
>

"SPT_051205.jpg".scan(/(\d\d)(\d\d)(\d\d)/) do |a|
puts Time.mktime(a[2].to_i + 2000, a[0].to_i, a[1].to_i).strftime("%b %d %Y")
end

Enjoy,
nikolai

--
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Pete Elmore

unread,

Jun 10, 2005, 8:28:59 PM6/10/05

to

Ezra Zygmuntowicz wrote:
> "SPT_051205.jpg"
>
> How could I strip out the numbers that stand for the date and convert
> it to look like this:
>
> "May 12 2005"

#!/usr/bin/env ruby
require 'date'
def date_from_special_string(s)
/(\d\d)(\d\d)(\d\d)/.match s
month = $1
day = $2
year = "20#{$3}"
Date::parse("#{month}/#{day}/#{year}").strftime("%B %d %G")
end

puts date_from_special_string "SPT_051205.jpg"

There was a little ambiguity in the example (Was the month the first 05,
or was that the year?), so you may need to swap the $1 and $3.

Hope that was helpful.

Pete

Ezra Zygmuntowicz

unread,

Jun 10, 2005, 8:34:24 PM6/10/05

to

On Jun 10, 2005, at 5:21 PM, Nikolai Weibull wrote:

> Ezra Zygmuntowicz wrote:
>
>
>> Say that I have a string like this:
>>
>> "SPT_051205.jpg"
>>
>> How could I strip out the numbers that stand for the date and convert
>> it to look like this:
>>
>> "May 12 2005"
>>
>>
>
> "SPT_051205.jpg".scan(/(\d\d)(\d\d)(\d\d)/) do |a|
> puts Time.mktime(a[2].to_i + 2000, a[0].to_i, a[1].to_i).strftime
> ("%b %d %Y")
> end
>
> Enjoy,
> nikolai
>

Thanks so much Nikolai it works perfect!
-Ezra

> --
> Nikolai Weibull: now available free of charge at http://bitwi.se/!
> Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
> main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
>

-Ezra Zygmuntowicz

Ezra Zygmuntowicz

unread,

Jun 10, 2005, 8:36:46 PM6/10/05

to

Thanks Pete, that helps a lot as well. I guess I should have picked a
better example too. But you got it right the first 05 is the month
and the last one was the year. I'll take care to use an unambiguous
sample next time.

Thanks!

Nikolai Weibull

unread,

Jun 10, 2005, 8:39:46 PM6/10/05

to

Pete Elmore wrote:

> Ezra Zygmuntowicz wrote:

> > "SPT_051205.jpg"

> Date::parse("#{month}/#{day}/#{year}").strftime("%B %d %G")

Are you sure %G is what you want here? I’m betting that the dates
generated for the names of these JPEGs aren’t conforming to ISO 8601 and
timezone settings. Or perhaps I’m misunderstanding the usefulness of
%G. Also, Ezra, the use of the fifth month (May) was ambiguous, as it’s
unclear whether you want the full month name (%B) or its three-letter
abbreviation (%b).

> There was a little ambiguity in the example (Was the month the first
> 05, or was that the year?), so you may need to swap the $1 and $3.

The <year><day><month> template is definitely one of the weirder ones,
although <month><day><year> is certainly a bit weird as well :-),
nikolai

Ezra Zygmuntowicz

unread,

Jun 15, 2005, 5:44:50 PM6/15/05

to

Hello list!
Could someone help me do a little regex conversion? I've got a
few perl compatible regexes from a php script I am trying to port to
ruby but I need a little help. Here are the php functions:

$buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
[a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
$buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
(?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
$0</a>", $buffer);
$buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
"<a href=\"mailto:$0\">$0</a>", $buffer);

Can someone please help me get these into a format that ruby will
like? I kow I will end up using gsub! to do the sub but these regexes
don't parse correctly in ruby and I am not sure of the rules I need
to follow to make ruby happy. Help is much appreciated.
Thanks-

Chris Eidhof

unread,

Jun 15, 2005, 5:52:56 PM6/15/05

to

On Thu, Jun 16, 2005 at 06:44:50AM +0900, Ezra Zygmuntowicz wrote:
> Hello list!
> Could someone help me do a little regex conversion? I've got a
> few perl compatible regexes from a php script I am trying to port to
> ruby but I need a little help. Here are the php functions:

> $buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
> [a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
> $buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
> (?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
> (?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
> $0</a>", $buffer);
> $buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
> "<a href=\"mailto:$0\">$0</a>", $buffer);

I'm willing to help, but could you give a little more detail on what
the regexen should do?
--
Best regards,
Chris Eidhof

Ezra Zygmuntowicz

unread,

Jun 15, 2005, 6:26:10 PM6/15/05

to

Thanks Chris-
I was able to hack these out and get them to work in ruby. They
just do some formatting and conversion of some hyperlinks and ftp
links. It was the (?....) grouping that was messing things up a bit.
Thanks all the same though!

Nikolai Weibull

unread,

Jun 15, 2005, 6:36:39 PM6/15/05

to

Ezra Zygmuntowicz wrote:

> Could someone help me do a little regex conversion? I've got a
> few perl compatible regexes from a php script I am trying to port to
> ruby but I need a little help. Here are the php functions:
>
> $buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
> [a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
> $buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
> (?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
> (?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank\">
> $0</a>", $buffer);
> $buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
> "<a href=\"mailto:$0\">$0</a>", $buffer);

OK, this wins my newly instated prize for _worst regexes ever_. Inefficient,
inconclusive, inconsistent, and just plain wrong. I really hope you
don’t have to work with a lot of code like this.

Nonetheless, here’s my solution:

domain = /(?:[[:alnum:]\-]+\.)/
tld = /[[:alpha:]]{2,4}/
buffer.gsub!(/(?<!"|http:\/\/)www\.#{domain_part}*#{tld}/, 'http://\0')
buffer.gsub!(/(?<!\"|href=|href\s=\s|href=\s|href\s=)
(?:https?|ftp):\/\/#{domain_part}+#{tld}
(?::\d+)?(?:\/[^\s"'<]+)?/x,
'<a href="\0" target="_blank">\0</a>')
buffer.gsub!(/(?<=\s)[[:alnum:]\-_.]+@[^,<\s]+/i,
'<a href="mailto:\0">\0</a>')

Totally untested, but at least it’s somewhat easier to understand and a
bit more correct. There are better ways to extract URLs and email
addresses from an input than this, mind you,

Ezra Zygmuntowicz

unread,

Jun 15, 2005, 6:46:05 PM6/15/05

to

Nikolai-
Thank you. I have inherited a ton of NASTY php code like this at
the newspaper I work at. I am rewriting it all in rails and ruby cgi
scripts. But the guy who wrote this stuff is no longer here and I
think he liked making his code as obsfuscated as possible in order to
keep his job secure. I am by no means a regex master so digesting
volumes of stuff like this hurts my head. Thank you for the help.

Martin DeMello

unread,

Jun 16, 2005, 2:44:03 AM6/16/05

to

Ezra Zygmuntowicz <ez...@yakima-herald.com> wrote:
>
> Nikolai-
> Thank you. I have inherited a ton of NASTY php code like this at
> the newspaper I work at. I am rewriting it all in rails and ruby cgi
> scripts. But the guy who wrote this stuff is no longer here and I
> think he liked making his code as obsfuscated as possible in order to
> keep his job secure. I am by no means a regex master so digesting
> volumes of stuff like this hurts my head. Thank you for the help.

http://www.weitz.de/regex-coach/ is a nice way to interactively test
regexps as you develop them.

martin

Ezra Zygmuntowicz

unread,

Jun 16, 2005, 3:31:15 AM6/16/05

to

Martin-
Thank you for the link! That is exactly the tool I needed. I
really appreciate it.

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ez...@yakima-herald.com
509-577-7732