Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

help with regex

21 views
Skip to first unread message

George Mpouras

unread,
Jun 19, 2013, 8:00:18 AM6/19/13
to
I must discover all possible field names of a key/value file.
The properties of the file are unknown so I must be a little creative.
The values optional can have whitespaces inside "..."
its key/value separated with a space from the next pair.
Do you thing the following is ok ?



#!/usr/bin/perl
use strict;
use warnings;

while(<DATA>) { chomp;

while ( /([^=]+)=("[^"]+"|\S+)/g ) {
my ($key, $val) = ($1, $2);
$val =~s/^["\s]*(.*?)["\s]*$/$1/;
print "*$key* *$val*\n"
}

print "--------\n"
}


__DATA__
f1=hello f2= f3="foo" f4="hello world"
f6="day" f7="day & night" f8=100

Tim McDaniel

unread,
Jun 20, 2013, 12:08:08 AM6/20/13
to
In article <kps6f4$15u3$1...@news.ntua.gr>,
George Mpouras <nospam.gravital...@hotmail.noads.com> wrote:
>I must discover all possible field names of a key/value file.
>The properties of the file are unknown

If by "properties" you mean the layout, format, et cetera,
then how can anyone advise you on a proper way to parse it
when neither you nor we know what's valid?

--
Tim McDaniel, tm...@panix.com

Peter Gordon

unread,
Jun 20, 2013, 5:52:11 AM6/20/13
to
George Mpouras <nospam.gravital...@hotmail.noads.com> wrote
in news:kps6f4$15u3$1...@news.ntua.gr:
You don't say if all possible sequences are included in the data.
If they are, the code below decodes it.

#!/usr/bin/perl -w
use strict;
use 5.14.0;
my %lines;
while( <DATA> ) {
chomp;
last if /^$/; # Catch blank lines at end of data.
while ( /(f\d+)=(.*?)(?: f\d+=|$)/ ) {
my $key = $1;
my $value = $2;
s/$key=$value(.*)/$1/; # Strip the key/value pair off the
string.
$value =~ s/"(.*)"/$1/; # Strip off any "
$lines{$key} = $value;
}
}
say "The Hash";
foreach my $key (sort keys %lines ) {
say "$key: $lines{$key}";

Martijn Lievaart

unread,
Jun 20, 2013, 10:33:35 AM6/20/13
to
On Thu, 20 Jun 2013 09:52:11 +0000, Peter Gordon wrote:

> You don't say if all possible sequences are included in the data.
> If they are, the code below decodes it.
>
> #!/usr/bin/perl -w use strict;
> use 5.14.0;
> my %lines;
> while( <DATA> ) {
> chomp;
> last if /^$/; # Catch blank lines at end of data.
> while ( /(f\d+)=(.*?)(?: f\d+=|$)/ ) {
> my $key = $1;
> my $value = $2;
> s/$key=$value(.*)/$1/; # Strip the key/value pair off the
> string.
> $value =~ s/"(.*)"/$1/; # Strip off any "
> $lines{$key} = $value;
> }
> }
> say "The Hash";
> foreach my $key (sort keys %lines ) {
> say "$key: $lines{$key}";
> }
> __DATA__
> f1=hello f2= f3="foo" f4="hello world"
> f6="day" f7="day & night" f8=100

I'ld do:

#!/usr/bin/perl -w
use strict;
use 5.14.0;
my %lines;
while( <DATA> ) {
chomp;
next if /^$/; # Skip blank lines
while (s/ \s* (f\d+) = (?: (".*?" | \w* ) \s* ) //x ) {
$lines{$1} = $2
}
die "'$1'" if /(..*)/;
}
say "The Hash";
foreach my $key (sort keys %lines ) {
say "$key: $lines{$key}";
}
__DATA__
f1=hello f2= f3="foo" f4="hello world"
f6="day" f7="day & night" f8=100

What it doesn't do is f99="\"" or a lot of other useful stuff. Anyone
know a module for this?

M4

George Mpouras

unread,
Jun 20, 2013, 4:29:00 PM6/20/13
to
key names can be whatever string with no spaces not f\d+
f100 etc was an example, so the regex
/(f\d+)=(.*?)(?: f\d+=|$)/ )
is not catching correctly


George Mpouras

unread,
Jun 20, 2013, 4:30:00 PM6/20/13
to
key names can be whatever string with no spaces not only f\d+

George Mpouras

unread,
Jun 20, 2013, 4:34:23 PM6/20/13
to
you are correct, specs are loosy
lines with multiple key/value pairs separated by space
keys are not containing space
values may contain space inside double quotes

Tim McDaniel

unread,
Jun 20, 2013, 4:47:31 PM6/20/13
to
In article <kpvome$ck5$1...@news.ntua.gr>,
George Mpouras <nospam.gravit...@spamno.hotmail.anispam.com.nospam> wrote:
>key names can be whatever string with no spaces not f\d+

The examples were the only spec you gave, so you can understand why
people coded to it.

>f100 etc was an example

An example provided by the teacher of the class?

Settings separated by space: do you mean one space or one or more
characters of whitespace?
Is there always an "="? That is, you can't have "foo bar"; it must be
"foo= bar="?
Can there be whitespace around "=", as in "foo = bar";
Can there be leading whitespace and/or trailing whitespace on the line?

--
Tim McDaniel, tm...@panix.com

Martijn Lievaart

unread,
Jun 20, 2013, 5:00:17 PM6/20/13
to
On Thu, 20 Jun 2013 16:33:35 +0200, Martijn Lievaart wrote:

> while (s/ \s* (f\d+) = (?: (".*?" | \w* ) \s* ) //x ) {

Hummm...

while (s/ \s* (f\d+) = (".*?" | \w* ) \s* //x ) {

Does the job just as well. Dunno what I was smoking.

M4


Martijn Lievaart

unread,
Jun 20, 2013, 5:01:20 PM6/20/13
to
On Thu, 20 Jun 2013 23:30:00 +0300, George Mpouras wrote:

> key names can be whatever string with no spaces not only f\d+

Replace f\d+ with \w+ or [^ ]+

M4

George Mpouras

unread,
Jun 20, 2013, 5:57:26 PM6/20/13
to

> An example provided by the teacher of the class?

!!!!!!!!!!!!!!!!!!!!!!!!!!

0 new messages