Confused about Schwartz idiom utilizing map & split

1 view
Skip to first unread message

weston

unread,
Mar 3, 2006, 6:53:01 PM3/3/06
to
In an article on Stonehenge.com on using libxml2 to strip html from a
document, I came across a part of the listing that I'm having trouble
understanding. Randall apparently creates a hash of approved tags and
their attributes with these lines:

=9= my %PERMITTED =
=10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
=11= split /\n/, <<'END';
=12= a href name target class title
=13= b
=14= big
=15= blockquote class
....
=49= END

(See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

I keep trying to parse line 10 in my head and am not getting a lot of
mental traction in really understanding how this works. Anybody want to
help?

Dr.Ruud

unread,
Mar 3, 2006, 7:21:46 PM3/3/06
to
weston schreef:

Maybe this helps:

#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;

my %PERMITTED =


map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }

split /\n/, <<'END';


a href name target class title

b
big
blockquote class
...
END

print Data::Dumper->Dump( [\%PERMITTED]
, [qw(%PERMITTED)]
), "\n";

--
Affijn, Ruud

"Gewoon is een tijger."

Tad McClellan

unread,
Mar 3, 2006, 7:37:16 PM3/3/06
to
weston <notsew-reversePrec...@canncentral.org> wrote:

> In an article on Stonehenge.com on using libxml2 to strip html from a
> document, I came across a part of the listing that I'm having trouble
> understanding. Randall apparently creates a hash of approved tags and
> their attributes with these lines:

> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }

> I keep trying to parse line 10 in my head and am not getting a lot of


> mental traction in really understanding how this works. Anybody want to
> help?


Does this help?

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my %PERMITTED =
map { my($k, @v) = split; # 1st space-sep'd field is tag, rest are its attrs
($k, {map {$_, 1} @v}) # a 2-element list. 1st is tag,
# 2nd is a hash-ref with keys as attr names,
# and values set to one
}
split /\n/, <<'END';


a href name target class title

b
big
blockquote class
END

print Dumper \%PERMITTED;
------------------------------


Or maybe it would help to "unroll" the maps into foreachs:

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my %PERMITTED;

foreach (split /\n/, <<'END')

a href name target class title

b
big
blockquote class
END
{
my($k, @v) = split;
my %h;
foreach ( @v ) { # "unroll" {map {$_, 1} @v
$h{$_} = 1;
}
$PERMITTED{$k} = \%h;
}

print Dumper \%PERMITTED;
------------------------------


--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas

Randal L. Schwartz

unread,
Mar 3, 2006, 7:36:27 PM3/3/06
to weston
>>>>> "weston" == weston <wes...@canncentral.org> writes:

weston> In an article on Stonehenge.com on using libxml2 to strip html from a
weston> document, I came across a part of the listing that I'm having trouble
weston> understanding. Randall apparently creates a hash of approved tags and
weston> their attributes with these lines:

weston> =9= my %PERMITTED =
weston> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
weston> =11= split /\n/, <<'END';
weston> =12= a href name target class title
weston> =13= b
weston> =14= big
weston> =15= blockquote class
weston> ....
weston> =49= END

weston> (See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

weston> I keep trying to parse line 10 in my head and am not getting a lot of
weston> mental traction in really understanding how this works. Anybody want to
weston> help?

Heh.

The split on line 11 creates elements like:

"a href name target class title",
"b",
"big",
"blockquote class",

etc. The map on the beginning of line 10 sets $_ equal to each of those,
and looks for a list-valued return from the block.

The split in the middle of line 10 breaks each of those elements listed above
into a list, and assigns the first to $k, and any remaining ones to @v.

The second map on line 10 converts @v to a list of elements of @v alternating
with the value "1", and then turns that into a hashref, so that @v becomes
keys, with values 1. That hashref is then added along with $k to be
two values that eventually contribute to %PERMITTED.

But didn't I say all this in the article? :-)

print "Just another Perl hacker,"; # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***

Christian Winter

unread,
Mar 3, 2006, 7:55:43 PM3/3/06
to
Dr.Ruud schrieb:
> weston schreef:
>
[...snipped...]

>>
>> =9= my %PERMITTED =
>> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
>> =11= split /\n/, <<'END';
>> =12= a href name target class title
>> =13= b
>> =14= big
>> =15= blockquote class
>> ....
>> =49= END
>>
>>(See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )
>>
>>I keep trying to parse line 10 in my head and am not getting a lot of
>>mental traction in really understanding how this works. Anybody want
>>to help?
>
>
> Maybe this helps:
>
> #!/usr/bin/perl
> use strict; use warnings;
> use Data::Dumper;
>
> my %PERMITTED =
> map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
> split /\n/, <<'END';

I was about to suggest the same, but maybe the list vs. hash
notation could help clarify it a bit too:

map {
my ($k, @v) = split;
(
$k => { # hashref!!
map { $_ => 1 } @v


}
)
} split /\n/, '<<END';

> a href name target class title
> b
> big
> blockquote class
> ...
> END

It's in fact one of those expressions that can be confusing
because of the dual meaning of curly brackets (block/hashref).

-Chris

Anno Siegel

unread,
Mar 3, 2006, 8:01:31 PM3/3/06
to
weston <notsew-reversePrec...@canncentral.org> wrote in comp.lang.perl.misc:

> In an article on Stonehenge.com on using libxml2 to strip html from a
> document, I came across a part of the listing that I'm having trouble
> understanding. Randall apparently creates a hash of approved tags and

Who is this Randall you speak of?

> their attributes with these lines:

Randal's code constructs a hash of hashes. The first word in each data
line is a primary key. The rest of the words in each line (if any)
become the keys of an inner hash, all with the value 1. Presumably
the inner hash represents a set of whatever, associated with the primary
key.

> =9= my %PERMITTED =
> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
> =11= split /\n/, <<'END';
> =12= a href name target class title
> =13= b
> =14= big
> =15= blockquote class
> ....
> =49= END

How does it do that? Rewriting the code with fewer map's and more
variable names may help. (untested)

my @lines = split /\n/, <<'END';


a href name target class title

b
big
blockquote class
END

my %PERMITTED;

for my $line ( @lines ) {
my ($primary_key, @words) = split; # ($k, @v) in the original code
# build wordlist
my @wordlist; # alternating one word and one 1 (for hash initialization)
for my $word ( @v ) {
push @wordlist, ( $word => 1);
}
# build a hash out of @wordlist and assign it to its place
$PERMITTED{ $k} = { @wordlist};
}

> I keep trying to parse line 10 in my head and am not getting a lot of
> mental traction in really understanding how this works. Anybody want to
> help?

Line 10 does basically what the (outer) for-loop does in my code. The
inner for-loop does the job of the nested map.

Randal's code is that of a fluent speaker of Perl. Its parts (the two map's)
are two well-known idioms for hash-building. Applied together, they may
look like a mess, but once you recognize the pattern of each their
interaction becomes clear too.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

Dr.Ruud

unread,
Mar 4, 2006, 6:27:31 AM3/4/06
to
Tad McClellan schreef:

> print Dumper \%PERMITTED;


Alternative:

print Data::Dumper->Dump( [\%var], ['%var'] );

Reply all
Reply to author
Forward
0 new messages