html tables

Matija Papec

unread,

May 24, 2003, 8:06:02 AM5/24/03

to

I want to ignore some rows in table depending on $filter

$filter = qr/one|two|four/;
$simpletable =~ s{(<tr.+?</tr>)}{
$1 =~ /$filter/ ? '' : $1;
}iges;

Is there more efficiant way to do the same thing and using perl only?(don't
like idea of capturing $1 and doing substitution with same content)

--
Matija

Andras Malatinszky

unread,

May 24, 2003, 9:01:45 AM5/24/03

to

Matija Papec wrote:

Would

$simpletable=~s[<tr.+?$filter.+?</tr>][]iges;

work?

There are modules on CPAN for parsing HTML, and I've often seen the
advice here to use those modules rather than roll your own.

Matija Papec

unread,

May 25, 2003, 9:57:09 AM5/25/03

to

X-Ftn-To: Andras Malatinszky

Andras Malatinszky <nob...@dev.null> wrote:
>> $1 =~ /$filter/ ? '' : $1;
>> }iges;
>>
>> Is there more efficiant way to do the same thing and using perl only?(don't
>> like idea of capturing $1 and doing substitution with same content)
>
>
>Would
>
>$simpletable=~s[<tr.+?$filter.+?</tr>][]iges;
>
>work?

Not quite; it would always start matching from first '<tr' in $simpletable.

>There are modules on CPAN for parsing HTML, and I've often seen the
>advice here to use those modules rather than roll your own.

I'm not in position to use additional modules, but I'll take a look at CPAN.
Do you have some favorite module?

--
Matija

Ron Savage

unread,

May 27, 2003, 6:10:59 AM5/27/03

to

Hi Matija

See below.

"Matija Papec" <mpa...@yahoo.com> wrote in message
news:4ti1dvs7nor9ktcsr...@4ax.com...

Ron Savage

unread,

May 27, 2003, 6:14:24 AM5/27/03

to

Hi Matija

Ignore previous post - that was just Outlook Express and its
auto-post-before-I-finished-typing option :-(.

See below.

"Matija Papec" <mpa...@yahoo.com> wrote in message
news:4ti1dvs7nor9ktcsr...@4ax.com...

HTML::TreeBuilder is a fine module.

Tested data and code:
-----><8-----
<html>
<head>
<title>Tutorial for HTML::TreeBuilder</title>
</head>
<body>
<h1 align = 'center'>Tutorial for HTML::TreeBuilder</h1>

<table align = 'center' border = '1'>
<tr>
<th>Outer table has 1 row with 2 columns</th>
<td>
<table border = '1'>
<tr>
<th colspan = '2'>Inner table has 3 rows with 2 columns</th>
</tr>
<tr>
<th>Row Two/Column One</th><td>Row Two/Column Two</td>
</tr>
<tr>
<th>Row Three/Column One</th><td>Row Three/Column Two</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
-----><8-----

-----><8-----
#!/usr/bin/perl
#
# Name:
# test-html-treebuilder.pl.
#
# Author:
# Ron Savage
# http://savage.net.au/index.html.

use strict;
use warnings;

use HTML::TreeBuilder;

# -----------------------------------------------

sub find_nested_content
{
my($root) = @_;

print "Looking for nested content. \n";
print "\n";

my($first_tr, $last_tr);

for ($root -> look_down(
sub
{
# Find the 1st & last <tr>s, so we can report them.

return 0 if ($_[0] -> tag() ne 'tr');

(! $first_tr) && ($first_tr = $_[0]);

$last_tr = $_[0];

return 0;
}))
{
}

print "Text of 1st tr in nested table: ", $first_tr -> as_text(), ". \n"
if ($first_tr);
print "\n";
print "Text of last tr in nested table: ", $last_tr -> as_text(), ". \n" if
($last_tr);
print "\n";

} # End of find_nested_content.

# -----------------------------------------------

sub find_nested_table
{
my($root) = @_;

print "Looking for nested table. \n";
print "\n";

# Find the 1st <table>, because I know it contains a <td>.

my($nested_table);

$root -> look_down(_tag => 'table',
sub
{
# Find the 1st <td>, because I know it contains the nested <table>.

$nested_table = $_[0] -> look_down(_tag => 'td',
sub
{
# Find the nested <table>.

return $_[0] -> look_down(_tag => 'table');
});

return $nested_table;
});

if ($nested_table)
{
print "Found nested table. \n";
}
else
{
print "Did not find nested table. \n";
}

print "\n";

$nested_table;

} # End of find_nested_table.

# -----------------------------------------------

my($file_name) = '/temp/test-html-treebuilder.html';
my($root) = HTML::TreeBuilder -> new();

$root -> parse_file($file_name) || die("Can't parse $file_name");

my($nested_table) = find_nested_table($root);

find_nested_content($nested_table) if ($nested_table);

$root -> delete();
-----><8-----

Matija Papec

unread,

May 28, 2003, 1:31:36 PM5/28/03

to

X-Ftn-To: Ron Savage

"Ron Savage" <r...@savage.net.au> wrote:
>> I'm not in position to use additional modules, but I'll take a look at
>CPAN.
>> Do you have some favorite module?
>
>HTML::TreeBuilder is a fine module.

Looks interesting.. why do you use empty foreach loop in find_nested_content
and why nearby sub is always returning zero?

--
Matija