Mojo::URL Bug?? to_abs returns incorrect results.

35 views
Skip to first unread message

perl_Is_Best

unread,
May 17, 2013, 4:13:25 PM5/17/13
to mojol...@googlegroups.com
Hello All,

I am running into this issue where Mojo::URL to_abs returns me with incorrect absolute path? Can someone help me with a workaround if possible?

Below is the code to prove the possible bug and results.


Results are:
http://www.mellanox.com/page/page/rss
They should be:
http://www.mellanox.com/page/rss

#!/usr/bin/perl
use 5.010;
use open qw( :std :utf8 );
use strict;
use utf8;
use warnings qw(all);
use Data::Dumper;

use Mojo::UserAgent;

# FIFO queue
my $linkUrl = "http://www.mellanox.com/page/press_releases";
my $ua = Mojo::UserAgent->new(max_redirects => 2)->detect_proxy;
my $tx = $ua->get($linkUrl);

for my $e ($tx->res->dom('a[href]')->each) {

    my $link = Mojo::URL->new($e->{href});
    next if 'Mojo::URL' ne ref $link;

    $link = $link->to_abs($tx->req->url)->fragment(undef);
    next unless grep { $link->protocol eq $_ } qw(http https);
    if ($link->to_string =~ /rss/ )
    {
        print $link->to_abs;
        print "\n";
    }
}

s...@bykov.odessa.ua

unread,
May 17, 2013, 4:52:11 PM5/17/13
to mojol...@googlegroups.com
You forget about the <base> HTML tag at the 4th line of HTML.
This example isn't correct because you also need to check leading '/' tags, but it works for your url.

#!/usr/bin/env perl
use Mojo::Base -strict;

use Mojo::UserAgent;



my $linkUrl = "http://www.mellanox.com/page/press_releases";
my $ua      = Mojo::UserAgent->new(max_redirects => 2)->detect_proxy;
my $tx      = $ua->get($linkUrl);
my $res     = $tx->success or die "ERROR: " . $tx->error;
my $dom     = $res->dom;

# <base href="http://www.mellanox.com/" />
my $base = $dom->at('base');
my $base_url = $base && Mojo::URL->new($base->{href});


for my $e ($tx->res->dom('a[href]')->each) {

  my $link = Mojo::URL->new($e->{href});
 
  # base
  $link = $link->to_abs($base_url || $tx->req->url)->fragment(undef);

  next unless grep { $link->protocol eq $_ } qw(http https);

  if ($link->to_string =~ /rss/) {
    say $link->to_abs;
  }
}












--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.
To post to this group, send email to mojol...@googlegroups.com.
Visit this group at http://groups.google.com/group/mojolicious?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

perl_Is_Best

unread,
May 17, 2013, 11:04:30 PM5/17/13
to mojol...@googlegroups.com
Thanks so much for the help.  I spent so much time on this issue, I could not figure it out.
Reply all
Reply to author
Forward
0 new messages