Please give me less trivial example of non-blocking Mojo::UserAgent

106 views
Skip to first unread message

Alex Povolotsky

unread,
Nov 12, 2016, 4:47:57 AM11/12/16
to Mojolicious
Hello

Examples of Mojo::UserAgent are limited to fetching a set of files.

But I need a bit more complex thing: I need to parse a site, reading pages, parsing it and reading links, using non-blocking UA with, say, 4 downloads at a time, no more and if possible no less.

Can someone give me a good example?

Alex

Scott Wiersdorf

unread,
Nov 12, 2016, 10:25:53 AM11/12/16
to Mojolicious
Here is a complete working example you can run:

#!/usr/bin/env perl
use Mojolicious::Lite;

get '/random-urls' => sub {
    my $c = shift;
    $c->render_later;

    $c->delay(
        sub {  ## first step
            my $delay = shift;

            $c->ua->get('https://www.random.org/bytes/',        $delay->begin);
            $c->ua->get('https://www.random.org/integer-sets/', $delay->begin);
            $c->ua->get('https://www.random.org/strings/',      $delay->begin);
            $c->ua->get('https://www.random.org/audio-noise/',  $delay->begin);
        },

        sub {  ## second step
            my $delay = shift;

            for my $dom (map { $_->res->dom } @_) {
                say STDERR $dom->at('title')->text;
            }

            $c->render(text => "Got all the links");
        }
    );
};

app->start;

When the results finally all come back from the first step, the second step will print out the page titles to STDERR, and return to the client "Got all the links".

Scott

Alexbyk (subscriptions)

unread,
Nov 12, 2016, 10:55:19 AM11/12/16
to mojol...@googlegroups.com

Here is how you can fetch simultaneously many urls, check if every response is 200 OK and print title tags (or catch an exception if something goes wrong) .

If you're not familiar with promises, here is a perfect documentation https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise

# cpanm Mojo::Pua

use Evo 'Mojo::Pua want_code; Mojo::Promise all';

my $ua   = Mojo::Pua->new;
my @urls = qw(http://alexbyk.com https://metacpan.org https://www.perl.org);

all(map { $ua->get($_)->then(want_code 200)->then(sub { shift->dom->at('title') }) } @urls)
  ->spread(sub(@titles) { say $_ for @titles })
  ->catch(sub($e) { warn $e })->finally(sub { Mojo::IOLoop->stop });

Mojo::IOLoop->start;

Throttling requests is a little bit more complex

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.
To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Alex Povolotsky

unread,
Nov 14, 2016, 7:41:52 AM11/14/16
to mojol...@googlegroups.com
Looks like I was unclear. I do not need only extract all links, I must continue fetching and parsing them, not more than N at a time and if possible not less. I need some king of mirroring crawler.

--
You received this message because you are subscribed to a topic in the Google Groups "Mojolicious" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mojolicious/wdp_pgd4e0k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mojolicious+unsubscribe@googlegroups.com.

Alexander Karelas

unread,
Nov 14, 2016, 7:47:01 AM11/14/16
to mojol...@googlegroups.com

I'm not an expert, but what you're asking for looks like a job for Minion: https://metacpan.org/pod/Minion

You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.

Alex Povolotsky

unread,
Nov 14, 2016, 7:53:50 AM11/14/16
to mojol...@googlegroups.com
I do not need an extra job manager with forks, I need to employ Mojo's event-based nonblocking I/O

To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious+unsubscribe@googlegroups.com.

To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Scott Wiersdorf

unread,
Nov 14, 2016, 10:46:45 AM11/14/16
to Mojolicious
I agree with Alexander Karelas: your question is a textbook job queue application.

It is possible to solve this using only non-blocking user agents, but in either case you'll have to write some kind of controller that keeps track of "in-flight" requests, a queue of pending requests, and something to manage the queue. It's not a trivial application to do it well—you're essentially implementing Minion. This is likely not the right forum to ask for someone to write this for you.

Scott
To unsubscribe from this group and all its topics, send an email to mojolicious...@googlegroups.com.

To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.

To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Mojolicious" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mojolicious/wdp_pgd4e0k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mojolicious...@googlegroups.com.

Alex Povolotsky

unread,
Nov 15, 2016, 6:04:17 PM11/15/16
to mojol...@googlegroups.com
I've found the exact example I needed. Unfortunately, it is commented and described in Russian. I did not need nothing like minion



To unsubscribe from this group and all its topics, send an email to mojolicious+unsubscribe@googlegroups.com.

Stefan Adams

unread,
Nov 15, 2016, 7:13:42 PM11/15/16
to mojolicious

On Tue, Nov 15, 2016 at 5:04 PM, Alex Povolotsky <tar...@gmail.com> wrote:
I've found the exact example I needed. Unfortunately, it is commented and described in Russian. I did not need nothing like minion

Can you share the example?

Alexbyk (subscriptions)

unread,
Nov 16, 2016, 3:35:18 AM11/16/16
to mojol...@googlegroups.com

You are asking too abstract question and trying to get too specific answer.

If you want an example that works for you, provide a details: what sites do you want to parse, where they come from, why do you need to throttle download queue, what exactly r u trying to limit (max connections at a time, max kicks per a time unit) and how (per link, per host, per ip, per subnet).

Joel Berger

unread,
Nov 16, 2016, 11:31:38 AM11/16/16
to Mojolicious
Hello, I have written this exact use-case example in a gist. I don't typically share it (and haven't put it onto CPAN) because I worry that people might abuse it. While it has parallelism limiting it does not have rate limiting. It doesn't check robots.txt. You need to comply with user agreements. That said, here it is, use it RESPONSIBLY.

Alex Povolotsky

unread,
Nov 16, 2016, 3:36:17 PM11/16/16
to mojol...@googlegroups.com
Thanks a lot, it's an excellent example. 

--
Reply all
Reply to author
Forward
0 new messages