Running pyspider

53 views
Skip to first unread message

En Ware

unread,
Dec 5, 2018, 10:03:09 AM12/5/18
to pyspider-users
Hello, 

   I used to run scrapy but found out I need message queues and also I like what pyspider has to offer. Here are my questions. 

1. When I try to run the imdb tutorial I push run , which I get a follow then I push play and it seems the screen goes grey but it doesn't do anything. I am running localhost on port 5000 and almost like its running in a loop? What I am doing wrong? 

I am using a virutalenv using pypy3.5 version, can I not use pypy3.5 for my python install? 

I also see that that last commit was 6 days ago to change travis to python 3.5. I am assuming the project is still active? 

2. Is there a forum or IRC channel # that I can talk to about pyspider rather than just the google groups? 

I look forward running pyspider 

- nixfreak 

En Ware

unread,
Dec 5, 2018, 10:17:16 AM12/5/18
to pyspider-users
Ok so I answered my own question , pypy3.5.3 doesn't work with pyspider at least not when your using the webui. I install 3.6.3 instead in a brand new virtualenv and using the same tutorial it loaded right up. 
awesome ! 

En Ware

unread,
Dec 5, 2018, 12:07:27 PM12/5/18
to pyspider-users
Posting some code , just trying to scrape IMDB

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2018-12-05 09:39:49
# Project: imdb_tutorial

from pyspider.libs.base_handler import *
import re

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            if re.match("http://www.imdb.com/title/tt\d+/$", each.attr.href):
                self.crawl(each.attr.href, callback=self.detail_page)
        self.crawl(response.doc('.next-page').attr.href, callback=self.index_page)
        self.crawl(response.doc('.prev-page').attr.href, callback=self.index_page)
                
    @config(priority=2)
    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('.lister-item-header a').text(),
            "date" : response.doc('.text-muted').text(),
        }

****************************
So right now I am able to see the next-page and pre-page and index page but not able to extract that information 
I'm using response.doc(.list-item-header a').text() 

Can someone tell me what I am doing wrong ? 



On Wednesday, December 5, 2018 at 9:03:09 AM UTC-6, En Ware wrote:

Roy Binux

unread,
Dec 5, 2018, 1:05:49 PM12/5/18
to En Ware, pyspider-users
You are using a selector of index page to extract a detail page.

--
You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To post to this group, send email to pyspide...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/a3e011e5-2ee4-4395-a140-3f5865db2c63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages