pyspider-users

Contact owners and managers

1–30 of 162

Welcome to the user group of https://github.com/binux/pyspider.

pyspider is a powerful spider(web crawler) system in python. You can try it here.

Documentation and tutorial: http://docs.pyspider.org/

0 selected

Jul 2

Collaboration Opportunity Link Exchange

Dear Team at googlegroups.com, I hope this email finds you well. I am Shao Wei-Ling, an Outreach

unread,

Collaboration Opportunity Link Exchange

Dear Team at googlegroups.com, I hope this email finds you well. I am Shao Wei-Ling, an Outreach

Jul 2

Michael Cohen, Shubham Biswas2

9/23/22

Using PySpider to scrape wiki

It may can work if done correctly, using MYSQL and DATA On Friday, October 23, 2020 at 8:42:34 PM UTC

unread,

Using PySpider to scrape wiki

It may can work if done correctly, using MYSQL and DATA On Friday, October 23, 2020 at 8:42:34 PM UTC

9/23/22

ecom4...@gmail.com, … Siroj bobojonov4

5/31/22

Details page giving this FAILED error

hi , i have error with pyspider in (from pyspider.libs.base_handler import *), ModuleNotFoundError:

unread,

Details page giving this FAILED error

hi , i have error with pyspider in (from pyspider.libs.base_handler import *), ModuleNotFoundError:

5/31/22

ecom4...@gmail.com4

8/11/20

Can I recover data from scraped tasks?.....

allelluja!!!!... This one worked: https://support.oneidentity.com/kb/310376/sqlite3-database-gets-

unread,

Can I recover data from scraped tasks?.....

allelluja!!!!... This one worked: https://support.oneidentity.com/kb/310376/sqlite3-database-gets-

8/11/20

a9106...@gmail.com

4/23/20

请问group的功能是什么？

从文档中只知道group设为delete可以删除project，除此之外有别的作用吗？

unread,

请问group的功能是什么？

从文档中只知道group设为delete可以删除project，除此之外有别的作用吗？

4/23/20

manbuheiniu, … ecom4...@gmail.com17

3/26/20

Slow processing speed

Ok. That is my next challenge... working on proxies... Would be that useful with RabbitMQ? Or some

unread,

Slow processing speed

Ok. That is my next challenge... working on proxies... Would be that useful with RabbitMQ? Or some

3/26/20

HD, … Roy Binux12

3/16/20

Interact with pyspider via other program / command line

Send URL to on_message callback, self.crawl submit the task. That's it. You don't need to

unread,

Interact with pyspider via other program / command line

Send URL to on_message callback, self.crawl submit the task. That's it. You don't need to

3/16/20

11/12/19

nginx setup for demo deployment

Hi Roy, I would like to use pyspider in production with a similar deployment like in the demo: http:/

unread,

nginx setup for demo deployment

Hi Roy, I would like to use pyspider in production with a similar deployment like in the demo: http:/

11/12/19

mat...@starpos.se

9/26/19

How to I store the actual text of links together with the detail_page data?

Let's say I want to scrape news.ycombinator.com for new articles.. Quite easy I would get a link

unread,

How to I store the actual text of links together with the detail_page data?

Let's say I want to scrape news.ycombinator.com for new articles.. Quite easy I would get a link

9/26/19

bind...@gmail.com, Roy Binux3

5/7/19

How to set status to "FAILED" when some condition?

It's work! Thx! On Saturday, May 4, 2019 at 9:29:07 PM UTC+8, Roy Binux wrote: Insert data in

unread,

How to set status to "FAILED" when some condition?

It's work! Thx! On Saturday, May 4, 2019 at 9:29:07 PM UTC+8, Roy Binux wrote: Insert data in

5/7/19

bind...@gmail.com

5/4/19

Running is Invalid when using config.json

Hello, I try to build my result-worker but it doesn't work. When I click "run", it just

unread,

Running is Invalid when using config.json

Hello, I try to build my result-worker but it doesn't work. When I click "run", it just

5/4/19

lee...@gmail.com, bind...@gmail.com2

5/4/19

如何返回任务ID

you may try print(self.task['taskid']). On Thursday, April 25, 2019 at 9:41:01 PM UTC+8, lee.

unread,

如何返回任务ID

you may try print(self.task['taskid']). On Thursday, April 25, 2019 at 9:41:01 PM UTC+8, lee.

5/4/19

lee...@gmail.com

4/25/19

PySpider 如何合并长文章的内分页

文章很长，内部有多个分页，请问如何每个分页都采集后再合成一个完整的文章？比如漫画的一个篇章有很多个页面，但是都属于同一话，采集完后要归类到同一个页面。 Url 一般是这种格式。 https://www

unread,

PySpider 如何合并长文章的内分页

文章很长，内部有多个分页，请问如何每个分页都采集后再合成一个完整的文章？比如漫画的一个篇章有很多个页面，但是都属于同一话，采集完后要归类到同一个页面。 Url 一般是这种格式。 https://www

4/25/19

Alex W, … Reid Du3

4/21/19

Mysql Access denied with right username and password in pyspider

this problem comes from mysql 5.7 ,you can have two solutions: 1) use mysql <= 5.6 2) add data

unread,

Mysql Access denied with right username and password in pyspider

this problem comes from mysql 5.7 ,you can have two solutions: 1) use mysql <= 5.6 2) add data

4/21/19

En Ware, Roy Binux2

12/5/18

Demo links (Dead) not working

demo.pyspider.org is not running On Wed, 5 Dec 2018, 11:37 En Ware, <nixf...@gmail.com> wrote:

unread,

Demo links (Dead) not working

demo.pyspider.org is not running On Wed, 5 Dec 2018, 11:37 En Ware, <nixf...@gmail.com> wrote:

12/5/18

En Ware, Roy Binux4

12/5/18

Running pyspider

You are using a selector of index page to extract a detail page. On Wed, 5 Dec 2018, 09:07 En Ware,

unread,

Running pyspider

You are using a selector of index page to extract a detail page. On Wed, 5 Dec 2018, 09:07 En Ware,

12/5/18

Nick Gilmour, Roy Binux6

3/20/18

Communication with other software components & post-processing results

Great, thanks! On Tue, Mar 20, 2018 at 6:31 PM, Roy Binux <r...@binux.me> wrote: Read source

unread,

Communication with other software components & post-processing results

Great, thanks! On Tue, Mar 20, 2018 at 6:31 PM, Roy Binux <r...@binux.me> wrote: Read source

3/20/18

Nick Gilmour, Roy Binux2

3/1/18

How to save logs with Sentry?

Try https://docs.sentry.io/clients/python/integrations/logging/#setup "Another option is to use

unread,

How to save logs with Sentry?

Try https://docs.sentry.io/clients/python/integrations/logging/#setup "Another option is to use

3/1/18

Nick Gilmour, Roy Binux5

2/23/18

result in quotes - can't query pyspider results

I meant this one: resultdb = connect_database("sqlite+resultdb:////home/user/data/result.db

unread,

result in quotes - can't query pyspider results

I meant this one: resultdb = connect_database("sqlite+resultdb:////home/user/data/result.db

2/23/18

2/18/18

Getting joined text with pyquery

Hi all, I have the following issue with pyquery in pyspider: when I use pyquery in pyspider to get

unread,

Getting joined text with pyquery

Hi all, I have the following issue with pyquery in pyspider: when I use pyquery in pyspider to get

2/18/18

zenghailo...@gmail.com

1/29/18

Exception: HTTP 599: Protocol "https" not supported or disabled in libcurl

I install pyspider on MAC, and the HTTP request is normal after running, but HTTPS requests this

unread,

Exception: HTTP 599: Protocol "https" not supported or disabled in libcurl

I install pyspider on MAC, and the HTTP request is normal after running, but HTTPS requests this

1/29/18

1/28/18

How Is robots.txt supported?

Hi all, As far as I have seen here: https://github.com/binux/pyspider/issues/218 and here: https://

unread,

How Is robots.txt supported?

Hi all, As far as I have seen here: https://github.com/binux/pyspider/issues/218 and here: https://

1/28/18

yyi...@gmail.com

1/15/18

Cronjob only reload the index page but not reload the detail page

Hi, I have written some code and try to insert the data into mongo db. It is fine when the first run

unread,

Cronjob only reload the index page but not reload the detail page

Hi, I have written some code and try to insert the data into mongo db. It is fine when the first run

1/15/18

Nick Gilmour, Roy Binux5

12/24/17

Importing another project?

OK, many thanks! On Sun, Dec 24, 2017 at 10:15 PM, Roy Binux <r...@binux.me> wrote: Yes, your

unread,

Importing another project?

OK, many thanks! On Sun, Dec 24, 2017 at 10:15 PM, Roy Binux <r...@binux.me> wrote: Yes, your

12/24/17

Nick Gilmour, Roy Binux5

12/24/17

Questions about crawled links

OK, many thanks. I'll try... On Sun, Dec 24, 2017 at 10:19 PM, Roy Binux <r...@binux.me>

unread,

Questions about crawled links

OK, many thanks. I'll try... On Sun, Dec 24, 2017 at 10:19 PM, Roy Binux <r...@binux.me>

12/24/17

tjluoz...@gmail.com

10/15/17

pysider是如何实现内嵌浏览器中css帮助器的？

pysider是如何实现内嵌浏览器中css帮助器的？

unread,

pysider是如何实现内嵌浏览器中css帮助器的？

pysider是如何实现内嵌浏览器中css帮助器的？

10/15/17

10/10/17

Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع

YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea

unread,

Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع

YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea

10/10/17

zev...@hotmail.com, Roy Binux3

10/9/17

How can I recrawl failed task only?

I wanted to override on_result so when detail_page is failed to crawl, the program can export the url

unread,

How can I recrawl failed task only?

I wanted to override on_result so when detail_page is failed to crawl, the program can export the url

10/9/17

www...@gmail.com, Roy Binux3

9/22/17

pyspider可以持续爬取一个页面么？

多谢提示，我会考虑直接用phantomjs来实现，或者您有任何建议么？我也在看pyspider fetcher这部分内容，看能不能自己加一个功能上去，如果可以给我提供一下思路的话就更加感谢了在

unread,

pyspider可以持续爬取一个页面么？

多谢提示，我会考虑直接用phantomjs来实现，或者您有任何建议么？我也在看pyspider fetcher这部分内容，看能不能自己加一个功能上去，如果可以给我提供一下思路的话就更加感谢了在

9/22/17

Nick Gilmour, Roy Binux3

6/14/17

No results with middle function

Are you actually crawling same URL? On Mon, 12 Jun 2017, 09:27 Nick Gilmour, <nicke...@gmail.com

unread,

No results with middle function

Are you actually crawling same URL? On Mon, 12 Jun 2017, 09:27 Nick Gilmour, <nicke...@gmail.com

6/14/17

Search

Clear search

Close search

Google apps

Main menu