Groups
Conversations
All groups and messages
Send feedback to Google
Help
Training
Sign in
Groups
pyspider-users
Conversations
Labels
About
pyspider-users
Contact owners and managers
1–30 of 162
Welcome to the user group of
https://github.com/binux/
pyspider
.
pyspider is a powerful spider(web crawler) system in python. You can try it
here
.
Documentation and tutorial:
http://docs.pyspider.org/
Mark all as read
Report group
0 selected
Shao Wei-Ling
Jul 2
Collaboration Opportunity Link Exchange
Dear Team at googlegroups.com, I hope this email finds you well. I am Shao Wei-Ling, an Outreach
unread,
Collaboration Opportunity Link Exchange
Dear Team at googlegroups.com, I hope this email finds you well. I am Shao Wei-Ling, an Outreach
Jul 2
Michael Cohen
,
Shubham Biswas
2
9/23/22
Using PySpider to scrape wiki
It may can work if done correctly, using MYSQL and DATA On Friday, October 23, 2020 at 8:42:34 PM UTC
unread,
Using PySpider to scrape wiki
It may can work if done correctly, using MYSQL and DATA On Friday, October 23, 2020 at 8:42:34 PM UTC
9/23/22
ecom4...@gmail.com
, …
Siroj bobojonov
4
5/31/22
Details page giving this FAILED error
hi , i have error with pyspider in (from pyspider.libs.base_handler import *), ModuleNotFoundError:
unread,
Details page giving this FAILED error
hi , i have error with pyspider in (from pyspider.libs.base_handler import *), ModuleNotFoundError:
5/31/22
ecom4...@gmail.com
4
8/11/20
Can I recover data from scraped tasks?.....
allelluja!!!!... This one worked: https://support.oneidentity.com/kb/310376/sqlite3-database-gets-
unread,
Can I recover data from scraped tasks?.....
allelluja!!!!... This one worked: https://support.oneidentity.com/kb/310376/sqlite3-database-gets-
8/11/20
a9106...@gmail.com
4/23/20
Q&A
请问group的功能是什么?
从文档中只知道group设为delete可以删除project, 除此之外有别的作用吗?
unread,
Q&A
请问group的功能是什么?
从文档中只知道group设为delete可以删除project, 除此之外有别的作用吗?
4/23/20
manbuheiniu
, …
ecom4...@gmail.com
17
3/26/20
Slow processing speed
Ok. That is my next challenge... working on proxies... Would be that useful with RabbitMQ? Or some
unread,
Slow processing speed
Ok. That is my next challenge... working on proxies... Would be that useful with RabbitMQ? Or some
3/26/20
HD
, …
Roy Binux
12
3/16/20
Interact with pyspider via other program / command line
Send URL to on_message callback, self.crawl submit the task. That's it. You don't need to
unread,
Interact with pyspider via other program / command line
Send URL to on_message callback, self.crawl submit the task. That's it. You don't need to
3/16/20
Nick Gilmour
11/12/19
nginx setup for demo deployment
Hi Roy, I would like to use pyspider in production with a similar deployment like in the demo: http:/
unread,
nginx setup for demo deployment
Hi Roy, I would like to use pyspider in production with a similar deployment like in the demo: http:/
11/12/19
mat...@starpos.se
9/26/19
How to I store the actual text of links together with the detail_page data?
Let's say I want to scrape news.ycombinator.com for new articles.. Quite easy I would get a link
unread,
How to I store the actual text of links together with the detail_page data?
Let's say I want to scrape news.ycombinator.com for new articles.. Quite easy I would get a link
9/26/19
bind...@gmail.com
,
Roy Binux
3
5/7/19
How to set status to "FAILED" when some condition?
It's work! Thx! On Saturday, May 4, 2019 at 9:29:07 PM UTC+8, Roy Binux wrote: Insert data in
unread,
How to set status to "FAILED" when some condition?
It's work! Thx! On Saturday, May 4, 2019 at 9:29:07 PM UTC+8, Roy Binux wrote: Insert data in
5/7/19
bind...@gmail.com
5/4/19
Q&A
Running is Invalid when using config.json
Hello, I try to build my result-worker but it doesn't work. When I click "run", it just
unread,
Q&A
Running is Invalid when using config.json
Hello, I try to build my result-worker but it doesn't work. When I click "run", it just
5/4/19
lee...@gmail.com
,
bind...@gmail.com
2
5/4/19
如何返回任务ID
you may try print(self.task['taskid']). On Thursday, April 25, 2019 at 9:41:01 PM UTC+8, lee.
unread,
如何返回任务ID
you may try print(self.task['taskid']). On Thursday, April 25, 2019 at 9:41:01 PM UTC+8, lee.
5/4/19
lee...@gmail.com
4/25/19
PySpider 如何合并长文章的内分页
文章很长,内部有多个分页,请问如何每个分页都采集后再合成一个完整的文章? 比如漫画的一个篇章有很多个页面,但是都属于同一话,采集完后要归类到同一个页面。 Url 一般是这种格式。 https://www
unread,
PySpider 如何合并长文章的内分页
文章很长,内部有多个分页,请问如何每个分页都采集后再合成一个完整的文章? 比如漫画的一个篇章有很多个页面,但是都属于同一话,采集完后要归类到同一个页面。 Url 一般是这种格式。 https://www
4/25/19
Alex W
, …
Reid Du
3
4/21/19
Q&A
Mysql Access denied with right username and password in pyspider
this problem comes from mysql 5.7 ,you can have two solutions: 1) use mysql <= 5.6 2) add data
unread,
Q&A
Mysql Access denied with right username and password in pyspider
this problem comes from mysql 5.7 ,you can have two solutions: 1) use mysql <= 5.6 2) add data
4/21/19
En Ware
,
Roy Binux
2
12/5/18
Demo links (Dead) not working
demo.pyspider.org is not running On Wed, 5 Dec 2018, 11:37 En Ware, <nixf...@gmail.com> wrote:
unread,
Demo links (Dead) not working
demo.pyspider.org is not running On Wed, 5 Dec 2018, 11:37 En Ware, <nixf...@gmail.com> wrote:
12/5/18
En Ware
,
Roy Binux
4
12/5/18
Running pyspider
You are using a selector of index page to extract a detail page. On Wed, 5 Dec 2018, 09:07 En Ware,
unread,
Running pyspider
You are using a selector of index page to extract a detail page. On Wed, 5 Dec 2018, 09:07 En Ware,
12/5/18
Nick Gilmour
,
Roy Binux
6
3/20/18
Communication with other software components & post-processing results
Great, thanks! On Tue, Mar 20, 2018 at 6:31 PM, Roy Binux <r...@binux.me> wrote: Read source
unread,
Communication with other software components & post-processing results
Great, thanks! On Tue, Mar 20, 2018 at 6:31 PM, Roy Binux <r...@binux.me> wrote: Read source
3/20/18
Nick Gilmour
,
Roy Binux
2
3/1/18
How to save logs with Sentry?
Try https://docs.sentry.io/clients/python/integrations/logging/#setup "Another option is to use
unread,
How to save logs with Sentry?
Try https://docs.sentry.io/clients/python/integrations/logging/#setup "Another option is to use
3/1/18
Nick Gilmour
,
Roy Binux
5
2/23/18
result in quotes - can't query pyspider results
I meant this one: resultdb = connect_database("sqlite+resultdb:////home/user/data/result.db
unread,
result in quotes - can't query pyspider results
I meant this one: resultdb = connect_database("sqlite+resultdb:////home/user/data/result.db
2/23/18
Nick Gilmour
2/18/18
Getting joined text with pyquery
Hi all, I have the following issue with pyquery in pyspider: when I use pyquery in pyspider to get
unread,
Getting joined text with pyquery
Hi all, I have the following issue with pyquery in pyspider: when I use pyquery in pyspider to get
2/18/18
zenghailo...@gmail.com
1/29/18
Exception: HTTP 599: Protocol "https" not supported or disabled in libcurl
I install pyspider on MAC, and the HTTP request is normal after running, but HTTPS requests this
unread,
Exception: HTTP 599: Protocol "https" not supported or disabled in libcurl
I install pyspider on MAC, and the HTTP request is normal after running, but HTTPS requests this
1/29/18
Nick Gilmour
1/28/18
How Is robots.txt supported?
Hi all, As far as I have seen here: https://github.com/binux/pyspider/issues/218 and here: https://
unread,
How Is robots.txt supported?
Hi all, As far as I have seen here: https://github.com/binux/pyspider/issues/218 and here: https://
1/28/18
yyi...@gmail.com
1/15/18
Cronjob only reload the index page but not reload the detail page
Hi, I have written some code and try to insert the data into mongo db. It is fine when the first run
unread,
Cronjob only reload the index page but not reload the detail page
Hi, I have written some code and try to insert the data into mongo db. It is fine when the first run
1/15/18
Nick Gilmour
,
Roy Binux
5
12/24/17
Importing another project?
OK, many thanks! On Sun, Dec 24, 2017 at 10:15 PM, Roy Binux <r...@binux.me> wrote: Yes, your
unread,
Importing another project?
OK, many thanks! On Sun, Dec 24, 2017 at 10:15 PM, Roy Binux <r...@binux.me> wrote: Yes, your
12/24/17
Nick Gilmour
,
Roy Binux
5
12/24/17
Questions about crawled links
OK, many thanks. I'll try... On Sun, Dec 24, 2017 at 10:19 PM, Roy Binux <r...@binux.me>
unread,
Questions about crawled links
OK, many thanks. I'll try... On Sun, Dec 24, 2017 at 10:19 PM, Roy Binux <r...@binux.me>
12/24/17
tjluoz...@gmail.com
10/15/17
pysider是如何实现内嵌浏览器中css帮助器的?
pysider是如何实现内嵌浏览器中css帮助器的?
unread,
pysider是如何实现内嵌浏览器中css帮助器的?
pysider是如何实现内嵌浏览器中css帮助器的?
10/15/17
Kasem A
3
10/10/17
Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع
YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea
unread,
Fwd: Us congress hearing of maan alsaan Money laundry قضية الكونغجرس لغسيل الأموال للمليادير معن الصانع
YouTube videos of US Congress money laundering hearing of Saudi Billionaire " Maan Al sanea
10/10/17
zev...@hotmail.com
,
Roy Binux
3
10/9/17
How can I recrawl failed task only?
I wanted to override on_result so when detail_page is failed to crawl, the program can export the url
unread,
How can I recrawl failed task only?
I wanted to override on_result so when detail_page is failed to crawl, the program can export the url
10/9/17
www...@gmail.com
,
Roy Binux
3
9/22/17
pyspider可以持续爬取一个页面么?
多谢提示,我会考虑直接用phantomjs来实现,或者您有任何建议么? 我也在看pyspider fetcher这部分内容,看能不能自己加一个功能上去,如果可以给我提供一下思路的话就更加感谢了 在
unread,
pyspider可以持续爬取一个页面么?
多谢提示,我会考虑直接用phantomjs来实现,或者您有任何建议么? 我也在看pyspider fetcher这部分内容,看能不能自己加一个功能上去,如果可以给我提供一下思路的话就更加感谢了 在
9/22/17
Nick Gilmour
,
Roy Binux
3
6/14/17
No results with middle function
Are you actually crawling same URL? On Mon, 12 Jun 2017, 09:27 Nick Gilmour, <nicke...@gmail.com
unread,
No results with middle function
Are you actually crawling same URL? On Mon, 12 Jun 2017, 09:27 Nick Gilmour, <nicke...@gmail.com
6/14/17