看日志
看日志
--
You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/10640a13-f51e-4349-84c1-1276275db1ec%40googlegroups.com.
和 updated 没关系,启动的时候就是 updated 的。你怎么部署的。
把你部署的启动命令我看看,每个组件运行了几个实例
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# vim: set et sw=4 ts=4 sts=4 ff=unix fenc=utf8:
# Created on 2014-12-12 11:26:00
from pyspider.libs.base_handler import *
import re
import json
import time
from pyquery import PyQuery
class Handler(BaseHandler):
'''
this is a sample handler
'''
crawl_config = {
"headers": {
"User-Agent": "BaiDuSpider",
}
}
def on_start(self):
self.crawl('http://blog.chinaunix.net/', callback=self.index_page)
@every(60)
def cronjob(self):
self.on_start()
def index_page(self, response):
for each in response.doc('A').items():
searchObj2 = re.search( r'(http://blog.chinaunix.net/uid-(\d*)-id-(\d*).html$)', each.attr.href, re.M|re.I)
if searchObj2:
self.crawl(each.attr.href, callback=self.detail_page,priority=9,age=60*60*24*2)
searchObj = re.search( r'(http://blog.chinaunix.net/uid/(\d*).html$)|(http://blog.chinaunix.net/(.*).html$)', each.attr.href, re.M|re.I)
if searchObj:
self.crawl(each.attr.href, callback=self.index_page,age=60*60*24*2)
def detail_page(self, response):
text = response.doc('div.Blog_con2_1').text()
yd=re.match( u'.*阅读\((\d+)\)',text)
hf=re.match( u'.*评论\((\d+)\)',text)
if yd:
rds = yd.group(1)
else:
rds = 0
if hf:
hfs = hf.group(1)
else:
hfs = 0
return {
"cid": self.task['taskid'],
"url": response.url,
"title": response.doc('div.Blog_tit4>a').text(),
"posttime": response.doc('div.Blog_tit4>em').text(),
#"content": response.doc('div.Blog_wz1').text(),
"comment": hfs,
"reads": rds,
#"sitename": "ChinaUnix博客",
#"siteurl": "blog.chinaunix.net",
#"ctime":time.time()
}
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/ee5521db-971b-4f28-b6d7-e3fbdc914f34%40googlegroups.com.
...
one 模式完全不一样的。。。one 模式只是为了调试使用,不是『分别部署』http://docs.pyspider.org/en/latest/Deployment/ 是生产环境推荐的部署方式
...
one 模式完全不一样的。。。one 模式只是为了调试使用,不是『分别部署』http://docs.pyspider.org/en/latest/Deployment/ 是生产环境推荐的部署方式
...
...
让你分开部署,这样部署了吗?
...
...
processor组件报错:
[D 150204 06:00:00 project_module:124] project: cnblog updated.
[E 150204 06:00:00 base_connection:299] Socket Error on fd 7: 104
[W 150204 06:00:00 base_connection:160] Socket closed when connection was open
[E 150204 06:00:00 rabbitmq:39] RabbitMQ error: The AMQP connection was closed, reconnect.
[I 150204 06:00:00 base_connection:179] Connecting to 10.129.82.172:5672
[I 150204 06:00:00 processor:153] process cnblog:_on_cronjob data:,_on_cronjob -> [200] len:11 -> result:None fol:1 msg:0 err:None
[E 150204 07:00:00 base_connection:299] Socket Error on fd 7: 104
[W 150204 07:00:00 base_connection:160] Socket closed when connection was open
[E 150204 07:00:00 rabbitmq:39] RabbitMQ error: The AMQP connection was closed, reconnect.
[I 150204 07:00:00 base_connection:179] Connecting to 10.129.82.172:5672
[I 150204 07:00:00 processor:153] process cnblog:_on_cronjob data:,_on_cronjob -> [200] len:11 -> result:None fol:1 msg:0 err:None
[D 150204 08:00:00 project_module:124] project: cnblog updated.
[E 150204 08:00:00 base_connection:299] Socket Error on fd 7: 104
[W 150204 08:00:00 base_connection:160] Socket closed when connection was open
[E 150204 08:00:00 rabbitmq:39] RabbitMQ error: The AMQP connection was closed, reconnect.
[I 150204 08:00:00 base_connection:179] Connecting to 10.129.82.172:5672
[I 150204 08:00:00 processor:153] process cnblog:_on_cronjob data:,_on_cronjob -> [200] len:11 -> result:None fol:1 msg:0 err:None
[E 150204 08:22:05 base_connection:299] Socket Error on fd 6: 104
[W 150204 08:22:05 base_connection:160] Socket closed when connection was open
[E 150204 08:22:05 rabbitmq:39] RabbitMQ error: The AMQP connection was closed, reconnect.
[I 150204 08:22:05 base_connection:179] Connecting to 10.129.82.172:5672
[I 150204 08:22:05 processor:153] process cnblog:on_start data:,on_start -> [200] len:8 -> result:None fol:1 msg:0 err:None
scheduler组件报错:[I 150204 08:29:43 task_queue:151] [processing: retry] 46295bb3c582484caae1f03736114fac
[I 150204 08:29:44 task_queue:151] [processing: retry] d9ad73b4e68653f0f172ac81e536b99b
[I 150204 08:29:45 task_queue:151] [processing: retry] 2d4ba4af7e5f25afce0314b526f8ffc9
[I 150204 08:29:47 task_queue:151] [processing: retry] f5cc10591d19c04df9aee8658ce2c852
[I 150204 08:29:48 task_queue:151] [processing: retry] 840aef41f8e2ed033c49eda1a70d20b8
[I 150204 08:29:49 task_queue:151] [processing: retry] 4e422573552d37ef898c2765f2335394
[I 150204 08:29:49 task_queue:151] [processing: retry] 4987016b312bbc8945b0b20614889976
[I 150204 08:29:50 task_queue:151] [processing: retry] 9d063720f81a0232a8695c6bc9b2e3e3
[I 150204 08:29:53 task_queue:151] [processing: retry] a9568138dc694a8f5761864e075eedcb
[I 150204 08:29:54 task_queue:151] [processing: retry] 6570e021436da9fdf6dbc2a49a1719e5
[I 150204 08:29:55 task_queue:151] [processing: retry] 96de45b4cff0c2f162f005f38d19b667
[I 150204 08:29:56 task_queue:151] [processing: retry] 9e3edb3e75263ca4fbe91877a3edb5bc
其它组件日志正常等待。
rabbitmq服务状态:
| 各队列数值都为0,队列里没有数据进入和输出。 |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
分别部署不能用内置队列,参照文档进行部署:http://docs.pyspider.org/en/latest/Deployment/
...
processing: retry。我再观察观察吧
在 2015年2月4日星期三 UTC+8上午9:41:07,Roy Binux写道:...
scheduler 日志再往前,时间和 processor 对不上
...
--
You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To post to this group, send email to pyspide...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/7eb10ad0-b1e9-4d4c-9294-d9ec7ab8c6d8%40googlegroups.com.
scheduler 日志再往前,时间和 processor 对不上
...
...
--
You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To post to this group, send email to pyspide...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/fd61cad8-f3b1-43d9-b48c-dcc9c5de386c%40googlegroups.com.
...