Can't control what scrapy logs to stdout

Hartley Brody

unread,

Sep 5, 2014, 10:57:56 AM9/5/14

to scrapy...@googlegroups.com

I'm running scrapy as a cron job, and so all output that is sent to stdout gets emailed to me at the end of the day, which is currently in the dozens of MB. Most of the log lines are INFO messages that I'm trying to suppress, but I still want WARNING, ERROR and CRITICAL to be printed to stdout so that those get emailed to me.

I know about the logging settings, and am currently using:

```

LOG_LEVEL = 'WARNING'

LOG_FILE = '/path/to/scrapy.log'

LOG_STDOUT = False

```

in my `settings.py`. These settings seem to be doing the right thing in terms of the log *file* -- only logging the right messages -- but I'm still seeing everything (including INFO) printed to stdout. I've also tried running the scraper with the `scrapy crawl <spider> -L WARNING` flag, but I'm still seeing INFO message on stdout.

Is there a setting I'm missing somewhere that controls what gets sent to stdout? I don't want to pipe it to /dev/null since I still want WARNINGS and up to be sent to stdout. But I don't see a way to do this.

Nicolás Alejandro Ramírez Quiros

unread,

Sep 5, 2014, 3:12:04 PM9/5/14

to scrapy...@googlegroups.com

Review your code again because the settings are working fine.
https://gist.github.com/nramirezuy/e75d8c041b07a8edb44f

Message has been deleted

Hartley Brody

unread,

Sep 5, 2014, 4:45:27 PM9/5/14

to scrapy...@googlegroups.com

Tried the -s flag, still seeing INFO loglines:

$> scrapy crawl detail -s LOG_LEVEL=WARNING

2014-09-05 16:40:46-0400 [scrapy] INFO: Scrapy 0.24.4 started (bot: detail)

2014-09-05 16:40:46-0400 [scrapy] INFO: Optional features available: ssl, http11

2014-09-05 16:40:46-0400 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'crawler.spiders', 'LOG_LEVEL': 'WARNING', 'SPIDER_MODULES': ['crawler.spiders'], 'BOT_NAME': 'chrome_store_crawler', 'USER_AGENT': '...', 'DOWNLOAD_DELAY': 0.3}

2014-09-05 16:40:47-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState

2014-09-05 16:40:48-0400 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats

2014-09-05 16:40:48-0400 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware

2014-09-05 16:40:48-0400 [scrapy] INFO: Enabled item pipelines: CsvExporterPipeline

2014-09-05 16:40:48-0400 [detail] INFO: Spider opened

......

Would there be any settings that would conflict with this? I'm running Scrapy v0.24.4

Hartley Brody

unread,

Sep 5, 2014, 4:51:32 PM9/5/14

to scrapy...@googlegroups.com

Here's a line where I'm writing a message to the log:

log.msg("Parsing sitemap: {0}".format(response.url), level=log.INFO)

Hartley Brody

unread,

Sep 5, 2014, 5:05:59 PM9/5/14

to scrapy...@googlegroups.com

I think I've found the issue.

I'm logging from within a spider and had called log.start() in the __init__ method of my Spider, as recommended in the docs:

http://doc.scrapy.org/en/latest/topics/logging.html#logging-from-spiders

But when I removed that line, then the logging behaves as expected, where the LOG_LEVEL setting is being honored.

Seems like calling log.start overrides the settings. I'll file a bug on the project.

You can test for this by taking the code you included above and adding `log.start()` to the init method of your spider.

Reply all

Reply to author

Forward