Mandatory elem thumbnail missing!

44 views
Skip to first unread message

Cristian Godoy

unread,
Nov 29, 2016, 4:14:25 PM11/29/16
to django-dynamic-scraper
Hello Holger:
I am having the following problem when downloading images.
Attached configuration and code captures.
Thank you very much in advance.

cristiang@cristiang-MS-7592 ~/django-projects/django-dynamic-scraper/example_project $ scrapy crawl article_spider -a id=1 -a do_action=yes
2016-11-29 14:59:55 [scrapy] INFO: Scrapy 1.1.3 started (bot: open_news)
2016-11-29 14:59:55 [scrapy] INFO: Overridden settings: {'LOG_STDOUT': True, 'SPIDER_MODULES': [u'dynamic_scraper.spiders', u'open_news.scraper'], 'USER_AGENT': u'open_news/1.0', 'BOT_NAME': u'open_news'}
2016-11-29 14:59:55 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-11-29 14:59:55 [root] INFO: Django settings used: example_project.settings
2016-11-29 14:59:55 [root] INFO: Use THUMBS images store format (save only the thumbnail images)
2016-11-29 14:59:55 [root] INFO: Runtime config: do_action True
2016-11-29 14:59:55 [root] INFO: Spider for NewsWebsite "Wikinews" (1) initialized.
2016-11-29 14:59:55 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-11-29 14:59:55 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-11-29 14:59:55 [scrapy] INFO: Enabled item pipelines:
[u'dynamic_scraper.pipelines.DjangoImagesPipeline',
 u'dynamic_scraper.pipelines.ValidationPipeline',
 u'open_news.scraper.pipelines.DjangoWriterPipeline']
2016-11-29 14:59:55 [scrapy] INFO: Spider opened
2016-11-29 14:59:55 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-29 14:59:55 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-29 14:59:56 [scrapy] DEBUG: Redirecting (301) to <GET https://en.wikinews.org/wiki/Main_Page> from <GET http://en.wikinews.org/wiki/Main_Page>
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (200) <GET https://en.wikinews.org/wiki/Main_Page> (referer: None)
2016-11-29 14:59:57 [root] INFO: Starting to crawl item 1 from page 1.
2016-11-29 14:59:57 [root] DEBUG: url                 'http://en.wikinews.org/wiki/Judge_jails_%27monstrous%27_London_serial_killer_Stephen_Port'
2016-11-29 14:59:57 [root] DEBUG: description         'More than a year after he was first charged, a judge on Friday sentenced London serial killer Stephen Port to life imprisonment without parole for four murders and a host of poisoning and sexual offences, calling him "wicked and monstrous". Port was convicted of the murders on Wednesday.'
2016-11-29 14:59:57 [root] DEBUG: title               'Judge jails 'monstrous' London serial killer Stephen Port'
2016-11-29 14:59:57 [root] DEBUG: thumbnail           'http://upload.wikimedia.org/wikinews/en/thumb/1/1a/StephenPortMugshot.jpg/100px-StephenPortMugshot.jpg'
2016-11-29 14:59:57 [root] INFO: Starting to crawl item 2 from page 1.
2016-11-29 14:59:57 [root] DEBUG: url                 'http://en.wikinews.org/wiki/Telegram_introduces_blogging_and_instant_view_features'
2016-11-29 14:59:57 [root] DEBUG: description         'Instant messaging application Telegram announced publishing service'
2016-11-29 14:59:57 [root] DEBUG: title               'Telegram introduces blogging and instant view features'
2016-11-29 14:59:57 [root] DEBUG: thumbnail           'http://upload.wikimedia.org/wikinews/en/thumb/e/e4/Instant_view.png/56px-Instant_view.png'
2016-11-29 14:59:57 [root] INFO: Starting to crawl item 3 from page 1.
2016-11-29 14:59:57 [root] DEBUG: url                 'http://en.wikinews.org/wiki/Gerrard_announces_retirement_from_professional_football'
2016-11-29 14:59:57 [root] DEBUG: description         'On Thursday, former English and Liverpool F.C. football captain Steven Gerrard announced retirement from professional football.'
2016-11-29 14:59:57 [root] DEBUG: title               'Gerrard announces retirement from professional football'
2016-11-29 14:59:57 [root] DEBUG: thumbnail           'http://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/Gerrard_celebrates_his_second_goal_v_Everton.jpg/100px-Gerrard_celebrates_his_second_goal_v_Everton.jpg'
2016-11-29 14:59:57 [root] INFO: Starting to crawl item 4 from page 1.
2016-11-29 14:59:57 [root] DEBUG: url                 'http://en.wikinews.org/wiki/Mosque_vandalized_near_Seattle,_Washington'
2016-11-29 14:59:57 [root] DEBUG: description         'A mosque in Redmond, Washington, US, was vandalised.'
2016-11-29 14:59:57 [root] DEBUG: title               'Mosque vandalized near Seattle, Washington'
2016-11-29 14:59:57 [root] DEBUG: thumbnail           'http://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg/100px-Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg'
2016-11-29 14:59:57 [root] INFO: Starting to crawl item 5 from page 1.
2016-11-29 14:59:57 [root] DEBUG: url                 'http://en.wikinews.org/wiki/Hoene%C3%9F_re-elected_as_FC_Bayern_president'
2016-11-29 14:59:57 [root] DEBUG: description         'On Friday, Uli Hoeneß was re-elected as FC Bayern Munich's president at the club's Annual General Meeting, the German football club announced.'
2016-11-29 14:59:57 [root] DEBUG: title               'Hoeneß re-elected as FC Bayern president'
2016-11-29 14:59:57 [root] DEBUG: thumbnail           'http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Uli_Hoene%C3%9F_2503.jpg/85px-Uli_Hoene%C3%9F_2503.jpg'
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (301) <GET http://upload.wikimedia.org/wikinews/en/thumb/1/1a/StephenPortMugshot.jpg/100px-StephenPortMugshot.jpg> (referer: None)
2016-11-29 14:59:57 [scrapy] WARNING: File (code: 301): Error downloading file from <GET http://upload.wikimedia.org/wikinews/en/thumb/1/1a/StephenPortMugshot.jpg/100px-StephenPortMugshot.jpg> referred in <None>
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (301) <GET http://upload.wikimedia.org/wikinews/en/thumb/e/e4/Instant_view.png/56px-Instant_view.png> (referer: None)
2016-11-29 14:59:57 [scrapy] WARNING: File (code: 301): Error downloading file from <GET http://upload.wikimedia.org/wikinews/en/thumb/e/e4/Instant_view.png/56px-Instant_view.png> referred in <None>
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (301) <GET http://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/Gerrard_celebrates_his_second_goal_v_Everton.jpg/100px-Gerrard_celebrates_his_second_goal_v_Everton.jpg> (referer: None)
2016-11-29 14:59:57 [scrapy] WARNING: File (code: 301): Error downloading file from <GET http://upload.wikimedia.org/wikipedia/commons/thumb/5/5d/Gerrard_celebrates_his_second_goal_v_Everton.jpg/100px-Gerrard_celebrates_his_second_goal_v_Everton.jpg> referred in <None>
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (301) <GET http://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg/100px-Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg> (referer: None)
2016-11-29 14:59:57 [scrapy] WARNING: File (code: 301): Error downloading file from <GET http://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg/100px-Vandalism_at_Muslim_Association_of_Puget_Sound_02.jpg> referred in <None>
2016-11-29 14:59:57 [root] ERROR: Mandatory elem thumbnail missing!
2016-11-29 14:59:57 [scrapy] WARNING: Dropped:
{u'description': u'More than a year after he was first charged, a judge on Friday sentenced London serial killer Stephen Port to life imprisonment without parole for four murders and a host of poisoning and sexual offences, calling him "wicked and monstrous". Port was convicted of the murders on Wednesday.',
 u'thumbnail': None,
 u'title': u"Judge jails 'monstrous' London serial killer Stephen Port",
 u'url': u'http://en.wikinews.org/wiki/Judge_jails_%27monstrous%27_London_serial_killer_Stephen_Port'}
2016-11-29 14:59:57 [scrapy] DEBUG: Crawled (301) <GET http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Uli_Hoene%C3%9F_2503.jpg/85px-Uli_Hoene%C3%9F_2503.jpg> (referer: None)
2016-11-29 14:59:57 [scrapy] WARNING: File (code: 301): Error downloading file from <GET http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Uli_Hoene%C3%9F_2503.jpg/85px-Uli_Hoene%C3%9F_2503.jpg> referred in <None>
2016-11-29 14:59:57 [root] ERROR: Mandatory elem thumbnail missing!
2016-11-29 14:59:57 [scrapy] WARNING: Dropped:
{u'description': u'Instant messaging application Telegram announced publishing service',
 u'thumbnail': None,
 u'title': u'Telegram introduces blogging and instant view features',
 u'url': u'http://en.wikinews.org/wiki/Telegram_introduces_blogging_and_instant_view_features'}
2016-11-29 14:59:57 [root] ERROR: Mandatory elem thumbnail missing!
2016-11-29 14:59:57 [scrapy] WARNING: Dropped:
{u'description': u'On Thursday, former English and Liverpool F.C. football captain Steven Gerrard announced retirement from professional football.',
 u'thumbnail': None,
 u'title': u'Gerrard announces retirement from professional football',
 u'url': u'http://en.wikinews.org/wiki/Gerrard_announces_retirement_from_professional_football'}
2016-11-29 14:59:57 [root] ERROR: Mandatory elem thumbnail missing!
2016-11-29 14:59:57 [scrapy] WARNING: Dropped:
{u'description': u'A mosque in Redmond, Washington, US, was vandalised.',
 u'thumbnail': None,
 u'title': u'Mosque vandalized near Seattle, Washington',
 u'url': u'http://en.wikinews.org/wiki/Mosque_vandalized_near_Seattle,_Washington'}
2016-11-29 14:59:57 [root] ERROR: Mandatory elem thumbnail missing!
2016-11-29 14:59:57 [scrapy] WARNING: Dropped:
{u'description': u"On Friday, Uli Hoene\xdf was re-elected as FC Bayern Munich's president at the club's Annual General Meeting, the German football club announced.",
 u'thumbnail': None,
 u'title': u'Hoene\xdf re-elected as FC Bayern president',
 u'url': u'http://en.wikinews.org/wiki/Hoene%C3%9F_re-elected_as_FC_Bayern_president'}
2016-11-29 14:59:57 [scrapy] INFO: Closing spider (finished)
2016-11-29 14:59:57 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1933,
 'downloader/request_count': 7,
 'downloader/request_method_count/GET': 7,
 'downloader/response_bytes': 21964,
 'downloader/response_count': 7,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/301': 6,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 11, 29, 20, 59, 57, 973962),
 'item_dropped_count': 5,
 'item_dropped_reasons_count/DropItem': 5,
 'log_count/DEBUG': 28,
 'log_count/ERROR': 5,
 'log_count/INFO': 16,
 'log_count/WARNING': 10,
 'response_received_count': 6,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 11, 29, 20, 59, 55, 568398)}
2016-11-29 14:59:57 [scrapy] INFO: Spider closed (finished)
Captura de pantalla de 2016-11-29 18-13-28.png
Captura de pantalla de 2016-11-29 18-13-15.png
Captura de pantalla de 2016-11-29 18-12-58.png
Captura de pantalla de 2016-11-29 18-12-45.png

Cristian Godoy

unread,
Nov 30, 2016, 12:24:05 PM11/30/16
to django-dynamic-scraper
The problem was that I added http pre_url while wikinews uses https.
Thank you anyway.
Reply all
Reply to author
Forward
0 new messages