Hi together,
First the important thing:
I searched a long time for an scraper that works for me (scraper newbie, no coding know how) and after a long search i found portia. It is great and thanks for your work!
Now my problem:
I actually working with portia 16.02 in a vagrant enviroment (on windows 10). I create a spider for a small private project.
The spider checks an resultspage (as start page) for an flightroute between OSL (Oslo) and LAX (Los Angeles) on a special date on
expedia.de (heavy js...).
I annotate the site and the sample looks great (i added another startpage and the sample looks also fine). So i tried in the virtual machine an portiacrawl to check the export:
/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py:115: ScrapyDeprecationWarning: SPIDER_MANAGER_CLASS option is deprecated. Please use SPIDER_LOADER_CLASS.
self.spider_loader = _get_spider_loader(settings)
2016-02-26 14:12:10 [scrapy] INFO: Scrapy 1.0.0 started (bot: scrapybot)
2016-02-26 14:12:10 [scrapy] INFO: Optional features available: ssl, http11
2016-02-26 14:12:10 [scrapy] INFO: Overridden settings: {'FEED_FORMAT': 'xml', 'FEED_URI': 'test.xml', 'DUPEFILTER_CLASS': 'scrapyjs.SplashAwareDupeFilter'}
2016-02-26 14:12:10 [scrapy] INFO: Enabled extensions: CloseSpider, FeedExporter, TelnetConsole, LogStats, CoreStats, SpiderState
2016-02-26 14:12:13 [py.warnings] WARNING: /usr/local/lib/python2.7/dist-packages/scrapyjs/middleware.py:8: ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more.
from scrapy import log
2016-02-26 14:12:13 [py.warnings] WARNING: /usr/local/lib/python2.7/dist-packages/scrapyjs/dupefilter.py:8: ScrapyDeprecationWarning: Module `scrapy.dupefilter` is deprecated, use `scrapy.dupefilters` instead
from scrapy.dupefilter import RFPDupeFilter
2016-02-26 14:12:13 [py.warnings] WARNING: /usr/local/lib/python2.7/dist-packages/scrapyjs/cache.py:11: ScrapyDeprecationWarning: Module `scrapy.contrib.httpcache` is deprecated, use `scrapy.extensions.httpcache` instead
from scrapy.contrib.httpcache import FilesystemCacheStorage
2016-02-26 14:12:13 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, PageActionsMiddleware, CookiesMiddleware, SlybotJsMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-02-26 14:12:13 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-02-26 14:12:13 [scrapy] INFO: Enabled item pipelines: DupeFilterPipeline
2016-02-26 14:12:13 [scrapy] INFO: Spider opened
2016-02-26 14:12:13 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-02-26 14:12:13 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-02-26 14:12:14 [scrapy] DEBUG: Redirecting (301) to <GET https://www.expedia.de/Flight-SearchResults?inpPackageType=FLIGHT_ONLY&inpInfants=2&inpFlightClass=2&inpDepartureDates=17.03.2016&inpDepartureDates=24.03.2016&inpDepartureTimes=362&inpDepartureTimes=362&inpFlightRouteType=3&inpHotelRoomCount=1&inpFlightAirlinePreference=&inpAdultCounts=1&inpIsNonstopOnly=N&intcp=0&inpChildCounts=0&action=FlightSearchResults%40searchFlights&inpSeniorCounts=0&inpRefundableFlightsOnly=N&inpSortType=0&inpDepartureLocations=Oslo%2C+Norwegen+(OSL-Alle+Flugh%C3%A4fen)&inpDepartureLoc&inttkn=NfXCUxEImb3egksh> from <GET https://goo.gl/6u5fEm>
2016-02-26 14:12:14 [scrapy] DEBUG: Crawled (200) <GET https://www.expedia.de/Flight-SearchResults?inpPackageType=FLIGHT_ONLY&inpInfants=2&inpFlightClass=2&inpDepartureDates=17.03.2016&inpDepartureDates=24.03.2016&inpDepartureTimes=362&inpDepartureTimes=362&inpFlightRouteType=3&inpHotelRoomCount=1&inpFlightAirlinePreference=&inpAdultCounts=1&inpIsNonstopOnly=N&intcp=0&inpChildCounts=0&action=FlightSearchResults%40searchFlights&inpSeniorCounts=0&inpRefundableFlightsOnly=N&inpSortType=0&inpDepartureLocations=Oslo%2C+Norwegen+(OSL-Alle+Flugh%C3%A4fen)&inpDepartureLoc&inttkn=NfXCUxEImb3egksh> (referer: None)
2016-02-26 14:12:15 [scrapy] DEBUG: Crawled (200) <GET https://www.expedia.de/Flight-SearchResults?inpPackageType=FLIGHT_ONLY&inpInfants=2&inpFlightClass=2&inpDepartureDates=17.03.2016&inpDepartureDates=24.03.2016&inpDepartureTimes=362&inpDepartureTimes=362&inpFlightRouteType=3&inpHotelRoomCount=1&inpFlightAirlinePreference=&inpAdultCounts=1&inpIsNonstopOnly=N&intcp=0&inpChildCounts=0&action=FlightSearchResults%40searchFlights&inpSeniorCounts=0&inpRefundableFlightsOnly=N&inpSortType=0&inpDepartureLocations=Oslo%2C+Norwegen+%28OSL-Alle+Flugh%C3%A4fen%29&inpDepartureLocations=Los+Angeles%2C+CA%2C+USA+%28LAX-Los+Angeles+Intl.%29&inpArrivalLocations=Los+Angeles%2C+CA%2C+USA+%28LAX-Los+Angeles+Intl.%29&inpArrivalLocations=Oslo%2C+Norwegen+%28OSL-Alle+Flugh%C3%A4fen%29&inttkn=mrRyD4CXEnw3zhjf> (referer: None)
2016-02-26 14:12:15 [scrapy] INFO: Closing spider (finished)
2016-02-26 14:12:15 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1869,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 118471,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/301': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 2, 26, 14, 12, 15, 963588),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'log_count/WARNING': 3,
'response_received_count': 2,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2016, 2, 26, 14, 12, 13, 657551)}
Some warnings okay, after that i checked the exportfile but the export file is empty. I tested again with a specified xml files as output file but these file is also empty.
Anybody an idea why i get no export?
Thanks!
Regards
Timo