Hello everyone!
I’m pleased to announce that we have finally reached our very first major release for Scrapy. This important milestone comes along with a lot of changes aiming for Scrapy’s stability as a project, such as improved documentation, multiple bugfixes and broad code refactoring, as well as a few longtime expected features and enhancements.
Scrapy 1.0 was a combined effort of several months of work, made possible by all developers and users who contributed to this project. We want to specially thank all collaborators who took part in this release, we’re really grateful for such amazing work from all of you.
Here’s a quick glance of some introduced utilities you may find useful:
import scrapy
class MySpider(scrapy.Spider):
# … custom_settings = {
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
}
def parse(self, response):
for href in response.xpath(‘//h2/a/@href’).extract():
full_url = response.urljoin(href)
yield scrapy.Response(full_url, callback=self.parse_post)
def parse_post(self, response):
yield {
‘title’: response.xpath(‘//h1’).extract_first(),
‘body’: response.xpath(‘//div.content’).extract_first(),
}This example shows how to use
urljoin,
extract_first and
custom_settings features, as well as the support for returning explicit dictionaries within your spiders. Make sure to check the latest
Release Notes for the whole changelog.
Scrapy
1.0.0rc1 is already available at PyPI, with this pre-release we’ll try to catch as many last-minute issues as we can before rolling out the official release next week. We encourage everyone to try it out!
Install or upgrade Scrapy to the new release candidate with pip by running this command:
$ pip install Scrapy==1.0.0rc1Until the official 1.0 release next week, pip will install the latest stable release (0.24.6) unless you explicitly specify otherwise.