Scrapy 1.4.0 is out

Skip to first unread message

Paul Tremberth

May 23, 2017, 9:56:08 AM5/23/17
to scrapy-users

Hello Scrapy users,

we released Scrapy 1.4.0 last Thursday and we hope you will like it.

It brings a bunch of bug fixes but also a handful of new features.

response.follow: the new kid in town

Checkout the new response.follow shortcut method to properly build Request objects in your callbacks.

It is the new recommended way to do that. It’s shorter to write, and more correct.

So, instead of:

for href in response.css(' a::attr(href)').extract():
= response.urljoin(href)
yield scrapy.Request(url, self.parse, encoding=response.encoding)

you can now write this:

for a in response.css(' a'):
yield response.follow(a, self.parse)

FTP in Python 3

Scrapy finally supports FTP in Python 3, with the additional support for anonymous FTP sessions even.

Just make sure you are using at least Twisted 17.1.

Link extractors

Link extractors also got some love regarding leading and trailing whitespace.

Their behavior is now much closer to what your regular desktop browser does when following hyperlinks.

Oh, and we disabled the default canonicalization of URLs for extracted links.

It was causing more trouble for users than anything.

Referrer policy

Handling of the “Referer” HTTP header is now driven by a customizable Referrer Policy, as defined by the W3C.

Checkout the details and security implications in the dedicated docs section.

Pretty-printing your items

Scrapy 1.4 also has a new option for pretty-printing items when you export to JSON or XML.

By default, you still have items on their own line. But you can also get a more human-readable output with a non-negative FEED_EXPORT_INDENT.

To get a pretty-printed JSON with an indentation of two spaces, you run:

$ scrapy crawl yourspider -o items.json -s FEED_EXPORT_INDENT=2

We recommend all users to update Scrapy to version 1.4.0.

Pip users:

$ pip install --upgrade scrapy

Conda users:

$ conda install -c conda-forge scrapy=1.4.0

Check out the release notes for the full changelog.

Happy scraping!

/Paul, for the Scrapy team
Reply all
Reply to author
0 new messages