Hello Shankar,
It's great to know you're looking forward to participating in GSoC 2017!
For the HTTP/1.1 downloader project, I suggest that you get familiar with how scrapy uses Twised in [1]
Scrapy uses Twisted Agent [2] and customizes it to handle proxies with the CONNECT method, TLS connections without verifying peer certificates or using a specific TLS method, etc.
Note that some of the description for the project [3] is not up-to-date as I re-read it.
Especially, scrapy does not ship with Twisted code anymore in scrapy.xlib.tx
Also, you can check the open issues around HTTP in GitHub.
For example, there's an old ticket about handling responses without a reason phrase [4]
Here is a recent Pull Request to Twisted [5] by one of the contributors to Scrapy, namely Rolango, to be able to customize the HTTP client parser.
This could be a pre-requisite for the GSoC project.
I would also say that Scrapy HTTP/1.1 download handler needs more thorough tests, with all the various good and bad practices from web servers, especially for HTTP proxies and TLS connections.
Just to name a few:
- servers that never respond
- servers that send less bytes than advertized [6]
- servers can be very slow, or throttling a lot
Some of these tests are already implemented, some of them are less robust or incomplete (see [7])
Finally, as bonus points, it would be great to see how far Scrapy is from supporting an HTTP 2 client (see [8])
Hope this helps,
Paul.