help in contributing for GSOC2017

127 views
Skip to first unread message

SHANKAR JHA

unread,
Feb 25, 2017, 4:58:12 PM2/25/17
to scrapy-users
Hi Scrapians,

I am studying computer science and newbie in GSoC. I always want to contribute to open source project.

I just found  "new HTTP/1.1 downloader handler" in scrapy project ideas and I am very excited to work on this.
I am good in python and HTTP but never worked on twisted.

Can anyone help me and show me the direction, So that I can start working on my project.

With Due Regards,
Shankar Jha




Paul Tremberth

unread,
Feb 27, 2017, 6:47:58 AM2/27/17
to scrapy...@googlegroups.com
Hello Shankar,

It's great to know you're looking forward to participating in GSoC 2017!
For the HTTP/1.1 downloader project, I suggest that you get familiar with how scrapy uses Twised in [1]

Scrapy uses Twisted Agent [2] and customizes it to handle proxies with the CONNECT method, TLS connections without verifying peer certificates or using a specific TLS method, etc.

Note that some of the description for the project [3] is not up-to-date as I re-read it.
Especially, scrapy does not ship with Twisted code anymore in scrapy.xlib.tx

Also, you can check the open issues around HTTP in GitHub.
For example, there's an old ticket about handling responses without a reason phrase [4]
Here is a recent Pull Request to Twisted [5] by one of the contributors to Scrapy, namely Rolango, to be able to customize the HTTP client parser.
This could be a pre-requisite for the GSoC project.

I would also say that Scrapy HTTP/1.1 download handler needs more thorough tests, with all the various good and bad practices from web servers, especially for HTTP proxies and TLS connections.
Just to name a few:
- servers that never respond
- servers that send less bytes than advertized [6]
- servers can be very slow, or throttling a lot

Some of these tests are already implemented, some of them are less robust or incomplete (see [7])

Finally, as bonus points, it would be great to see how far Scrapy is from supporting an HTTP 2 client (see [8])

Hope this helps,
Paul.



--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Anshul Malik

unread,
Mar 4, 2017, 4:59:00 AM3/4/17
to scrapy-users
Hey there!

I am Anshul studying Computer Engineering in India. I am also planning to work on "New HTTP/1.1 download handler" project.
I have already read the guidelines mentioned by Paul. So I am now working on outlining the plan.

Is there anything that I need to know or would be best for me If I do check that out?

Thanks
Anshul

Paul Tremberth

unread,
Mar 6, 2017, 11:00:38 AM3/6/17
to scrapy-users
Hi Anshul,
hi Shankar

Twisted recently fixed one of the issues Scrapy had with the standard HTTP/1.1 downloader,
namely to handle HTTP responses without a reason phrase: https://github.com/twisted/twisted/pull/723

Rolando also fixed a couple of issues related to HTTP responses:

The tests have also been improved a bit for these cases.

There's still for example an issue with HTTP 100 responses (https://github.com/scrapy/scrapy/issues/544)
but it also looks like the new HTTP/1.1 downloader idea is loosing some of it substance.
Maybe the focus for GSoC should be on HTTP/2 support. I'm not sure.

Regards,
Paul.

Anshul Malik

unread,
Mar 6, 2017, 1:20:50 PM3/6/17
to scrapy...@googlegroups.com
Hi Paul,

So can we think of HTTP/2 support for scrapy as a GSoC project?

Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/UN5P1AJMKxk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users+unsubscribe@googlegroups.com.

Paul Tremberth

unread,
Mar 6, 2017, 1:27:35 PM3/6/17
to scrapy...@googlegroups.com
Hi Anshul,

Indeed you can present a GSoC application around adding HTTP/2 support to Scrapy.

The ideas page that is currently online is just a list of proposals,
and you can certainly come up with another idea altogether.

Don't hesitate to interact here or open an issue on GitHub to discuss design options.

Best,
Paul.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.

Atul Krishna

unread,
Mar 10, 2017, 5:49:22 AM3/10/17
to scrapy-users
Hi Paul,
This is Atul Krishna, a undergraduate from computer science department, MAKAUT, India. I am also interested in "new HTTP/1.1 downloader handler" in scrapy project ideas. I would love to work on "adding HTTP/2 support to Scrapy". Since i am a newbie it would be great if you guide on how to start with HTTP/2 support to Scrapy.
Regards,
Atul Krishna
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.

To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.

Paul Tremberth

unread,
Mar 10, 2017, 6:07:24 AM3/10/17
to scrapy...@googlegroups.com
Hey Atul,

I'm afraid I haven't looked at all at HTTP/2 support in Twisted (nor am I familiar enough with HTTP/2 in general)
Here's the open issue on GitHub for scrapy: https://github.com/scrapy/scrapy/issues/1854

There's an example from hyper-h2 showing how to use the library with Twisted:

You can have a look at how download handlers are implemented in Scrapy to see how that would fit in:

Hope this helps,
Paul.


To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages