Hi ,I'm working towards adding Python 3 support to scrapy. I went through a lot of blogs and projects related to adding Python 3 support and found that currently twisted is also working towards creating a version of twisted that is source-compatible with Python 2.6, Python 2.7, and Python 3.3 [1]. There are various tools like "2to3" that read Python 2.x source code and appliy a series of fixers to transform it into valid Python 3.x code. Although it is more helpful for those who are porting to Python 3 rather than adding support for it.Currently, I'm working towards a plan on how all this should be carried out and how much time each part of scrapy would take. Also I'm reading through [2] to see what all changes are required.I also had some questions:1. Why don't we completely port scrapy to Python 3 rather than adding support for it ? Would it be to much for a GSoC Project ?
It would likely result in a cleaner code as compared to adding support.
2. Is it recommended to use tools like 2to3 to convert the code ?
On twisted page [1] they mention not to use the tool whereas various projects and also the website [2] recommend its use.
The recommended way is to use "six" Python module. Some parts of Scrapy are already ported to Python 3 - see e.g. https://travis-ci.org/scrapy/scrapy/jobs/54761340 - 235 tests pass in Python 3.3. To get started try cloning Scrapy and running some tests using tox (as described in docs).
You can also check https://github.com/scrapy/scrapy/blob/master/tests/py3-ignores.txt file - try uncommenting something and run tests again to see what's not ported. We can't rely only on tests when porting, but they are a good start.
This URL encoding thing is where we stopped. Without having a solid solution we can't port scrapy.Request, and without scrapy.Request most other Scrapy components don't work.
2to3
is to run 2to3
on the code once and then fix it up until it works on Python 3. Only then introduce Python 2 support into the Python 3 code, using six
where needed. Add support for Python 2.7 first, and then Python 2.6. Doing it this way can sometimes result in a very quick and painless process."Sir,I have learned the differences between Python 2 and Python 3. I have created a google doc (https://docs.google.com/document/d/1xf7OtuyB5b6npCOLalZ-yjPZEcoKNb19iimfElyDino/edit) in which I have written the common porting errors which I could find after going through various blogs and projects and there corresponding syntax corrections. You can add your valuable suggestions or anything that I have missed out to it by directly going to the link and editing it. Do tell me if you find something wrong with the approach.The recommended way is to use "six" Python module. Some parts of Scrapy are already ported to Python 3 - see e.g. https://travis-ci.org/scrapy/scrapy/jobs/54761340 - 235 tests pass in Python 3.3. To get started try cloning Scrapy and running some tests using tox (as described in docs).I got some errors while setting up scrapy and found out that I had to install libssl-dev, libffi-dev, python-dev and libxml2-dev. As mentioned on (http://stackoverflow.com/questions/17611324/error-when-installing-scrapy-on-ubuntu-13-04).
Shouldn't these be added to the scrapy requirements ? Should I create an issue relating to this ? I'm currently working on Ubuntu 14.04.
You can also check https://github.com/scrapy/scrapy/blob/master/tests/py3-ignores.txt file - try uncommenting something and run tests again to see what's not ported. We can't rely only on tests when porting, but they are a good start.This is great ! Would really help me in planning my strategy.This URL encoding thing is where we stopped. Without having a solid solution we can't port scrapy.Request, and without scrapy.Request most other Scrapy components don't work.Handling binary data is the most trickiest issue that people face in supporting Python 2 and Python 3. So the first thing to do would be to find the best solution for URL encoding. Only then we would be able to port other scrapy components.
So I should first take a look at the w3lib project.As quoted in the book (http://python3porting.com/strategies.html#python-2-and-python-3-without-conversion):"My recommendation for the development workflow if you want to support Python 3 without using2to3
is to run2to3
on the code once and then fix it up until it works on Python 3. Only then introduce Python 2 support into the Python 3 code, usingsix
where needed. Add support for Python 2.7 first, and then Python 2.6. Doing it this way can sometimes result in a very quick and painless process."Is this the recommended method ?