You should try to look at the examples,
Understand the basics of how the scraper is working, then try to understand what the more complex scrapers do, like the SmarScraper
Once you've got the basics, there are some pending stuff we need to add.
Just pick one, or propose your own ideas:
- Parsing/following of robots.txt
- Turnaround for noscript tags
- WebDriver support for clientside rendered pages
- Better redirect (302) handling
- Metadata extraction from flv objects
- Support for any file format or db format you can think of that is not currently supported
I think that's about it... if you have any idea to improve Crawley that is not listed above, please feel free to comment it, I'm sure it will be accepted with open arms.
If you want to contact me more directly to ask about anything, this email address is also my Gtalk and MSN account... and you can find me on skype: kroma.harry