You can consider running multiple spiders in parallel for the large websites. But be aware that crawling at speed of light can be considered (D)DoS against the website.
scrapy-redis (
https://github.com/darkrho/scrapy-redis) can help you to distribute the request queue through multiple spider processes. This assuming your bottleneck is the bandwidth and you run each spider in a different host. If your bottleneck is the cpu then you need to add either more power or multiple hosts.
Also, if your bottleneck is the cpu, you might want to consider fine tuning the parsing, i.e., using lxml directly instead the SgmlLinkExtractor or a lxml-based link extractor, avoid to create lists (link extractors, selectors) and use generators instead, and so on.