1. The "chunked body too large" message is a safety check, which protects you against a server that sends you a larger file than you're willing to handle. To download large files with SimpleAsyncHTTPClient, you must pass `max_body_size` to the AsyncHTTPClient constructor (be aware of the magic that allows AsyncHTTPClient objects to be reused; you may need to pass force_instance=True to disable this magic or ensure that you only create http clients in one place).
2. TCP is a byte-stream protocol; chunk boundaries on the server do not necessarily correspond to chunk boundaries on the client. There are many levels of "chunking" here: your code reads 1MB from the file at a time and passes it to the IOStream via self.write(). Then the IOStream sends it to the socket in smaller chunks (typically around 100KB). The network then breaks it up into packets (1500 bytes). On the other side, the client reassembles packets into larger chunks (64KB, and this is not currently configurable).
3. Why does the HTTP client's chunk size matter to you? You can take multiple 64KB chunks and assemble them into 1MB chunks if you want. This would be a little more convenient if the HTTP client's chunk size were configurable, but the performance difference should be small.
I don't understand your question about multithread or multistream. HTTP uses a single stream per request, and Tornado is a single-threaded framework.
4. The client's 64KB limit is an upper bound on chunk size. It may give you smaller chunks if that's what's in the network buffer. (this is one reason why increasing the client's chunk size may not do what you want)