Piotr,
Thanks for your comments! You make good points about this design relative to large file sizes. Streaming bytes to file-like objects would more easily allow optimizations for large files or heavily-appended files for the reasons you state.
This system was originally designed to hold course assets, which are currently always saved in a single transaction and are on the order of 10s of MB. Also, I'd hoped to evolve the BlobStore into a web service instead of a Python module (if possible) to make it more useful across the edX platform and its various non-platform components - so I'd been avoiding returning Python constructs in the API. The API above was shaped by this plan - indeed, all these reasons have shaped the current design.
But, first, we'll implement a Python API. And I see the value in passing back a file-like Python object to enable streaming writes/reads, which gives the design flexibility in dealing with uploads/downloads of large files.
I'd still rather not use pyfilesystem directly - here's why: I'd prefer to encapsulate the usage of the file's metadata (content-type, original directory, hash value, etc.) into each storage implementation. Some storage impls will use the metadata fully, by storing the file under a directory structure based on its metadata - and possibly even save the metadata in a DB or KVS for querying, easy retrieval, and maximum introspection. Some storage impls won't use the metadata at all - or will only use one piece of metadata. The logic which decides how blobs are stored seems to belong down at a lower level, close to the actual storage. There's nothing that would prevent pyfs from being used down at the storage impl level - or boto or any other existing Python-based storage interface.
I'm definitely not proposing that directory structures be removed from all asset storage - they are important to some storage impls and I expect that most impls will use them. I'm just proposing that directory-creating/choosing and blob-location logic be pushed down out of platform and into the storage impls.
For asset cleanup, any asset deletion originating in the store itself will also need to be mirrored in the platform as well, since we'll be storing the course asset BlobLocators in the course's modulestore. I've always viewed those types of cleanups as originating from the platform itself - though I can imagine situations where storage-originated removals will happen (mistakes, storage failures).