This looks very nice!
There's 2 things that immediately popped up, but I think both are covered. Also please excuse the unstructured mess that follows :)
- Allow ranged requests (with an offset and size), this is mentioned in the docs but probably not yet in the fetch_attr struct, one question there would be how such a byte range would be represented in the IndexedDB cache and the local file system. I guess a byte-range downloaded from the HTTP server would also only update a byte-range in the local version of the file?
The main problems and technical challenges we are seeing in our games all revolve around cache-control and CDNs. On one hand we need 'content version ids', all the expiration-time based caching mechanisms are are fairly useless to us, because a specific game-version (client and server versions) expects data files exactly matching a specific version.
What we usually do for content-versioning is this:
- set the cache-control max-age to some time far in the future, so that all expiration-time-based cache heuristics are (hopefully) disabled
- have some sort of version id for each data file (usually a content hash or a simple version number), and add this to the URL somehow, one way is to add the version id as an URL parameter (e.g. ?id=xxx), or make the version id part of the file name
We are using a MD5-hash of the file content as version-id (hence content-id), another option would be an incremental version number string, or the ETag provided by the web server.
- the content-id could be used as ETag to let the web server know about the file version the client already has (however, I haven't worked with ETags yet, but plan to look into this, so my understanding may be wrong)
- the content-id should be associated with the file content that's already in the IndexedDB or even in memory-filesystem (the memory-filesystem would then also act as some sort of 1st-level-cache), it would be cool to query the IndexedDB backing store by filename *and* content-id (basically do you already have *this* file with *this* specific version)
In general we use our MD5-hash content-id for all sorts of things in our own HTTP filesystem that we have layered on top of HTTP requests:
- replace all expiration-time based cache-control things with explicit versioning via content-id (we add the content-id as an URL parameter, so that the same file with 2 different content-id's appear as different 'things' to any caching layer (e.g. CDNs)
- use the content-id in our own local cache implementation to decide whether the client already has the right version in the local cache, in this case we don't need to do an HTTP request at all
- when a file is downloaded or loaded from cache, compute the actual MD5 hash and compare it to the expected hash to check whether the download or cache content is corrupted (or more likely has been tampered with)
Here's a couple (not really thought-through) ideas how such a content-id could be integrated with the fetch API:
- add a general 'ETag' behaviour flag in the fetch_attr_t flags field, in this case, the ETag coming from the HTTP response would be used as content-id from here on, otherwise, a custom content-id string can be provided through a new 'const char*' in the fetch_attr_t struct
- the content-id value should always be available / associated-with the actual data, for instance when the data is stored in the IndexedDB, the content-id should be stored with it, or the content-id should be available to user-callbacks (e.g. in the emscripten_fetch_t struct)
Provide a small set of 'fetch_cache' functions to query the IndexedDB (and in-memory-filesystem?), for instance (omitting the emscripten_ here):
fetch_cache_exists("name", "content-id"): check if an entry in the cache with matching content-id exists, if it exists, I don't need to do an HTTP request to retrieve it, content-id could be optional in this case just the items existance is checked
fetch_cache_get_contentid("name"): if the item exists in the local cache, get its content id
fetch_cache_invalidate("name", "content-id"): invalidate/delete an item in the caching layer, again, if content-id is not provided, only the name is checked
Ok, so that's about content-ids, what follows now is a brain dump about a way to load data from web servers, but without having implemented that yet:
There's currently 2 popular ways to load data from web servers in games:
1) as asset bundles: advantage is better compressibility, much less protocol overhead during download (HTTP request/response header overhead), much more efficient storage on local device, disadvantage: if one byte in a bundle changes, it must be completely redownloaded, and bundles usually need to be downloaded before a game or level is started, they are not useful for granular on-demand-streaming
2) as unique files: advantages are: much less overhead when files are streamed on demand during game play, if a specific texture is needed at one point during gaming, only that texture is downloaded, not the whole bundle containing that texture, same with versioning, if one specific file has been updated in a new game version, only that one file needs to be downloaded, disadvantage: much higher protocol-overhead, at least with HTTP (much better with HTTP2), usually less compressibility, and less efficient local storage
What I would *like* to try out at some point is some sort of block-based on-demand-paging all the way from the web server to the local cache (or even to the in-memory representation). HTTP downloads would work as range-based HTTP requests on blocks, or consecutive ranges of blocks, and the job of the 'HTTP filesystem' would be to gradually bring the local mirror of the 'page file' to the same state as the remote mirror, and only update blocks that are out-of-date.
However I would first try this on a native platform where I would have direct filesystem access. I'm not sure if this idea collides with a browser's own cache implementation.
Generally I think emscripten's fetch API should remain as general-purpose as possible, without any of those ideas 'baked in', but it should enable the implementation of such ideas. I think associating a content-id with a data item is still within the area of being 'general purpose' as long as no assumptions are made, what the content-id actually is.
Ok, that's all I can think of so far, apologies for the unstructured brain dump :)
Cheers,
-Floh.