This way you could also upload from multiple clients or in multiple threads, each getting different chunks to work on and you don't have to bother with the actual order of chunks.
However, there are still some open questions:
The first problem here from my perspective is that you cannot specify the target size of the object before it actually has been uploaded completely (or at least the chunk at the very end). Also I'm not sure how available implementations handle that approach (e.g. if you start with the last chunk first - for pre-initializing the storage in the backend or making sure that there is enough space for the object).
Also a problem might be, that some offset goes far out of the objects size, e.g. you initially uploaded a 4 MiB object and now you upload an update chunk with "?value:2199023255552-2199023779840" (offset = 2.0 TiB, chunk size = 512 KiB)? Will the object be "filled" with zeros until the offset is being reached in the backend (could be quiet slow - in this case) or does the update deny modification with an offset bigger than "cdmi_size", which effectively would mean that my approach will not work?
Also what I am missing is the "logical size" of an object. I am a bit puzzled by the definition of "cdmi_size" is determined by the storage backend and thus may not represent the actual object's size (Deduplication? Sparse files? Opaque compression?) - Section 16.3:
The number of bytes consumed by the object. This storage system metadata item is computed by the storage system, and any attempts to set or modify it will be ignored.
...whereas Section 8.4.6 Table 16 describes for "valuerange":
The cdmi_size storage system metadata of the data object shall always indicate the complete size of the object, including zero-filled gaps.
Would be nice to hear some opinions about this from implementors (Scality? NetApp?).
Thanx,
Ancoron
Hi David,
thanx for the update on this and the link to the proposed extension.
However, some questions arise immediately:
1.) Is the "range=<byte-range>" only allowed at the first (initiating)
request?
2.) Why are "[ true | false ] | upload-id=<upload-id>" exclusive?
3.) What about relation-ship between "?value:<range>" and
"range=<byte-range>"?
4.) What about aborting a partial upload?
So, effectively this all means the following:
The client has to do things very differently if it wants to use this
extension and encounters a server that understands it. Basically even
the starting and ending phases of an upload are completely different,
not only the upload itself:
Current:
1.) "X-CDMI-Partial: true"
2.) ...upload data...
3.) "X-CDMI-Partial: false"
New:
1.) "X-CDMI-Partial: upload-id=1234; range=3145729-4194304"
2.) ...upload data...
3.) ???
So, just consider that I have a 4 MiB upload, which I want to split into
1 MiB each using 4 connections. Now who tells me that the one with the
range "3145729-4194304" will be the slowest? In other words to ensure a
consistent upload I would have to block the last chunk until all others
are finished, which is usually not what you want at a client side. Just
too much logic involved.
Now going one step further considering a client library that is a bit
more sophisticated and calculates used "?value:<range>" quite
dynamically. This scenario simply is impossible with either of the new
"completion conditions".
So why not just reusing the current start and end conditions and just
extend them:
I would rather leave the existing simple true/false approach untouched
and add a couple of headers: