Hi Pieter,
Regardless, you're right that rather than instead of interleaving the tile sizes between the tile data, you could also put it before all actual tile data. I don't think this matters much in terms of complexity. In both cases, the offsets are directly readable:
offset[0] = 0;
for (tile = 1; tile < n_tiles; tile++)
offset[tile] = offset[tile - 1] + read_tile_size(tile - 1);
were read_tile_size(n) is either le32(&data_start[n * 4]) (your approach) or le32(&data_start[offset[n]]) (current approach). The reason we went with this approach is so that encoders can start streaming data without having finished encoding the full frame (particularly with row tiling enabled), since they don't need to know the size of all tiles before the header partition is finished - the size of each tile is just part of its own tile data, not of the header partition. In that design, end-to-end latency can theoretically be improved by (n-1)/(n*fps) seconds on the encoder side, where n is the number of row tiles. This doesn't affect the decoder.
HTH,
Ronald