agreed. built in "thumbnailing" and "zoom handling" is a must as a
"capability" of the format if it is to supplant jpeg. it's a problem
i've recently had - i can work around it and do, to an extent, things
with jpeg like decode only lower order freq to get "zoom out for free
on decode" but this requires walking a lot of IO data first (ie the
whole file) to get the lower order freqs. a mip-map-like "pre-stored
in the format" would massively help. and yes - this will bloat up
files again. in addition the need to be able to address images as
tiles - ie be able to have a tile index and be able to instantly skip
in the file TO a specific region OF a specific zoom level without
having to decode/decompress the rest of the file up until that point
imho is a must. if al this is is "hey - we have better compression" i
think it's moot. in fact if i lose the ability to skip low order freqs
on decode as above, easily, then webp is a net loss for me compare to
jpeg as generating lower-res versions on the fly on decode (where it
is actually faster to decode than the full res).
so to summarise:
1. webp chunk has info about the whole target image,along with a list
of zoom levels supported in the file and which sub chunks (offsets
mayeb for speed in seeking to them) in the image chunk contain which
zoom level.
2. each zoom level (the original image itself is just another zoom
level), contains information about its own zoom level image size, plus
size of tiles and an array of tiles info chunks (which contain offset?
maybe to the tile data chunk below).
3. each tile chunk also contains information on its size, location and
the final image data.
yes - this will make the files bigger, but it will make them markedly
more USEFUL than jpeg, and MUCH more efficient to download just the
region of the image file that contains the zoom size you need for
display (or for local files just seek to that zoom level). also for
panning around them you can just load/decode the tiles u need from
that zoom level. to make this work for dumber apps, stick the original
image as the first bit of data in the file so they can just download
until they have the entire original image chunk then close the
connection/download and throw away the extra zoom levels that come
after the original. for those smart enough they fetch just the header
to determine what is available and what they need - then "seek" to the
zoom level needed ultimately only downloading what they need. in this
scenario you cant just compare file sizes to show how much better you
are. you have to show "load times" and "total data xfer" differences.
a lot of the above will need to do several images in parallel to be
really more efficient, and for local images the download and round
trips are fairly moot. for very small images where the round trips and
time saved on skipping download of zoom levels not needed is a net
loss as the latency will kill any gains - the client can determine
what to do based on the header and do the cut-off on download as
above.