> If it's on disk, i can lseek() over it without ever reading data I'm not
> interested in.
> If it's coming in from some stream, it's either in an input buffer and I can
> skip
> over a known number of bytes, or I could instruct the reader to discard data
> early on.
I see. This case is useful when data stored on disk or something
multiple readable. However, just fast travelling across the data with
seek is meaningless. If your data is structured, there will be some
IDs, keys or something unique that makes one value looks not as
others. If there is no such, probably, you'd like to make something.
In anyway, in your application you'll never operate with file offsets,
you'd like to use something meaningful. This leads you to build index
of your data. This may be separate file, also ubjson with single
object container that holds meaningful value as key and data offset as
value. Such indexes helps you not only scan ubjson data faster, but
also jump to any valuable point.
Lets say, you got next UBJSON data:
[[i\x01i\x02i\x03][i\x04i\x05i\x06][i\x07i\x08i\x0b]]
After first read, you'll make the index, that points to the each arrays:
{C1i\x01C2i\x09C3i\x11}
which gives you after decoding: {'1':1, '2':9, '3':17}
first array at offset 1
second array at offset 9
last array at offset 17
So, you may jump to any array at any time by using meaningful key
without need to read whole file from start, even with ability to skip
large containers. More profit you'll receive as more structured and
complex data you'll have. However, indexes are a out of scope of
UBJSON format, so feel free to design your own that fits you. (: Hope
this helps.
--
,,,^..^,,,