heuristics for d/l and seeking into http-served files

47 views
Skip to first unread message

Sebastien Binet

unread,
Mar 1, 2022, 5:55:19 AM3/1/22
to golang-nuts
hi there,

I am trying to give access to files over http(s), exposing the resource
as an io.ReaderAt (via Accept-Ranges):

- https://github.com/go-hep/hep/compare/main...sbinet-hep:groot-httpio-pread-cache?expand=1

despite my "best" attempts (implementing a parallel-reader-at, getting
bytes-ranges (see preader.go), and implementing a local cache file, fed
with and while the preader is fetching remote blocks), I don't manage to
recoup performance numbers similar to just downloading the whole file
locally and using that.

I could of course just do that (download the whole file locally and use
that) as it is what's faster when the user application will read through
the whole file, but that user application may just want to look at the
header+footer data of the file and be done with it.
it seems wasteful to download 100Mb, 1G or 4G worth of (CERN/LHC) data
just to display metadata of the file.

are there some heuristics for how to scale buffers, chunks and number of
goroutines when handling remote files (over http(s)) from the client
side?
I guess those heuristics would depend on the file size?
Any other obvious thing I might have missed?

(in the meantime, I guess I'll try to default each "ReadAt" request to a
minimal buffer size, say 16*1024*1024, to minimize requests when
crawling through a file)

-s
Reply all
Reply to author
Forward
0 new messages