getting the size of an io.ReaderAt?

498 views
Skip to first unread message

Jochen Voss

unread,
Mar 31, 2023, 4:30:48 PM3/31/23
to golang-nuts
Dear all,

I am trying to get the size of an io.ReaderAt, i.e. the offset after which no more data can be read.  I have some working (?) code (https://go.dev/play/p/wTouYbaJ7RG , also reproduced below), but I am not sure whether what I do is correct and the best way to do this.  Some questions:
  • Does the standard library provide a way to get the size of an io,ReaderAt?
  • Is my code correct?
  • Is there a better way to do this?
  • If a type provides not only a ReadAt() method, but also a Size() method, would it be save to assume that Size() returns how many bytes are accessible via ReadAt()?  This seems to work for bytes.Reader and strings.Reader,  but there may be types out there where Size() does something different?
  • If ReadAt(p, x) returns io.EOF for an offset x, is it then guaranteed that then ReadAt(p, y) also returns io.EOF for all y > x?  Or could there be different error messages, or "files with holes", or whatnot?
  • Which types in the standard library provide ReadAt methods?  I know of os.File, strings.Reader, and bytes.Reader.  Any others?
For context: this is for reading the cross reference table of PDF files, which have to be located by following some convoluted route starting from the end of the PDF file.

Many thanks,
Jochen

func getSize(r io.ReaderAt) (int64, error) {
if f, ok := r.(*os.File); ok {
fi, err := f.Stat()
if err != nil {
return 0, err
}
return fi.Size(), nil
}
if b, ok := r.(*bytes.Reader); ok {
return int64(b.Size()), nil
}
if s, ok := r.(*strings.Reader); ok {
return int64(s.Size()), nil
}

buf := make([]byte, 1024)
n, err := r.ReadAt(buf, 0)
if err == io.EOF {
return int64(n), nil
} else if err != nil {
return 0, err
}

lowerBound := int64(n) // all bytes before lowerBound are known to be present
var upperBound int64   // at least one byte before upperBound is known to be missing
for {
test := 2 * lowerBound
_, err := r.ReadAt(buf[:1], test-1)
if err == io.EOF {
upperBound = test
break
} else if err != nil {
return 0, err
}
lowerBound = test
}

for lowerBound+1 < upperBound {
test := (lowerBound + upperBound + 1) / 2
_, err := r.ReadAt(buf[:1], test-1)
if err == io.EOF {
upperBound = test
} else if err != nil {
return 0, err
} else {
lowerBound = test
}
}
return lowerBound, nil
}

Bruno Albuquerque

unread,
Mar 31, 2023, 4:38:04 PM3/31/23
to Jochen Voss, golang-nuts
Not a direct answer to your question (your code looks like a reasonable implementation modulo any bugs I did not notice) but: Aren't you over engineering things? If the source is a PDF file, why not also pass the size to the function instead of doing all this work to get it? At some point you have to open the file, right? And checking the size of a file is trivial (it is just a stat call) and then you do not need to do that.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/aa604ac0-3542-4a9e-adef-a9eaecfd8170n%40googlegroups.com.

Jochen Voss

unread,
Apr 1, 2023, 4:58:28 AM4/1/23
to golang-nuts
Dear Bruno,

Thanks for your answer.  Originally I had a separate size argument for the function which opens a new reader, something like this:

    func NewReader(data io.ReaderAt, size int64, opt *ReaderOptions) (*Reader, error)

But this makes it a bit annoying to create a reader for a file, because you always need the additional Stat() call before, and also it looks a bit weird, I though.  So my idea was to move the code for checking the size into the function, and to simplify the API:

    func NewReader(data io.ReaderAt, opt *ReaderOptions) (*Reader, error)

This way the library does the work once, rather than every caller having to do this separately.

All the best,
Jochen

Brian Candler

unread,
Apr 1, 2023, 5:51:11 AM4/1/23
to golang-nuts
Could you make your function accept some other interface? e.g.

type SizeReaderAt interface {
    io.ReaderAt
    Size() int64
}

Values of this type will still be able to satisfy io.ReaderAt at point of use.

You could also use the existing io.SectionReader which has has Read(), ReadAt(), Seek() and Size() methods.

Another option is to define your own interface:
 
type StatReaderAt interface {
    io.ReaderAt
    Stat() (os.FileInfo, error)
}

This interface also already satisfied if the caller is passing a concrete *os.File. However if they aren't, then the caller has a bit of work to wrap their value so that it carries a Stat() method which returns a value that satisfies FileInfo (with 6 methods, although 5 of those could be dummies).

It's unfortunate that *os.File doesn't automatically satisfy io.SectionReader (since File doesn't have a Size() method). But I guess you could make a type which embeds *os.File and adds a Size() method which calls Stat().Size().

Brian Candler

unread,
Apr 1, 2023, 9:58:13 AM4/1/23
to golang-nuts
Or the caller can wrap the object with io.NewSectionReader(f, 0, len)
Reply all
Reply to author
Forward
0 new messages