x/image/riff optimisation

111 views
Skip to first unread message

Alex Crane

unread,
Jan 25, 2021, 11:52:21 AM1/25/21
to golang-nuts
Hello,

I have recently been using the x/image/riff library to inspect some large RIFF files - typically I am interested in all chunks except the data chunk. I have found with large files the performance is quite poor. Looking into the code it does a `got, z.err = io.Copy(ioutil.Discard, z.chunkReader)` to get the Reader into the right place if the previous chunk data has not been fully read/drained. For an *os.File, this can understandably be very slow if the chunk data is large. I have found if I do a type assertion for io.ReadSeeker on the underlying Reader and Seek instead, in that case the performance is dramatically improved (a 99% improvement in CPU time in a benchmark with a 128MB file). There can also be an optimisation to recognise when a chunk is the last chunk and therefore not need to drain it (can be significant for many media RIFF types which tend to put the large data chunk last) before returning io.EOF, but with the Seek optimisation this is less significant.

I can attach the diff. It's rather small. Is this something that is worth contribution? I'd rather not maintain a branch, though this would be my first contribution, so require some setup I'd rather not do if this isn't considered worth contributing.

Thanks,
Alex

Ian Lance Taylor

unread,
Jan 25, 2021, 6:12:18 PM1/25/21
to Alex Crane, golang-nuts, Nigel Tao
On Mon, Jan 25, 2021 at 8:52 AM Alex Crane <alex.c...@gmail.com> wrote:
>
> I have recently been using the x/image/riff library to inspect some large RIFF files - typically I am interested in all chunks except the data chunk. I have found with large files the performance is quite poor. Looking into the code it does a `got, z.err = io.Copy(ioutil.Discard, z.chunkReader)` to get the Reader into the right place if the previous chunk data has not been fully read/drained. For an *os.File, this can understandably be very slow if the chunk data is large. I have found if I do a type assertion for io.ReadSeeker on the underlying Reader and Seek instead, in that case the performance is dramatically improved (a 99% improvement in CPU time in a benchmark with a 128MB file). There can also be an optimisation to recognise when a chunk is the last chunk and therefore not need to drain it (can be significant for many media RIFF types which tend to put the large data chunk last) before returning io.EOF, but with the Seek optimisation this is less significant.
>
> I can attach the diff. It's rather small. Is this something that is worth contribution? I'd rather not maintain a branch, though this would be my first contribution, so require some setup I'd rather not do if this isn't considered worth contributing.

Sounds like a reasonable patch to me. Please don't send a diff.
Please use a GitHub pull request or a Gerrit change as described at
https://golang.org/doc/contribute.html. Thanks.

Ian
Reply all
Reply to author
Forward
0 new messages