[go] image/jpeg: improve handling of JPEG restart markers in non-ideal cases

112 views
Skip to first unread message

Gopher Robot (Gerrit)

unread,
Feb 3, 2022, 5:53:47 AM2/3/22
to Seamus Allan, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://golang.org/doc/contribute.html#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://golang.org/s/release
for more details.

View Change

    To view, visit change 382754. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    Gerrit-Change-Number: 382754
    Gerrit-PatchSet: 1
    Gerrit-Owner: Seamus Allan <seamus...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-Comment-Date: Thu, 03 Feb 2022 10:53:42 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Seamus Allan (Gerrit)

    unread,
    Feb 3, 2022, 2:16:16 PM2/3/22
    to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

    Seamus Allan has uploaded this change for review.

    View Change

    image/jpeg: improve handling of JPEG restart markers in non-ideal cases

    Improved handling of restart (RST) markers in JPEG data. Particularly addressing issues around spurious data before a marker, but also improving robust handling of unexpected markers, following the philosophy used in Lua libjpeg.

    Fixes #40130 #41875

    Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    ---
    M src/image/jpeg/reader_test.go
    M src/image/jpeg/scan.go
    A src/image/testdata/jpeg-bad-rst-marker-001.jpeg
    3 files changed, 117 insertions(+), 26 deletions(-)

    diff --git a/src/image/jpeg/reader_test.go b/src/image/jpeg/reader_test.go
    index bf07fad..9f94c16 100644
    --- a/src/image/jpeg/reader_test.go
    +++ b/src/image/jpeg/reader_test.go
    @@ -491,6 +491,21 @@
    }
    }

    +func TestBadRestartMarker(t *testing.T) {
    + // Image sourced from: https://github.com/photoprism/photoprism/issues/1673
    + // Referenced here: https://go-review.googlesource.com/c/go/+/260837
    +
    + f, err := os.Open("../testdata/jpeg-bad-rst-marker-001.jpeg")
    + if err != nil {
    + t.Fatalf("Open: %v", err)
    + }
    +
    + defer f.Close()
    + if _, err = Decode(f); err != nil {
    + t.Fatalf("Decode: %v", err)
    + }
    +}
    +
    func benchmarkDecode(b *testing.B, filename string) {
    data, err := os.ReadFile(filename)
    if err != nil {
    diff --git a/src/image/jpeg/scan.go b/src/image/jpeg/scan.go
    index 94f3d3a..5192c23 100644
    --- a/src/image/jpeg/scan.go
    +++ b/src/image/jpeg/scan.go
    @@ -5,6 +5,7 @@
    package jpeg

    import (
    + "fmt"
    "image"
    )

    @@ -305,37 +306,91 @@
    } // for i
    mcu++
    if d.ri > 0 && mcu%d.ri == 0 && mcu < mxx*myy {
    - // A more sophisticated decoder could use RST[0-7] markers to resynchronize from corrupt input,
    - // but this one assumes well-formed input, and hence the restart marker follows immediately.
    - if err := d.readFull(d.tmp[:2]); err != nil {
    - return err
    - }
    + discardedBytes := 0
    + var err error

    - // Section F.1.2.3 says that "Byte alignment of markers is
    - // achieved by padding incomplete bytes with 1-bits. If padding
    - // with 1-bits creates a X’FF’ value, a zero byte is stuffed
    - // before adding the marker."
    - //
    - // Seeing "\xff\x00" here is not spec compliant, as we are not
    - // expecting an *incomplete* byte (that needed padding). Still,
    - // some real world encoders (see golang.org/issue/28717) insert
    - // it, so we accept it and re-try the 2 byte read.
    - //
    - // libjpeg issues a warning (but not an error) for this:
    - // https://github.com/LuaDist/libjpeg/blob/6c0fcb8ddee365e7abc4d332662b06900612e923/jdmarker.c#L1041-L1046
    - if d.tmp[0] == 0xff && d.tmp[1] == 0x00 {
    - if err := d.readFull(d.tmp[:2]); err != nil {
    + // Find the next restart marker. It should be the first byte, but that's
    + // not always the case. This loop scans through the bytes until the right
    + // marker is found, or at least an acceptable marker is found (leaving
    + // the rest up to the entropy decoder)
    + // Based on the logic here: https://github.com/LuaDist/libjpeg/blob/6c0fcb8ddee365e7abc4d332662b06900612e923/jdmarker.c#L1293-L1340
    + for {
    + //Try to read 1 byte
    + d.tmp[0], err = d.readByte()
    + if err != nil {
    return err
    }
    +
    + // Discard non 0xff (if we're at a marker there shouldn't be any)
    + // Keep track of this spurious data to warn the user later (if
    + // they're interested...)
    + for d.tmp[0] != 0xff {
    + discardedBytes++
    + d.tmp[0], err = d.readByte()
    + if err != nil {
    + return err
    + }
    + }
    +
    + // Now discard duplicate 0xffs (padding bytes)
    + for d.tmp[0] == 0xff {
    + d.tmp[0], err = d.readByte()
    + if err != nil {
    + return err
    + }
    + }
    +
    + // Check for stuffed zero... if yes start again
    + // Section F.1.2.3 says that "Byte alignment of markers is
    + // achieved by padding incomplete bytes with 1-bits. If padding
    + // with 1-bits creates a X’FF’ value, a zero byte is stuffed
    + // before adding the marker."
    + //
    + // Seeing "\xff\x00" here is not spec compliant, as we are not
    + // expecting an *incomplete* byte (that needed padding). Still,
    + // some real world encoders (see golang.org/issue/28717) insert
    + // it, so we accept it and re-try the 2 byte read.
    + //
    + // libjpeg issues a warning (but not an error) for this:
    + // https://github.com/LuaDist/libjpeg/blob/6c0fcb8ddee365e7abc4d332662b06900612e923/jdmarker.c#L1041-L1046
    + if d.tmp[0] == 0x00 {
    + discardedBytes += 2
    + continue
    + }
    +
    + // By here we have a marker of some sort... 0xff 0x__
    + // Warn the user if the JPEG had data in a weird place
    + if discardedBytes > 0 {
    + fmt.Println("Warning JPEG extraneous data: ", discardedBytes)
    + }
    +
    + // Best case: it's the correct Restart Marker
    + // If it's what we're looking for, increment and break out to reset Huffman
    + if d.tmp[0] == expectedRST {
    + expectedRST++
    + if expectedRST == rst7Marker+1 {
    + expectedRST = rst0Marker
    + }
    + break
    + }
    +
    + // Next case: The Restart Marker is within 2 of what we were hoping for
    + // Apparently the thing to do is keep scanning until we find another marker.
    + if d.Abs(int(d.tmp[0])-int(expectedRST)) < 3 {
    + continue
    + }
    +
    + // Next Case: It's a valid Marker, but not a restart marker
    + // In this case switch back to the entropy decoder and let it handle the marker
    + if (d.tmp[0] >= sof0Marker && d.tmp[0] < rst0Marker) || (d.tmp[0] > rst7Marker) {
    + fmt.Println("Warning JPEG found valid non-RST Marker in wrong place")
    + break
    + }
    +
    + // Finally if it's invalid (most likely < 0xC0/sof0Marker random data)
    + // go back and strip the data and keep searching for somthing valid.
    }

    - if d.tmp[0] != 0xff || d.tmp[1] != expectedRST {
    - return FormatError("bad RST marker")
    - }
    - expectedRST++
    - if expectedRST == rst7Marker+1 {
    - expectedRST = rst0Marker
    - }
    // Reset the Huffman decoder.
    d.bits = bits{}
    // Reset the DC components, as per section F.2.1.3.1.
    @@ -521,3 +576,11 @@
    }
    return nil
    }
    +
    +// Quick function to save a bunch of additional includes to cast bytes to float64
    +func (d *decoder) Abs(x int) int {
    + if x < 0 {
    + return -x
    + }
    + return x
    +}
    diff --git a/src/image/testdata/jpeg-bad-rst-marker-001.jpeg b/src/image/testdata/jpeg-bad-rst-marker-001.jpeg
    new file mode 100644
    index 0000000..a0305fb
    --- /dev/null
    +++ b/src/image/testdata/jpeg-bad-rst-marker-001.jpeg
    Binary files differ

    To view, visit change 382754. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    Gerrit-Change-Number: 382754
    Gerrit-PatchSet: 1
    Gerrit-Owner: Seamus Allan <seamus...@gmail.com>
    Gerrit-MessageType: newchange

    Dmitri Shuralyov (Gerrit)

    unread,
    Mar 5, 2022, 1:02:14 PM3/5/22
    to Seamus Allan, goph...@pubsubhelper.golang.org, Dmitri Shuralyov, Ian Lance Taylor, Brad Fitzpatrick, Nigel Tao, Rob Pike, Gopher Robot, golang-co...@googlegroups.com

    Attention is currently required from: Seamus Allan, Nigel Tao.

    Patch set 1:Run-TryBot +1Trust +1

    View Change

    4 comments:

    • Commit Message:

      • Patch Set #1, Line 11: Fixes #40130 #41875

        Please add line breaks to line 9 so it fits more comfortably in contexts where automatic wrapping isn't applied, as shown at https://go.dev/doc/contribute#commit_messages.

        Write this as two "Fixes" lines, one for each issue:

        ```
        Fixes #40130
        Fixes #41875
        ```

        Otherwise the second issue would not be recognized.

    • Patchset:

      • Patch Set #1:

        Just leaving a few comments on high-level things I've spotted in case it's helpful, but I'm not familiar with JPEG decoding to have comments on those details.

    • File src/image/jpeg/scan.go:

      • Patch Set #1, Line 364: fmt.Println("Warning JPEG extraneous data: ", discardedBytes)

        This JPEG decoder should not print to stdout if there are warnings, since there's no API for users to turn that off when it's not desirable. It should either ignore them, or return it as a fatal error, whichever is more appropriate.

    • File src/image/testdata/jpeg-bad-rst-marker-001.jpeg:

      • Patch Set #1:

        This is a 4 MB file, which is quite large. How hard would it be to try to reproduce the issue with something smaller? If this is a tricky edge case to trigger with a smaller image, then we might need to consider other approaches, or accept 4 MB.

    To view, visit change 382754. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    Gerrit-Change-Number: 382754
    Gerrit-PatchSet: 1
    Gerrit-Owner: Seamus Allan <seamus...@gmail.com>
    Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
    Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
    Gerrit-CC: Brad Fitzpatrick <brad...@golang.org>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Rob Pike <r...@golang.org>
    Gerrit-Attention: Seamus Allan <seamus...@gmail.com>
    Gerrit-Attention: Nigel Tao <nige...@golang.org>
    Gerrit-Comment-Date: Sat, 05 Mar 2022 18:02:10 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    Gerrit-MessageType: comment

    Dmitri Shuralyov (Gerrit)

    unread,
    Mar 5, 2022, 1:08:20 PM3/5/22
    to Seamus Allan, goph...@pubsubhelper.golang.org, Dmitri Shuralyov, Ian Lance Taylor, Brad Fitzpatrick, Nigel Tao, Rob Pike, Gopher Robot, golang-co...@googlegroups.com

    Attention is currently required from: Seamus Allan, Nigel Tao.

    View Change

    1 comment:

    • Commit Message:

      • Please add line breaks to line 9 so it fits more comfortably in contexts where automatic wrapping is […]

        Oh, #41875 is a PR, not an issue. In that case there's no need to include it in the Fixes lines. You can add a sentence to the commit message body like "This replaces the fix in CL 260837." instead.

    To view, visit change 382754. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    Gerrit-Change-Number: 382754
    Gerrit-PatchSet: 1
    Gerrit-Owner: Seamus Allan <seamus...@gmail.com>
    Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
    Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
    Gerrit-CC: Brad Fitzpatrick <brad...@golang.org>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Rob Pike <r...@golang.org>
    Gerrit-Attention: Seamus Allan <seamus...@gmail.com>
    Gerrit-Attention: Nigel Tao <nige...@golang.org>
    Gerrit-Comment-Date: Sat, 05 Mar 2022 18:08:16 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Dmitri Shuralyov <dmit...@golang.org>
    Gerrit-MessageType: comment

    Nigel Tao (Gerrit)

    unread,
    Mar 18, 2024, 8:01:46 AM3/18/24
    to Seamus Allan, goph...@pubsubhelper.golang.org, Gopher Robot, Dmitri Shuralyov, Ian Lance Taylor, Brad Fitzpatrick, Rob Pike, golang-co...@googlegroups.com
    Attention needed from Seamus Allan

    Nigel Tao added 2 comments

    File src/image/jpeg/scan.go
    Line 379, Patchset 1 (Latest): if d.Abs(int(d.tmp[0])-int(expectedRST)) < 3 {
    Nigel Tao . unresolved

    Abs isn't the right calculation. For example, RST0 and RST7 marker values are 0xD0 and 0xD7 and are "within 2" of each other, conceptually, but the absolute value of the difference is 7, which fails this "less than 3" test.

    Line 580, Patchset 1 (Latest):// Quick function to save a bunch of additional includes to cast bytes to float64
    Nigel Tao . unresolved

    Per https://go.dev/wiki/CodeReviewComments#comment-sentences

    Comments should begin with the name of the thing being described and end in a period:

    Thus:

    // Abs is a quick function to etc.

    But, also, you don't need the `d *decoder` receiver. And you don't need to capitalize `Abs`, as there's no need to export it outside of this package.

    But, also also, it's not the right calculation: see above.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Seamus Allan
    Submit Requirements:
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedLegacy-TryBots-Pass
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    • requirement is not satisfiedTryBots-Pass
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: Ibbbdb7f4c2217c46dc2789d5799fd0226babdc08
    Gerrit-Change-Number: 382754
    Gerrit-PatchSet: 1
    Gerrit-Owner: Seamus Allan <seamus...@gmail.com>
    Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
    Gerrit-Reviewer: Gopher Robot <go...@golang.org>
    Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
    Gerrit-CC: Brad Fitzpatrick <brad...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Rob Pike <r...@golang.org>
    Gerrit-Attention: Seamus Allan <seamus...@gmail.com>
    Gerrit-Comment-Date: Mon, 18 Mar 2024 12:01:38 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    unsatisfied_requirement
    open
    diffy
    Reply all
    Reply to author
    Forward
    0 new messages