[PATCH] kunit: tool: continue past invalid utf-8 output

Daniel Latypov

unread,

Oct 8, 2021, 5:08:02 PM10/8/21

to brendan...@google.com, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org, Daniel Latypov

kunit.py currently crashes and fails to parse kernel output if it's not
fully valid utf-8.

This can come from memory corruption or or just inadvertently printing
out binary data as strings.

E.g. adding this line into a kunit test
pr_info("\x80")
will cause this exception
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte

We can tell Python how to handle errors, see
https://docs.python.org/3/library/codecs.html#error-handlers

Unfortunately, it doesn't seem like there's a way to specify this in
just one location, so we need to repeat ourselves quite a bit.

Specify `errors='backslashreplace'` so we instead:
* print out the offending byte as '\x80'
* try and continue parsing the output.
* as long as the TAP lines themselves are valid, we're fine.

Signed-off-by: Daniel Latypov <dlat...@google.com>
---
tools/testing/kunit/kunit.py | 3 ++-
tools/testing/kunit/kunit_kernel.py | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 9c9ed4071e9e..28ae096d4b53 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -457,9 +457,10 @@ def main(argv, linux=None):
sys.exit(1)
elif cli_args.subcommand == 'parse':
if cli_args.file == None:
+ sys.stdin.reconfigure(errors='backslashreplace')
kunit_output = sys.stdin
else:
- with open(cli_args.file, 'r') as f:
+ with open(cli_args.file, 'r', errors='backslashreplace') as f:
kunit_output = f.read().splitlines()
request = KunitParseRequest(cli_args.raw_output,
None,
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index faa6320e900e..f08c6c36a947 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
- text=True, shell=True)
+ text=True, shell=True, errors='backslashreplace')

class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
"""An abstraction over command line operations performed on a source tree."""
@@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
- text=True)
+ text=True, errors='backslashreplace')

def get_kconfig_path(build_dir) -> str:
return get_file_path(build_dir, KCONFIG_PATH)

base-commit: a032094fc1ed17070df01de4a7883da7bb8d5741
--
2.33.0.882.g93a45727a2-goog

Brendan Higgins

unread,

Oct 8, 2021, 5:15:38 PM10/8/21

to Daniel Latypov, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org

On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov <dlat...@google.com> wrote:
>
> kunit.py currently crashes and fails to parse kernel output if it's not
> fully valid utf-8.
>
> This can come from memory corruption or or just inadvertently printing
> out binary data as strings.
>
> E.g. adding this line into a kunit test
> pr_info("\x80")
> will cause this exception
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte
>
> We can tell Python how to handle errors, see
> https://docs.python.org/3/library/codecs.html#error-handlers
>
> Unfortunately, it doesn't seem like there's a way to specify this in
> just one location, so we need to repeat ourselves quite a bit.
>
> Specify `errors='backslashreplace'` so we instead:
> * print out the offending byte as '\x80'
> * try and continue parsing the output.
> * as long as the TAP lines themselves are valid, we're fine.
>
> Signed-off-by: Daniel Latypov <dlat...@google.com>

Thanks for fixing this!

Reviewed-by: Brendan Higgins <brendan...@google.com>

Daniel Latypov

unread,

Oct 8, 2021, 7:51:41 PM10/8/21

to brendan...@google.com, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org

On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov <dlat...@google.com> wrote:
>

Ugh, pytype doesn't like this even though it's valid.
I can squash the error with
sys.stdin.reconfigure(errors='backslashreplace') # pytype:
disable=attribute-error

I had wanted us to avoid having anything specific to pytype in the code.
But mypy (the more common typechecker iirc) hasn't been smart enough
to typecheck our code since the QEMU support landed.

If we don't add this directive, both typecheckers will report at least
one spurious warning.
Should I go ahead and add it, Brendan/David?

Daniel Latypov

unread,

Oct 13, 2021, 12:52:09 PM10/13/21

to brendan...@google.com, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org

Friendly ping.
Should we go ahead and add "# pytype: disable=attribute-error" here?

Daniel Latypov

unread,

Oct 20, 2021, 7:21:26 PM10/20/21

to brendan...@google.com, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org, Daniel Latypov

kunit.py currently crashes and fails to parse kernel output if it's not
fully valid utf-8.

This can come from memory corruption or or just inadvertently printing
out binary data as strings.

E.g. adding this line into a kunit test
pr_info("\x80")
will cause this exception
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte

We can tell Python how to handle errors, see
https://docs.python.org/3/library/codecs.html#error-handlers

Unfortunately, it doesn't seem like there's a way to specify this in
just one location, so we need to repeat ourselves quite a bit.

Specify `errors='backslashreplace'` so we instead:
* print out the offending byte as '\x80'
* try and continue parsing the output.
* as long as the TAP lines themselves are valid, we're fine.

Signed-off-by: Daniel Latypov <dlat...@google.com>

Reviewed-by: Brendan Higgins <brendan...@google.com>
---
v1 -> v2: add comment to silence erroneous pytype error

---
tools/testing/kunit/kunit.py | 3 ++-
tools/testing/kunit/kunit_kernel.py | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py

index e1dd3180f0d1..68e6f461c758 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -477,9 +477,10 @@ def main(argv, linux=None):

sys.exit(1)
elif cli_args.subcommand == 'parse':
if cli_args.file == None:

+ sys.stdin.reconfigure(errors='backslashreplace') # pytype: disable=attribute-error

base-commit: 63b136c634a2bdffd78795bc33ac2d488152ffe8
--
2.33.0.1079.g6e70778dc9-goog

Daniel Latypov

unread,

Oct 20, 2021, 7:22:26 PM10/20/21

to brendan...@google.com, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org

I've sent out a v2 with this:
https://lore.kernel.org/linux-kselftest/20211020232121.1...@google.com

David Gow

unread,

Oct 20, 2021, 10:32:44 PM10/20/21

to Daniel Latypov, Brendan Higgins, Linux Kernel Mailing List, KUnit Development, open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan

On Thu, Oct 21, 2021 at 7:21 AM 'Daniel Latypov' via KUnit Development
<kuni...@googlegroups.com> wrote:
>
> kunit.py currently crashes and fails to parse kernel output if it's not
> fully valid utf-8.
>
> This can come from memory corruption or or just inadvertently printing
> out binary data as strings.
>
> E.g. adding this line into a kunit test
> pr_info("\x80")
> will cause this exception
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte
>
> We can tell Python how to handle errors, see
> https://docs.python.org/3/library/codecs.html#error-handlers
>
> Unfortunately, it doesn't seem like there's a way to specify this in
> just one location, so we need to repeat ourselves quite a bit.
>
> Specify `errors='backslashreplace'` so we instead:
> * print out the offending byte as '\x80'
> * try and continue parsing the output.
> * as long as the TAP lines themselves are valid, we're fine.
>
> Signed-off-by: Daniel Latypov <dlat...@google.com>
> Reviewed-by: Brendan Higgins <brendan...@google.com>
> ---
> v1 -> v2: add comment to silence erroneous pytype error
> ---

Thanks. I've tested this, and it works well for me. I don't mind the
pytype comment, even though I don't use pytype, so I'm glad it's
there.

Tested-by: David Gow <davi...@google.com>

Cheers,
-- David

> --
> You received this message because you are subscribed to the Google Groups "KUnit Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kunit-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kunit-dev/20211020232121.1748376-1-dlatypov%40google.com.

Brendan Higgins

unread,

Oct 25, 2021, 5:29:20 PM10/25/21

to Daniel Latypov, davi...@google.com, linux-...@vger.kernel.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, sk...@linuxfoundation.org

Sorry, missed this.

Yeah, I am fine with disabling the type checkers if they fail to
understand valid code.

Reply all

Reply to author

Forward