stg import regressions

11 views
Skip to first unread message

Peter Grayson

unread,
Jan 24, 2018, 11:27:24 AM1/24/18
to Catalin Marinas, st...@googlegroups.com
Hi Catalin,

I want to address the stgit issue that found its way to the LKML
yesterday. For reference:

https://lkml.org/lkml/2018/1/23/832

The main question I have is whether this regression still exists since
PR 10 was merged a few days ago?

https://github.com/ctmarinas/stgit/pull/10

I am not familiar enough with the workflow that led to this issue to
make a competent attempt at reproducing the problem. Since you are more
familiar with the email-based workflows used for Linux kernel
development, I was hoping you might be able to comment on whether you
believe this still exists, and if so, perhaps help guide me to a
reproducible case.

I am committed to getting this issue resolved.

For reference, this is the pull request email that, when imported with
`stg import -m` led to the problem:

https://www.spinics.net/lists/linux-pci/msg68379.html

And this git commit shows the problem. Note that the author field has
[ugly] encoded words (as described in RFC-2047).

Thanks in advance for any help you can offer.

Pete

Peter Grayson

unread,
Jan 24, 2018, 2:26:28 PM1/24/18
to Catalin Marinas, st...@googlegroups.com
Update:

I now have the relevant mbox file and can reproduce the issue.

Linus helpfully identified two nuances in the offending mbox file.

First, is that the encoded name in the From header is quoted.

From: "=?UTF-8?q?Christian=20K=C3=B6nig?=" <ckoenig.lei...@gmail.com>

Second, although the Content-Type header indicates that the body is
UTF-8, there is a stray latin-1 "ö" character in the body.

There are different failing behaviors in the v0.18 release vs. git
master.

v0.18 imports successfully if the quotes are removed from the From
header. The stray latin-1 character is apparently not a problem.

git master imports if the stray latin-1 character is removed from the
body. However, there is a python2 vs python3 difference w.r.t. whether
the imported patch's author retains the encoded words:

- python3: quotes are okay, author ends up correct
- python2: quotes not okay, author ends up incorrect

Author: =?UTF-8?q?Christian=20K=C3=B6nig?= <ckoenig.lei...@gmail.com>

So it appears that python2's email library treats encoded words inside
quotes literally whereas python3's email library unwraps the quotes. I
have not yet dug into the RFC's to figure out which behavior is correct,
but, in this case at least, it seems that the python3 behavior is more
desirable.

W.r.t. the stray latin-1 character, I believe that v0.18 works because
it uses the python2 email library in such a manner that email bodies are
treated as a stream of bytes; i.e. never encoded or decoded, whereas on
git master we now attempt to decode email bodies and are thus exposed to
mis-encoded emails.

I believe the goal for stgit is to import this mbox file as-is, even if
the mbox file is incorrect in the ways noted. I will be attempting to
modify stgit to that end.

Thanks,
Pete
Reply all
Reply to author
Forward
0 new messages