在 2026/3/21 08:31, David Sterba 写道:
> On Thu, Mar 19, 2026 at 07:26:34PM +1030, Qu Wenruo wrote:
>>
>>
>> 在 2026/3/19 16:04, Daniel J Blueman 写道:
>>> When booting Linux 7.0-rc4 on a Qualcomm Snapdragon X1 with KASAN
>>> software tagging with a BTRFS filesystem, we see:
>>>
>>> BUG: KASAN: invalid-access in xxh64_update (lib/xxhash.c:143 lib/xxhash.c:283)
>>> Read of size 8 at addr 7bff000804fe1000 by task kworker/u49:2/138
>>> Pointer tag: [7b], memory tag: [b2]
>>>
>>> CPU: 0 UID: 0 PID: 138 Comm: kworker/u49:2 Not tainted 7.0.0-rc4+ #34 PREEMPTLAZY
>>> Hardware name: LENOVO 83ED/LNVNB161216, BIOS NHCN60WW 09/11/2025
>>> Workqueue: btrfs-endio-meta simple_end_io_work
>>> Call trace:
>>> show_stack (arch/arm64/kernel/stacktrace.c:501) (C)
>>> dump_stack_lvl (lib/dump_stack.c:122)
>>> print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
>>> kasan_report (mm/kasan/report.c:597)
>>> kasan_check_range (mm/kasan/sw_tags.c:86 (discriminator 1))
>>> __hwasan_loadN_noabort (mm/kasan/sw_tags.c:158)
>>> xxh64_update (lib/xxhash.c:143 lib/xxhash.c:283)
>>> btrfs_csum_update (fs/btrfs/fs.c:106)
>>> csum_tree_block (fs/btrfs/disk-io.c:103 (discriminator 3))
>>> btrfs_validate_extent_buffer (fs/btrfs/disk-io.c:389)
>>> end_bbio_meta_read (fs/btrfs/extent_io.c:3853 (discriminator 1))
>>> btrfs_bio_end_io (fs/btrfs/bio.c:152)
>>> simple_end_io_work (fs/btrfs/bio.c:388)
>>> process_one_work (./arch/arm64/include/asm/jump_label.h:36 ./include/trace/events/workqueue.h:110 kernel/workqueue.c:3281)
>>> worker_thread (kernel/workqueue.c:3353 (discriminator 2) kernel/workqueue.c:3440 (discriminator 2))
>>> kthread (kernel/kthread.c:436)
>>> ret_from_fork (arch/arm64/kernel/entry.S:861)
>>>
>>> The buggy address belongs to the physical page:
>>> page: refcount:3 mapcount:0 mapping:f1ff00080055dee8 index:0x2467bd pfn:0x884fe1
>>> memcg:51ff000800e68ec0 aops:btree_aops ino:1
>>> flags: 0x9340000000004000(private|zone=2|kasantag=0x4d)
>>> raw: 9340000000004000 0000000000000000 dead000000000122 f1ff00080055dee8
>>> raw: 00000000002467bd 43ff00081d0cc6f0 00000003ffffffff 51ff000800e68ec0
>>> page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>> ffff000804fe0e00: 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b
>>> ffff000804fe0f00: 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b 7b
>>>> ffff000804fe1000: b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2
>>> ^
>>> ffff000804fe1100: b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2
>>> ffff000804fe1200: b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2 b2
>>>
>>> This occurs as contiguous pages may have different KASAN tags in the upper address
>>> bits, leading to a tag mismatch if linear addressing is used.
>>>
>>> Fix this by treating them as discontiguous.
>>>
>>> Signed-off-by: Daniel J Blueman <
dan...@quora.org>
>>> Fixes: 397239ed6a6c ("btrfs: allow extent buffer helpers to skip cross-page handling")
>>>
>>> ---
>>> fs/btrfs/extent_io.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 5f97a3d2a8d7..e2b241fb6c0e 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -3517,8 +3517,16 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>> * At this stage, either we allocated a large folio, thus @i
>>> * would only be 0, or we fall back to per-page allocation.
>>> */
>>> - if (i && folio_page(eb->folios[i - 1], 0) + 1 != folio_page(folio, 0))
>>> - page_contig = false;
>>> + if (i > 0) {
>>> + struct page *prev = folio_page(eb->folios[i - 1], 0);
>>> + struct page *curr = folio_page(folio, 0);
>>> +
>>> + /*
>>> + * Contiguous pages may have different tags; can't be treated as contiguous
>>> + */
>>> + if (curr != prev + 1 || page_kasan_tag(curr) != page_kasan_tag(prev))
>>> + page_contig = false;
>>
>> I am not a fan of this solution.
>>
>> Although it doesn't affect end users who don't have KASAN soft tag
>> enabled, I don't get what we can really get from the different tags.
>>
>> I mean all those pages are already contig in physical addresses, why we
>> can not access the range in one go?
>>
>> Maybe it will be better to set all pages with the same random tag if
>> page_contig is true?
>
> I don't know if there's an interface how to change the tags but adding
> one condition that enables a sanitizer to work on some platform does not
> sound like a terrible thing. The contiguous pages on our side is an
> optimization so it's a special case, I'd rather adapt to the sanitizers
> than to let people ignore a warning or have to read a warning that that
> one is harmless.
There is the interface, page_kasan_tag_set()/page_kasan_tag_reset(), and
is already utilized inside MM.
And the deeper problem is, if this is a false alert, shouldn't we fix
the sanitizer?
Especially in this case I didn't see any problem accessing properly
allocated and physically adjacent pages.
If this is really a problem, I think a lot of bio accesses are also
going to cause problems, as one bvec can have multiple physically
adjacent pages, and if they have different tags then drivers copying a
large bvec should lead to the same tag difference.
So to the reporter/KASAN people, what's the problem of accessing
different tags in the first place?
Thanks,
Qu