Hello,
This is a follow-up to a discussion that took place in the Kernel Sanitizers
Office Hours (IIRC) at this year's Plumbers Event in Vienna.
I had asked about how KCSAN could detect races due to DMA[1] and Arnd
suggested that we could use KASAN to detect the CPU accessing buffers that
it doesn't have ownership of. I mentioned having implemented[2] this exact scheme
in the barebox bootloader's KASAN support and promised to type up an email
about it to help getting a similar functionality into the kernel, but first
some context:
The streaming DMA API is used to annotate ownership transfer of buffers in
memory shared between the kernel and a DMA-capable device.
The relevant kernel documentation is:
https://www.kernel.org/doc/Documentation/DMA-API.txt
https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt
But I'll give a quick recap. There are four key operations:
- dma_map_single() moves a range of memory from CPU to device ownership
- dma_sync_single_for_cpu() can be called on all or a subset of the range
mapped by dma_map_single() to move ownership back to the CPU
- dma_sync_single_for_device() moves back all oo a subset of the range
mapped by dma_map_single() to the ownership of the device
- dma_unmap_single() gives back ownership of a range of memory to the CPU
It's a bug for the CPU or the device to access a streaming DMA mapping while
it's owned by the other side. On many systems, that bug will manifest itself
as memory corruption due to loss of cache coherence.
To make it easier to spot some misuses of the API, the kernel has a
CONFIG_DMA_API_DEBUG feature, which will run sanity checks when using the API.
It can't however detect if a memory access happens to a buffer while it's
owned by other side, which is where KASAN can come in by having CONFIG_DMA_API_DEBUG
record ownership information into the KASAN shadow memory.
That way accessing a device mapped buffer before sync'ing it to the CPU is
detected like KASAN would detect a use-after-free. When the ownership is moved
back to the CPU, the memory is unpoisoned and such an access would be allowed
again.
I had implemented this scheme[3] in the barebox bootloader and it works ok:
BUG: KASAN: dma-mapped-to-device in eqos_send+0xdc/0x1a8
Read of size 4 at addr 0000000040419f00
Call trace:
[<7fbd4980>] (unwind_backtrace+0x0/0xb0) from [<7fbd4a40>] (dump_stack+0x10/0x18)
[<7fbd4a40>] (dump_stack+0x10/0x18) from [<7fba2360>] (kasan_report+0x11c/0x290)
[<7fba2360>] (kasan_report+0x11c/0x290) from [<7fba1f44>] (__asan_load4+0x54/0xb8)
[<7fba1f44>] (__asan_load4+0x54/0xb8) from [<7fb2e52c>] (eqos_send+0xdc/0x1a8)
[<7fb2e52c>] (eqos_send+0xdc/0x1a8) from [<7fbb6544>] (eth_send+0x154/0x16c)
[<7fbb6544>] (eth_send+0x154/0x16c) from [<7fbb7114>] (net_ip_send+0xe8/0xf8)
[<7fbb7114>] (net_ip_send+0xe8/0xf8) from [<7fbb7d10>] (net_udp_send+0x68/0x78)
The aforementioned barebox functionality goes a step further and also used
the shadow memory information to detect repeated syncs without an ownership
change. While this is not a bug, my impression is that this is unnecessary
overhead and a diagnostic could help correct a developer's misunderstanding
of the API.
I hope to kick off a discussion about this with my mail here and perhaps even
motivate someone else to port it over or reimplement it. :D
[1]: when CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN is enabled
[2]:
https://lore.barebox.org/barebox/20240910114832.2...@pengutronix.de/
[3]:
https://github.com/barebox/barebox/blob/master/drivers/dma/debug.c
Cheers,
Ahmad
--
Pengutronix e.K. | |
Steuerwalder Str. 21 |
http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax:
+49-5121-206917-5555 |