Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] ACPI: Implement overriding of arbitrary ACPI tables via initrd

315 views
Skip to first unread message

Thomas Renninger

unread,
Mar 23, 2012, 10:40:02 AM3/23/12
to
Details can be found in:
Documentation/acpi/initrd_table_override.txt

Additional dmesg output of a booted system with
FACP (FADT), DSDT and SSDT (the 9th dynamically loaded one)
tables overridden (with ### marked comments):

### ACPI tables found glued to initrd
DSDT ACPI table found in initrd - size: 16234
FACP ACPI table found in initrd - size: 116
SSDT ACPI table found in initrd - size: 334
### Re-printed e820 map via e820_update() with additionally created
### ACPI data section at 0xcff55000 where the ACPI tables passed via
### initrd where copied to
modified physical RAM map:
...
### New ACPI data section:
modified: 00000000cff55000 - 00000000cff5912c (ACPI data)
### BIOS e820 provided ACPI data section:
modified: 00000000cff60000 - 00000000cff69000 (ACPI data)
...
### Total size of all ACPI tables glued to initrd
### The address is initrd_start which gets updated to
### initrd_start = initrd_start + "size of all ACPI tables glued to initrd"
Found acpi tables of size: 16684 at 0xffff8800374c4000

Disabling lock debugging due to kernel taint
### initrd provided FACP and DSDT tables are used instead of BIOS provided ones
ACPI: FACP @ 0x00000000cff68dd8 Phys table override, replaced with:
ACPI: FACP 00000000cff58f6a 00074 (v01 INTEL TUMWATER 06040000 PTL 00000003)
ACPI: DSDT @ 0x00000000cff649d4 Phys table override, replaced with:
ACPI: DSDT 00000000cff55000 04404 (v01 Intel BLAKFORD 06040000 MSFT 0100000E)
...
### Much later, the 9th (/sys/firmware/acpi/table/dynamic/SSDT9) dynamically
### loaded ACPI table matches and gets overridden:
ACPI: SSDT @ 0x00000000cff64824 Phys table override, replaced with:
ACPI: SSDT 00000000cff58fde 0014E (v01 PmRef Cpu7Ist 00003000 INTL 20110316)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT (null) 0014E (v01 PmRef Cpu7Ist 00003000 INTL 20110316)
...

If the initrd does not start with a valid ACPI table signature or the ACPI
table's checksum is wrong, there is no functional change.

Signed-off-by: Thomas Renninger <tr...@suse.de>
CC: linux...@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: x...@kernel.org
CC: Lin Ming <ming....@intel.com>
CC: le...@kernel.org
CC: robert...@intel.com
CC: h...@zytor.com
---
Documentation/acpi/initrd_table_override.txt | 110 +++++++++++++++++++
arch/x86/kernel/setup.c | 18 ++-
arch/x86/mm/init.c | 6 +
drivers/acpi/Kconfig | 10 ++
drivers/acpi/osl.c | 152 +++++++++++++++++++++++++-
include/linux/acpi.h | 4 +
include/linux/initrd.h | 3 +
7 files changed, 294 insertions(+), 9 deletions(-)
create mode 100644 Documentation/acpi/initrd_table_override.txt

diff --git a/Documentation/acpi/initrd_table_override.txt b/Documentation/acpi/initrd_table_override.txt
new file mode 100644
index 0000000..7b29d5f
--- /dev/null
+++ b/Documentation/acpi/initrd_table_override.txt
@@ -0,0 +1,110 @@
+Overriding ACPI tables via initrd
+=================================
+
+1) Introduction (What is this about)
+2) What is this for
+3) How does it work
+4) References (Where to retrieve userspace tools)
+
+1) What is this about
+---------------------
+
+If ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible to
+override nearly any ACPI table provided by the BIOS with an instrumented,
+modified one.
+
+Up to 10 arbitrary ACPI tables can be passed.
+For a full list of ACPI tables that can be overridden, take a look at
+the char *table_sigs[MAX_ACPI_SIGNATURE]; definition in drivers/acpi/osl.c
+All ACPI tables iasl (Intel's ACPI compiler and disassembler) knows should
+be overridable, except:
+ - ACPI_SIG_RSDP (has a signature of 6 bytes)
+ - ACPI_SIG_FACS (does not have an ordinary ACPI table header)
+Both could get implemented as well.
+
+
+2) What is this for
+-------------------
+
+Please keep in mind that this is a debug option.
+ACPI tables should not get overridden for productive use.
+If BIOS ACPI tables are overridden the kernel will get tainted with the
+TAINT_OVERRIDDEN_ACPI_TABLE flag.
+Complain to your platform/BIOS vendor if you find a bug which is that sever
+that a workaround is not accepted in the Linus kernel.
+
+Still, it can and should be enabled in any kernel, because:
+ - There is no functional change with not instrumented initrds
+ - It provides a powerful feature to easily debug and test ACPI BIOS table
+ compatibility with the Linux kernel.
+
+Until now it was only possible to override the DSDT by compiling it into
+the kernel. This is a nightmare when trying to work on ACPI related bugs
+and a lot bugs got stuck because of that.
+Even for people with enough kernel knowledge, building a kernel to try out
+things is very time consuming. Also people may have to browse and modify the
+ACPI interpreter code to find a possible BIOS bug. With this feature, people
+can correct the ACPI tables and try out quickly whether this is the root cause
+that needs to get addressed in the kernel.
+
+This could even ease up testing for BIOS providers who could flush their BIOS
+to test, but overriding table via initrd is much easier and quicker.
+For example one could prepare different initrds overriding NUMA tables with
+different affinity settings. Set up a script, let the machine reboot and
+run tests over night and one can get a picture how these settings influence
+the Linux kernel and which values are best.
+
+People can instrument the dynamic ACPI (ASL) code (for example with debug
+statements showing up in syslog when the ACPI code is processed, etc.),
+to better understand BIOS to OS interfaces, to hunt down ACPI BIOS code related
+bugs quickly or to easier develop ACPI based drivers.
+
+Intstrumenting ACPI code in SSDTs is now much easier. Before, one had to copy
+all SSDTs into the DSDT to compile it into the kernel for testing
+(because only DSDT could get overridden). That's what the acpi_no_auto_ssdt
+boot param is for: the BIOS provided SSDTs are ignored and all have to get
+copied into the DSDT, complicated and time consuming.
+
+Much more use cases, depending on which ACPI parts you are working on...
+
+
+3) How does it work
+-------------------
+
+# Extract the machine's ACPI tables:
+acpidump >acpidump
+acpixtract -a acpidump
+# Disassemble, modify and recompile them:
+iasl -d *.dat
+# For example add this statement into a _PRT (PCI Routing Table) function
+# of the DSDT:
+Store("Hello World", debug)
+iasl -sa *.dsl
+# glue them together with the initrd. ACPI tables go first, original initrd
+# goes on top:
+cat TBL1.dat >>instrumented_initrd
+cat TBL2.dat >>instrumented_initrd
+cat TBL3.dat >>instrumented_initrd
+cat /boot/initrd >>instrumented_initrd
+# reboot with increased acpi debug level, e.g. boot params:
+acpi.debug_level=0x2 acpi.debug_layer=0xFFFFFFFF
+# and check your syslog:
+[ 1.268089] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
+[ 1.272091] [ACPI Debug] String [0x0B] "HELLO WORLD"
+
+iasl is able to disassemble and recompile quite a lot different,
+also static ACPI tables.
+
+4) Where to retrieve userspace tools
+------------------------------------
+
+iasl and acpixtract are part of Intel's ACPICA project:
+http://acpica.org/
+and should be packaged by distributions (for example in the acpica package
+on SUSE).
+
+acpidump can be found in Len Browns pmtools:
+ftp://kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools/acpidump
+This tool is also part of the acpica package on SUSE.
+Alternatively used ACPI tables can be retrieved via sysfs in latest kernels:
+/sys/firmware/acpi/tables
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7d5099..5a5a33f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -412,12 +412,20 @@ static void __init reserve_initrd(void)
*/
initrd_start = ramdisk_image + PAGE_OFFSET;
initrd_end = initrd_start + ramdisk_size;
- return;
+ } else {
+ relocate_initrd();
+ memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
}
-
- relocate_initrd();
-
- memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+ acpi_initrd_offset = acpi_initrd_table_override((void *)initrd_start,
+ (void *)initrd_end);
+ if (!acpi_initrd_offset)
+ return;
+ printk(KERN_INFO "Found acpi tables of size: %lu at 0x%lx\n",
+ acpi_initrd_offset, initrd_start);
+ initrd_start += acpi_initrd_offset;
+ return;
+#endif
}
#else
static void __init reserve_initrd(void)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 6cabf65..8df3fda 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -391,6 +391,12 @@ void free_initrd_mem(unsigned long start, unsigned long end)
* - relocate_initrd()
* So here We can do PAGE_ALIGN() safely to get partial page to be freed
*/
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+ if (acpi_initrd_offset)
+ free_init_pages("initrd memory", start - acpi_initrd_offset,
+ PAGE_ALIGN(end));
+ else
+#endif
free_init_pages("initrd memory", start, PAGE_ALIGN(end));
}
#endif
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 7556913..7113327 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -261,6 +261,16 @@ config ACPI_CUSTOM_DSDT
bool
default ACPI_CUSTOM_DSDT_FILE != ""

+config ACPI_INITRD_TABLE_OVERRIDE
+ bool
+ depends on X86
+ default y
+ help
+ This option provides functionality to override arbitrary ACPI tables
+ via initrd. No functional change if no ACPI tables are glued to the
+ initrd, therefore it's safe to say Y.
+ See Documentation/acpi/initrd_table_override.txt for details
+
config ACPI_BLACKLIST_YEAR
int "Disable ACPI for systems before Jan 1st this year" if X86_32
default 0
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index ad92131..a78b6f6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -45,6 +45,7 @@
#include <linux/list.h>
#include <linux/jiffies.h>
#include <linux/semaphore.h>
+#include <linux/memblock.h>

#include <asm/io.h>
#include <asm/uaccess.h>
@@ -534,6 +535,107 @@ acpi_os_predefined_override(const struct acpi_predefined_names *init_val,
return AE_OK;
}

+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+#include <asm/e820.h>
+
+#define ACPI_OVERRIDE_TABLES 10
+
+static unsigned long acpi_table_override_offset[ACPI_OVERRIDE_TABLES];
+static u64 acpi_tables_inram;
+
+unsigned long acpi_initrd_offset;
+
+/* Copied from acpica/tbutils.c:acpi_tb_checksum() */
+u8 __init acpi_table_checksum(u8 *buffer, u32 length)
+{
+ u8 sum = 0;
+ u8 *end = buffer + length;
+
+ while (buffer < end)
+ sum = (u8) (sum + *(buffer++));
+ return sum;
+}
+
+/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+#define MAX_ACPI_SIGNATURE 35
+static const char *table_sigs[MAX_ACPI_SIGNATURE] = {
+ ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+ ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+ ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+ ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+ ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+ ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+ ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+ ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+ ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT };
+
+int __init acpi_initrd_table_override(void *start_addr, void *end_addr)
+{
+ int table_nr, sig;
+ unsigned long offset = 0, max_len = end_addr - start_addr;
+ char *p;
+
+ for (table_nr = 0; table_nr < ACPI_OVERRIDE_TABLES; table_nr++) {
+ struct acpi_table_header *table;
+ if (max_len < offset + sizeof(struct acpi_table_header)) {
+ WARN_ON(1);
+ return 0;
+ }
+ table = start_addr + offset;
+
+ for (sig = 0; sig < MAX_ACPI_SIGNATURE; sig++)
+ if (!memcmp(table->signature, table_sigs[sig], 4))
+ break;
+
+ if (sig >= MAX_ACPI_SIGNATURE)
+ break;
+
+ if (max_len < offset + table->length) {
+ WARN_ON(1);
+ return 0;
+ }
+
+ if (acpi_table_checksum(start_addr + offset, table->length)) {
+ WARN(1, "%4.4s has invalid checksum\n",
+ table->signature);
+ continue;
+ }
+ printk(KERN_INFO "%4.4s ACPI table found in initrd"
+ " - size: %d\n", table->signature, table->length);
+
+ offset += table->length;
+ acpi_table_override_offset[table_nr] = offset;
+ }
+ if (!offset)
+ return 0;
+
+ acpi_tables_inram =
+ memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
+ offset, PAGE_SIZE);
+ if (!acpi_tables_inram)
+ panic("Cannot find place for ACPI override tables\n");
+
+ /*
+ * Only calling e820_add_reserve does not work and the
+ * tables are invalid (memory got used) later.
+ * memblock_x86_reserve_range works as expected and the tables
+ * won't get modified. But it's not enough because ioremap will
+ * complain later (used by acpi_os_map_memory) that the pages
+ * that should get mapped are not marked "reserved".
+ * Both memblock_x86_reserve_range and e820_add_region works fine.
+ */
+ memblock_reserve(acpi_tables_inram, acpi_tables_inram + offset);
+ e820_add_region(acpi_tables_inram, offset, E820_ACPI);
+ update_e820();
+
+ p = early_ioremap(acpi_tables_inram, offset);
+ memcpy(p, start_addr, offset);
+ early_iounmap(p, offset);
+ return offset;
+}
+
+#endif
+
acpi_status
acpi_os_table_override(struct acpi_table_header * existing_table,
struct acpi_table_header ** new_table)
@@ -559,12 +661,54 @@ acpi_os_table_override(struct acpi_table_header * existing_table,

acpi_status
acpi_os_physical_table_override(struct acpi_table_header *existing_table,
- acpi_physical_address * new_address,
- u32 *new_table_length)
+ acpi_physical_address *address,
+ u32 *table_length)
{
- return AE_SUPPORT;
-}
+#ifndef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+ *table_length = 0;
+ *address = 0;
+ return AE_OK;
+#endif
+
+ int table_nr = 0;
+ *table_length = 0;
+ *address = 0;
+ for (; table_nr < ACPI_OVERRIDE_TABLES &&
+ acpi_table_override_offset[table_nr]; table_nr++) {
+ int table_offset;
+ int table_len;
+ struct acpi_table_header *table;
+
+ if (table_nr == 0)
+ table_offset = 0;
+ else
+ table_offset = acpi_table_override_offset[table_nr - 1];

+ table_len = acpi_table_override_offset[table_nr] - table_offset;
+
+ table = acpi_os_map_memory(acpi_tables_inram + table_offset,
+ table_len);
+
+ if (memcmp(existing_table->signature, table->signature, 4)) {
+ acpi_os_unmap_memory(table, table_len);
+ continue;
+ }
+
+ /* Only override tables with matching oem id */
+ if (memcmp(table->oem_table_id, existing_table->oem_table_id,
+ ACPI_OEM_TABLE_ID_SIZE)) {
+ acpi_os_unmap_memory(table, table_len);
+ continue;
+ }
+
+ acpi_os_unmap_memory(table, table_len);
+ *address = acpi_tables_inram + table_offset;
+ *table_length = table_len;
+ add_taint(TAINT_OVERRIDDEN_ACPI_TABLE);
+ break;
+ }
+ return AE_OK;
+}

static irqreturn_t acpi_irq(int irq, void *dev_id)
{
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 104eda7..12bda6a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -76,6 +76,10 @@ typedef int (*acpi_table_handler) (struct acpi_table_header *table);

typedef int (*acpi_table_entry_handler) (struct acpi_subtable_header *header, const unsigned long end);

+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+int __init acpi_initrd_table_override(void *start_addr, void *end_addr);
+#endif
+
char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
void __acpi_unmap_table(char *map, unsigned long size);
int early_acpi_boot_init(void);
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 55289d2..c7694d9 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -16,5 +16,8 @@ extern int initrd_below_start_ok;
/* free_initrd_mem always gets called with the next two as arguments.. */
extern unsigned long initrd_start, initrd_end;
extern void free_initrd_mem(unsigned long, unsigned long);
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+extern unsigned long acpi_initrd_offset;
+#endif

extern unsigned int real_root_dev;
--
1.7.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Thomas Renninger

unread,
Mar 23, 2012, 12:00:01 PM3/23/12
to
Hi,

On Friday, March 23, 2012 03:29:44 PM Thomas Renninger wrote:
> Details can be found in:
> Documentation/acpi/initrd_table_override.txt
...

I should have mentioned that this work depends on ACPICA changes
which have been applied in Len's ACPI next tree only/recently.
For example:
commit f7b004a17c9183f023796dea0d70284684ec000d
Author: Bob Moore <robert...@intel.com>
Date: Tue Feb 14 18:31:56 2012 +0800

ACPICA: Add acpi_os_physical_table_override interface

Thus this patch is intended to get merged through the ACPI
tree.

The patch is (in similar form, before some additional ACPICA
enhancements by Lin and Bob) in productive 12.1 openSUSE kernel.

Thanks,

Thomas

H. Peter Anvin

unread,
Mar 23, 2012, 4:10:02 PM3/23/12
to
On 03/23/2012 07:29 AM, Thomas Renninger wrote:
> Details can be found in:
> Documentation/acpi/initrd_table_override.txt

I did not see in this any discussion about how the data format of the
initrd/initramfs gets affected. There are some other things too
(microcode updates, for example) which also would like to get initramfs
data early, and maybe we need to think about how to containerize this
properly.

-hpa

Yinghai Lu

unread,
Mar 23, 2012, 5:00:02 PM3/23/12
to
great. that is very good feature for development and debug.

will not need to ask for testbios ...
table already get copied to allocated ram, So you don't need to keep
the portion in initrd range.
please don't use max_low_pfn_mapped. that will leave another blocker
to allocate kdump block with 512M.

could use max_low_pfn, and use io_remap to access it for copying purpose.



> +       if (!acpi_tables_inram)
> +               panic("Cannot find place for ACPI override tables\n");
> +
> +       /*
> +        * Only calling e820_add_reserve does not work and the
> +        * tables are invalid (memory got used) later.
> +        * memblock_x86_reserve_range works as expected and the tables
> +        * won't get modified. But it's not enough because ioremap will
> +        * complain later (used by acpi_os_map_memory) that the pages
> +        * that should get mapped are not marked "reserved".
> +        * Both memblock_x86_reserve_range and e820_add_region works fine.
> +        */
> +       memblock_reserve(acpi_tables_inram, acpi_tables_inram + offset);
> +       e820_add_region(acpi_tables_inram, offset, E820_ACPI);
> +       update_e820();

could use __weak function to avoid use e820 related calling here in osl.c
maybe could have dmi checking for more strict checking.

also would help if have one boot command that could skip overriding.
if you have
#else
static inline int __init acpi_initrd_table_override(void *start_addr,
void *end_addr) { return 0 }
#endif

here, you can avoid one #ifdef in .c

Thanks

Yinghai

Yinghai Lu

unread,
Mar 23, 2012, 8:20:01 PM3/23/12
to
On Fri, Mar 23, 2012 at 1:54 PM, Yinghai Lu <yin...@kernel.org> wrote:
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -391,6 +391,12 @@ void free_initrd_mem(unsigned long start, unsigned long end)
>>         *   - relocate_initrd()
>>         * So here We can do PAGE_ALIGN() safely to get partial page to be freed
>>         */
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +       if (acpi_initrd_offset)
>> +               free_init_pages("initrd memory", start - acpi_initrd_offset,
>> +                               PAGE_ALIGN(end));
>> +       else
>> +#endif
>
> table already get copied to allocated ram, So you don't need to keep
> the portion in initrd range.
>
>>        free_init_pages("initrd memory", start, PAGE_ALIGN(end));
>>  }

oh, i missed it. that is offset

change following would be more clean?

+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+ /* offset back to free acpi tables */
+ if (acpi_initrd_offset)
+ start -= acpi_initrd_offset,
+#endif

Thomas Renninger

unread,
Mar 23, 2012, 9:30:01 PM3/23/12
to
On Saturday 24 March 2012 02:05:18 Yinghai Lu wrote:
> On Fri, Mar 23, 2012 at 1:54 PM, Yinghai Lu <yin...@kernel.org> wrote:
> updated it a little bit to remove 2 #ifdef. please check attached.

Plus getting rid of the e820 stuff in drivers/acpi/osl.c
Thanks, nice cleanups.

Thomas

Thomas Renninger

unread,
Mar 23, 2012, 9:30:01 PM3/23/12
to
On Friday 23 March 2012 21:54:12 Yinghai Lu wrote:
> On Fri, Mar 23, 2012 at 7:29 AM, Thomas Renninger <tr...@suse.de> wrote:
...

> great. that is very good feature for development and debug.
Thanks.
...

> > + int table_nr = 0;
> > + *table_length = 0;
> > + *address = 0;
> > + for (; table_nr < ACPI_OVERRIDE_TABLES &&
> > + acpi_table_override_offset[table_nr]; table_nr++) {
> > + int table_offset;
> > + int table_len;
> > + struct acpi_table_header *table;
> > +
> > + if (table_nr == 0)
> > + table_offset = 0;
> > + else
> > + table_offset = acpi_table_override_offset[table_nr - 1];
> >
> > + table_len = acpi_table_override_offset[table_nr] - table_offset;
> > +
> > + table = acpi_os_map_memory(acpi_tables_inram + table_offset,
> > + table_len);
> > +
>
> maybe could have dmi checking for more strict checking.
I do not understand what should get checked?
Hm, I guess you mean if a general table is always added that is distributed
on different platforms and you want to white/blacklist machines to
explicitly load/not load the table?
This must not happen.
The tables are always platform specific and you provide tables for
a specific machine only for debugging purposes.

> also would help if have one boot command that could skip overriding.
Same. Both should not be needed.

Or you have a use-case in mind I cannot not think of...

Thomas

Thomas Renninger

unread,
Mar 23, 2012, 9:50:02 PM3/23/12
to
On Friday 23 March 2012 21:05:46 H. Peter Anvin wrote:
> On 03/23/2012 07:29 AM, Thomas Renninger wrote:
> > Details can be found in:
> > Documentation/acpi/initrd_table_override.txt
>
> I did not see in this any discussion about how the data format of the
> initrd/initramfs gets affected. There are some other things too
> (microcode updates, for example)
Interesting.

> which also would like to get initramfs
> data early, and maybe we need to think about how to containerize this
> properly.
Sounds as if this would get a bigger discussion...

I won't be able to come up with a detailed suggestion for such a general
initrd format change (that's what you suggest?).
I'd be interested to be put to CC and join the discussion, though.
This patch only slightly touches x86 initrd specifics:
(with Yinhai additions even 3 lines less):
arch/x86/kernel/setup.c | 15 ++
arch/x86/mm/init.c | 4
No general initrd code is touched at all, all the rest sits in
drivers/acpi/

If there is any initrd change this could easily be adopted.
Would be great to see this one pushed into 3.4 before a possibly long
taking discussion about bigger initrd layout changes.

Thanks,

Thomas

H. Peter Anvin

unread,
Mar 23, 2012, 10:10:02 PM3/23/12
to
On 03/23/2012 06:42 PM, Thomas Renninger wrote:
>
> If there is any initrd change this could easily be adopted.
> Would be great to see this one pushed into 3.4 before a possibly long
> taking discussion about bigger initrd layout changes.
>

This should have been in linux-next before the merge window started, and
certainly "pushing it upstream before a possibly long talking discussion
about bigger initrd layout changes" is *definitely* putting the cart
before the horse ... almost nothing matters as much as avoiding
introducing a new protocol that we need to keep stable.

Sorry, it doesn't work that way.

I'm not too happy about the idea of using "naked" ACPI headers as the
sole discriminator, because they lack a strong magic. Furthermore, I
really don't want to see all the potential early-initrd users invent
different schemes for encapsulation, which has the potential of
cross-infection (early microcode update, for example, would have to be
invoked before ACPI, and needing that code to understand ACPI table
format is a nonstarter.)

I see two realistic options:

1. We use cpio encapsulation for everything, with a special namespace
for items used directly by the kernel, e.g. "kernel/".

+ Simple, existing tools can pick apart
- May lead people to believe that the early-initrd portion can be
compressed like the "normal" initrd portion, leading to strange
problems.

2. We create a new simple header (just a magic number, an identifier
for the type of data, and a length) for each of the early-initrd
objects:

struct early_initrd_header {
u64 magic; /* 0xa5a46ce422d8f5a1 */
u32 type; /* 1 = file data, 2 = ACPI, 3 = microcode... */
u32 length; /* Length of data object */
};

XXX: Should we make this a defined endianness (presumably
bigendian), or use host-endianness? I would guess the former might
be better...

Either of these allow one piece of code to quickly bypass bits that
doesn't belong to it.

Thoughts?

-hpa

Thomas Renninger

unread,
Mar 23, 2012, 11:10:01 PM3/23/12
to
On Saturday 24 March 2012 03:01:28 H. Peter Anvin wrote:
> On 03/23/2012 06:42 PM, Thomas Renninger wrote:
> >
> > If there is any initrd change this could easily be adopted.
> > Would be great to see this one pushed into 3.4 before a possibly long
> > taking discussion about bigger initrd layout changes.
> >
>
> This should have been in linux-next before the merge window started, and
> certainly "pushing it upstream before a possibly long talking discussion
> about bigger initrd layout changes" is *definitely* putting the cart
> before the horse ... almost nothing matters as much as avoiding
> introducing a new protocol that we need to keep stable.
>
> Sorry, it doesn't work that way.
I guess this would not be the first time a good feature has been added,
knowing that a more general API will be build around it later.
10 lines are really easy to adopt.

> I'm not too happy about the idea of using "naked" ACPI headers as the
> sole discriminator, because they lack a strong magic. Furthermore, I
> really don't want to see all the potential early-initrd users invent
> different schemes for encapsulation, which has the potential of
> cross-infection (early microcode update, for example, would have to be
> invoked before ACPI, and needing that code to understand ACPI table
> format is a nonstarter.)
>
> I see two realistic options:
>
> 1. We use cpio encapsulation for everything, with a special namespace
> for items used directly by the kernel, e.g. "kernel/".
>
> + Simple, existing tools can pick apart
> - May lead people to believe that the early-initrd portion can be
> compressed like the "normal" initrd portion, leading to strange
> problems.
Can this be implemented without using dynamic memory allocations?
If not, it's not suitable for early APCI overriding.

>
> 2. We create a new simple header (just a magic number, an identifier
> for the type of data, and a length) for each of the early-initrd
> objects:
>
> struct early_initrd_header {
> u64 magic; /* 0xa5a46ce422d8f5a1 */
> u32 type; /* 1 = file data, 2 = ACPI, 3 = microcode... */
> u32 length; /* Length of data object */
> };
>
> XXX: Should we make this a defined endianness (presumably
> bigendian), or use host-endianness? I would guess the former might
> be better...
>
> Either of these allow one piece of code to quickly bypass bits that
> doesn't belong to it.
>
> Thoughts?
Whatabout Multiboot(2):
http://en.wikipedia.org/wiki/Multiboot_Specification
http://www.gnu.org/software/grub/manual/multiboot/multiboot.html

They do it similar to 2, but without specifying the data of the
files like you do with type.

It's a bigger thing, but I could imagine there are some guys who would
be willing to give it a try. I remember I googled a multiboot patch
which got submitted to lkml long time ago.

It has it pros and cons, but should show up in such a discussion.

Thomas

H. Peter Anvin

unread,
Mar 24, 2012, 12:50:02 AM3/24/12
to
On 03/23/2012 08:02 PM, Thomas Renninger wrote:
> Whatabout Multiboot(2):
> http://en.wikipedia.org/wiki/Multiboot_Specification
> http://www.gnu.org/software/grub/manual/multiboot/multiboot.html
>
> They do it similar to 2, but without specifying the data of the
> files like you do with type.
>
> It's a bigger thing, but I could imagine there are some guys who would
> be willing to give it a try. I remember I googled a multiboot patch
> which got submitted to lkml long time ago.
>
> It has it pros and cons, but should show up in such a discussion.
>

Unfortunately Multiboot is defective in way too many ways, beginning
with its insistence of entering past the firmware point of no return,
which means the bootloader has to do a lot of work (this has been proven
to be a nonworkable solution) and no support for relocation.

So that is not an option.

-hpa

H. Peter Anvin

unread,
Mar 24, 2012, 12:50:02 AM3/24/12
to
On 03/23/2012 08:02 PM, Thomas Renninger wrote:
>>
>> Sorry, it doesn't work that way.
> I guess this would not be the first time a good feature has been added,
> knowing that a more general API will be build around it later.
> 10 lines are really easy to adopt.
>

No, what you want is to add a new interface, which is going to have to
be maintained indefinitely. That's the problem.

>> I see two realistic options:
>>
>> 1. We use cpio encapsulation for everything, with a special namespace
>> for items used directly by the kernel, e.g. "kernel/".
>>
>> + Simple, existing tools can pick apart
>> - May lead people to believe that the early-initrd portion can be
>> compressed like the "normal" initrd portion, leading to strange
>> problems.
> Can this be implemented without using dynamic memory allocations?
> If not, it's not suitable for early APCI overriding.

Yes it can, as long as it is uncompressed. It just becomes a matter of
walking the cpio headers which is effectively a linked list.

It is more code than the simple header option, though.

By the way, if "relying on the bootloader" was an option in any way then
we would already have a solution in the form of the kernel data linked
list. Unfortunately to the best of my knowledge not a single bootloader
provides support for it.

-hpa

Yinghai Lu

unread,
Mar 24, 2012, 12:50:02 AM3/24/12
to
On Fri, Mar 23, 2012 at 6:26 PM, Thomas Renninger <tr...@suse.de> wrote:
>> maybe could have dmi checking for more strict checking.
> I do not understand what should get checked?
> Hm, I guess you mean if a general table is always added that is distributed
> on different platforms and you want to white/blacklist machines to
> explicitly load/not load the table?
> This must not happen.
> The tables are always platform specific and you provide tables for
> a specific machine only for debugging purposes.
>
>> also would help if have one boot command that could skip overriding.
> Same. Both should not be needed.
>
> Or you have a use-case in mind I cannot not think of...

For known broken system, could be old, or vendor does not want to
provide updated firmware anymore.
if the user could boot system with debug parameter
like "noapic maxcpus=1...."
then have user space util to extract acpi tables, and patch them and
repack them into initrd.
next boot will have system running with optimal ...

If we could do that, we could avoid or kill workaround and quirks in the kernel.

So we may need some strict check or skip when you are moving disk
around or update different bios later.

Thanks

Yinghai

H. Peter Anvin

unread,
Mar 24, 2012, 1:00:02 AM3/24/12
to
On 03/23/2012 09:50 PM, Yinghai Lu wrote:
> On Fri, Mar 23, 2012 at 9:43 PM, H. Peter Anvin <h...@zytor.com> wrote:
>>> Can this be implemented without using dynamic memory allocations?
>>> If not, it's not suitable for early APCI overriding.
>>
>> Yes it can, as long as it is uncompressed. It just becomes a matter of
>> walking the cpio headers which is effectively a linked list.
>
> During checking in reserve_inirtd() (BTW, the name of the function is outdated).
> memblock is ready.
> So dynamical memory allocation is there already.
>
> also could put that headers in the tail, so old intird utilities could
> work as before.
>

We could; I'm not sure if that would be an improvement or not.

However, for the microcode update stuff this needs to happen long before
memblock exists, so it's not really an option. However, it's not a
problem, as the whole initrd blob needs to be reserved for that duration.

Yinghai Lu

unread,
Mar 24, 2012, 1:00:02 AM3/24/12
to
On Fri, Mar 23, 2012 at 9:43 PM, H. Peter Anvin <h...@zytor.com> wrote:
>> Can this be implemented without using dynamic memory allocations?
>> If not, it's not suitable for early APCI overriding.
>
> Yes it can, as long as it is uncompressed.  It just becomes a matter of
> walking the cpio headers which is effectively a linked list.

During checking in reserve_inirtd() (BTW, the name of the function is outdated).
memblock is ready.
So dynamical memory allocation is there already.

also could put that headers in the tail, so old intird utilities could
work as before.

Yinghai

Borislav Petkov

unread,
Mar 24, 2012, 5:30:01 AM3/24/12
to
On Fri, Mar 23, 2012 at 09:43:09PM -0700, H. Peter Anvin wrote:
> By the way, if "relying on the bootloader" was an option in any way then
> we would already have a solution in the form of the kernel data linked
> list. Unfortunately to the best of my knowledge not a single bootloader
> provides support for it.

... and that's unfortunate because if we had that support, we wouldn't
need to redo the initrd each time microcode or BIOS tables change but
simply point the boot loader to the updated images.

:-(

Btw, hpa, didn't you have someone working on early microcode loading,
any results there?

Thanks.

--
Regards/Gruss,
Boris.

Konrad Rzeszutek Wilk

unread,
Mar 24, 2012, 2:50:02 PM3/24/12
to
> 2. We create a new simple header (just a magic number, an identifier
> for the type of data, and a length) for each of the early-initrd
> objects:
>
> struct early_initrd_header {
> u64 magic; /* 0xa5a46ce422d8f5a1 */

Probably should also have:
u32 version;

in case we decide to expand this structure in the future, and:
> u32 type; /* 1 = file data, 2 = ACPI, 3 = microcode... */
> u32 length; /* Length of data object */
> };

and encapsulate the whole thing in a 4K union?

>
> XXX: Should we make this a defined endianness (presumably
> bigendian), or use host-endianness? I would guess the former might
> be better...

Perhaps the header should be in big-endian (that is the same as the network
byte order, right?) and each sub-type can define its own endian?

The sub-types could be:
struct microcode_type {
u64 magic;
u32 version;
#define BIG_ENDIAN 1<<1
#define LITTLE_ENDIAN 1<<2
u32 flags;
}

and that could be attached to the end of the 'early_initrd_header' ?

>
> Either of these allow one piece of code to quickly bypass bits that
> doesn't belong to it.
>
> Thoughts?
>
> -hpa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in

H. Peter Anvin

unread,
Mar 24, 2012, 3:00:02 PM3/24/12
to
[Adding Harald Hoyer, maintainer of dracut. Harald, we're discussing
adding components to the initrd blob which is used during early boot.
We have at least two such users identified: microcode updates and ACPI
override. The two proposals on the table is either to stick with cpio
format and using a special namespace (e.g. kernel/) or using a new
header for these early objects that would be skipped by the initramfs
decoder.]

On 03/24/2012 02:24 AM, Borislav Petkov wrote:
> On Fri, Mar 23, 2012 at 09:43:09PM -0700, H. Peter Anvin wrote:
>> By the way, if "relying on the bootloader" was an option in any way then
>> we would already have a solution in the form of the kernel data linked
>> list. Unfortunately to the best of my knowledge not a single bootloader
>> provides support for it.
>
> ... and that's unfortunate because if we had that support, we wouldn't
> need to redo the initrd each time microcode or BIOS tables change but
> simply point the boot loader to the updated images.
>
> :-(
>
> Btw, hpa, didn't you have someone working on early microcode loading,
> any results there?
>

Yes... unfortunately between the kernel.org disaster and having a baby I
haven't had enough to time to get that work out of the freezer, which is
unfortunate in the extreme.

On the other hand, perhaps it is a good thing we didn't end up settling
on a protocol which was too narrow purpose for that one, too.

This is my current thinking, and please comment on this so we don't
spend a bunch of time bikeshedding this one... I'd like to come up with
something that we can implement and move on.

I have to say I'm personally leaning in the direction of just using a
special namespace and still encode things as cpio. However, in order
for that to work we need to make sure that the invariant "early kernel
objects come before any compressed blob" is maintained at all times,
which might be sensitive. However, it has serious advantages both from
a bootloader UI point of view (names are ASCII strings) and from a tools
point of view (it's just cpio, still.)

The main downside is that the cpio header is mostly in ASCII encoded
hexadecimal, which adds to the amount of code that needs to be ported
into "special" environments... not a problem for the ACPI converter but
for the early microcode it matters.

I might try to throw a minimal walker together and see how much code it
is, unless someone beats me to it ;)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

H. Peter Anvin

unread,
Mar 24, 2012, 3:20:02 PM3/24/12
to
On 03/24/2012 11:42 AM, Konrad Rzeszutek Wilk wrote:
>
> Probably should also have:
> u32 version;
>
> in case we decide to expand this structure in the future, and:
>> u32 type; /* 1 = file data, 2 = ACPI, 3 = microcode... */
>> u32 length; /* Length of data object */
>> };
>
> and encapsulate the whole thing in a 4K union?
>

For "version" you'd have to define what happens if you see a version
number you don't recognize, and why that is in any way better than
changing the magic number or the type. It is something that people like
to throw in without thinking about it, and that is a mistake.

Forcing everything to be page-aligned may be a good idea, although that
assumes all bootloaders actually align them that way...

>
> Perhaps the header should be in big-endian (that is the same as the network
> byte order, right?) and each sub-type can define its own endian?
>

The content of the encapsulation is its own thing; it will be different
for different types as most of them already

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Konrad Rzeszutek Wilk

unread,
Mar 24, 2012, 3:30:02 PM3/24/12
to
On Sat, Mar 24, 2012 at 12:15:59PM -0700, H. Peter Anvin wrote:
> On 03/24/2012 11:42 AM, Konrad Rzeszutek Wilk wrote:
> >
> > Probably should also have:
> > u32 version;
> >
> > in case we decide to expand this structure in the future, and:
> >> u32 type; /* 1 = file data, 2 = ACPI, 3 = microcode... */
> >> u32 length; /* Length of data object */
> >> };
> >
> > and encapsulate the whole thing in a 4K union?
> >
>
> For "version" you'd have to define what happens if you see a version
> number you don't recognize, and why that is in any way better than
> changing the magic number or the type. It is something that people like
> to throw in without thinking about it, and that is a mistake.

I was thinking of interface version. So the first would be 1, and if
there are extensions (so version 2 for example), it should support 1 and 2.
The idea is to expand past the structure with newer additions without
breaking the binary interface.

>
> Forcing everything to be page-aligned may be a good idea, although that
> assumes all bootloaders actually align them that way...

Oh, and be 4K.

H. Peter Anvin

unread,
Mar 24, 2012, 3:50:02 PM3/24/12
to
You're missing the point. Unless you have specific rules how to deal.with extensions, and what thise extensions may do, you're not adding anything.

I think the simpler just use cpio idea still wins out in my mind.
Sent from my mobile phone. Please excuse my brevity and lack of formatting.

H. Peter Anvin

unread,
Mar 24, 2012, 6:30:02 PM3/24/12
to
The attached cpio-parsing code compiles to 458 bytes on x86-64 and 476
bytes on i386, and that is without any library dependencies at all.
Again, it will completely stop at the first compressed data item, so any
such kernel objects absolutely will have to be first. In good Linux
tradition, it is also completely untested.

However, given that very reasonable size I would think that this is a
reasonable approach. Anyone who has a better suggestion for the
namespace than "kernel/"?
findcpio.c

H. Peter Anvin

unread,
Mar 24, 2012, 6:50:02 PM3/24/12
to
On 03/24/2012 03:21 PM, H. Peter Anvin wrote:
> The attached cpio-parsing code compiles to 458 bytes on x86-64 and 476
> bytes on i386, and that is without any library dependencies at all.
> Again, it will completely stop at the first compressed data item, so any
> such kernel objects absolutely will have to be first. In good Linux
> tradition, it is also completely untested.
>
> However, given that very reasonable size I would think that this is a
> reasonable approach. Anyone who has a better suggestion for the
> namespace than "kernel/"?
>

Slightly improved version with actually working memcmp()...

-hpa

findcpio.c

H. Peter Anvin

unread,
Mar 25, 2012, 12:20:01 AM3/25/12
to
On 03/24/2012 03:21 PM, H. Peter Anvin wrote:
> The attached cpio-parsing code compiles to 458 bytes on x86-64 and 476
> bytes on i386, and that is without any library dependencies at all.
> Again, it will completely stop at the first compressed data item, so any
> such kernel objects absolutely will have to be first. In good Linux
> tradition, it is also completely untested.
>
> However, given that very reasonable size I would think that this is a
> reasonable approach. Anyone who has a better suggestion for the
> namespace than "kernel/"?
>

The more I think about it the more I really think this is the right
approach. For microcode, this means we don't have to worry about
creating a super-container for the various microcode formats; we can
simply have:

kernel/x86/microcode/GenuineIntel
kernel/x86/microcode/AuthenticAMD

... which solves the problem neatly.

It does beg the question if you want to be able to have multiple:

kernel/acpi/...

... files, to make managing different tables easier, or is that overkill?

-hpa

Borislav Petkov

unread,
Mar 25, 2012, 5:00:02 AM3/25/12
to
On Sat, Mar 24, 2012 at 11:49:38AM -0700, H. Peter Anvin wrote:
> Yes... unfortunately between the kernel.org disaster and having a baby I

Congrats on the second one!

> haven't had enough to time to get that work out of the freezer, which is
> unfortunate in the extreme.
>
> On the other hand, perhaps it is a good thing we didn't end up settling
> on a protocol which was too narrow purpose for that one, too.
>
> This is my current thinking, and please comment on this so we don't
> spend a bunch of time bikeshedding this one... I'd like to come up with
> something that we can implement and move on.
>
> I have to say I'm personally leaning in the direction of just using a
> special namespace and still encode things as cpio. However, in order
> for that to work we need to make sure that the invariant "early kernel
> objects come before any compressed blob" is maintained at all times,

That actually makes sense for ucode because we want it as early as
possible.

> which might be sensitive. However, it has serious advantages both from
> a bootloader UI point of view (names are ASCII strings) and from a tools
> point of view (it's just cpio, still.)
>
> The main downside is that the cpio header is mostly in ASCII encoded
> hexadecimal, which adds to the amount of code that needs to be ported
> into "special" environments... not a problem for the ACPI converter but
> for the early microcode it matters.

One other downside for this approach would be the need to have
initrd/initramfs support always compiled into the kernel in order to do
all those things but, hey, distro kernels already have that so we're
probably stuck with it anyway.

And, if someone wants to add the support to the bootloader later, he
can still do that by handing in cpio archives to the kernel for further
processing...

--
Regards/Gruss,
Boris.

Borislav Petkov

unread,
Mar 25, 2012, 5:10:02 AM3/25/12
to
On Sat, Mar 24, 2012 at 09:17:21PM -0700, H. Peter Anvin wrote:
> The more I think about it the more I really think this is the right
> approach. For microcode, this means we don't have to worry about
> creating a super-container for the various microcode formats; we can
> simply have:
>
> kernel/x86/microcode/GenuineIntel
> kernel/x86/microcode/AuthenticAMD
>
> ... which solves the problem neatly.

Yes, this is actually pretty neat. Basically, distro vendors will get
ucode images from the resp. hw vendors, put them in the proper hierarchy
as above and generate the cpio archive, easy.

> It does beg the question if you want to be able to have multiple:
>
> kernel/acpi/...
>
> ... files, to make managing different tables easier, or is that overkill?

Well, like the above, different sub-namespaces...

--
Regards/Gruss,
Boris.

Borislav Petkov

unread,
Mar 25, 2012, 5:30:01 AM3/25/12
to
[..]

> /*
> * findcpio.c
> *
> * Find a specific cpio member; must precede any compressed content.

Looks simple enough. Btw, there's some cpio handling already done in
<init/initramfs.c> - probably reuse some of the code there...?

--
Regards/Gruss,
Boris.

H. Peter Anvin

unread,
Mar 25, 2012, 7:40:01 PM3/25/12
to
On 03/25/2012 02:25 AM, Borislav Petkov wrote:
>
>> /*
>> * findcpio.c
>> *
>> * Find a specific cpio member; must precede any compressed content.
>
> Looks simple enough. Btw, there's some cpio handling already done in
> <init/initramfs.c> - probably reuse some of the code there...?
>

Quite possible, but then you will have to untangle the dependencies.
Keep in mind that it needs to be able to compile this in such a way that
you have no library dependencies and no fixed address references (so
that it can be invoked from the pre-paging code to get microcode updates
in as early as at all possible.) So code reuse is a bit tricky at least
for that specific use case. The ACPI code is less sensitive since it
comes a lot later.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Thomas Renninger

unread,
Mar 25, 2012, 8:50:01 PM3/25/12
to
On Saturday 24 March 2012 05:43:09 H. Peter Anvin wrote:
> On 03/23/2012 08:02 PM, Thomas Renninger wrote:
> >>
> >> Sorry, it doesn't work that way.
> > I guess this would not be the first time a good feature has been added,
> > knowing that a more general API will be build around it later.
> > 10 lines are really easy to adopt.
> >
>
> No, what you want is to add a new interface, which is going to have to
> be maintained indefinitely. That's the problem.
It's a debug option...
Best would be if no distro specific mkinitrd magic is needed and it's
just as easy as it is:
cp DSDT.aml /boot/initrd-test
cat /boot/initrd >>/boot/initrd-test
and add a test boot entry to grub's menu.lst or whereever.
Then developers would not have to look at distro specific implementations
(which should not exist) about how to test a table quickly.

> >> I see two realistic options:
> >>
> >> 1. We use cpio encapsulation for everything, with a special namespace
> >> for items used directly by the kernel, e.g. "kernel/".
> >>
> >> + Simple, existing tools can pick apart
> >> - May lead people to believe that the early-initrd portion can be
> >> compressed like the "normal" initrd portion, leading to strange
> >> problems.
> > Can this be implemented without using dynamic memory allocations?
> > If not, it's not suitable for early APCI overriding.
>
> Yes it can, as long as it is uncompressed. It just becomes a matter of
> walking the cpio headers which is effectively a linked list.
>
> It is more code than the simple header option, though.
>
> By the way, if "relying on the bootloader" was an option in any way

Why exactly is a change in the bootloader not an option?
Not sure whether a version number is passed, but the magic number could be
changed for now.

> then
> we would already have a solution in the form of the kernel data linked
> list. Unfortunately to the best of my knowledge not a single bootloader
> provides support for it.
If the new magic number is passed, we get a linked list.
Otherwise we get it the old style and can still support that for several
years.
I agree that this should not be subject to change every here and there, but
"never ever change..." sounds as if something went wrong.
This is just an interface to the kernel, some apps, called bootloaders are
making use of?

It's not that I want to propagate a bigger change, I am all for the quick change
and proceed solution, I just wonder why above is not possible.

Thomas

Thomas Renninger

unread,
Mar 25, 2012, 8:50:01 PM3/25/12
to
On Saturday 24 March 2012 05:41:37 Yinghai Lu wrote:
> On Fri, Mar 23, 2012 at 6:26 PM, Thomas Renninger <tr...@suse.de> wrote:
> >> maybe could have dmi checking for more strict checking.
> > I do not understand what should get checked?
> > Hm, I guess you mean if a general table is always added that is distributed
> > on different platforms and you want to white/blacklist machines to
> > explicitly load/not load the table?
> > This must not happen.
> > The tables are always platform specific and you provide tables for
> > a specific machine only for debugging purposes.
> >
> >> also would help if have one boot command that could skip overriding.
> > Same. Both should not be needed.
> >
> > Or you have a use-case in mind I cannot not think of...
>
> For known broken system, could be old, or vendor does not want to
> provide updated firmware anymore.
> if the user could boot system with debug parameter
> like "noapic maxcpus=1...."
> then have user space util to extract acpi tables, and patch them and
> repack them into initrd.
> next boot will have system running with optimal ...
>
> If we could do that, we could avoid or kill workaround and quirks in the kernel.
>
> So we may need some strict check or skip when you are moving disk
> around or update different bios later.
NO!

This is exactly for what you've written first:
---
that is very good feature for development and debug.
will not need to ask for testbios ...
---

Linux ACPI policy is to support BIOSes which work with latest Windows version
supported systems. If needed, quirks are added.
I agree that some very old quirks can vanish, I don't have a nice example at hand,
there should not be much of them.
When DSDT overriding via initrd was not possible any more because it has been done
too early, I saw about 3 bugs of poor souls with old machines who really needed this
to get the system half way running. For such old systems this may be an option and
they do not need to compile a modified DSDT into their kernel anymore.


If this was not clear by tainting the kernel and adding this to Documentation/ :
+Please keep in mind that this is a debug option.
+ACPI tables should not get overridden for productive use.
+If BIOS ACPI tables are overridden the kernel will get tainted with the
+TAINT_OVERRIDDEN_ACPI_TABLE flag.
+Complain to your platform/BIOS vendor if you find a bug which is that sever
+that a workaround is not accepted in the Linus kernel.

a message like "Overriding an ACPI table, don't do that unless you know exactly
what you are doing" or the Kconfig option can be moved from the ACPI to the
"Kernel hacking" section to make this even more obvious.

Len has a bunch of arguments why fixing things by overriding ACPI tables is
the wrong approach.

This still is a very convenient feature:
- to find bugs in BIOS ACPI tables or in the kernel.
- for research -> in the end ACPI is an essential part of the X86
architecture and e.g. by clustering the DSDT with debug statements it's
now easy to get an idea how bits are shifted around and where they
are coming from.
- for BIOS developers and platform vendors to easily test the impact of their
changes on the Linux kernel. Also to let them use the Intel ACPI aka ACPICA
tools which is the code base syned into (used in) the kernel.

Thomas

H. Peter Anvin

unread,
Mar 25, 2012, 9:30:02 PM3/25/12
to
On 03/25/2012 05:45 PM, Thomas Renninger wrote:
> Best would be if no distro specific mkinitrd magic is needed and it's
> just as easy as it is:
> cp DSDT.aml /boot/initrd-test
> cat /boot/initrd >>/boot/initrd-test
> and add a test boot entry to grub's menu.lst or whereever.
> Then developers would not have to look at distro specific implementations
> (which should not exist) about how to test a table quickly.

There is no distro-specific magic needed. What I'm proposing is
basically what you have above, except that your DSDT.aml would be
wrapped in a cpio header. What I would like to ask from you is if it
makes sense to have kernel/acpi/DSDT, kernel/acpi/SSDT and so on, or
just make it a single kernel/acpi member.

By wrapping in a cpio container it becomes a generic mechanism.

>> By the way, if "relying on the bootloader" was an option in any way
>
> Why exactly is a change in the bootloader not an option?
> Not sure whether a version number is passed, but the magic number could be
> changed for now.

There are a lot of bootloaders, and one of the most commonly used ones
has a very adversarial relationship with the kernel maintainers.

> If the new magic number is passed, we get a linked list.

The linked list stuff is already supported. This interface has been
supported in the kernel since 2007.

-hpa

H. Peter Anvin

unread,
Mar 25, 2012, 9:40:02 PM3/25/12
to
On 03/25/2012 01:54 AM, Borislav Petkov wrote:
>
> One other downside for this approach would be the need to have
> initrd/initramfs support always compiled into the kernel in order to do
> all those things but, hey, distro kernels already have that so we're
> probably stuck with it anyway.
>

In theory we could make the early stuff independent of the initramfs
code in the kernel -- just like we always build in an initramfs.

Not that it matters much, at this point the no-initramfs configurations
are mostly theoretical.

I was able to shave a bit more off the code... optimizing for size can
be fun sometimes. The code is now 442 bytes on x86-64 and 413 on i386
when compiled with -O2 -fomit-frame-pointer; with -Os
-fomit-frame-pointer it is 372/410... all of that without any library
calls or relocations of any kind (which means it should be safe to call
from the prepaging environment.)

-hpa
findcpio.c

Thomas Renninger

unread,
Mar 26, 2012, 10:20:01 AM3/26/12
to
On Monday, March 26, 2012 03:25:17 AM H. Peter Anvin wrote:
> On 03/25/2012 05:45 PM, Thomas Renninger wrote:
> > Best would be if no distro specific mkinitrd magic is needed and it's
> > just as easy as it is:
> > cp DSDT.aml /boot/initrd-test
> > cat /boot/initrd >>/boot/initrd-test
> > and add a test boot entry to grub's menu.lst or whereever.
> > Then developers would not have to look at distro specific implementations
> > (which should not exist) about how to test a table quickly.
>
> There is no distro-specific magic needed. What I'm proposing is
> basically what you have above, except that your DSDT.aml would be
> wrapped in a cpio header. What I would like to ask from you is if it
> makes sense to have kernel/acpi/DSDT, kernel/acpi/SSDT and so on, or
> just make it a single kernel/acpi member.
Do the names have to be fixed?
Then it would not make much sense. Possibly placeholder file names like:
table1.aml...table10.aml
could be defined.
Especially the possibility of several SSDTs makes a naming convention
rather ugly: SSDT1.aml, SSDT2.aml...
If there is a directory which (only) contains all ACPI table files that can
be scanned and the file names do not matter, it's rather nice to set it
up in userspace. I guess I have to do 2 iterations then:
- Go through all files in this dir, validate each as a valid ACPI table
and sum up the size of all these files -> Reserve "size of all files".
- Then do a 2nd iteration and copy all these files into the reserved
memory area.

If there is only one file where all ACPI table binaries are glued
together kernel/acpi_tables.binary.blob
my code should already work as good as unmodified.

Not sure what is best and whether anybody cares about how the tables
are layed out.

I guess I just wait until your functionality is in linux-next and
give it a closer look then and may come up with a new patch.

> By wrapping in a cpio container it becomes a generic mechanism.
>
> >> By the way, if "relying on the bootloader" was an option in any way
> >
> > Why exactly is a change in the bootloader not an option?
> > Not sure whether a version number is passed, but the magic number could be
> > changed for now.
>
> There are a lot of bootloaders, and one of the most commonly used ones
> has a very adversarial relationship with the kernel maintainers.
>
> > If the new magic number is passed, we get a linked list.
>
> The linked list stuff is already supported. This interface has been
> supported in the kernel since 2007.
Ok, got it. Thanks.

Thomas

H. Peter Anvin

unread,
Mar 26, 2012, 10:50:03 AM3/26/12
to
On 03/26/2012 07:19 AM, Thomas Renninger wrote:
> Do the names have to be fixed?

No, I assume it would "all files inside this namespace".

> If there is only one file where all ACPI table binaries are glued
> together kernel/acpi_tables.binary.blob
> my code should already work as good as unmodified.
>
> Not sure what is best and whether anybody cares about how the tables
> are layed out.
>
> I guess I just wait until your functionality is in linux-next and
> give it a closer look then and may come up with a new patch.

I doubt it will get into linux-next without a user. If you have time to
pull in the code I sent it will be quicker, otherwise I'll probably
setup a git tree upon which things can be put.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Thomas Renninger

unread,
Mar 26, 2012, 11:00:01 AM3/26/12
to
On Monday, March 26, 2012 04:46:02 PM H. Peter Anvin wrote:
> On 03/26/2012 07:19 AM, Thomas Renninger wrote:
> > Do the names have to be fixed?
>
> No, I assume it would "all files inside this namespace".
>
> > If there is only one file where all ACPI table binaries are glued
> > together kernel/acpi_tables.binary.blob
> > my code should already work as good as unmodified.
> >
> > Not sure what is best and whether anybody cares about how the tables
> > are layed out.
> >
> > I guess I just wait until your functionality is in linux-next and
> > give it a closer look then and may come up with a new patch.
>
> I doubt it will get into linux-next without a user. If you have time to
> pull in the code I sent it will be quicker, otherwise I'll probably
> setup a git tree upon which things can be put.

I need some weeks to have time to revisit this.
But as it slips this merge window anyway it should not matter.
Please add me to CC if someone else looks at or goes on with this.

Thomas

H. Peter Anvin

unread,
Mar 27, 2012, 12:20:01 AM3/27/12
to
For what it's worth, I'm starting a git tree for this work at:

git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-earlyinitramfs.git

-hpa

Peter Stuge

unread,
Mar 27, 2012, 1:00:01 AM3/27/12
to
H. Peter Anvin wrote:
> There are a lot of bootloaders, and one of the most commonly used ones
> has a very adversarial relationship with the kernel maintainers.

A couple of less commonly used ones, by or near the coreboot project,
haven't exactly been embraced by kernel maintainers. Dunno if it's
because coreboot doesn't much like ACPI, and thinks that it would
be interesting to explore and innovate firmware on x86.


//Peter

H. Peter Anvin

unread,
Mar 27, 2012, 2:20:02 AM3/27/12
to
On 03/26/2012 09:46 PM, Peter Stuge wrote:
> H. Peter Anvin wrote:
>> There are a lot of bootloaders, and one of the most commonly used ones
>> has a very adversarial relationship with the kernel maintainers.
>
> A couple of less commonly used ones, by or near the coreboot project,
> haven't exactly been embraced by kernel maintainers. Dunno if it's
> because coreboot doesn't much like ACPI, and thinks that it would
> be interesting to explore and innovate firmware on x86.
>
> //Peter

I think it is more because coreboot has tried to do things in
nonstandard ways rather than fill in the gaps they have. This is
getting a *lot* better, as far as I can tell, but in a sane world there
shouldn't be any need for "bootloaders by or near the coreboot project"
since any standard x86 bootloader should Just Work[TM].

I like to draw parallels with how we dealt with holes in applications in
the early days of Linux. For example, when Linux wasn't providing the
interfaces that X needed, we didn't add a bunch of Linux-specific code
to X, *we added the standard interfaces to Linux*.

-hpa

Konrad Rzeszutek Wilk

unread,
Mar 27, 2012, 10:50:02 AM3/27/12
to
On Sun, Mar 25, 2012 at 06:36:56PM -0700, H. Peter Anvin wrote:
> On 03/25/2012 01:54 AM, Borislav Petkov wrote:
> >
> > One other downside for this approach would be the need to have
> > initrd/initramfs support always compiled into the kernel in order to do
> > all those things but, hey, distro kernels already have that so we're
> > probably stuck with it anyway.
> >
>
> In theory we could make the early stuff independent of the initramfs
> code in the kernel -- just like we always build in an initramfs.


Were would this payload resize? Would it be tacked on the initramfs.gz
or perhaps to the vmlinuz? Or would it be a seperate payload (so multiboot
type)? Or would this be supported by all of those mechanism?

>
> Not that it matters much, at this point the no-initramfs configurations
> are mostly theoretical.
>
> I was able to shave a bit more off the code... optimizing for size can
> be fun sometimes. The code is now 442 bytes on x86-64 and 413 on i386
> when compiled with -O2 -fomit-frame-pointer; with -Os
> -fomit-frame-pointer it is 372/410... all of that without any library
> calls or relocations of any kind (which means it should be safe to call
> from the prepaging environment.)
>
> -hpa

> /*
> * findcpio.c
> *
> * Find a specific cpio member; must precede any compressed content.
> */
>
> #include <stddef.h>
> #include <stdbool.h>
>
> struct cpio_data {
> void *data;
> unsigned long size;
> };
>
> enum cpio_fields {
> C_MAGIC,
> C_INO,
> C_MODE,
> C_UID,
> C_GID,
> C_NLINK,
> C_MTIME,
> C_FILESIZE,
> C_MAJ,
> C_MIN,
> C_RMAJ,
> C_RMIN,
> C_NAMESIZE,
> C_CHKSUM,
> C_NFIELDS
> };
>
> #if defined(__i386__) || defined(__x86_64__)
> static size_t strlen(const char *name)
> {
> size_t n = -1;
>
> asm("repne; scasb"
> : "+D" (name), "+c" (n)
> : "a" (0));
>
> return -2 - n;
> }
>
> static int memcmp(const void *p1, const void *p2, size_t n)
> {
> unsigned char rv;
>
> asm("repe; cmpsb; setne %0"
> : "=r" (rv), "+S" (p1), "+D" (p2), "+c" (n));
>
> return rv;
> }
> #else
> static size_t strlen(const char *name)
> {
> size_t n = 0;
>
> while (*name++)
> n++;
>
> return n;
> }
>
> static int memcmp(const void *p1, const void *p2, size_t n)
> {
> const unsigned char *u1 = p1;
> const unsigned char *u2 = p2;
> int d;
>
> while (n--) {
> d = *u2++ - *u1++;
> if (d)
> return d;
> }
> return 0;
> }
> #endif
>
> #define ALIGN4(p) ((void *)(((size_t)p + 3) & ~3))
>
> struct cpio_data find_cpio_data(const char *name, const void *data, size_t len)
> {
> const size_t cpio_header_len = 8*C_NFIELDS - 2;
> struct cpio_data cd = { NULL, 0 };
> const char *p, *dptr, *nptr;
> unsigned int ch[C_NFIELDS], *chp, v;
> unsigned char c, x;
> size_t mynamesize = strlen(name) + 1;
> int i, j;
>
> p = data;
>
> while (len > cpio_header_len) {
> if (!*p) {
> /* All cpio headers need to be 4-byte aligned */
> p += 4;
> len -= 4;
> continue;
> }
>
> j = 6; /* The magic field is only 6 characters */
> chp = ch;
> for (i = C_NFIELDS; i; i--) {
> v = 0;
> while (j--) {
> v <<= 4;
> c = *p++;
>
> x = c - '0';
> if (x < 10) {
> v += x;
> continue;
> }
>
> x = (c | 0x20) - 'a';
> if (x < 6) {
> v += x + 10;
> continue;
> }
>
> goto quit; /* Invalid hexadecimal */
> }
> *chp++ = v;
> j = 8; /* All other fields are 8 characters */
> }
>
> if ((ch[C_MAGIC] - 0x070701) > 1)
> goto quit; /* Invalid magic */
>
> len -= cpio_header_len;
>
> dptr = ALIGN4(p + ch[C_NAMESIZE]);
> nptr = ALIGN4(dptr + ch[C_FILESIZE]);
>
> if (nptr > p + len || dptr < p || nptr < dptr)
> goto quit; /* Buffer overrun */
>
> if ((ch[C_MODE] & 0170000) == 0100000 &&
> ch[C_NAMESIZE] == mynamesize &&
> !memcmp(p, name, mynamesize)) {
> cd.data = (void *)dptr;
> cd.size = ch[C_FILESIZE];
> return cd; /* Found it! */
> }
>
> len -= (nptr - p);
> p = nptr;
> }
>
> quit:
> return cd;
0 new messages