Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#699913: Memory hotplug (VMware) often fails

730 views
Skip to first unread message

Jonathan Nieder

unread,
Feb 7, 2013, 2:10:02 AM2/7/13
to
tags 699913 + moreinfo
quit

Hi Bernhard,

Bernhard Schmidt wrote:

> Adding additional RAM to a virtual machine running Debian Wheezy on
> VMware ESXi 5.0 often, but not always leads to the attached backtrace.
[...]
> [504133.812000] VMCIUtil: Updating context id from 0x4d2c44d9 to 0x4d2c44d9 on event 0.
> [504133.936000] Hotplug Mem Device
> [504133.956000] init_memory_mapping: 0000000040000000-0000000080000000
> [504133.956000] 0040000000 - 0080000000 page 2M
> [504134.044000] kworker/0:1: page allocation failure: order:9, mode:0x80d0
> [504134.044000] Pid: 15680, comm: kworker/0:1 Tainted: G O 3.2.0-4-amd64 #1 Debian 3.2.35-2
> [504134.044000] Call Trace:
> [504134.044000] [<ffffffff810b8417>] ? warn_alloc_failed+0x11a/0x12d
> [504134.044000] [<ffffffff810363d8>] ? should_resched+0x5/0x23
> [504134.044000] [<ffffffff8134bd27>] ? _cond_resched+0x7/0x1c
> [504134.044000] [<ffffffff813487db>] ? __alloc_pages_direct_compact+0x162/0x174
> [504134.044000] [<ffffffff810bb143>] ? __alloc_pages_nodemask+0x704/0x7aa
> [504134.044000] [<ffffffff811aa431>] ? ida_get_new_above+0xf4/0x198
> [504134.044000] [<ffffffff81344113>] ? vmemmap_alloc_block+0x5f/0xdc
> [504134.044000] [<ffffffff8134330e>] ? vmemmap_populate+0xf7/0x1f6
> [504134.044000] [<ffffffff81344531>] ? sparse_mem_map_populate+0x24/0x34
> [504134.044000] [<ffffffff81344017>] ? sparse_add_one_section+0x4e/0xeb
> [504134.044000] [<ffffffff81331782>] ? __add_pages+0x73/0x1fe
> [504134.044000] [<ffffffff8102d892>] ? arch_add_memory+0x5d/0xd1
> [504134.044000] [<ffffffff810363d8>] ? should_resched+0x5/0x23
> [504134.044000] [<ffffffff8104cd44>] ? request_resource_conflict+0x30/0x3b
> [504134.044000] [<ffffffff81331a5d>] ? add_memory+0xcc/0x14e
> [504134.044000] [<ffffffffa0159167>] ? acpi_memory_enable_device+0x7d/0xbf [acpi_memhotplug]
> [504134.044000] [<ffffffffa0159518>] ? acpi_memory_device_add+0xbe/0xdd [acpi_memhotplug]
> [504134.044000] [<ffffffff811f0456>] ? acpi_device_probe+0x42/0x10d
> [504134.044000] [<ffffffff81250add>] ? driver_probe_device+0xa8/0x138
> [504134.044000] [<ffffffff81250bdc>] ? __driver_attach+0x6f/0x6f
> [504134.044000] [<ffffffff8124f691>] ? bus_for_each_drv+0x47/0x7b
> [504134.044000] [<ffffffff812509fe>] ? device_attach+0x6f/0x8f
> [504134.044000] [<ffffffff81250280>] ? bus_probe_device+0x25/0x8d
> [504134.044000] [<ffffffff8124e897>] ? device_add+0x3fd/0x590
> [504134.044000] [<ffffffff81258509>] ? pm_runtime_init+0xb5/0xc9
> [504134.044000] [<ffffffff811f1788>] ? acpi_add_single_object+0x8f9/0xaec
> [504134.044000] [<ffffffff81070ad5>] ? arch_local_irq_save+0x11/0x17
> [504134.044000] [<ffffffff812056e0>] ? acpi_get_data+0x63/0x6e
> [504134.044000] [<ffffffff811f1aa7>] ? acpi_bus_check_add+0x12c/0x18e
> [504134.044000] [<ffffffff81039817>] ? finish_task_switch+0x88/0xb9
> [504134.044000] [<ffffffff81070ad5>] ? arch_local_irq_save+0x11/0x17
> [504134.044000] [<ffffffff8134d03c>] ? _raw_spin_lock_irqsave+0x9/0x25
> [504134.044000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.044000] [<ffffffff811f1b3a>] ? acpi_bus_scan+0x31/0x76
> [504134.044000] [<ffffffff811eded8>] ? acpi_os_signal_semaphore+0x19/0x24
> [504134.044000] [<ffffffff811f1bde>] ? acpi_bus_add+0x24/0x2a
> [504134.044000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.044000] [<ffffffffa01592ad>] ? acpi_memory_device_notify+0xa5/0x221 [acpi_memhotplug]
> [504134.044000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.044000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.044000] [<ffffffff811fb583>] ? acpi_ev_notify_dispatch+0x5b/0x6f
> [504134.044000] [<ffffffff811ed5a3>] ? acpi_os_execute_deferred+0x1e/0x2a
> [504134.044000] [<ffffffff8105b0ed>] ? process_one_work+0x163/0x284
> [504134.044000] [<ffffffff8105c0cc>] ? worker_thread+0xc2/0x145
> [504134.044000] [<ffffffff8105c00a>] ? manage_workers.isra.25+0x15b/0x15b
> [504134.044000] [<ffffffff8105f201>] ? kthread+0x76/0x7e
> [504134.044000] [<ffffffff81354174>] ? kernel_thread_helper+0x4/0x10
> [504134.044000] [<ffffffff8105f18b>] ? kthread_worker_fn+0x139/0x139
> [504134.044000] [<ffffffff81354170>] ? gs_change+0x13/0x13
> [504134.044000] Mem-Info:
[...]
> [504134.048000] WARNING: at /build/buildd-linux_3.2.35-2-amd64-v9djlH/linux-3.2.35/arch/x86/mm/init_64.c:676 arch_add_memory+0x7f/0xd1()
> [504134.048000] Hardware name: VMware Virtual Platform
> [504134.048000] Modules linked in: joydev vsock(O) vmmemctl(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc ext4 crc16 jbd2 mbcache snd_pcm snd_page_alloc i2c_piix4 snd_timer coretemp vmci(O) snd vmwgfx parport_pc ttm drm parport soundcore shpchp crc32c_intel psmouse pcspkr i2c_core serio_raw ac power_supply processor evdev thermal_sys container button acpi_memhotplug loop autofs4 xfs vmxnet(O) sr_mod cdrom sg ata_generic sd_mod crc_t10dif floppy ata_piix libata vmxnet3 vmw_pvscsi scsi_mod [last unloaded: scsi_wait_scan]
> [504134.048000] Pid: 15680, comm: kworker/0:1 Tainted: G O 3.2.0-4-amd64 #1 Debian 3.2.35-2
> [504134.048000] Call Trace:
> [504134.048000] [<ffffffff81046a75>] ? warn_slowpath_common+0x78/0x8c
> [504134.048000] [<ffffffff8102d8b4>] ? arch_add_memory+0x7f/0xd1
> [504134.048000] [<ffffffff810363d8>] ? should_resched+0x5/0x23
> [504134.048000] [<ffffffff81331a5d>] ? add_memory+0xcc/0x14e
> [504134.048000] [<ffffffffa0159167>] ? acpi_memory_enable_device+0x7d/0xbf [acpi_memhotplug]
> [504134.048000] [<ffffffffa0159518>] ? acpi_memory_device_add+0xbe/0xdd [acpi_memhotplug]
> [504134.048000] [<ffffffff811f0456>] ? acpi_device_probe+0x42/0x10d
> [504134.048000] [<ffffffff81250add>] ? driver_probe_device+0xa8/0x138
> [504134.048000] [<ffffffff81250bdc>] ? __driver_attach+0x6f/0x6f
> [504134.048000] [<ffffffff8124f691>] ? bus_for_each_drv+0x47/0x7b
> [504134.048000] [<ffffffff812509fe>] ? device_attach+0x6f/0x8f
> [504134.048000] [<ffffffff81250280>] ? bus_probe_device+0x25/0x8d
> [504134.048000] [<ffffffff8124e897>] ? device_add+0x3fd/0x590
> [504134.048000] [<ffffffff81258509>] ? pm_runtime_init+0xb5/0xc9
> [504134.048000] [<ffffffff811f1788>] ? acpi_add_single_object+0x8f9/0xaec
> [504134.048000] [<ffffffff81070ad5>] ? arch_local_irq_save+0x11/0x17
> [504134.048000] [<ffffffff812056e0>] ? acpi_get_data+0x63/0x6e
> [504134.048000] [<ffffffff811f1aa7>] ? acpi_bus_check_add+0x12c/0x18e
> [504134.048000] [<ffffffff81039817>] ? finish_task_switch+0x88/0xb9
> [504134.048000] [<ffffffff81070ad5>] ? arch_local_irq_save+0x11/0x17
> [504134.048000] [<ffffffff8134d03c>] ? _raw_spin_lock_irqsave+0x9/0x25
> [504134.048000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.048000] [<ffffffff811f1b3a>] ? acpi_bus_scan+0x31/0x76
> [504134.048000] [<ffffffff811eded8>] ? acpi_os_signal_semaphore+0x19/0x24
> [504134.048000] [<ffffffff811f1bde>] ? acpi_bus_add+0x24/0x2a
> [504134.048000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.048000] [<ffffffffa01592ad>] ? acpi_memory_device_notify+0xa5/0x221 [acpi_memhotplug]
> [504134.048000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.048000] [<ffffffff811ed585>] ? acpi_os_wait_events_complete+0x1c/0x1c
> [504134.048000] [<ffffffff811fb583>] ? acpi_ev_notify_dispatch+0x5b/0x6f
> [504134.048000] [<ffffffff811ed5a3>] ? acpi_os_execute_deferred+0x1e/0x2a
> [504134.048000] [<ffffffff8105b0ed>] ? process_one_work+0x163/0x284
> [504134.048000] [<ffffffff8105c0cc>] ? worker_thread+0xc2/0x145
> [504134.048000] [<ffffffff8105c00a>] ? manage_workers.isra.25+0x15b/0x15b
> [504134.048000] [<ffffffff8105f201>] ? kthread+0x76/0x7e
> [504134.048000] [<ffffffff81354174>] ? kernel_thread_helper+0x4/0x10
> [504134.048000] [<ffffffff8105f18b>] ? kthread_worker_fn+0x139/0x139
> [504134.048000] [<ffffffff81354170>] ? gs_change+0x13/0x13
> [504134.048000] ---[ end trace f7146bc50aa7470a ]---
> [504134.048000] ACPI:memory_hp:add_memory failed
> [504134.048000] ACPI:memory_hp:Error in acpi_memory_enable_device
> [504134.048000] acpi_memhotplug: probe of PNP0C80:04 failed with error -22
[...]
> It happens with Squeeze 2.6.32, Squeeze with bpo and Wheezy 3.2.0-4
>
> The tainted warning comes from the kernel modules in the official VMware
> tools package, I will try to reproduce without them tomorrow.

Thanks. Please also attach /proc/iomem.

Looking forward to hearing how it goes,
Jonathan


--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Bernhard Schmidt

unread,
Feb 7, 2013, 4:30:02 AM2/7/13
to
Attached /proc/iomem of the situation with VMware tools

00000000-0000ffff : reserved
00010000-0009f3ff : System RAM
0009f400-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000ca000-000cbfff : reserved
000ca000-000cafff : Adapter ROM
000cb000-000cbfff : Adapter ROM
000cc000-000ccfff : Adapter ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-3fedffff : System RAM
01000000-01356915 : Kernel code
01356916-016946ff : Kernel data
01729000-01806fff : Kernel bss
3fee0000-3fefefff : ACPI Tables
3feff000-3fefffff : ACPI Non-volatile Storage
3ff00000-3fffffff : System RAM
c0000000-febfffff : PCI Bus 0000:00
c0000000-c0007fff : 0000:00:0f.0
d0000000-d0001fff : 0000:00:07.7
d0200000-d03fffff : pnp 00:0d
d0800000-d0ffffff : 0000:00:0f.0
d0800000-d0ffffff : vmwgfx stealth probe
d1900000-d23fffff : PCI Bus 0000:02
d2400000-d24fffff : PCI Bus 0000:03
d2400000-d2407fff : 0000:03:00.0
d2400000-d2407fff : vmw_pvscsi
d2500000-d25fffff : PCI Bus 0000:0b
d2500000-d2501fff : 0000:0b:00.0
d2503000-d2503fff : 0000:0b:00.0
d2503000-d2503fff : vmxnet3
d2504000-d2504fff : 0000:0b:00.0
d2504000-d2504fff : vmxnet3
d2600000-d26fffff : PCI Bus 0000:13
d2600000-d2601fff : 0000:13:00.0
d2603000-d2603fff : 0000:13:00.0
d2603000-d2603fff : vmxnet3
d2604000-d2604fff : 0000:13:00.0
d2604000-d2604fff : vmxnet3
d2700000-d27fffff : PCI Bus 0000:1b
d2702000-d2703fff : 0000:1b:00.0
d2704000-d2704fff : 0000:1b:00.0
d2704000-d2704fff : vmxnet3
d2705000-d2705fff : 0000:1b:00.0
d2705000-d2705fff : vmxnet3
d2800000-d28fffff : PCI Bus 0000:04
d2900000-d29fffff : PCI Bus 0000:0c
d2a00000-d2afffff : PCI Bus 0000:14
d2b00000-d2bfffff : PCI Bus 0000:1c
d2c00000-d2cfffff : PCI Bus 0000:05
d2d00000-d2dfffff : PCI Bus 0000:0d
d2e00000-d2efffff : PCI Bus 0000:15
d2f00000-d2ffffff : PCI Bus 0000:1d
d3000000-d30fffff : PCI Bus 0000:06
d3100000-d31fffff : PCI Bus 0000:0e
d3200000-d32fffff : PCI Bus 0000:16
d3300000-d33fffff : PCI Bus 0000:1e
d3400000-d34fffff : PCI Bus 0000:07
d3500000-d35fffff : PCI Bus 0000:0f
d3600000-d36fffff : PCI Bus 0000:17
d3700000-d37fffff : PCI Bus 0000:1f
d3800000-d38fffff : PCI Bus 0000:08
d3900000-d39fffff : PCI Bus 0000:10
d3a00000-d3afffff : PCI Bus 0000:18
d3b00000-d3bfffff : PCI Bus 0000:20
d3c00000-d3cfffff : PCI Bus 0000:09
d3d00000-d3dfffff : PCI Bus 0000:11
d3e00000-d3efffff : PCI Bus 0000:19
d3f00000-d3ffffff : PCI Bus 0000:21
d4000000-d40fffff : PCI Bus 0000:0a
d4100000-d41fffff : PCI Bus 0000:12
d4200000-d42fffff : PCI Bus 0000:1a
d4300000-d43fffff : PCI Bus 0000:22
d4400000-d44fffff : PCI Bus 0000:03
d4400000-d440ffff : 0000:03:00.0
d4500000-d45fffff : PCI Bus 0000:0b
d4500000-d450ffff : 0000:0b:00.0
d4600000-d46fffff : PCI Bus 0000:13
d4600000-d460ffff : 0000:13:00.0
d4700000-d47fffff : PCI Bus 0000:1b
d4700000-d470ffff : 0000:1b:00.0
d4800000-d48fffff : PCI Bus 0000:04
d4900000-d49fffff : PCI Bus 0000:0c
d4a00000-d4afffff : PCI Bus 0000:1c
d4b00000-d4bfffff : PCI Bus 0000:0d
d4c00000-d4cfffff : PCI Bus 0000:1d
d4d00000-d4dfffff : PCI Bus 0000:0e
d4e00000-d4efffff : PCI Bus 0000:1e
d4f00000-d4ffffff : PCI Bus 0000:0f
d5000000-d50fffff : PCI Bus 0000:1f
d5100000-d51fffff : PCI Bus 0000:10
d5200000-d52fffff : PCI Bus 0000:20
d5300000-d53fffff : PCI Bus 0000:11
d5400000-d54fffff : PCI Bus 0000:21
d5500000-d55fffff : PCI Bus 0000:12
d5600000-d56fffff : PCI Bus 0000:22
d8000000-dbffffff : 0000:00:0f.0
d8000000-d80bffff : vesafb
dc400000-dc9fffff : PCI Bus 0000:02
dca00000-dcafffff : PCI Bus 0000:14
dcb00000-dcbfffff : PCI Bus 0000:05
dcc00000-dccfffff : PCI Bus 0000:15
dcd00000-dcdfffff : PCI Bus 0000:06
dce00000-dcefffff : PCI Bus 0000:16
dcf00000-dcffffff : PCI Bus 0000:07
dd000000-dd0fffff : PCI Bus 0000:17
dd100000-dd1fffff : PCI Bus 0000:08
dd200000-dd2fffff : PCI Bus 0000:18
dd300000-dd3fffff : PCI Bus 0000:09
dd400000-dd4fffff : PCI Bus 0000:19
dd500000-dd5fffff : PCI Bus 0000:0a
dd600000-dd6fffff : PCI Bus 0000:1a
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
e0000000-efffffff : reserved
e0000000-efffffff : pnp 00:0d
fec00000-fec0ffff : reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : pnp 00:08
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fffe0000-ffffffff : reserved

Ben Hutchings

unread,
Feb 10, 2013, 8:20:02 PM2/10/13
to
Control: tag -1 upstream moreinfo

On Wed, 2013-02-06 at 18:02 +0100, Bernhard Schmidt wrote:
> Package: src:linux
> Version: 3.2.35-2
> Severity: normal
>
> Adding additional RAM to a virtual machine running Debian Wheezy on
> VMware ESXi 5.0 often, but not always leads to the attached backtrace.
>
> If that happens, the system has considerably less new (offline) memory
> banks in /sys/devices/system/memory/memory* than it should have, and
> setting all available memory banks online does not give all the memory
> expected.
[...]

Please test whether the attached patch fixes this. Instructions for
building a patched kernel package are in the Debian kernel handbook:
<http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus
0001-mm-Try-harder-to-allocate-vmemmap-blocks.patch
signature.asc

Bernhard Schmidt

unread,
Feb 11, 2013, 4:30:02 AM2/11/13
to
On 11.02.2013 02:09, Ben Hutchings wrote:

Hello,

thanks will do. Your description fits quite well, we've been unable to
reproduce with a freshly booted system even in production, but most VMs
that were upgraded due to memory shortage failed.

Bernhard

> Control: tag -1 upstream moreinfo
>
> On Wed, 2013-02-06 at 18:02 +0100, Bernhard Schmidt wrote:
>> Package: src:linux
>> Version: 3.2.35-2
>> Severity: normal
>>
>> Adding additional RAM to a virtual machine running Debian Wheezy on
>> VMware ESXi 5.0 often, but not always leads to the attached backtrace.
>>
>> If that happens, the system has considerably less new (offline) memory
>> banks in /sys/devices/system/memory/memory* than it should have, and
>> setting all available memory banks online does not give all the memory
>> expected.
> [...]
>
> Please test whether the attached patch fixes this. Instructions for
> building a patched kernel package are in the Debian kernel handbook:
> <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.
>
> Ben.
>


--

Bernhard Schmidt

unread,
Feb 13, 2013, 6:50:02 AM2/13/13
to
On 11.02.2013 02:09, Ben Hutchings wrote:

Hello Ben,

> Control: tag -1 upstream moreinfo
>
> On Wed, 2013-02-06 at 18:02 +0100, Bernhard Schmidt wrote:
>> Package: src:linux
>> Version: 3.2.35-2
>> Severity: normal
>>
>> Adding additional RAM to a virtual machine running Debian Wheezy on
>> VMware ESXi 5.0 often, but not always leads to the attached backtrace.
>>
>> If that happens, the system has considerably less new (offline) memory
>> banks in /sys/devices/system/memory/memory* than it should have, and
>> setting all available memory banks online does not give all the memory
>> expected.
> [...]
>
> Please test whether the attached patch fixes this. Instructions for
> building a patched kernel package are in the Debian kernel handbook:
> <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.

It looks good on Wheezy now. Hard to tell for sure because it did not
always happen, especially not with freshly booted devices, but we did
install the kernel on a few Wheezy boxes and put load on them, and I did
not observe the backtrace anymore.

Squeeze is another story, but I think there is another problem as well.
Previously we sometimes saw the backtrace, sometimes just the following
message.


[ 44.220000] VMCIUtil: Updating context id from 0x7a3c21d6 to
0x7a3c21d6 on event 0.
[ 44.252000] Hotplug Mem Device
[ 44.252000] System RAM resource 20000000 - 27ffffff cannot be added
[ 44.252000] ACPI:memory_hp:add_memory failed
[ 44.252000] ACPI:memory_hp:Error in acpi_memory_enable_device
[ 44.252000] acpi_memhotplug: probe of PNP0C80:00 failed with error -22
[ 44.252000]
[ 44.252000] driver data not found
[ 44.252000] ACPI:memory_hp:Cannot find driver data
[ 44.268000] Hotplug Mem Device
[ 44.268000] init_memory_mapping: 0000000028000000-0000000030000000
[ 44.268000] 0028000000 - 0030000000 page 2M
[ 44.280000] [ffffea00008c0000-ffffea0000abffff] PMD ->
[ffff88001f200000-ffff88001f3fffff] on node 0
[ 44.280000] Hotplug Mem Device
[ 44.284000] init_memory_mapping: 0000000030000000-0000000038000000
[ 44.284000] 0030000000 - 0038000000 page 2M
[ 44.284000] [ffffea0000a00000-ffffea0000bfffff] PMD ->
[ffff88001e400000-ffff88001e5fffff] on node 0
[ 44.340000] Hotplug Mem Device
[ 44.340000] init_memory_mapping: 0000000038000000-0000000040000000
[ 44.340000] 0038000000 - 0040000000 page 2M

We did not observe the backtrace anymore, but the "driver data not
found" is still there.

So I think the patch fixes the backtrace (allocation error) on both
squeeze and wheezy, but squeeze has a second issue. I'll go through the
bug reports and open a new one.

Bernhard

Ben Hutchings

unread,
Feb 13, 2013, 11:50:02 PM2/13/13
to
Control: tag -1 patch
Control: tag -1 - moreinfo

On Wed, 2013-02-13 at 12:40 +0100, Bernhard Schmidt wrote:
> On 11.02.2013 02:09, Ben Hutchings wrote:
>
> Hello Ben,
>
> > Control: tag -1 upstream moreinfo
> >
> > On Wed, 2013-02-06 at 18:02 +0100, Bernhard Schmidt wrote:
> >> Package: src:linux
> >> Version: 3.2.35-2
> >> Severity: normal
> >>
> >> Adding additional RAM to a virtual machine running Debian Wheezy on
> >> VMware ESXi 5.0 often, but not always leads to the attached backtrace.
> >>
> >> If that happens, the system has considerably less new (offline) memory
> >> banks in /sys/devices/system/memory/memory* than it should have, and
> >> setting all available memory banks online does not give all the memory
> >> expected.
> > [...]
> >
> > Please test whether the attached patch fixes this. Instructions for
> > building a patched kernel package are in the Debian kernel handbook:
> > <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.
>
> It looks good on Wheezy now. Hard to tell for sure because it did not
> always happen, especially not with freshly booted devices, but we did
> install the kernel on a few Wheezy boxes and put load on them, and I did
> not observe the backtrace anymore.

Yes, it will depend on how full (and how fragmented) memory is when you
try to add hotplug.

> Squeeze is another story, but I think there is another problem as well.
> Previously we sometimes saw the backtrace, sometimes just the following
> message.
>
>
> [ 44.220000] VMCIUtil: Updating context id from 0x7a3c21d6 to
> 0x7a3c21d6 on event 0.
> [ 44.252000] Hotplug Mem Device
> [ 44.252000] System RAM resource 20000000 - 27ffffff cannot be added
> [ 44.252000] ACPI:memory_hp:add_memory failed
> [ 44.252000] ACPI:memory_hp:Error in acpi_memory_enable_device
> [ 44.252000] acpi_memhotplug: probe of PNP0C80:00 failed with error -22

This is 'invalid argument' - wonder where that's coming from?

> [ 44.252000] driver data not found
> [ 44.252000] ACPI:memory_hp:Cannot find driver data
> [ 44.268000] Hotplug Mem Device
> [ 44.268000] init_memory_mapping: 0000000028000000-0000000030000000
> [ 44.268000] 0028000000 - 0030000000 page 2M
> [ 44.280000] [ffffea00008c0000-ffffea0000abffff] PMD ->
> [ffff88001f200000-ffff88001f3fffff] on node 0
> [ 44.280000] Hotplug Mem Device
> [ 44.284000] init_memory_mapping: 0000000030000000-0000000038000000
> [ 44.284000] 0030000000 - 0038000000 page 2M
> [ 44.284000] [ffffea0000a00000-ffffea0000bfffff] PMD ->
> [ffff88001e400000-ffff88001e5fffff] on node 0
> [ 44.340000] Hotplug Mem Device
> [ 44.340000] init_memory_mapping: 0000000038000000-0000000040000000
> [ 44.340000] 0038000000 - 0040000000 page 2M
>
> We did not observe the backtrace anymore, but the "driver data not
> found" is still there.
>
> So I think the patch fixes the backtrace (allocation error) on both
> squeeze and wheezy, but squeeze has a second issue. I'll go through the
> bug reports and open a new one.

Right, this is definitely a separate bug.
signature.asc
0 new messages