FS checkpointing stuck

Shamik Saha

unread,

Sep 25, 2015, 7:00:27 PM9/25/15

to gem5-gpu Developers List

Hi!

I am trying to create a checkpoint and my command line is as follows: build/X86_VI_hammer_GPU/gem5.opt --outdir=/users/shamik/gem5-gpu/gem5/chkpt/backprop ../gem5-gpu/configs/fs_fusion.py -b backprop --disk-image=x86root.img --kernel=x86_64-vmlinux-2.6.28.4-smp

I have mounted the executable of backprop in the home directory of x86root.img. Also, I am using the following .rcS file
#!/system/bin/sh

stop_m5() {
    /sbin/m5 exit

    return
}

run_bench_test() {
    /sbin/m5 checkpoint
    /sbin/m5 resetstats
    #am start -W com.fsck.k9/com.fsck.k9.activity.Accounts
    ./gem5_fusion_backprop
    /sbin/m5 dumpstats
    stop_m5

    return
}

#sleep 10
run_bench_test

However, the simulation seems to be stuck at
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
warn: Don't know what interrupt to clear for console.
warn: instruction 'wbinvd' unimplemented
warn: x86 cpuid: unknown family 0x4000
warn: x86 cpuid: unimplemented function 13

Please help!

Shamik Saha

unread,

Sep 25, 2015, 9:48:01 PM9/25/15

to gem5-gpu Developers List

Also, I was monitoring the system.pc.com_1.terminal and got the following at the end of the file

Kernel panic - not syncing: Attempted to kill the idle task!

------------[ cut here ]------------

WARNING: at kernel/smp.c:333 smp_call_function_mask+0x1de/0x250()

Modules linked in:

Pid: 0, comm: swapper Tainted: G D W 2.6.28-rc4-dirty #5

Call Trace:

[<ffffffff8023af92>] warn_on_slowpath+0x62/0xa0

[<ffffffff803a4159>] vsnprintf+0x449/0x6b0

[<ffffffff803a3a84>] string+0x34/0xf0

[<ffffffff803a4118>] vsnprintf+0x408/0x6b0

[<ffffffff802547dd>] up+0xd/0x40

[<ffffffff8023b6de>] release_console_sem+0x1ae/0x200

[<ffffffff8021ef40>] stop_this_cpu+0x0/0x30

[<ffffffff8025db9e>] smp_call_function_mask+0x1de/0x250

[<ffffffff803a3f78>] vsnprintf+0x268/0x6b0

[<ffffffff80615074>] printk+0x40/0x45

[<ffffffff8021ef30>] native_smp_send_stop+0x20/0x30

[<ffffffff80614f8d>] panic+0x82/0x129

[<ffffffff8023ecae>] do_exit+0x7de/0x890

[<ffffffff80615074>] printk+0x40/0x45

[<ffffffff806184ba>] oops_end+0x7a/0xc0

[<ffffffff8020d3d4>] do_invalid_op+0x84/0xa0

[<ffffffff808d69e7>] xsave_cntxt_init+0x35/0x130

[<ffffffff8023b6de>] release_console_sem+0x1ae/0x200

[<ffffffff802547dd>] up+0xd/0x40

[<ffffffff806178e9>] error_exit+0x0/0x51

[<ffffffff808d69e7>] xsave_cntxt_init+0x35/0x130

[<ffffffff8060e2b1>] fpu_init+0x4a/0x97

[<ffffffff8060fa3f>] cpu_init+0x319/0x33f

[<ffffffff808cdab5>] start_kernel+0x1b2/0x321

[<ffffffff808cd405>] x86_64_start_kernel+0xd9/0xdd

---[ end trace 4eaa2a86a8e2da22 ]---

Is it stuck here?

Joel Hestness

unread,

Sep 26, 2015, 10:42:05 AM9/26/15

to Shamik Saha, gem5-gpu Developers List

Hi,

This bug results from gem5's incomplete implementation of xsave/cpuid instructions. With xsave enabled, kernel 2.6.28.4 tries to execute cpuid function 13, as noted in the warning, but the result that is returned is incorrect, causing the boot CPU to trigger a kernel panic.

You can disable xsave by following my note in this gem5 email thread: http://permalink.gmane.org/gmane.comp.emulators.m5.devel/24965. You should be able to boot kernel 2.6.28.4 after disabling the bit described there.

Joel

--

Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/

Shamik Saha

unread,

Sep 26, 2015, 10:50:57 AM9/26/15

to gem5-gpu Developers List, sahash...@gmail.com

I am not really sure about what to do from the email thread. Could you please explain what to do?

Joel Hestness

unread,

Sep 26, 2015, 12:54:05 PM9/26/15

to Shamik Saha, gem5-gpu Developers List

Apply the attached patch to gem5. It disables the xsave bit as described in the email.

Joel

disable_x86_xsave

Shamik Saha

unread,

Sep 26, 2015, 12:58:23 PM9/26/15

to gem5-gpu Developers List, sahash...@gmail.com

So, basically change 0x04000209 to 0x00000209 right?

Message has been deleted

Shamik Saha

unread,

Sep 26, 2015, 2:13:54 PM9/26/15

to gem5-gpu Developers List, sahash...@gmail.com

Hi Joel,

Hi,

Thanks for the update.

However, now it is stuck at:

Freeing unused kernel memory: 332k freed

^MINIT: version 2.86 booting^M

mounting filesystems...

loading script...

Script from M5 readfile is empty, starting bash shell...

^[[01;31m(none)^[[01;34m / #^[[00m

I have declared the benchmark in Benchmark.py and also created a .rcS script in configs/boot

Is gem5 not being able to pickup the benchmark? what is the alternative?

- show quoted text -

Shamik Saha

unread,

Sep 26, 2015, 5:52:02 PM9/26/15

to gem5-gpu Developers List, sahash...@gmail.com

Hi Joel,

So I have modified the command line as:

build/X86_VI_hammer_GPU/gem5.opt --outdir=/users/shamik/gem5-gpu/gem5/chkpt/backprop ../gem5-gpu/configs/fs_fusion.py --disk-image=x86root.img --kernel=x86_64-vmlinux-2.6.28.4-smp --script=/users/shamik/gem5-gpu/gem5/configs/boot/backprop.rcS

However, I am getting stuck at:

IPv6 over IPv4 tunneling driver

NET: Registered protocol family 17

RPC: Registered udp transport module.

RPC: Registered tcp transport module.

input: PS/2 Generic Mouse as /class/input/input1

EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended

VFS: Mounted root (ext2 filesystem).

Freeing unused kernel memory: 332k freed

^MINIT: version 2.86 booting^M

mounting filesystems...

loading script...

/etc/init.d/rcS: /tmp/script: /system/bin/sh: bad interpreter: No such file or directory

/etc/init.d/rcS: line 15: /tmp/script: Success

Enter runlevel:

Right now, I have only mounted the backrpop executable. Should I mount the entire benchmark folder?

Shamik Saha

unread,

Sep 26, 2015, 7:46:45 PM9/26/15

to gem5-gpu Developers List, sahash...@gmail.com

So I am still getting stuck at the same place. What could be an alternative.

Reply all

Reply to author

Forward