Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Help, the dog ate my MBR (Linux/Windows dual boot)

7 views
Skip to first unread message

Noob

unread,
Jul 22, 2011, 10:25:28 AM7/22/11
to
[ Please note that I've added comp.sys.ibm.pc.hardware.storage
to the list of newsgroups ]

Noob wrote:

> Something trashed my MBR. Now when I boot the PC,
> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
> SYSTEM DISK AND PRESS ENTER" (or something close).

By MBR, I meant the first sector of my hard disk drive,
i.e. the boot-strapping code, and the table of primary
partitions.

> My setup:
> PATA 120-GB HDD on IDE0 master
> PATA DVD reader on IDE1 master
> No IDE slaves. No SATA drives. (SATA disabled in BIOS)
>
> I had used gparted to create three primary partitions
> (all partitions are aligned to 1 MiB, even though
> this is not a 4K-sector HDD)
> partition 1 : 80.00 GiB (for WinXP)
> partition 2 : 33.75 GiB (for Fedora 13)
> partition 3 : 765 MiB (for swap)

Short version of the rest of original message : the first sector
had been shifted by 14 bytes. Weird, right?

And the plot thickens. Later that same night, I copied the
MBR a second time in the live CD environment; and this second
time, the first 14 bytes of the MBR were ZERO, i.e.
they had changed !!

Even stranger, the next morning, I booted the PC,
and the 14-byte offset had disappeared.

I'm starting to think that this might be a hardware problem,
as both Linux AND the BIOS seem to have had troubles getting
the MBR consistently. Something else I haven't mentioned:
when Windows resumes from hibernation, the whole system
sometime reboots (this started about 2/3 weeks ago), around
the same time I changed the RAM from 2x512 to 2x1024.

I tested the RAM, no errors after one hour of checking.
The S.M.A.R.T. counters for the HDD claim the drive is
"healthy".

However, considering that the drive is inserting random
garbage around requested data, I'm wondering if this could
be the drive's controller failing? Would this show up in
a S.M.A.R.T. diagnostic?

Or am I on the wrong track, and do you see something else
that might be responsible?

Regards.


[ Below is the rest of my original message, which I left in
because of the belated cross-post to csiphs ]

> For my own record, my partitions are encoded as follows.
>
> http://en.wikipedia.org/wiki/Master_boot_record
> http://en.wikipedia.org/wiki/Partition_type
>
> 80 20 21 00 07 fe ff ff 00 08 00 00 00 00 00 0a
> bootable, NTFS, start = 1 MiB, count = 80 GiB
>
> 00 fe ff ff 83 fe ff ff 00 08 00 0a 00 00 38 04
> non-bootable, linux, count = 33.75 GiB
>
> 00 fe ff ff 82 fe ff ff 00 08 38 0e 00 e8 17 00
> non-bootable, swap, count = 765 MiB
>
> I used a Fedora 15 live CD to boot to Linux, and examine
> the MBR. I looked for the MBR signature, and noticed
> something very odd: the 0xAA55 signature was 14 bytes
> "too far", i.e. at offset 0x20c instead of 0x1fe, which
> means my broken MBR straddles sectors 0 and 1...
>
> # cat broken_mbr.dump
> 00000000 00 00 00 00 00 00 41 01 63 74 e6 00 39 00 eb 48 |......A.ct..9..H|
> 00000010 90 d0 bc 00 7c fb 50 07 50 1f fc be 1b 7c bf 1b |....|.P.P....|..|
> 00000020 06 50 57 b9 e5 01 f3 a4 cb bd be 07 b1 04 38 6e |.PW...........8n|
> 00000030 00 7c 09 75 13 83 c5 10 e2 f4 cd 18 8b f5 83 c6 |.|.u............|
> 00000040 10 49 74 19 38 2c 74 f6 a0 b5 07 b4 03 02 80 00 |.It.8,t.........|
> 00000050 00 80 00 e8 c4 0a 00 08 fa 90 90 f6 c2 80 75 02 |..............u.|
> 00000060 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc 00 20 |...Y|..1....... |
> 00000070 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80 74 54 |..@|<.t...R...tT|
> 00000080 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55 aa 75 |.A..U..ZRrI..U.u|
> 00000090 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66 8b 4c |C.A|..u....t7f.L|
> 000000a0 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7 04 10 |...|.D..f..D|...|
> 000000b0 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00 70 66 |..D...f.\..D..pf|
> 000000c0 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72 05 bb |1..D.f.D..B..r..|
> 000000d0 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f 84 f0 |.p.}....s.......|
> 000000e0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0 88 f0 |......|.D..f1...|
> 000000f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8 88 f4 |@f.D.1..........|
> 00000100 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04 66 a1 |@.D.1......f..f.|
> 00000110 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2 66 f7 |D|f1.f.4.T.f1.f.|
> 00000120 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a 54 0d |t..T..D.;D.}<.T.|
> 00000130 c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a 8a 74 |....L......l.Z.t|
> 00000140 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72 2a 8c |...p..1......r*.|
> 00000150 c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6 31 ff |...H|`......1.1.|
> 00000160 fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40 00 eb |....a.&B|..}.@..|
> 00000170 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30 00 be |...}.8.....}.0..|
> 00000180 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47 65 6f |.}.*...GRUB .Geo|
> 00000190 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65 61 64 |m.Hard Disk.Read|
> 000001a0 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd 10 ac |. Error.........|
> 000001b0 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00 00 00 |<.u.............|
> 000001c0 00 00 00 00 00 00 dc 3b dd 3b 00 00 80 20 21 00 |.......;.;... !.|
> 000001d0 07 fe ff ff 00 08 00 00 00 00 00 0a 00 fe ff ff |................|
> 000001e0 83 fe ff ff 00 08 00 0a 00 00 38 04 00 fe ff ff |..........8.....|
> 000001f0 82 fe ff ff 00 08 38 0e 00 e8 17 00 00 00 00 00 |......8.........|
> 00000200 00 00 00 00 00 00 00 00 00 00 00 00 55 aa 00 00 |............U...|
> 00000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
>
> The partition table (64 bytes) at the end, right
> before the last two-byte signature, is valid.
>
> For comparison, I examined /boot/grub/stage1
>
> # cat good_mbr.dump
> 00000000 eb 48 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |.H..............|
> 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 02 |................|
> 00000040 ff 00 00 80 01 00 00 00 00 08 fa eb 07 f6 c2 80 |................|
> 00000050 75 02 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc |u....Y|..1......|
> 00000060 00 20 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80 |. ..@|<.t...R...|
> 00000070 74 54 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55 |tT.A..U..ZRrI..U|
> 00000080 aa 75 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66 |.uC.A|..u....t7f|
> 00000090 8b 4c 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7 |.L...|.D..f..D|.|
> 000000a0 04 10 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00 |....D...f.\..D..|
> 000000b0 70 66 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72 |pf1..D.f.D..B..r|
> 000000c0 05 bb 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f |...p.}....s.....|
> 000000d0 84 f0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0 |........|.D..f1.|
> 000000e0 88 f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8 |..@f.D.1........|
> 000000f0 88 f4 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04 |..@.D.1......f..|
> 00000100 66 a1 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2 |f.D|f1.f.4.T.f1.|
> 00000110 66 f7 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a |f.t..T..D.;D.}<.|
> 00000120 54 0d c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a |T.....L......l.Z|
> 00000130 8a 74 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72 |.t...p..1......r|
> 00000140 2a 8c c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6 |*....H|`......1.|
> 00000150 31 ff fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40 |1.....a.&B|..}.@|
> 00000160 00 eb 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30 |.....}.8.....}.0|
> 00000170 00 be 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47 |...}.*...GRUB .G|
> 00000180 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65 |eom.Hard Disk.Re|
> 00000190 61 64 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd |ad. Error.......|
> 000001a0 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00 |..<.u...........|
> 000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 12 |..............$.|
> 000001c0 0f 09 00 be bd 7d 31 c0 cd 13 46 8a 0c 80 f9 00 |.....}1...F.....|
> 000001d0 75 0f be da 7d e8 c9 ff eb 97 46 6c 6f 70 70 79 |u...}.....Floppy|
> 000001e0 00 bb 00 70 b8 01 02 b5 00 b6 00 cd 13 72 d7 b6 |...p.........r..|
> 000001f0 01 b5 4f e9 e0 fe 00 00 00 00 00 00 00 00 55 aa |..O...........U.|
>
> And indeed, the "eb 48" is there in my broken MBR,
> 14 bytes "too far". So it looks like I could just
> use a binary editor to remove the first 14 bytes,
> then write that back to my HDD's first sector?
>
> Not too sure about that, though.
>
> Does my broken MBR, shifted left by 14 bytes, look
> like a valid MBR for grub?
>
> Thoughts? Suggestions? How should I proceed?
>
> For the record, here are the disassembly for the
> broken MBR, and the good MBR (they do seem to differ
> in several places, I'm wondering if this is because
> I shouldn't be looking at /boot/grub/stage1)
>
> On a normal MBR, the code area ranges from 0 to 0x1b7
> Shifted by 14 bytes, I expect range 14 to 0x1c5
>
> http://prefetch.net/blog/index.php/2006/09/09/digging-through-the-mbr/
>
> # cat broken_mbr.asm
> 0: 00 00 add %al,(%bx,%si)
> 2: 00 00 add %al,(%bx,%si)
> 4: 00 00 add %al,(%bx,%si)
> 6: 41 inc %cx
> 7: 01 63 74 add %sp,0x74(%bp,%di)
> a: e6 00 out %al,$0x0
> c: 39 00 cmp %ax,(%bx,%si)
> XXX e: eb 48 jmp 0x58
> 10: 90 nop
> 11: d0 bc 00 7c sarb 0x7c00(%si)
> 15: fb sti
> 16: 50 push %ax
> 17: 07 pop %es
> 18: 50 push %ax
> 19: 1f pop %ds
> 1a: fc cld
> 1b: be 1b 7c mov $0x7c1b,%si
> 1e: bf 1b 06 mov $0x61b,%di
> 21: 50 push %ax
> 22: 57 push %di
> 23: b9 e5 01 mov $0x1e5,%cx
> 26: f3 a4 rep movsb %ds:(%si),%es:(%di)
> 28: cb lret
> 29: bd be 07 mov $0x7be,%bp
> 2c: b1 04 mov $0x4,%cl
> 2e: 38 6e 00 cmp %ch,0x0(%bp)
> 31: 7c 09 jl 0x3c
> 33: 75 13 jne 0x48
> 35: 83 c5 10 add $0x10,%bp
> 38: e2 f4 loop 0x2e
> 3a: cd 18 int $0x18
> 3c: 8b f5 mov %bp,%si
> 3e: 83 c6 10 add $0x10,%si
> 41: 49 dec %cx
> 42: 74 19 je 0x5d
> 44: 38 2c cmp %ch,(%si)
> 46: 74 f6 je 0x3e
> 48: a0 b5 07 mov 0x7b5,%al
> 4b: b4 03 mov $0x3,%ah
> 4d: 02 80 00 00 add 0x0(%bx,%si),%al
> 51: 80 00 e8 addb $0xe8,(%bx,%si)
> 54: c4 0a les (%bp,%si),%cx
> 56: 00 08 add %cl,(%bx,%si)
> 58: fa cli
> 59: 90 nop
> 5a: 90 nop
> 5b: f6 c2 80 test $0x80,%dl
> 5e: 75 02 jne 0x62
> 60: b2 80 mov $0x80,%dl
> 62: ea 59 7c 00 00 ljmp $0x0,$0x7c59
> 67: 31 c0 xor %ax,%ax
> 69: 8e d8 mov %ax,%ds
> 6b: 8e d0 mov %ax,%ss
> 6d: bc 00 20 mov $0x2000,%sp
> 70: fb sti
> 71: a0 40 7c mov 0x7c40,%al
> 74: 3c ff cmp $0xff,%al
> 76: 74 02 je 0x7a
> 78: 88 c2 mov %al,%dl
> 7a: 52 push %dx
> 7b: f6 c2 80 test $0x80,%dl
> 7e: 74 54 je 0xd4
> 80: b4 41 mov $0x41,%ah
> 82: bb aa 55 mov $0x55aa,%bx
> 85: cd 13 int $0x13
> 87: 5a pop %dx
> 88: 52 push %dx
> 89: 72 49 jb 0xd4
> 8b: 81 fb 55 aa cmp $0xaa55,%bx
> 8f: 75 43 jne 0xd4
> 91: a0 41 7c mov 0x7c41,%al
> 94: 84 c0 test %al,%al
> 96: 75 05 jne 0x9d
> 98: 83 e1 01 and $0x1,%cx
> 9b: 74 37 je 0xd4
> 9d: 66 8b 4c 10 mov 0x10(%si),%ecx
> a1: be 05 7c mov $0x7c05,%si
> a4: c6 44 ff 01 movb $0x1,-0x1(%si)
> a8: 66 8b 1e 44 7c mov 0x7c44,%ebx
> ad: c7 04 10 00 movw $0x10,(%si)
> b1: c7 44 02 01 00 movw $0x1,0x2(%si)
> b6: 66 89 5c 08 mov %ebx,0x8(%si)
> ba: c7 44 06 00 70 movw $0x7000,0x6(%si)
> bf: 66 31 c0 xor %eax,%eax
> c2: 89 44 04 mov %ax,0x4(%si)
> c5: 66 89 44 0c mov %eax,0xc(%si)
> c9: b4 42 mov $0x42,%ah
> cb: cd 13 int $0x13
> cd: 72 05 jb 0xd4
> cf: bb 00 70 mov $0x7000,%bx
> d2: eb 7d jmp 0x151
> d4: b4 08 mov $0x8,%ah
> d6: cd 13 int $0x13
> d8: 73 0a jae 0xe4
> da: f6 c2 80 test $0x80,%dl
> dd: 0f 84 f0 00 je 0x1d1
> e1: e9 8d 00 jmp 0x171
> e4: be 05 7c mov $0x7c05,%si
> e7: c6 44 ff 00 movb $0x0,-0x1(%si)
> eb: 66 31 c0 xor %eax,%eax
> ee: 88 f0 mov %dh,%al
> f0: 40 inc %ax
> f1: 66 89 44 04 mov %eax,0x4(%si)
> f5: 31 d2 xor %dx,%dx
> f7: 88 ca mov %cl,%dl
> f9: c1 e2 02 shl $0x2,%dx
> fc: 88 e8 mov %ch,%al
> fe: 88 f4 mov %dh,%ah
> 100: 40 inc %ax
> 101: 89 44 08 mov %ax,0x8(%si)
> 104: 31 c0 xor %ax,%ax
> 106: 88 d0 mov %dl,%al
> 108: c0 e8 02 shr $0x2,%al
> 10b: 66 89 04 mov %eax,(%si)
> 10e: 66 a1 44 7c mov 0x7c44,%eax
> 112: 66 31 d2 xor %edx,%edx
> 115: 66 f7 34 divl (%si)
> 118: 88 54 0a mov %dl,0xa(%si)
> 11b: 66 31 d2 xor %edx,%edx
> 11e: 66 f7 74 04 divl 0x4(%si)
> 122: 88 54 0b mov %dl,0xb(%si)
> 125: 89 44 0c mov %ax,0xc(%si)
> 128: 3b 44 08 cmp 0x8(%si),%ax
> 12b: 7d 3c jge 0x169
> 12d: 8a 54 0d mov 0xd(%si),%dl
> 130: c0 e2 06 shl $0x6,%dl
> 133: 8a 4c 0a mov 0xa(%si),%cl
> 136: fe c1 inc %cl
> 138: 08 d1 or %dl,%cl
> 13a: 8a 6c 0c mov 0xc(%si),%ch
> 13d: 5a pop %dx
> 13e: 8a 74 0b mov 0xb(%si),%dh
> 141: bb 00 70 mov $0x7000,%bx
> 144: 8e c3 mov %bx,%es
> 146: 31 db xor %bx,%bx
> 148: b8 01 02 mov $0x201,%ax
> 14b: cd 13 int $0x13
> 14d: 72 2a jb 0x179
> 14f: 8c c3 mov %es,%bx
> 151: 8e 06 48 7c mov 0x7c48,%es
> 155: 60 pusha
> 156: 1e push %ds
> 157: b9 00 01 mov $0x100,%cx
> 15a: 8e db mov %bx,%ds
> 15c: 31 f6 xor %si,%si
> 15e: 31 ff xor %di,%di
> 160: fc cld
> 161: f3 a5 rep movsw %ds:(%si),%es:(%di)
> 163: 1f pop %ds
> 164: 61 popa
> 165: ff 26 42 7c jmp *0x7c42
> 169: be 7f 7d mov $0x7d7f,%si
> 16c: e8 40 00 call 0x1af
> 16f: eb 0e jmp 0x17f
> 171: be 84 7d mov $0x7d84,%si
> 174: e8 38 00 call 0x1af
> 177: eb 06 jmp 0x17f
> 179: be 8e 7d mov $0x7d8e,%si
> 17c: e8 30 00 call 0x1af
> 17f: be 93 7d mov $0x7d93,%si
> 182: e8 2a 00 call 0x1af
> 185: eb fe jmp 0x185
> 187: 47 inc %di
> 188: 52 push %dx
> 189: 55 push %bp
> 18a: 42 inc %dx
> 18b: 20 00 and %al,(%bx,%si)
> 18d: 47 inc %di
> 18e: 65 6f outsw %gs:(%si),(%dx)
> 190: 6d insw (%dx),%es:(%di)
> 191: 00 48 61 add %cl,0x61(%bx,%si)
> 194: 72 64 jb 0x1fa
> 196: 20 44 69 and %al,0x69(%si)
> 199: 73 6b jae 0x206
> 19b: 00 52 65 add %dl,0x65(%bp,%si)
> 19e: 61 popa
> 19f: 64 00 20 add %ah,%fs:(%bx,%si)
> 1a2: 45 inc %bp
> 1a3: 72 72 jb 0x217
> 1a5: 6f outsw %ds:(%si),(%dx)
> 1a6: 72 00 jb 0x1a8
> 1a8: bb 01 00 mov $0x1,%bx
> 1ab: b4 0e mov $0xe,%ah
> 1ad: cd 10 int $0x10
> 1af: ac lods %ds:(%si),%al
> 1b0: 3c 00 cmp $0x0,%al
> 1b2: 75 f4 jne 0x1a8
> 1b4: c3 ret
> ...
> !!! DATA (NOT CODE) BELOW THISPOINT (AFAIU) !!!
> 1c5: 00 dc add %bl,%ah
> 1c7: 3b dd cmp %bp,%bx
> 1c9: 3b 00 cmp (%bx,%si),%ax
> 1cb: 00 80 20 21 add %al,0x2120(%bx,%si)
> 1cf: 00 07 add %al,(%bx)
> 1d1: fe (bad)
> 1d2: ff (bad)
> 1d3: ff 00 incw (%bx,%si)
> 1d5: 08 00 or %al,(%bx,%si)
> 1d7: 00 00 add %al,(%bx,%si)
> 1d9: 00 00 add %al,(%bx,%si)
> 1db: 0a 00 or (%bx,%si),%al
> 1dd: fe (bad)
> 1de: ff (bad)
> 1df: ff 83 fe ff incw -0x2(%bp,%di)
> 1e3: ff 00 incw (%bx,%si)
> 1e5: 08 00 or %al,(%bx,%si)
> 1e7: 0a 00 or (%bx,%si),%al
> 1e9: 00 38 add %bh,(%bx,%si)
> 1eb: 04 00 add $0x0,%al
> 1ed: fe (bad)
> 1ee: ff (bad)
> 1ef: ff 82 fe ff incw -0x2(%bp,%si)
> 1f3: ff 00 incw (%bx,%si)
> 1f5: 08 38 or %bh,(%bx,%si)
> 1f7: 0e push %cs
> 1f8: 00 e8 add %ch,%al
> 1fa: 17 pop %ss
> ...
> 20b: 00 55 aa add %dl,-0x56(%di)
>
>
> # cat good_mbr.asm
> 0: eb 48 jmp 0x4a
> 2: 90 nop
> ...
> 3b: 00 00 add %al,(%bx,%si)
> 3d: 00 03 add %al,(%bp,%di)
> 3f: 02 ff add %bh,%bh
> 41: 00 00 add %al,(%bx,%si)
> 43: 80 01 00 addb $0x0,(%bx,%di)
> 46: 00 00 add %al,(%bx,%si)
> 48: 00 08 add %cl,(%bx,%si)
> 4a: fa cli
> 4b: eb 07 jmp 0x54
> 4d: f6 c2 80 test $0x80,%dl
> 50: 75 02 jne 0x54
> 52: b2 80 mov $0x80,%dl
> 54: ea 59 7c 00 00 ljmp $0x0,$0x7c59
> 59: 31 c0 xor %ax,%ax
> 5b: 8e d8 mov %ax,%ds
> 5d: 8e d0 mov %ax,%ss
> 5f: bc 00 20 mov $0x2000,%sp
> 62: fb sti
> 63: a0 40 7c mov 0x7c40,%al
> 66: 3c ff cmp $0xff,%al
> 68: 74 02 je 0x6c
> 6a: 88 c2 mov %al,%dl
> 6c: 52 push %dx
> 6d: f6 c2 80 test $0x80,%dl
> 70: 74 54 je 0xc6
> 72: b4 41 mov $0x41,%ah
> 74: bb aa 55 mov $0x55aa,%bx
> 77: cd 13 int $0x13
> 79: 5a pop %dx
> 7a: 52 push %dx
> 7b: 72 49 jb 0xc6
> 7d: 81 fb 55 aa cmp $0xaa55,%bx
> 81: 75 43 jne 0xc6
> 83: a0 41 7c mov 0x7c41,%al
> 86: 84 c0 test %al,%al
> 88: 75 05 jne 0x8f
> 8a: 83 e1 01 and $0x1,%cx
> 8d: 74 37 je 0xc6
> 8f: 66 8b 4c 10 mov 0x10(%si),%ecx
> 93: be 05 7c mov $0x7c05,%si
> 96: c6 44 ff 01 movb $0x1,-0x1(%si)
> 9a: 66 8b 1e 44 7c mov 0x7c44,%ebx
> 9f: c7 04 10 00 movw $0x10,(%si)
> a3: c7 44 02 01 00 movw $0x1,0x2(%si)
> a8: 66 89 5c 08 mov %ebx,0x8(%si)
> ac: c7 44 06 00 70 movw $0x7000,0x6(%si)
> b1: 66 31 c0 xor %eax,%eax
> b4: 89 44 04 mov %ax,0x4(%si)
> b7: 66 89 44 0c mov %eax,0xc(%si)
> bb: b4 42 mov $0x42,%ah
> bd: cd 13 int $0x13
> bf: 72 05 jb 0xc6
> c1: bb 00 70 mov $0x7000,%bx
> c4: eb 7d jmp 0x143
> c6: b4 08 mov $0x8,%ah
> c8: cd 13 int $0x13
> ca: 73 0a jae 0xd6
> cc: f6 c2 80 test $0x80,%dl
> cf: 0f 84 f0 00 je 0x1c3
> d3: e9 8d 00 jmp 0x163
> d6: be 05 7c mov $0x7c05,%si
> d9: c6 44 ff 00 movb $0x0,-0x1(%si)
> dd: 66 31 c0 xor %eax,%eax
> e0: 88 f0 mov %dh,%al
> e2: 40 inc %ax
> e3: 66 89 44 04 mov %eax,0x4(%si)
> e7: 31 d2 xor %dx,%dx
> e9: 88 ca mov %cl,%dl
> eb: c1 e2 02 shl $0x2,%dx
> ee: 88 e8 mov %ch,%al
> f0: 88 f4 mov %dh,%ah
> f2: 40 inc %ax
> f3: 89 44 08 mov %ax,0x8(%si)
> f6: 31 c0 xor %ax,%ax
> f8: 88 d0 mov %dl,%al
> fa: c0 e8 02 shr $0x2,%al
> fd: 66 89 04 mov %eax,(%si)
> 100: 66 a1 44 7c mov 0x7c44,%eax
> 104: 66 31 d2 xor %edx,%edx
> 107: 66 f7 34 divl (%si)
> 10a: 88 54 0a mov %dl,0xa(%si)
> 10d: 66 31 d2 xor %edx,%edx
> 110: 66 f7 74 04 divl 0x4(%si)
> 114: 88 54 0b mov %dl,0xb(%si)
> 117: 89 44 0c mov %ax,0xc(%si)
> 11a: 3b 44 08 cmp 0x8(%si),%ax
> 11d: 7d 3c jge 0x15b
> 11f: 8a 54 0d mov 0xd(%si),%dl
> 122: c0 e2 06 shl $0x6,%dl
> 125: 8a 4c 0a mov 0xa(%si),%cl
> 128: fe c1 inc %cl
> 12a: 08 d1 or %dl,%cl
> 12c: 8a 6c 0c mov 0xc(%si),%ch
> 12f: 5a pop %dx
> 130: 8a 74 0b mov 0xb(%si),%dh
> 133: bb 00 70 mov $0x7000,%bx
> 136: 8e c3 mov %bx,%es
> 138: 31 db xor %bx,%bx
> 13a: b8 01 02 mov $0x201,%ax
> 13d: cd 13 int $0x13
> 13f: 72 2a jb 0x16b
> 141: 8c c3 mov %es,%bx
> 143: 8e 06 48 7c mov 0x7c48,%es
> 147: 60 pusha
> 148: 1e push %ds
> 149: b9 00 01 mov $0x100,%cx
> 14c: 8e db mov %bx,%ds
> 14e: 31 f6 xor %si,%si
> 150: 31 ff xor %di,%di
> 152: fc cld
> 153: f3 a5 rep movsw %ds:(%si),%es:(%di)
> 155: 1f pop %ds
> 156: 61 popa
> 157: ff 26 42 7c jmp *0x7c42
> 15b: be 7f 7d mov $0x7d7f,%si
> 15e: e8 40 00 call 0x1a1
> 161: eb 0e jmp 0x171
> 163: be 84 7d mov $0x7d84,%si
> 166: e8 38 00 call 0x1a1
> 169: eb 06 jmp 0x171
> 16b: be 8e 7d mov $0x7d8e,%si
> 16e: e8 30 00 call 0x1a1
> 171: be 93 7d mov $0x7d93,%si
> 174: e8 2a 00 call 0x1a1
> 177: eb fe jmp 0x177
> 179: 47 inc %di
> 17a: 52 push %dx
> 17b: 55 push %bp
> 17c: 42 inc %dx
> 17d: 20 00 and %al,(%bx,%si)
> 17f: 47 inc %di
> 180: 65 6f outsw %gs:(%si),(%dx)
> 182: 6d insw (%dx),%es:(%di)
> 183: 00 48 61 add %cl,0x61(%bx,%si)
> 186: 72 64 jb 0x1ec
> 188: 20 44 69 and %al,0x69(%si)
> 18b: 73 6b jae 0x1f8
> 18d: 00 52 65 add %dl,0x65(%bp,%si)
> 190: 61 popa
> 191: 64 00 20 add %ah,%fs:(%bx,%si)
> 194: 45 inc %bp
> 195: 72 72 jb 0x209
> 197: 6f outsw %ds:(%si),(%dx)
> 198: 72 00 jb 0x19a
> 19a: bb 01 00 mov $0x1,%bx
> 19d: b4 0e mov $0xe,%ah
> 19f: cd 10 int $0x10
> 1a1: ac lods %ds:(%si),%al
> 1a2: 3c 00 cmp $0x0,%al
> 1a4: 75 f4 jne 0x19a
> 1a6: c3 ret
> ...
> !!! DATA (NOT CODE) BELOW THIS POINT (AFAIU) !!!
> 1bb: 00 00 add %al,(%bx,%si)
> 1bd: 00 24 add %ah,(%si)
> 1bf: 12 0f adc (%bx),%cl
> 1c1: 09 00 or %ax,(%bx,%si)
> 1c3: be bd 7d mov $0x7dbd,%si
> 1c6: 31 c0 xor %ax,%ax
> 1c8: cd 13 int $0x13
> 1ca: 46 inc %si
> 1cb: 8a 0c mov (%si),%cl
> 1cd: 80 f9 00 cmp $0x0,%cl
> 1d0: 75 0f jne 0x1e1
> 1d2: be da 7d mov $0x7dda,%si
> 1d5: e8 c9 ff call 0x1a1
> 1d8: eb 97 jmp 0x171
> 1da: 46 inc %si
> 1db: 6c insb (%dx),%es:(%di)
> 1dc: 6f outsw %ds:(%si),(%dx)
> 1dd: 70 70 jo 0x24f
> 1df: 79 00 jns 0x1e1
> 1e1: bb 00 70 mov $0x7000,%bx
> 1e4: b8 01 02 mov $0x201,%ax
> 1e7: b5 00 mov $0x0,%ch
> 1e9: b6 00 mov $0x0,%dh
> 1eb: cd 13 int $0x13
> 1ed: 72 d7 jb 0x1c6
> 1ef: b6 01 mov $0x1,%dh
> 1f1: b5 4f mov $0x4f,%ch
> 1f3: e9 e0 fe jmp 0xd6
> ...
> 1fe: 55 push %bp
> 1ff: aa stos %al,%es:(%di)
>
> Thanks for reading this far!! ;-)
>
> Regards.

Bob Willard

unread,
Jul 22, 2011, 12:17:17 PM7/22/11
to

I don't see any strong evidence that points at the HD or its controller.
When things get flaky and you know for sure that you don't have any
malware doing mal to your ware, my first guess is over-temps and my
second guess is the PS. Over-temp can be caused by the build-up of dust
and dirt: dirty air filters and dirty heatsinks are common causes;
check for loose cables while you are doing your housecleaning.

PSs do deteriorate over time, and your PS may have been somewhat
undersized or cheap from day one, or you have cause a marginal overload
by adding stuff.

My third guess would be flaky RAM. A one hour test is unimpressive; I
suggest running the current version of Memtest86+ overnight.

Guess #4: the MoBo. Lots of MBs have caps that get leaky over time.
You can visually examine the MB for caps that are swollen and, if you
have the skills, replace them (or replace the MB).

It is hot here, so I have to turn off my crystal ball now. Good luck.
--
Cheers, Bob

Noob

unread,
Jul 22, 2011, 12:35:01 PM7/22/11
to
> and dirt: dirty air filters and dirty heat sinks are common causes;
> check for loose cables while you are doing your housecleaning.

My PC case has a dust filter to prevent (most of the) dust from
entering the case. I clean the filter every month, and the case
2-3 times a year, including fans and heat sinks. Last time was
two months ago.

I really don't think the problem is over-heating, as the case
is clean, and the problem manifests right after a cold boot.
Temperature is 12-15C at night, around 18-20C at noon.

> PSs do deteriorate over time, and your PS may have been somewhat
> undersized or cheap from day one, or you have cause a marginal overload
> by adding stuff.

My power supply is CORSAIR CX 450W (which received great reviews).
It's now 2 years old. It is definitely not under-sized as I've
measured power draw at boot under 60W.

> My third guess would be flaky RAM. A one hour test is unimpressive; I
> suggest running the current version of Memtest86+ overnight.

Will do. The RAM is brand new, so it is a natural suspect.

> Guess #4: the MoBo. Lots of MBs have caps that get leaky over time.
> You can visually examine the MB for caps that are swollen and, if you
> have the skills, replace them (or replace the MB).

Will examine. MoBo is ASUS A8N-E from AUG 2005.

> It is hot here, so I have to turn off my crystal ball now. Good luck.

Regards.

Arno

unread,
Jul 22, 2011, 2:41:03 PM7/22/11
to
In comp.sys.ibm.pc.hardware.storage Noob <ro...@127.0.0.1> wrote:
> [ Please note that I've added comp.sys.ibm.pc.hardware.storage
> to the list of newsgroups ]

> Noob wrote:

>> Something trashed my MBR. Now when I boot the PC,
>> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
>> SYSTEM DISK AND PRESS ENTER" (or something close).

> By MBR, I meant the first sector of my hard disk drive,
> i.e. the boot-strapping code, and the table of primary
> partitions.

>> My setup:
>> PATA 120-GB HDD on IDE0 master
>> PATA DVD reader on IDE1 master
>> No IDE slaves. No SATA drives. (SATA disabled in BIOS)
>>
>> I had used gparted to create three primary partitions
>> (all partitions are aligned to 1 MiB, even though
>> this is not a 4K-sector HDD)
>> partition 1 : 80.00 GiB (for WinXP)
>> partition 2 : 33.75 GiB (for Fedora 13)
>> partition 3 : 765 MiB (for swap)

> Short version of the rest of original message : the first sector
> had been shifted by 14 bytes. Weird, right?

Very much so, indeed. A 14 byte offset seems next to impossible.

> And the plot thickens. Later that same night, I copied the
> MBR a second time in the live CD environment; and this second
> time, the first 14 bytes of the MBR were ZERO, i.e.
> they had changed !!

> Even stranger, the next morning, I booted the PC,
> and the 14-byte offset had disappeared.

> I'm starting to think that this might be a hardware problem,
> as both Linux AND the BIOS seem to have had troubles getting
> the MBR consistently. Something else I haven't mentioned:
> when Windows resumes from hibernation, the whole system
> sometime reboots (this started about 2/3 weeks ago), around
> the same time I changed the RAM from 2x512 to 2x1024.

This is typical for bad RAM. Unfortunately bad RAM can be
hard to detect. I once had to run memtest86+ for 3 days to
identify a bad module. Put in the old modules and try to
recreate the problem.

> I tested the RAM, no errors after one hour of checking.
> The S.M.A.R.T. counters for the HDD claim the drive is
> "healthy".

> However, considering that the drive is inserting random
> garbage around requested data, I'm wondering if this could
> be the drive's controller failing? Would this show up in
> a S.M.A.R.T. diagnostic?

No. SMART is just done on the disk itself. You may
find interface errors in the error log, but only
if there are chacksums on the data/commands. With
PATA there are no checksums on the commands and on the
data only with ATA-66 and above.

> Or am I on the wrong track, and do you see something else
> that might be responsible?

Well, it could be a very obscure hardware problem.
But are you sure you are using your tools right?
I had things like this happen to me, only to realize
later that the problem was before the keyboard.

That said, PATA is subject to both data and command
corruption with bad cables or not fully plugged connectors.
Even >= ATA-66 only has checksums on the data, not the
commands. If you have non-spec cables (>45cm, rounded),
the next test would be to replace them with spec cables
and try again.

Also, bad RAM can have arbitrary effects. From nothing
at all to very strange things as abserverd by you.

Arno

--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: ar...@wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

Paul

unread,
Jul 22, 2011, 2:52:48 PM7/22/11
to
Noob wrote:
> [ Please note that I've added comp.sys.ibm.pc.hardware.storage
> to the list of newsgroups ]
>
> Short version of the rest of original message : the first sector
> had been shifted by 14 bytes. Weird, right?
>
> And the plot thickens. Later that same night, I copied the
> MBR a second time in the live CD environment; and this second
> time, the first 14 bytes of the MBR were ZERO, i.e.
> they had changed !!
>
> Even stranger, the next morning, I booted the PC,
> and the 14-byte offset had disappeared.
>
> I'm starting to think that this might be a hardware problem,
> as both Linux AND the BIOS seem to have had troubles getting
> the MBR consistently. Something else I haven't mentioned:
> when Windows resumes from hibernation, the whole system
> sometime reboots (this started about 2/3 weeks ago), around
> the same time I changed the RAM from 2x512 to 2x1024.
>
> I tested the RAM, no errors after one hour of checking.
> The S.M.A.R.T. counters for the HDD claim the drive is
> "healthy".
>
> However, considering that the drive is inserting random
> garbage around requested data, I'm wondering if this could
> be the drive's controller failing? Would this show up in
> a S.M.A.R.T. diagnostic?
>
> Or am I on the wrong track, and do you see something else
> that might be responsible?
>
> Regards.
>

Why would hardware have such a failure mode ? Doesn't it
seem strange to you ?

This looks "programmatic".

It's unfortunate, that you can't effectively checksum the
BIOS contents. Portions of the BIOS are read only, but the
DMI and ESCD storage areas of the flash chip are not. If you
flash update the BIOS chip, boot your favorite OS just the
once, go back and make an archival copy of the BIOS chip,
the contents won't match. If you carefully examine sections
of the BIOS chip, you'll find most of the chip does not
vary, but sections set aside for hardware inventory information
storage, are changed. That's what I discovered while experimenting.
So generating a checksum or signature for the entire 512KB
chip, doesn't work as an effective mechanism for detecting
tampering or "bit rot".

Would a BIOS problem, carry over into an OS session ? I would
hope not. The BIOS uses an Extended INT 0x13 routine, for
accessing the disk during boot. At some point, the OS will
switch to using its own driver. Between the two of them,
how would you conclude which one is making the mess ?

Your RAM test idea was a good one. But again, what are the
odds a corruption would cause this particular problem ? It
would be much more likely, that any software running on the
corrupt RAM, would just crash, as do something "semi-sane".

*******

There are some games you can play with the two RAM sticks -

If you have a dual channel motherboard, with four slots, and
you own two sticks of RAM, you can make

1) Dual channel config, one stick per channel, for performance.

2) When strangeness is present in the system, switch over to using
a single stick.

3) Remove the stick and try your test cases again with the other
stick. The purpose of this, is to see if a "low memory location"
on one of the sticks, is causing the problem. *No* memory test
program, can test BIOS reserved memory locations. And the memory
test done by the BIOS itself, is pathetic (it missed a dead chip
on one of my motherboards). The solution to that, is clever
placement of the sticks.

4) Now, you can also use single channel mode to advantage.
Place both sticks on the same channel. One slot will then provide
the "low memory", the other stick the "high memory", as single
channel mode does no interleaving of memory locations.

5) Using the config in (4), swap the sticks with each other. Now
the stick that was providing the "high" locations, becomes the
"low" stick.

Using a few ideas like that, you can eventually get full test
coverage of both sticks. It takes two sticks minimum to get
that kind of coverage. (Memtest86+ will completely test the
"high" stick, when in two stick, single channel mode.)

This idea came to mind, when I had a RAM failure on my three slot
dual channel Nforce2 motherboard. The bad stick, when in
interleaved (dual channel, performance mode), prevented the
PC from starting. Once I started trying single channel
configurations, and swapped the sticks, then I could at least
run memtest86+ and have it tell me an entire memory chip
on one DIMM, was completely dead (Crucial Ballistix RAM).

*******

The disk controller, likes to deal in 512 byte (or recently 4KB)
chunks. Getting the hardware to offset things by 14, doesn't
really fit into the normal orientation of the operations.

If you run the disk manufacturer diagnostic, it will include
a test to check the cache RAM on the disk itself. You might
not get good test coverage otherwise (hard to say whether
SMART covers this in any depth on its own).

One problem with the manufacturer provided, downloadable
diagnostics, is they don't run on all hardware. I think
they don't run on my current PC for example. Which means,
I'll have to scramble to find a working test setup. It'll
mean moving the disks to another machine, for me.

Paul

Scott Lurndal

unread,
Jul 22, 2011, 3:19:02 PM7/22/11
to

This has all the classic signatures of bad memory. How did you
test your memory? Has it ECC? Registered or unbuffered?

s

Ian Collins

unread,
Jul 22, 2011, 5:44:26 PM7/22/11
to
On 07/23/11 02:25 AM, Noob wrote:
>
> I'm starting to think that this might be a hardware problem,
> as both Linux AND the BIOS seem to have had troubles getting
> the MBR consistently. Something else I haven't mentioned:
> when Windows resumes from hibernation, the whole system
> sometime reboots (this started about 2/3 weeks ago), around
> the same time I changed the RAM from 2x512 to 2x1024.
>
> I tested the RAM, no errors after one hour of checking.
> The S.M.A.R.T. counters for the HDD claim the drive is
> "healthy".

How did you test the RAM, memtest86? The symptoms do point to a memory
problem.

If you have several GB of RAM it can take a long time to test. I'd
leave it to run over night.

--
Ian Collins

Franc Zabkar

unread,
Jul 22, 2011, 8:05:47 PM7/22/11
to
On Fri, 22 Jul 2011 16:25:28 +0200, Noob <ro...@127.0.0.1> put finger
to keyboard and composed:

>> Something trashed my MBR. Now when I boot the PC,
>> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
>> SYSTEM DISK AND PRESS ENTER" (or something close).
>
>By MBR, I meant the first sector of my hard disk drive,
>i.e. the boot-strapping code, and the table of primary
>partitions.
>

>> I had used gparted to create three primary partitions
>> (all partitions are aligned to 1 MiB, even though
>> this is not a 4K-sector HDD)
>> partition 1 : 80.00 GiB (for WinXP)
>> partition 2 : 33.75 GiB (for Fedora 13)
>> partition 3 : 765 MiB (for swap)
>
>Short version of the rest of original message : the first sector
>had been shifted by 14 bytes. Weird, right?

ISTM that you are confusing the real, "good" MBR (ie physical sector
0) with GRUB's "bad" MBR. The latter appears to be relocated to some
other LBA(s), not LBA 0.

I expect that the code in the "good" MBR (ie the "real" MBR at LBA 0)
is executed at bootup, after which control is transferred to code in
the "bad" MBR. I say this because those 64 bytes which are normally
assigned to the four 16-byte partition slots are instead occupied by
code and a text string ("Floppy"). There are also JMP instructions (eg
je 0x1c3, jb 0x1ec, jne 0x1e1) that point to locations within this
block.

1B0 24 12 $.
1C0 0F 09 00 BE BD 7D 31 C0-CD 13 46 8A 0C 80 F9 00 .....}1...F.....
1D0 75 0F BE DA 7D E8 C9 FF-EB 97 46 6C 6F 70 70 79 u...}.....Floppy
1E0 00 BB 00 70 B8 01 02 B5-00 B6 00 CD 13 72 D7 B6 ...p.........r..
1F0 01 B5 4F E9 E0 FE 00 00-00 ..O......

As you can see, the unassembled code has INT 13 instructions.

See http://en.wikipedia.org/wiki/INT_13H

-u 1be 1d9
01BE 2412 AND AL,12
01C0 0F DB 0F
01C1 0900 OR [BX+SI],AX
01C3 BEBD7D MOV SI,7DBD
01C6 31C0 XOR AX,AX
01C8 CD13 INT 13

- INT 13 (AH=0) -> Reset Disk Drives

01CA 46 INC SI
01CB 8A0C MOV CL,[SI]
01CD 80F900 CMP CL,00
01D0 750F JNZ 01E1
01D2 BEDA7D MOV SI,7DDA
01D5 E8C9FF CALL 01A1
01D8 EB97 JMP 0171


-u 1e1 1f8
01E1 BB0070 MOV BX,7000
01E4 B80102 MOV AX,0201
01E7 B500 MOV CH,00
01E9 B600 MOV DH,00
01EB CD13 INT 13

The above code reads 1 sector from sector 0, track 0, head 0, drive 0
to a memory buffer at address 0x7000.

01ED 72D7 JB 01C6
01EF B601 MOV DH,01
01F1 B54F MOV CH,4F
01F3 E9E0FE JMP 00D6

Location 0xD6 contains an INT 13 opcode. Therefore, AFAICT, the above
code reads 1 sector from sector 0, track 79 (=0x4F), head 1, drive 0
to a buffer at address 0x7000.

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.

Paul

unread,
Jul 22, 2011, 8:58:01 PM7/22/11
to

So can someone translate this for me ?

WHY would a disk be set up this way, with what
*altruistic* objective ?

And WHAT software can we expect, to be mucking about in this way ?
The "rootkit of the month" club perhaps ? :-)

Paul

The Natural Philosopher

unread,
Jul 22, 2011, 9:23:53 PM7/22/11
to

Nope.
Couple of corrupted bits on an address can easily shift everything 14
bytes..

It is in fact FD in hex ..so a bus collisions that sets the whole lot to
FF except the second to last bit..when it should be all 0's...

Franc Zabkar

unread,
Jul 22, 2011, 9:31:07 PM7/22/11
to
On Fri, 22 Jul 2011 20:58:01 -0400, Paul <nos...@needed.com> put

In the days of Disk Drive Overlays, eg MaxBlast, the MBR would contain
a dummy partition table. This table would be set up for a 20MB drive,
with 17 sectors per track. The DDO code would call additional code
which hid within the first track, before sector 63. This code would
install itself in memory and provide the needed INT 13 extensions that
were missing from the BIOS. The actual MBR and a full sized partition
table would hide elsewhere within track 0. This is the MBR that the OS
would see.

In any case, one would have to ask, why is the OP's MBR code loading
sector 0, head 1, track 79? What code is in that particular sector?

Franc Zabkar

unread,
Jul 23, 2011, 3:34:28 AM7/23/11
to
On Sat, 23 Jul 2011 10:05:47 +1000, Franc Zabkar
<fza...@iinternode.on.net> put finger to keyboard and composed:

>01ED 72D7 JB 01C6
>01EF B601 MOV DH,01
>01F1 B54F MOV CH,4F
>01F3 E9E0FE JMP 00D6
>
>Location 0xD6 contains an INT 13 opcode. Therefore, AFAICT, the above
>code reads 1 sector from sector 0, track 79 (=0x4F), head 1, drive 0
>to a buffer at address 0x7000.

Doh! There is an obvious reason for the existence of the "Floppy" text
string.

Drive 0 is the floppy drive (A:).

Track 79 is the last track of an 80-track diskette.

The drive would have two heads, 0 and 1.

Franc Zabkar

unread,
Jul 23, 2011, 5:02:21 AM7/23/11
to
On Fri, 22 Jul 2011 20:58:01 -0400, Paul <nos...@needed.com> put

finger to keyboard and composed:

>So can someone translate this for me ?


>
>WHY would a disk be set up this way, with what
>*altruistic* objective ?
>
>And WHAT software can we expect, to be mucking about in this way ?
>The "rootkit of the month" club perhaps ? :-)
>
> Paul

Sorry, I just realised where I made my mistake.

The OP confused me by saying that one day his MBR was bad, whereas on
the next day it was good. He then posted two MBR dumps. I assumed that
the good dump was from his hard drive. Instead it appears to be a
standard MBR template for a floppy diskette. Once again my previous
experience with DDOs led me astray. Seagate's EZ-Drive, for example,
gives you a window of opportunity during the HDD boot process to
select the floppy drive as your boot device (using Ctrl-S, IIRC). This
enables the DDO to load itself into RAM (and enable INT 13 extensions)
before the floppy drive boots.

BTW, I found this article very informative:
http://thestarman.pcministry.com/asm/mbr/GRUB.htm

Noob

unread,
Jul 23, 2011, 2:51:00 PM7/23/11
to
Scott Lurndal wrote:

I tested the RAM using memtest86+ v4.10 on the Fedora 15 live CD.
I let the test run for two hours. It completed the first pass in
35 minutes, while the second took much longer. Is that expected?

(It didn't report any error.)

Moreover, I don't have any strange behavior once the OS (either
Linux or Windows) has booted up, so I'm having a hard time buying
the "faulty RAM" theory.

The motherboard is an ASUS A8N-E (AMD socket 939).
The RAM is 2x1GB DDR1 201 MHz 3-4-4-8, no ECC, unbuffered
(Corsair Value Select in dual channel configuration)
memtest86+ reports 2209 MB/s bandwidth
(which is quite far from the theoretical 3200 MB/s)

Regards.

Noob

unread,
Jul 23, 2011, 3:25:06 PM7/23/11
to
Franc Zabkar wrote:

> Noob wrote:
>
>>> Something trashed my MBR. Now when I boot the PC,
>>> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
>>> SYSTEM DISK AND PRESS ENTER" (or something close).
>>
>> By MBR, I meant the first sector of my hard disk drive,
>> i.e. the boot-strapping code, and the table of primary
>> partitions.
>>
>>> I had used gparted to create three primary partitions
>>> (all partitions are aligned to 1 MiB, even though
>>> this is not a 4K-sector HDD)
>>> partition 1 : 80.00 GiB (for WinXP)
>>> partition 2 : 33.75 GiB (for Fedora 13)
>>> partition 3 : 765 MiB (for swap)
>>
>> Short version of the rest of original message : the first sector
>> had been shifted by 14 bytes. Weird, right?
>
> ISTM that you are confusing the real, "good" MBR (ie physical sector
> 0) with GRUB's "bad" MBR. The latter appears to be relocated to some
> other LBA(s), not LBA 0.

I got a copy of the MBR by running
dd if=/dev/sda of=mbr.bin bs=512 count=2

This should get to LBA 0, right?

My partition table is

80 20 21 00 07 fe ff ff 00 08 00 00 00 00 00 0a

00 fe ff ff 83 fe ff ff 00 08 00 0a 00 00 38 04

Gordon Burditt

unread,
Jul 23, 2011, 6:15:24 PM7/23/11
to
>> Something trashed my MBR. Now when I boot the PC,
>> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
>> SYSTEM DISK AND PRESS ENTER" (or something close).
>
> By MBR, I meant the first sector of my hard disk drive,
> i.e. the boot-strapping code, and the table of primary
> partitions.
>
>> My setup:
>> PATA 120-GB HDD on IDE0 master
>> PATA DVD reader on IDE1 master
>> No IDE slaves. No SATA drives. (SATA disabled in BIOS)
>>
>> I had used gparted to create three primary partitions
>> (all partitions are aligned to 1 MiB, even though
>> this is not a 4K-sector HDD)
>> partition 1 : 80.00 GiB (for WinXP)
>> partition 2 : 33.75 GiB (for Fedora 13)
>> partition 3 : 765 MiB (for swap)

I think Linux has a version of fdisk(8) similar to the BSD version:
you can edit the type, start, and length of each partition. It
will also put back the standard boot code part of the MBR. You can
also list the whole partition setup and geometry. If you've got a
better tool that does this (gpartd, parted), you might use that
instead, but you want something you can boot from CD and use to
repair a screwed up MBR on a hard disk.

Although it is possible to hunt for and find filesystems on disk,
it's possible for such a tool to pick up on old filesystems from
before re-arranging the partitions, and it's generally a lot faster
to just know the size and length of all of them.

I've been doing backups of systems by putting a set of compressed
dumps of filesystems on sets of DVDs. Along with the dumps, I put
some small individual files to make recovering a situation like
this easier:

- the output of fdisk(8) for use to put back the MBR. Actually
there are two forms of output, one that can be read back in and
one human-readable one with more info. Manually typing in this
info isn't that hard and there isn't that much of it. You could
back this up on one sheet of paper if you want; you probably don't
intentionally change the MBR very often.

- the output of disklabel(8) for each partition (*BSD only). This
is a subdivision of a partition into multiple slices. It acts a
lot like a sub-MBR, but it's a different format.

- A copy of /etc/fstab. Since I like lots of filesystems it is useful to
know which filesystem gets mounted where. Yes, there's also a copy in
the root dump, but it's easier to reference with a file outside it.

- Copies of the output of "atacontrol list", "camcontrol devlist" and
"usbconfig list" to show the setup for ATA drives, SCSI drives, and USB
devices. These are probably FreeBSD only but I suspect Linux has
similar commands. This shows what is connected where, including models
of drives, to put back cables if needed.


> Short version of the rest of original message : the first sector
> had been shifted by 14 bytes. Weird, right?
>
> And the plot thickens. Later that same night, I copied the
> MBR a second time in the live CD environment; and this second
> time, the first 14 bytes of the MBR were ZERO, i.e.
> they had changed !!

I've had a problem of the MBR apparently getting trashed a couple
of times. Either that or some horrible mess in the root filesystem
where fsck is complaining about lots and lots of errors. Before
resorting to drastic action, I suggest powering down the system
(and for laptops, remove all batteries and disconnect charger) for
a while. Sometimes the problem just goes away, like a bad copy of
the MBR is sitting in the drive's cache and it goes away when the
power goes off.

> Even stranger, the next morning, I booted the PC,
> and the 14-byte offset had disappeared.
>
> I'm starting to think that this might be a hardware problem,
> as both Linux AND the BIOS seem to have had troubles getting
> the MBR consistently. Something else I haven't mentioned:
> when Windows resumes from hibernation, the whole system
> sometime reboots (this started about 2/3 weeks ago), around
> the same time I changed the RAM from 2x512 to 2x1024.

A recent RAM change warrants checking. I had problems on a system
other than my primary one and let memtest86+ run for a week or two.
I am suspicious of memory tests, as way back in 1976 I had a rather
unreliable 8080 system and it turned out that the memory test was
pretty much guaranteed to generate dynamic RAM refresh even if the
onboard refresh failed completely, (Suspicion was that it was failing
intermittently due to a design flaw) covering up the problem. I
don't know how memtest86+ does in this area.

> I tested the RAM, no errors after one hour of checking.
> The S.M.A.R.T. counters for the HDD claim the drive is
> "healthy".

S.M.A.R.T. can't test problems in CPU RAM and probably doesn't test
disk controller cache RAM.

> However, considering that the drive is inserting random
> garbage around requested data, I'm wondering if this could
> be the drive's controller failing? Would this show up in

I've had a system where a marginal power supply caused a disk
controller RAM buffer to drop bits occasionally, eventually scattering
bit errors all over the disk. But this is pre-IBM-PC hardware.

> a S.M.A.R.T. diagnostic?

My guess is no.



> Or am I on the wrong track, and do you see something else
> that might be responsible?

How often did you *power off* the system? Consider this possibility:
a power glitch (possibly related to a marginal power supply) causes
a corrupted sector cached in the disk controller (the real one is
still OK, but you read the bad copy). By the time you power off,
the bad sector in cache might or might not have been written back
to disk. Power cycle, and the corrupted sector might just fix
itself!

Noob

unread,
Jul 24, 2011, 10:42:15 AM7/24/11
to
Noob wrote:

> The S.M.A.R.T. counters for the HDD claim the drive is "healthy".

For the record (and in case someone spots something fishy)
here's the output of smartctl for the drive.

# smartctl -x /dev/sda
smartctl 5.40 2010-10-16 r3189 [x86_64-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model: Maxtor 6B120P0
Serial Number: B40Q5GFH
Firmware Version: BAH41B70
User Capacity: 122,942,324,736 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Sun Jul 24 16:39:07 2011 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (1202) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 54) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 207 205 063 Pre-fail Always - 16291
4 Start_Stop_Count 0x0032 251 251 000 Old_age Always - 5685
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 248 239 187 Pre-fail Always - 51306
9 Power_On_Minutes 0x0032 233 233 000 Old_age Always - 500h+12m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 239 239 000 Old_age Always - 5690
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 032 253 000 Old_age Always - 35
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 7425
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 22
202 Data_Address_Mark_Errs 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 1
204 Soft_ECC_Correction 0x000a 253 252 000 Old_age Always - 0
205 Thermal_Asperity_Rate 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 241 241 000 Old_age Offline - 144
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0

Read SMART Log Directory failed.

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: scsi error aborted command
Read GP Log Directory failed.

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Warning: device does not support SCT Commands
SATA Phy Event Counters (GP Log 0x11) not supported

Noob

unread,
Jul 24, 2011, 10:54:20 AM7/24/11
to
Noob wrote:

> The motherboard is an ASUS A8N-E (AMD socket 939).
> The RAM is 2x1GB DDR1 201 MHz 3-4-4-8, no ECC, unbuffered
> (Corsair Value Select in dual channel configuration)
> memtest86+ reports 2209 MB/s bandwidth
> (which is quite far from the theoretical 3200 MB/s)

In fact, AFAIU, dual-channel DDR-400 should even reach 6400 MB/s.

Arno

unread,
Jul 24, 2011, 11:29:39 AM7/24/11
to
In comp.sys.ibm.pc.hardware.storage Noob <ro...@127.0.0.1> wrote:
[...]

> I tested the RAM using memtest86+ v4.10 on the Fedora 15 live CD.
> I let the test run for two hours. It completed the first pass in
> 35 minutes, while the second took much longer. Is that expected?

> (It didn't report any error.)

Passes should take the ame time, unless your hardware is dying
(e.g. the CPU ECCing like crazy in first-level cache or CPU
intermittendly trotheling because of overheating).
Also, 2 hours is far too short. Let it at least run a
full day.

> Moreover, I don't have any strange behavior once the OS (either
> Linux or Windows) has booted up, so I'm having a hard time buying
> the "faulty RAM" theory.

Effects from fauly RAM can be higly dependen on where the faults are.

Arno

Franc Zabkar

unread,
Jul 24, 2011, 6:15:53 PM7/24/11
to
On Sat, 23 Jul 2011 21:25:06 +0200, Noob <ro...@127.0.0.1> put finger
to keyboard and composed:

>I got a copy of the MBR by running


>dd if=/dev/sda of=mbr.bin bs=512 count=2
>
>This should get to LBA 0, right?

I'm not a Linux user, but that seems OK to me (it's listed in
Wikipedia as one of the examples). In fact I wondered how you were
capturing these data because often such an offset is the result of the
application adding its own header information. In such cases the file
size would be 1038 (= 2 x 512 + 14), rather than 1024 bytes.

>My partition table is
>
>80 20 21 00 07 fe ff ff 00 08 00 00 00 00 00 0a
>00 fe ff ff 83 fe ff ff 00 08 00 0a 00 00 38 04
>00 fe ff ff 82 fe ff ff 00 08 38 0e 00 e8 17 00

Yes, I understood that from your original post. It's just that you
posted a "good" MBR dump that wasn't your own. That's what led me
astray. Sorry.

Some people have suggested that there may be a problem with your
system RAM, or with the drive's own SDRAM cache.

I wonder if you could disable Linux's HDD cache when performing your
tests. Then you could test the drive's cache memory (8MB) by dd-ing
data in 16MB blocks. In fact it should be sufficient to dd the first
8MB of data twice in succession, and then compare the two results.
This is because part of the 8MB cache is occupied by the drive's
firmware. This would mean that the second read should flush the first
sectors from the cache.

AFAICS, the following commands should do it:

dd if=/dev/sda of=mbr_1.bin bs=512 count=32768
dd if=/dev/sda of=mbr_2.bin bs=512 count=32768

Maxtor Diamondmax 10 Product Manual:
http://www.seagate.com/staticfiles/maxtor/en_us/documentation/manuals/diamondmax_10_product_manual_pata.pdf

Scott Lurndal

unread,
Jul 24, 2011, 7:08:09 PM7/24/11
to
Franc Zabkar <fza...@iinternode.on.net> writes:
>On Sat, 23 Jul 2011 21:25:06 +0200, Noob <ro...@127.0.0.1> put finger
>to keyboard and composed:
>
>>I got a copy of the MBR by running
>>dd if=/dev/sda of=mbr.bin bs=512 count=2
>>
>>This should get to LBA 0, right?

Yes. count=1 is sufficient for the MBR itself.


>I wonder if you could disable Linux's HDD cache when performing your
>tests. Then you could test the drive's cache memory (8MB) by dd-ing
>data in 16MB blocks. In fact it should be sufficient to dd the first
>8MB of data twice in succession, and then compare the two results.
>This is because part of the 8MB cache is occupied by the drive's
>firmware. This would mean that the second read should flush the first
>sectors from the cache.
>
>AFAICS, the following commands should do it:
>
> dd if=/dev/sda of=mbr_1.bin bs=512 count=32768
> dd if=/dev/sda of=mbr_2.bin bs=512 count=32768

Much much much much more efficient to use bs=16m count=1, which will issue a single read
versus 32,768 reads.

If the hard disk is an IDE or SATA drive, you can use the hdparm command to disable
the drive cache.

It is highly unlikely that this is a drive problem, however. Much more likely that
the scatter gather list entry given by the driver to the SATA/IDE controller is fubared
which implies a memory issue (although I can't figure a 14-byte offset from a single
bit error).

Another possiblity is a corrupted driver or rootkit.

scott

Franc Zabkar

unread,
Jul 24, 2011, 7:46:38 PM7/24/11
to
On 24 Jul 2011 23:08:09 GMT, sc...@slp53.sl.home (Scott Lurndal) put

finger to keyboard and composed:

>Franc Zabkar <fza...@iinternode.on.net> writes:
>>On Sat, 23 Jul 2011 21:25:06 +0200, Noob <ro...@127.0.0.1> put finger
>>to keyboard and composed:
>>
>>>I got a copy of the MBR by running
>>>dd if=/dev/sda of=mbr.bin bs=512 count=2
>>>
>>>This should get to LBA 0, right?
>
>Yes. count=1 is sufficient for the MBR itself.

I believe the OP specified count=2 so that he could account for the
14-byte offset.

>>I wonder if you could disable Linux's HDD cache when performing your
>>tests. Then you could test the drive's cache memory (8MB) by dd-ing
>>data in 16MB blocks. In fact it should be sufficient to dd the first
>>8MB of data twice in succession, and then compare the two results.
>>This is because part of the 8MB cache is occupied by the drive's
>>firmware. This would mean that the second read should flush the first
>>sectors from the cache.
>>
>>AFAICS, the following commands should do it:
>>
>> dd if=/dev/sda of=mbr_1.bin bs=512 count=32768
>> dd if=/dev/sda of=mbr_2.bin bs=512 count=32768
>
>Much much much much more efficient to use bs=16m count=1, which will issue a single read
>versus 32,768 reads.
>
>If the hard disk is an IDE or SATA drive, you can use the hdparm command to disable
>the drive cache.

I was thinking that the OP should disable the OS cache in system RAM
rather than the drive's onboard cache. However, I agree that your idea
has merit also.

>It is highly unlikely that this is a drive problem, however. Much more likely that
>the scatter gather list entry given by the driver to the SATA/IDE controller is fubared
>which implies a memory issue (although I can't figure a 14-byte offset from a single
>bit error).
>
>Another possiblity is a corrupted driver or rootkit.

If this were the case, then if the OP were to boot from an optical
drive before attempting to boot from the HDD, he should see a good
MBR. If instead he boots first from the HDD, and the corruption or
rootkit takes effect by loading code into RAM, a subsequent soft
reboot (Ctrl-Alt-Del) may result in him seeing a bad MBR rather than
the real one at LBA 0. A hard reset should be OK, though.

>scott

Scott Lurndal

unread,
Jul 24, 2011, 10:19:10 PM7/24/11
to
Franc Zabkar <fza...@iinternode.on.net> writes:


>I was thinking that the OP should disable the OS cache in system RAM
>rather than the drive's onboard cache. However, I agree that your idea
>has merit also.

With linux, specify "iflag=direct" on the dd(1) command to bypass the
OS cache.

However, given that the OS cache is always page-sized[*] and page-aligned,
I find it quite difficult to believe that something in the OS cache layer
is corrupting data in this fashion.

scott

[*] 4096 bytes on x86 and X86_64.

Paul

unread,
Jul 24, 2011, 10:38:19 PM7/24/11
to

Looks like a job for some kind of bootable disk editor (one
which is sans OS). I wonder what's available.

Paul

Franc Zabkar

unread,
Jul 25, 2011, 5:34:30 AM7/25/11
to
On 25 Jul 2011 02:19:10 GMT, sc...@slp53.sl.home (Scott Lurndal) put

finger to keyboard and composed:

>Franc Zabkar <fza...@iinternode.on.net> writes:

I'm suggesting that the OS cache needs to be bypassed so that we can
be certain that the data are coming from the drive on the second pass
with dd. If the OS cache were enabled, then we would just be comparing
the OS cache against itself.

Franc Zabkar

unread,
Jul 25, 2011, 5:38:12 AM7/25/11
to
On Sun, 24 Jul 2011 22:38:19 -0400, Paul <nos...@needed.com> put

finger to keyboard and composed:

>Looks like a job for some kind of bootable disk editor (one


>which is sans OS). I wonder what's available.

The following disc editor has a DOS version. I would think it should
be OK as long as you don't load smartdrv.

DMDE - DM Disk Editor and Data Recovery Software:
http://softdm.com/download.html

Noob

unread,
Jul 27, 2011, 5:36:59 AM7/27/11
to
Arno wrote:

> In comp.sys.ibm.pc.hardware.storage Noob wrote:
>
>> I tested the RAM using memtest86+ v4.10 on the Fedora 15 live CD.
>> I let the test run for two hours. It completed the first pass in
>> 35 minutes, while the second took much longer. Is that expected?
>
>> (It didn't report any error.)
>
> Passes should take the ame time, unless your hardware is dying
> (e.g. the CPU ECCing like crazy in first-level cache or CPU
> intermittendly trotheling because of overheating).
> Also, 2 hours is far too short. Let it at least run a
> full day.

I ran memtest86+ for 20 hours.

No errors were reported (over 18 passes).

I would think that the problem (the system being unable
to read the first sector of the HDD) was a freak accident,
like cosmic rays, or radioactive spiders, or something
equally rare; except that it happened TWICE in a week.

However, it has not happened since.

What am I doing differently?
I've stopped using Windows' "hibernate" feature.
I haven't played any games in a while.
(Nothing else relevant comes to mind.)

>> Moreover, I don't have any strange behavior once the OS (either
>> Linux or Windows) has booted up, so I'm having a hard time buying
>> the "faulty RAM" theory.
>

> Effects from faulty RAM can be higly dependent on where the faults are.

If the problem occurs again, would I gain any insight
by running memtest after swapping the (two) DIMMs?

I could also put the system in single channel mode,
which would prevent interleaving, right?

Regards.

The Natural Philosopher

unread,
Jul 27, 2011, 7:13:32 AM7/27/11
to
Noob wrote:
> Arno wrote:
>
>> In comp.sys.ibm.pc.hardware.storage Noob wrote:
>>
>>> I tested the RAM using memtest86+ v4.10 on the Fedora 15 live CD.
>>> I let the test run for two hours. It completed the first pass in
>>> 35 minutes, while the second took much longer. Is that expected?
>>> (It didn't report any error.)
>> Passes should take the ame time, unless your hardware is dying
>> (e.g. the CPU ECCing like crazy in first-level cache or CPU
>> intermittendly trotheling because of overheating).
>> Also, 2 hours is far too short. Let it at least run a
>> full day.
>
> I ran memtest86+ for 20 hours.
>
> No errors were reported (over 18 passes).
>
> I would think that the problem (the system being unable
> to read the first sector of the HDD) was a freak accident,
> like cosmic rays, or radioactive spiders, or something
> equally rare; except that it happened TWICE in a week.
>

Running memtest wont show a bus error caused by (in Intel architecture)
a lazy decode of an IO chip.


Since memtest wont be doing any IO instructions.

Things that might cause such a bus collsion include

overheating.
failing peripheral chip
Bad peripheral card.

The problem is that its almost reproducible by running a software test.

Noob

unread,
Jul 28, 2011, 4:43:37 AM7/28/11
to
Scott Lurndal wrote:

> Noob wrote:
>
>> I've stopped using Windows' "hibernate" feature.
>

> Now this seems to be a likely candidate. I could easily imagine
> windows modifying the MBR to short-circuit the recovery from
> hibernation.

I don't think Windows XP modifies the MBR because I always
get the GRUB2 menu when I boot up, whether I halt XP or send
it to hibernation (aka suspend to disk).

Moreover, as far as I could tell, the MBR had not been
modified, it had been shifted by 14 (?!) bytes...

> I assume this is a dual boot system and you're using grub
> to boot both linux and windows?

Yes I have Windows XP, Fedora 13 (must upgrade), and a swap
partition, all managed by GRUB2.

Regards.

Jonathan de Boyne Pollard

unread,
Jul 28, 2011, 7:49:50 PM7/28/11
to
> Now this seems to be a likely candidate.
>
Not really.

> I could easily imagine windows modifying the MBR to short-circuit the
> recovery from hibernation.
>

Imagination doesn't create truth. The hibernation resumption is simply
an alternative to the normal kernel loader, winresume.exe instead of
winload.exe. It is invoked by the Microsoft Boot Manager, that is
itself run long after the MBR has been involved in the bootstrap
process, at the point where it would normally invoke the kernel loader.
Hibernation resumption is not very different to an ordinary BCD entry,
in fact.

It is, as usual, Linux not Windows that has its dirty little fingers
into M. Noob's MBR. Xe has GRUB there.

Jonathan de Boyne Pollard

unread,
Jul 28, 2011, 7:49:56 PM7/28/11
to
> I used a Fedora 15 live CD to boot to Linux, and examine the MBR.
>
And yet this:

> # cat broken_mbr.dump
>
is not a command to do that. So what command did you *actually* run to
obtain the contents of your MBR?

Jonathan de Boyne Pollard

unread,
Jul 28, 2011, 8:10:52 PM7/28/11
to
> I say this because those 64 bytes which are normally assigned to the
> four 16-byte partition slots are instead occupied by code and a text
> string ("Floppy"). There are also JMP instructions (eg je 0x1c3, jb
> 0x1ec, jne 0x1e1) that point to locations within this block.
>
You are being led down the garden path by a GRUB2 programming trick.
See the GRUB2 source for details. There's a 3 line comment that
explains exactly what this is and why it's in the same place as the MBR
primary partition table entries.

Jonathan de Boyne Pollard

unread,
Jul 28, 2011, 8:37:15 PM7/28/11
to
> So can someone translate this for me ?
>
You can translate it for yourself. Indeed you can read the untranslated
original and save yourself even that effort. GRUB2 is free software.
Its source code, complete with comments explaining what it is doing, is
there for the reading. The file to read is grub-core/boot/i386/pc/boot.S .

> WHY would a disk be set up this way, with what *altruistic* objective ?
>

Hint: The clue to its quite benign objective is the very name of the
routine.

> And WHAT software can we expect, to be mucking about in this way ?
> The "rootkit of the month" club perhaps ? :-)
>

One may think of GRUB2 in those terms if one likes. It certainly (in
this case) is installed in the place where MBR computer viruses live.

Jonathan de Boyne Pollard

unread,
Jul 28, 2011, 8:18:25 PM7/28/11
to
> In any case, one would have to ask, why is the OP's MBR code loading sector 0, head 1, track 79? What code is in that particular sector?

M. Noob's MBR code is *not* doing that. Again, go and read the GRUB2
source. Read the floppy_probe routine in grub-core/boot/i386/pc/boot.S
. Even the name alone is a bit of a giveaway as to what the routine's
purpose is.

Noob

unread,
Aug 1, 2011, 5:27:26 AM8/1/11
to
Jonathan de Boyne Pollard wrote:

# dd if=/dev/sda of=mbr.bin bs=512 count=2
# hexdump -C mbr.bin >broken_mbr.dump

Was there another (better) way to do this?

Regards.

Michael Press

unread,
Aug 4, 2011, 12:56:15 AM8/4/11
to
In article <tB1Xp.52981$NY4....@news.usenetserver.com>,
sc...@slp53.sl.home (Scott Lurndal) wrote:

> It is highly unlikely that this is a drive problem, however. Much more likely that
> the scatter gather list entry given by the driver to the SATA/IDE controller is fubared
> which implies a memory issue (although I can't figure a 14-byte offset from a single
> bit error).

Has everything passed me by?
(Rhetorical question. Answer: yes)
"Fubar" used to be sufficient.

If you know what "mung" means, Patty at the
front desk of the assisted living facility
will award you a token for a free game of Bingo.

--
Michael Press

John Hasler

unread,
Aug 4, 2011, 9:15:30 AM8/4/11
to
Scott Lurndal writes:
> ...fubared which implies a memory issue...

Michael Press writes:
> "Fubar" used to be sufficient.

people used to know what FUBAR meant.

> If you know what "mung" means...

No one who knows what "mung" means would use "issue" as a synonym for
"problem".

--
John Hasler
jha...@newsguy.com
Dancing Horse Hill
Elmwood, WI USA

Noob

unread,
Oct 27, 2011, 5:13:53 AM10/27/11
to
EPILOGUE (top-posted, please delete history if replying)

Well, it was indeed a hardware problem: after weeks of correct
operation, and the occasional weirdness once in a while, port
IDE0 suddenly and completely failed, refusing to detect any
peripheral (HDD or optical drive) plugged into it.

I suppose I have three options:

1) change the MB (I'd need to buy used, since socket 939 MBs are rare now)
2) buy a SATA HDD, and just tape over IDE0
3) Configure HDD as master, DVD drive as slave on IDE1

I'm leaning towards solution 2, until I upgrade the whole system
(well MB+CPU+RAM at least; I have decent GPU, PS, and case)

Regards.

--

History below, provided for the record:

> Noob wrote:
>
>> Something trashed my MBR. Now when I boot the PC,
>> I get the dreaded "NON-SYSTEM DISK. PLEASE INSERT
>> SYSTEM DISK AND PRESS ENTER" (or something close).
>
> By MBR, I meant the first sector of my hard disk drive,
> i.e. the boot-strapping code, and the table of primary
> partitions.
>
>> My setup:
>> PATA 120-GB HDD on IDE0 master
>> PATA DVD reader on IDE1 master
>> No IDE slaves. No SATA drives. (SATA disabled in BIOS)
>>
>> I had used gparted to create three primary partitions
>> (all partitions are aligned to 1 MiB, even though
>> this is not a 4K-sector HDD)
>> partition 1 : 80.00 GiB (for WinXP)
>> partition 2 : 33.75 GiB (for Fedora 13)
>> partition 3 : 765 MiB (for swap)
>
> Short version of the rest of original message : the first sector
> had been shifted by 14 bytes. Weird, right?
>
> And the plot thickens. Later that same night, I copied the
> MBR a second time in the live CD environment; and this second
> time, the first 14 bytes of the MBR were ZERO, i.e.
> they had changed !!
>
> Even stranger, the next morning, I booted the PC,
> and the 14-byte offset had disappeared.
>
> I'm starting to think that this might be a hardware problem,
> as both Linux AND the BIOS seem to have had troubles getting
> the MBR consistently. Something else I haven't mentioned:
> when Windows resumes from hibernation, the whole system
> sometime reboots (this started about 2/3 weeks ago), around
> the same time I changed the RAM from 2x512 to 2x1024.
>
> I tested the RAM, no errors after one hour of checking.
> The S.M.A.R.T. counters for the HDD claim the drive is
> "healthy".
>
> However, considering that the drive is inserting random
> garbage around requested data, I'm wondering if this could
> be the drive's controller failing? Would this show up in
> a S.M.A.R.T. diagnostic?
>
> Or am I on the wrong track, and do you see something else
> that might be responsible?
>
> Regards.
>
>
> [ Below is the rest of my original message, which I left in
> because of the belated cross-post to csiphs ]
>
>> For my own record, my partitions are encoded as follows.
>>
>> http://en.wikipedia.org/wiki/Master_boot_record
>> http://en.wikipedia.org/wiki/Partition_type
>>
>> 80 20 21 00 07 fe ff ff 00 08 00 00 00 00 00 0a
>> bootable, NTFS, start = 1 MiB, count = 80 GiB
>>
>> 00 fe ff ff 83 fe ff ff 00 08 00 0a 00 00 38 04
>> non-bootable, linux, count = 33.75 GiB
>>
>> 00 fe ff ff 82 fe ff ff 00 08 38 0e 00 e8 17 00
>> non-bootable, swap, count = 765 MiB
>>
>> I used a Fedora 15 live CD to boot to Linux, and examine
>> the MBR. I looked for the MBR signature, and noticed
>> something very odd: the 0xAA55 signature was 14 bytes
>> "too far", i.e. at offset 0x20c instead of 0x1fe, which
>> means my broken MBR straddles sectors 0 and 1...
>>
>> # cat broken_mbr.dump
>> 00000000 00 00 00 00 00 00 41 01 63 74 e6 00 39 00 eb 48 |......A.ct..9..H|
>> 00000010 90 d0 bc 00 7c fb 50 07 50 1f fc be 1b 7c bf 1b |....|.P.P....|..|
>> 00000020 06 50 57 b9 e5 01 f3 a4 cb bd be 07 b1 04 38 6e |.PW...........8n|
>> 00000030 00 7c 09 75 13 83 c5 10 e2 f4 cd 18 8b f5 83 c6 |.|.u............|
>> 00000040 10 49 74 19 38 2c 74 f6 a0 b5 07 b4 03 02 80 00 |.It.8,t.........|
>> 00000050 00 80 00 e8 c4 0a 00 08 fa 90 90 f6 c2 80 75 02 |..............u.|
>> 00000060 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc 00 20 |...Y|..1....... |
>> 00000070 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80 74 54 |..@|<.t...R...tT|
>> 00000080 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55 aa 75 |.A..U..ZRrI..U.u|
>> 00000090 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66 8b 4c |C.A|..u....t7f.L|
>> 000000a0 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7 04 10 |...|.D..f..D|...|
>> 000000b0 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00 70 66 |..D...f.\..D..pf|
>> 000000c0 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72 05 bb |1..D.f.D..B..r..|
>> 000000d0 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f 84 f0 |.p.}....s.......|
>> 000000e0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0 88 f0 |......|.D..f1...|
>> 000000f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8 88 f4 |@f.D.1..........|
>> 00000100 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04 66 a1 |@.D.1......f..f.|
>> 00000110 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2 66 f7 |D|f1.f.4.T.f1.f.|
>> 00000120 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a 54 0d |t..T..D.;D.}<.T.|
>> 00000130 c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a 8a 74 |....L......l.Z.t|
>> 00000140 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72 2a 8c |...p..1......r*.|
>> 00000150 c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6 31 ff |...H|`......1.1.|
>> 00000160 fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40 00 eb |....a.&B|..}.@..|
>> 00000170 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30 00 be |...}.8.....}.0..|
>> 00000180 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47 65 6f |.}.*...GRUB .Geo|
>> 00000190 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65 61 64 |m.Hard Disk.Read|
>> 000001a0 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd 10 ac |. Error.........|
>> 000001b0 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00 00 00 |<.u.............|
>> 000001c0 00 00 00 00 00 00 dc 3b dd 3b 00 00 80 20 21 00 |.......;.;... !.|
>> 000001d0 07 fe ff ff 00 08 00 00 00 00 00 0a 00 fe ff ff |................|
>> 000001e0 83 fe ff ff 00 08 00 0a 00 00 38 04 00 fe ff ff |..........8.....|
>> 000001f0 82 fe ff ff 00 08 38 0e 00 e8 17 00 00 00 00 00 |......8.........|
>> 00000200 00 00 00 00 00 00 00 00 00 00 00 00 55 aa 00 00 |............U...|
>> 00000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
>>
>> The partition table (64 bytes) at the end, right
>> before the last two-byte signature, is valid.
>>
>> For comparison, I examined /boot/grub/stage1
>>
>> # cat good_mbr.dump
>> 00000000 eb 48 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |.H..............|
>> 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
>> *
>> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 02 |................|
>> 00000040 ff 00 00 80 01 00 00 00 00 08 fa eb 07 f6 c2 80 |................|
>> 00000050 75 02 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc |u....Y|..1......|
>> 00000060 00 20 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80 |. ..@|<.t...R...|
>> 00000070 74 54 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55 |tT.A..U..ZRrI..U|
>> 00000080 aa 75 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66 |.uC.A|..u....t7f|
>> 00000090 8b 4c 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7 |.L...|.D..f..D|.|
>> 000000a0 04 10 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00 |....D...f.\..D..|
>> 000000b0 70 66 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72 |pf1..D.f.D..B..r|
>> 000000c0 05 bb 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f |...p.}....s.....|
>> 000000d0 84 f0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0 |........|.D..f1.|
>> 000000e0 88 f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8 |..@f.D.1........|
>> 000000f0 88 f4 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04 |..@.D.1......f..|
>> 00000100 66 a1 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2 |f.D|f1.f.4.T.f1.|
>> 00000110 66 f7 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a |f.t..T..D.;D.}<.|
>> 00000120 54 0d c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a |T.....L......l.Z|
>> 00000130 8a 74 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72 |.t...p..1......r|
>> 00000140 2a 8c c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6 |*....H|`......1.|
>> 00000150 31 ff fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40 |1.....a.&B|..}.@|
>> 00000160 00 eb 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30 |.....}.8.....}.0|
>> 00000170 00 be 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47 |...}.*...GRUB .G|
>> 00000180 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65 |eom.Hard Disk.Re|
>> 00000190 61 64 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd |ad. Error.......|
>> 000001a0 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00 |..<.u...........|
>> 000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 12 |..............$.|
>> 000001c0 0f 09 00 be bd 7d 31 c0 cd 13 46 8a 0c 80 f9 00 |.....}1...F.....|
>> 000001d0 75 0f be da 7d e8 c9 ff eb 97 46 6c 6f 70 70 79 |u...}.....Floppy|
>> 000001e0 00 bb 00 70 b8 01 02 b5 00 b6 00 cd 13 72 d7 b6 |...p.........r..|
>> 000001f0 01 b5 4f e9 e0 fe 00 00 00 00 00 00 00 00 55 aa |..O...........U.|
>>
>> And indeed, the "eb 48" is there in my broken MBR,
>> 14 bytes "too far". So it looks like I could just
>> use a binary editor to remove the first 14 bytes,
>> then write that back to my HDD's first sector?
>>
>> Not too sure about that, though.
>>
>> Does my broken MBR, shifted left by 14 bytes, look
>> like a valid MBR for grub?
>>
>> Thoughts? Suggestions? How should I proceed?
>>
>> For the record, here are the disassembly for the
>> broken MBR, and the good MBR (they do seem to differ
>> in several places, I'm wondering if this is because
>> I shouldn't be looking at /boot/grub/stage1)
>>
>> On a normal MBR, the code area ranges from 0 to 0x1b7
>> Shifted by 14 bytes, I expect range 14 to 0x1c5
>>
>> http://prefetch.net/blog/index.php/2006/09/09/digging-through-the-mbr/
>>
>> # cat broken_mbr.asm
>> 0: 00 00 add %al,(%bx,%si)
>> 2: 00 00 add %al,(%bx,%si)
>> 4: 00 00 add %al,(%bx,%si)
>> 6: 41 inc %cx
>> 7: 01 63 74 add %sp,0x74(%bp,%di)
>> a: e6 00 out %al,$0x0
>> c: 39 00 cmp %ax,(%bx,%si)
>> XXX e: eb 48 jmp 0x58
>> 10: 90 nop
>> 11: d0 bc 00 7c sarb 0x7c00(%si)
>> 15: fb sti
>> 16: 50 push %ax
>> 17: 07 pop %es
>> 18: 50 push %ax
>> 19: 1f pop %ds
>> 1a: fc cld
>> 1b: be 1b 7c mov $0x7c1b,%si
>> 1e: bf 1b 06 mov $0x61b,%di
>> 21: 50 push %ax
>> 22: 57 push %di
>> 23: b9 e5 01 mov $0x1e5,%cx
>> 26: f3 a4 rep movsb %ds:(%si),%es:(%di)
>> 28: cb lret
>> 29: bd be 07 mov $0x7be,%bp
>> 2c: b1 04 mov $0x4,%cl
>> 2e: 38 6e 00 cmp %ch,0x0(%bp)
>> 31: 7c 09 jl 0x3c
>> 33: 75 13 jne 0x48
>> 35: 83 c5 10 add $0x10,%bp
>> 38: e2 f4 loop 0x2e
>> 3a: cd 18 int $0x18
>> 3c: 8b f5 mov %bp,%si
>> 3e: 83 c6 10 add $0x10,%si
>> 41: 49 dec %cx
>> 42: 74 19 je 0x5d
>> 44: 38 2c cmp %ch,(%si)
>> 46: 74 f6 je 0x3e
>> 48: a0 b5 07 mov 0x7b5,%al
>> 4b: b4 03 mov $0x3,%ah
>> 4d: 02 80 00 00 add 0x0(%bx,%si),%al
>> 51: 80 00 e8 addb $0xe8,(%bx,%si)
>> 54: c4 0a les (%bp,%si),%cx
>> 56: 00 08 add %cl,(%bx,%si)
>> 58: fa cli
>> 59: 90 nop
>> 5a: 90 nop
>> 5b: f6 c2 80 test $0x80,%dl
>> 5e: 75 02 jne 0x62
>> 60: b2 80 mov $0x80,%dl
>> 62: ea 59 7c 00 00 ljmp $0x0,$0x7c59
>> 67: 31 c0 xor %ax,%ax
>> 69: 8e d8 mov %ax,%ds
>> 6b: 8e d0 mov %ax,%ss
>> 6d: bc 00 20 mov $0x2000,%sp
>> 70: fb sti
>> 71: a0 40 7c mov 0x7c40,%al
>> 74: 3c ff cmp $0xff,%al
>> 76: 74 02 je 0x7a
>> 78: 88 c2 mov %al,%dl
>> 7a: 52 push %dx
>> 7b: f6 c2 80 test $0x80,%dl
>> 7e: 74 54 je 0xd4
>> 80: b4 41 mov $0x41,%ah
>> 82: bb aa 55 mov $0x55aa,%bx
>> 85: cd 13 int $0x13
>> 87: 5a pop %dx
>> 88: 52 push %dx
>> 89: 72 49 jb 0xd4
>> 8b: 81 fb 55 aa cmp $0xaa55,%bx
>> 8f: 75 43 jne 0xd4
>> 91: a0 41 7c mov 0x7c41,%al
>> 94: 84 c0 test %al,%al
>> 96: 75 05 jne 0x9d
>> 98: 83 e1 01 and $0x1,%cx
>> 9b: 74 37 je 0xd4
>> 9d: 66 8b 4c 10 mov 0x10(%si),%ecx
>> a1: be 05 7c mov $0x7c05,%si
>> a4: c6 44 ff 01 movb $0x1,-0x1(%si)
>> a8: 66 8b 1e 44 7c mov 0x7c44,%ebx
>> ad: c7 04 10 00 movw $0x10,(%si)
>> b1: c7 44 02 01 00 movw $0x1,0x2(%si)
>> b6: 66 89 5c 08 mov %ebx,0x8(%si)
>> ba: c7 44 06 00 70 movw $0x7000,0x6(%si)
>> bf: 66 31 c0 xor %eax,%eax
>> c2: 89 44 04 mov %ax,0x4(%si)
>> c5: 66 89 44 0c mov %eax,0xc(%si)
>> c9: b4 42 mov $0x42,%ah
>> cb: cd 13 int $0x13
>> cd: 72 05 jb 0xd4
>> cf: bb 00 70 mov $0x7000,%bx
>> d2: eb 7d jmp 0x151
>> d4: b4 08 mov $0x8,%ah
>> d6: cd 13 int $0x13
>> d8: 73 0a jae 0xe4
>> da: f6 c2 80 test $0x80,%dl
>> dd: 0f 84 f0 00 je 0x1d1
>> e1: e9 8d 00 jmp 0x171
>> e4: be 05 7c mov $0x7c05,%si
>> e7: c6 44 ff 00 movb $0x0,-0x1(%si)
>> eb: 66 31 c0 xor %eax,%eax
>> ee: 88 f0 mov %dh,%al
>> f0: 40 inc %ax
>> f1: 66 89 44 04 mov %eax,0x4(%si)
>> f5: 31 d2 xor %dx,%dx
>> f7: 88 ca mov %cl,%dl
>> f9: c1 e2 02 shl $0x2,%dx
>> fc: 88 e8 mov %ch,%al
>> fe: 88 f4 mov %dh,%ah
>> 100: 40 inc %ax
>> 101: 89 44 08 mov %ax,0x8(%si)
>> 104: 31 c0 xor %ax,%ax
>> 106: 88 d0 mov %dl,%al
>> 108: c0 e8 02 shr $0x2,%al
>> 10b: 66 89 04 mov %eax,(%si)
>> 10e: 66 a1 44 7c mov 0x7c44,%eax
>> 112: 66 31 d2 xor %edx,%edx
>> 115: 66 f7 34 divl (%si)
>> 118: 88 54 0a mov %dl,0xa(%si)
>> 11b: 66 31 d2 xor %edx,%edx
>> 11e: 66 f7 74 04 divl 0x4(%si)
>> 122: 88 54 0b mov %dl,0xb(%si)
>> 125: 89 44 0c mov %ax,0xc(%si)
>> 128: 3b 44 08 cmp 0x8(%si),%ax
>> 12b: 7d 3c jge 0x169
>> 12d: 8a 54 0d mov 0xd(%si),%dl
>> 130: c0 e2 06 shl $0x6,%dl
>> 133: 8a 4c 0a mov 0xa(%si),%cl
>> 136: fe c1 inc %cl
>> 138: 08 d1 or %dl,%cl
>> 13a: 8a 6c 0c mov 0xc(%si),%ch
>> 13d: 5a pop %dx
>> 13e: 8a 74 0b mov 0xb(%si),%dh
>> 141: bb 00 70 mov $0x7000,%bx
>> 144: 8e c3 mov %bx,%es
>> 146: 31 db xor %bx,%bx
>> 148: b8 01 02 mov $0x201,%ax
>> 14b: cd 13 int $0x13
>> 14d: 72 2a jb 0x179
>> 14f: 8c c3 mov %es,%bx
>> 151: 8e 06 48 7c mov 0x7c48,%es
>> 155: 60 pusha
>> 156: 1e push %ds
>> 157: b9 00 01 mov $0x100,%cx
>> 15a: 8e db mov %bx,%ds
>> 15c: 31 f6 xor %si,%si
>> 15e: 31 ff xor %di,%di
>> 160: fc cld
>> 161: f3 a5 rep movsw %ds:(%si),%es:(%di)
>> 163: 1f pop %ds
>> 164: 61 popa
>> 165: ff 26 42 7c jmp *0x7c42
>> 169: be 7f 7d mov $0x7d7f,%si
>> 16c: e8 40 00 call 0x1af
>> 16f: eb 0e jmp 0x17f
>> 171: be 84 7d mov $0x7d84,%si
>> 174: e8 38 00 call 0x1af
>> 177: eb 06 jmp 0x17f
>> 179: be 8e 7d mov $0x7d8e,%si
>> 17c: e8 30 00 call 0x1af
>> 17f: be 93 7d mov $0x7d93,%si
>> 182: e8 2a 00 call 0x1af
>> 185: eb fe jmp 0x185
>> 187: 47 inc %di
>> 188: 52 push %dx
>> 189: 55 push %bp
>> 18a: 42 inc %dx
>> 18b: 20 00 and %al,(%bx,%si)
>> 18d: 47 inc %di
>> 18e: 65 6f outsw %gs:(%si),(%dx)
>> 190: 6d insw (%dx),%es:(%di)
>> 191: 00 48 61 add %cl,0x61(%bx,%si)
>> 194: 72 64 jb 0x1fa
>> 196: 20 44 69 and %al,0x69(%si)
>> 199: 73 6b jae 0x206
>> 19b: 00 52 65 add %dl,0x65(%bp,%si)
>> 19e: 61 popa
>> 19f: 64 00 20 add %ah,%fs:(%bx,%si)
>> 1a2: 45 inc %bp
>> 1a3: 72 72 jb 0x217
>> 1a5: 6f outsw %ds:(%si),(%dx)
>> 1a6: 72 00 jb 0x1a8
>> 1a8: bb 01 00 mov $0x1,%bx
>> 1ab: b4 0e mov $0xe,%ah
>> 1ad: cd 10 int $0x10
>> 1af: ac lods %ds:(%si),%al
>> 1b0: 3c 00 cmp $0x0,%al
>> 1b2: 75 f4 jne 0x1a8
>> 1b4: c3 ret
>> ...
>> !!! DATA (NOT CODE) BELOW THISPOINT (AFAIU) !!!
>> 1c5: 00 dc add %bl,%ah
>> 1c7: 3b dd cmp %bp,%bx
>> 1c9: 3b 00 cmp (%bx,%si),%ax
>> 1cb: 00 80 20 21 add %al,0x2120(%bx,%si)
>> 1cf: 00 07 add %al,(%bx)
>> 1d1: fe (bad)
>> 1d2: ff (bad)
>> 1d3: ff 00 incw (%bx,%si)
>> 1d5: 08 00 or %al,(%bx,%si)
>> 1d7: 00 00 add %al,(%bx,%si)
>> 1d9: 00 00 add %al,(%bx,%si)
>> 1db: 0a 00 or (%bx,%si),%al
>> 1dd: fe (bad)
>> 1de: ff (bad)
>> 1df: ff 83 fe ff incw -0x2(%bp,%di)
>> 1e3: ff 00 incw (%bx,%si)
>> 1e5: 08 00 or %al,(%bx,%si)
>> 1e7: 0a 00 or (%bx,%si),%al
>> 1e9: 00 38 add %bh,(%bx,%si)
>> 1eb: 04 00 add $0x0,%al
>> 1ed: fe (bad)
>> 1ee: ff (bad)
>> 1ef: ff 82 fe ff incw -0x2(%bp,%si)
>> 1f3: ff 00 incw (%bx,%si)
>> 1f5: 08 38 or %bh,(%bx,%si)
>> 1f7: 0e push %cs
>> 1f8: 00 e8 add %ch,%al
>> 1fa: 17 pop %ss
>> ...
>> 20b: 00 55 aa add %dl,-0x56(%di)
>>
>>
>> # cat good_mbr.asm
>> 0: eb 48 jmp 0x4a
>> 2: 90 nop
>> ...
>> 3b: 00 00 add %al,(%bx,%si)
>> 3d: 00 03 add %al,(%bp,%di)
>> 3f: 02 ff add %bh,%bh
>> 41: 00 00 add %al,(%bx,%si)
>> 43: 80 01 00 addb $0x0,(%bx,%di)
>> 46: 00 00 add %al,(%bx,%si)
>> 48: 00 08 add %cl,(%bx,%si)
>> 4a: fa cli
>> 4b: eb 07 jmp 0x54
>> 4d: f6 c2 80 test $0x80,%dl
>> 50: 75 02 jne 0x54
>> 52: b2 80 mov $0x80,%dl
>> 54: ea 59 7c 00 00 ljmp $0x0,$0x7c59
>> 59: 31 c0 xor %ax,%ax
>> 5b: 8e d8 mov %ax,%ds
>> 5d: 8e d0 mov %ax,%ss
>> 5f: bc 00 20 mov $0x2000,%sp
>> 62: fb sti
>> 63: a0 40 7c mov 0x7c40,%al
>> 66: 3c ff cmp $0xff,%al
>> 68: 74 02 je 0x6c
>> 6a: 88 c2 mov %al,%dl
>> 6c: 52 push %dx
>> 6d: f6 c2 80 test $0x80,%dl
>> 70: 74 54 je 0xc6
>> 72: b4 41 mov $0x41,%ah
>> 74: bb aa 55 mov $0x55aa,%bx
>> 77: cd 13 int $0x13
>> 79: 5a pop %dx
>> 7a: 52 push %dx
>> 7b: 72 49 jb 0xc6
>> 7d: 81 fb 55 aa cmp $0xaa55,%bx
>> 81: 75 43 jne 0xc6
>> 83: a0 41 7c mov 0x7c41,%al
>> 86: 84 c0 test %al,%al
>> 88: 75 05 jne 0x8f
>> 8a: 83 e1 01 and $0x1,%cx
>> 8d: 74 37 je 0xc6
>> 8f: 66 8b 4c 10 mov 0x10(%si),%ecx
>> 93: be 05 7c mov $0x7c05,%si
>> 96: c6 44 ff 01 movb $0x1,-0x1(%si)
>> 9a: 66 8b 1e 44 7c mov 0x7c44,%ebx
>> 9f: c7 04 10 00 movw $0x10,(%si)
>> a3: c7 44 02 01 00 movw $0x1,0x2(%si)
>> a8: 66 89 5c 08 mov %ebx,0x8(%si)
>> ac: c7 44 06 00 70 movw $0x7000,0x6(%si)
>> b1: 66 31 c0 xor %eax,%eax
>> b4: 89 44 04 mov %ax,0x4(%si)
>> b7: 66 89 44 0c mov %eax,0xc(%si)
>> bb: b4 42 mov $0x42,%ah
>> bd: cd 13 int $0x13
>> bf: 72 05 jb 0xc6
>> c1: bb 00 70 mov $0x7000,%bx
>> c4: eb 7d jmp 0x143
>> c6: b4 08 mov $0x8,%ah
>> c8: cd 13 int $0x13
>> ca: 73 0a jae 0xd6
>> cc: f6 c2 80 test $0x80,%dl
>> cf: 0f 84 f0 00 je 0x1c3
>> d3: e9 8d 00 jmp 0x163
>> d6: be 05 7c mov $0x7c05,%si
>> d9: c6 44 ff 00 movb $0x0,-0x1(%si)
>> dd: 66 31 c0 xor %eax,%eax
>> e0: 88 f0 mov %dh,%al
>> e2: 40 inc %ax
>> e3: 66 89 44 04 mov %eax,0x4(%si)
>> e7: 31 d2 xor %dx,%dx
>> e9: 88 ca mov %cl,%dl
>> eb: c1 e2 02 shl $0x2,%dx
>> ee: 88 e8 mov %ch,%al
>> f0: 88 f4 mov %dh,%ah
>> f2: 40 inc %ax
>> f3: 89 44 08 mov %ax,0x8(%si)
>> f6: 31 c0 xor %ax,%ax
>> f8: 88 d0 mov %dl,%al
>> fa: c0 e8 02 shr $0x2,%al
>> fd: 66 89 04 mov %eax,(%si)
>> 100: 66 a1 44 7c mov 0x7c44,%eax
>> 104: 66 31 d2 xor %edx,%edx
>> 107: 66 f7 34 divl (%si)
>> 10a: 88 54 0a mov %dl,0xa(%si)
>> 10d: 66 31 d2 xor %edx,%edx
>> 110: 66 f7 74 04 divl 0x4(%si)
>> 114: 88 54 0b mov %dl,0xb(%si)
>> 117: 89 44 0c mov %ax,0xc(%si)
>> 11a: 3b 44 08 cmp 0x8(%si),%ax
>> 11d: 7d 3c jge 0x15b
>> 11f: 8a 54 0d mov 0xd(%si),%dl
>> 122: c0 e2 06 shl $0x6,%dl
>> 125: 8a 4c 0a mov 0xa(%si),%cl
>> 128: fe c1 inc %cl
>> 12a: 08 d1 or %dl,%cl
>> 12c: 8a 6c 0c mov 0xc(%si),%ch
>> 12f: 5a pop %dx
>> 130: 8a 74 0b mov 0xb(%si),%dh
>> 133: bb 00 70 mov $0x7000,%bx
>> 136: 8e c3 mov %bx,%es
>> 138: 31 db xor %bx,%bx
>> 13a: b8 01 02 mov $0x201,%ax
>> 13d: cd 13 int $0x13
>> 13f: 72 2a jb 0x16b
>> 141: 8c c3 mov %es,%bx
>> 143: 8e 06 48 7c mov 0x7c48,%es
>> 147: 60 pusha
>> 148: 1e push %ds
>> 149: b9 00 01 mov $0x100,%cx
>> 14c: 8e db mov %bx,%ds
>> 14e: 31 f6 xor %si,%si
>> 150: 31 ff xor %di,%di
>> 152: fc cld
>> 153: f3 a5 rep movsw %ds:(%si),%es:(%di)
>> 155: 1f pop %ds
>> 156: 61 popa
>> 157: ff 26 42 7c jmp *0x7c42
>> 15b: be 7f 7d mov $0x7d7f,%si
>> 15e: e8 40 00 call 0x1a1
>> 161: eb 0e jmp 0x171
>> 163: be 84 7d mov $0x7d84,%si
>> 166: e8 38 00 call 0x1a1
>> 169: eb 06 jmp 0x171
>> 16b: be 8e 7d mov $0x7d8e,%si
>> 16e: e8 30 00 call 0x1a1
>> 171: be 93 7d mov $0x7d93,%si
>> 174: e8 2a 00 call 0x1a1
>> 177: eb fe jmp 0x177
>> 179: 47 inc %di
>> 17a: 52 push %dx
>> 17b: 55 push %bp
>> 17c: 42 inc %dx
>> 17d: 20 00 and %al,(%bx,%si)
>> 17f: 47 inc %di
>> 180: 65 6f outsw %gs:(%si),(%dx)
>> 182: 6d insw (%dx),%es:(%di)
>> 183: 00 48 61 add %cl,0x61(%bx,%si)
>> 186: 72 64 jb 0x1ec
>> 188: 20 44 69 and %al,0x69(%si)
>> 18b: 73 6b jae 0x1f8
>> 18d: 00 52 65 add %dl,0x65(%bp,%si)
>> 190: 61 popa
>> 191: 64 00 20 add %ah,%fs:(%bx,%si)
>> 194: 45 inc %bp
>> 195: 72 72 jb 0x209
>> 197: 6f outsw %ds:(%si),(%dx)
>> 198: 72 00 jb 0x19a
>> 19a: bb 01 00 mov $0x1,%bx
>> 19d: b4 0e mov $0xe,%ah
>> 19f: cd 10 int $0x10
>> 1a1: ac lods %ds:(%si),%al
>> 1a2: 3c 00 cmp $0x0,%al
>> 1a4: 75 f4 jne 0x19a
>> 1a6: c3 ret
>> ...
>> !!! DATA (NOT CODE) BELOW THIS POINT (AFAIU) !!!
>> 1bb: 00 00 add %al,(%bx,%si)
>> 1bd: 00 24 add %ah,(%si)
>> 1bf: 12 0f adc (%bx),%cl
>> 1c1: 09 00 or %ax,(%bx,%si)
>> 1c3: be bd 7d mov $0x7dbd,%si
>> 1c6: 31 c0 xor %ax,%ax
>> 1c8: cd 13 int $0x13
>> 1ca: 46 inc %si
>> 1cb: 8a 0c mov (%si),%cl
>> 1cd: 80 f9 00 cmp $0x0,%cl
>> 1d0: 75 0f jne 0x1e1
>> 1d2: be da 7d mov $0x7dda,%si
>> 1d5: e8 c9 ff call 0x1a1
>> 1d8: eb 97 jmp 0x171
>> 1da: 46 inc %si
>> 1db: 6c insb (%dx),%es:(%di)
>> 1dc: 6f outsw %ds:(%si),(%dx)
>> 1dd: 70 70 jo 0x24f
>> 1df: 79 00 jns 0x1e1
>> 1e1: bb 00 70 mov $0x7000,%bx
>> 1e4: b8 01 02 mov $0x201,%ax
>> 1e7: b5 00 mov $0x0,%ch
>> 1e9: b6 00 mov $0x0,%dh
>> 1eb: cd 13 int $0x13
>> 1ed: 72 d7 jb 0x1c6
>> 1ef: b6 01 mov $0x1,%dh
>> 1f1: b5 4f mov $0x4f,%ch
>> 1f3: e9 e0 fe jmp 0xd6
>> ...
>> 1fe: 55 push %bp
>> 1ff: aa stos %al,%es:(%di)
>>
>> Thanks for reading this far!! ;-)
>>
>> Regards.

Arno

unread,
Oct 27, 2011, 11:24:17 AM10/27/11
to
In comp.sys.ibm.pc.hardware.storage Noob <ro...@127.0.0.1> wrote:
> EPILOGUE (top-posted, please delete history if replying)

> Well, it was indeed a hardware problem: after weeks of correct
> operation, and the occasional weirdness once in a while, port
> IDE0 suddenly and completely failed, refusing to detect any
> peripheral (HDD or optical drive) plugged into it.

> I suppose I have three options:

> 1) change the MB (I'd need to buy used, since socket 939 MBs are rare now)
> 2) buy a SATA HDD, and just tape over IDE0
> 3) Configure HDD as master, DVD drive as slave on IDE1

> I'm leaning towards solution 2, until I upgrade the whole system
> (well MB+CPU+RAM at least; I have decent GPU, PS, and case)

You can go way 2, but it is a real possibility this is a chipset
problem and will affect the SATA port as well. I had one ASUS
board (they have moved to the "incompetent and stupid" model for
chipset cooling some time ago), that started with failing USB
then moved on to SATA port by port.

There ia also option 4: Get an IDE controller.

And thanks for posting the info!

Noob

unread,
Oct 28, 2011, 5:06:06 AM10/28/11
to
Arno wrote:

> Noob wrote:
>
>> EPILOGUE
>>
>> Well, it was indeed a hardware problem: after weeks of correct
>> operation, and the occasional weirdness once in a while, port
>> IDE0 suddenly and completely failed, refusing to detect any
>> peripheral (HDD or optical drive) plugged into it.
>>
>> I suppose I have three options:
>>
>> 1) change the MB (I'd need to buy used, since socket 939 MBs are rare now)
>> 2) buy a SATA HDD, and just tape over IDE0
>> 3) Configure HDD as master, DVD drive as slave on IDE1
>>
>> I'm leaning towards solution 2, until I upgrade the whole system
>> (well MB+CPU+RAM at least; I have decent GPU, PS, and case)
>
> You can go way 2, but it is a real possibility this is a chipset
> problem and will affect the SATA port as well.

Thanks for pointing that out. I hadn't thought to incriminate
the south bridge. I do hope it's still working fine, because
I've just bought 2 GB of DDR1, which are useless once I finally
replace the MB.

> I had one ASUS board (they have moved to the "incompetent and
> stupid" model for chipset cooling some time ago), that started
> with failing USB then moved on to SATA port by port.

My MB is an Asus A8N-E. There was an issue with the chipset fan
on early models, and Asus changed the fan on later models.
However, this new chipset fan was still too noisy for my taste,
so I turned it off, and installed a 12-cm fan in the front of
the case to blow cold air (from outside the case) on the HDD
and the chipset. I monitored the chipset temperature for
several months, and everything seemed well. But maybe I should
have changed the chipset heatsink+fan with some custom solution,
such as a Zalman ZM-NB47J passive heatsink, with the cold air
from the chassis fan blowing over it...

> There ia also option 4: Get an IDE controller.

Thanks I hadn't thought of that either. Can the BIOS boot off
a HDD plugged into a PCI-based IDE controller?
0 new messages