If you're just looking to use a disassembler, then objdump is one choice. The disassembler that comes with the nasm assembler is ndisasm. You can also run "debug.exe" in DOS Box on Linux, provided you get a hold of a copy of the program. It also does disassembly, as well as controlled execution; i.e. simulation of the CPU, itself - which is also important, even when doing disassembly, for reasons I'm about to describe.
This gets to the other sense of your query: "I want to make a disassembler". The source for ndisasm is available, and it handles many of the descendants of 8086, not just 8086, itself (which seriously clutters it, if all you want is an 8086 or even 80386 disassembler), but it is not self-contained and has a heavy dependency on the rest of the distribution.
Its main talking point is that it uses octal digits for the opcodes - which better fits the 80x86 - as I pointed out on the USENET in 1995 in comp.lang.asm ... and (in fact) nasm's creation was a direct response to that. So, it's potentially more transparent and you may want to keep the source handy as a check and comparison, if you're making your own disassembler.
And then you've just disassembled a disassembler that also happens to do CPU emulation, like Fake86 does - but only for the 8086. You'll have to make the absolute addresses relative (using the original relocation table as a guide), to make is re-assemblable. Once you do that, you can work on the source. The opcode table is in clear view (if you display it as text) - both when seen in the packed and unpacked versions of debug.exe.
There's also DosDebug up on GitHub. It handles everything up to "80586" (or Pentium") and "80686": it flags a generation "6" for some instructions.; e.g. the conditional "cmov" operations are handled by it, as well as their "fcmov" floating point versions. DosDebug is in 8086 assembly and is best-suited to compile with jwasm. You might be able to run nasm on it, I don't know. I never tried.
I might port the DAS disassembler to the x86, since items (a)-(f) are already incorporated into DAS's design. I've only ever ported it to the 8051, 6800, 6809 and 8080/8085 (and Z80) up to now; but the transition from 8085 to 8086 is relatively small. To that end, I might hack something out of Fake86. That's mostly abandonware, now, since the author replaced it by XTulator, as Fake86 was written when the programmer was relatively new to C. You might also be able to hack something directly out of DosDebug's opcode tables (their "instr.*" files).
As Jester correctly pointed out in a comment, you just need to use set architecture i8086 when using gdb so that it knows to assume 16-bit 8086 instruction format. You can learn about the gdb targets here.
Normally when you debug an ELF, PE or any other object file gdb can infer the architecture from the file headers. When you debug a bootloader there is no object file to read so you can tell gdb the architecture yourself (In the case of a bootloader arch will be i8086):
Currently there seem to be an issue in gdb that causes it to choose the most "featureful compatible architecture" between the target's architecture (i386) and the user provided architecture (i8086). Because gdb sees i386 as a proper super set of i8086 it uses it instead. Choosing i386 causes all operands to default to 32 bits (instead of 16), this what causes the disassembler errors.
This means that a source program can be translated to assembly language in many different ways, and machine language can be translated back to source in many different ways. As a result, it is quite common that compiling a file and immediately decompiling it may yield a vastly different source file from the one that was input. Decompilers are very language and library dependent. Processing a binary produced by a Delphi compiler with a decompiler designed to generate C code can yield very strange results. Similarly, feeding a compiled Windows binary through a decompiler that has no knowledge of the Windows programming API may not yield anything useful.
Many people already use what we call binary rewriting for profiling purposes. For example DynInst & MAQAO do that to profile applications in order to locate bottlenecks in basic blocks. Now the question you'll probably be asking yourself is how is it done ? Simple. Most available disassemblers like objdump, objconv, IDA, etc. work in standalone mode and usually print an instruction on disassembly, but others like udis86 & distorm offer an API to access the disassembled code in addition to being available in standalone mode.But, what DynInst, MAQAO, and most binary rewriting tools do is disassemble a binary file and insert probes wherever it is appropriate in the data structure before reassembling the binary. Thus all necessary changes related to addresses, branches, context saving, and so on are handled properly before reassembly.
What you must know is that it is extremely hard to write such tools. The first challenge is to write a reliable disassembler. This of course implies choosing a disassembly algorithm (linear sweep vs. recursive traversal), separating the instructions from the data (they can be mixed - shellcodes for example), and so on. Then comes the second challenge, patching the disassembled code. This is extremely tricky and I'll point out this document which should be of great help : _techreport.pdf. It has been written by the author of the disassembler used in MAQAO (MADRAS - Multi Architecture Disassembler Rewriter and Assembler). The interesting part about this documents is the references (over 50 and extremely helpful) and the appendices which describe the algorithms used.
Some systems use a simple executable format which is linked by combining the instructions and defined data contained in the processed assembly files, in a specified sequence; the linker will apply address fix-ups, but otherwise the contents of the output executable will be exactly what the programmer specified--nothing more; nothing less. The old MS-DOS .COM file format was like that, and so it's possible for a disassembler to take a COM file and produce an file which, when assembled and linked, will yield a bit-identical file (the disassembly may have to use data directives for anything it doesn't understand, or for any instructions which might compile to something other than the bit pattern than appears in the file, but any executable-file bytes could be created using define-byte directives if nothing else.
Other systems, however, use linkers which are more sophisticated and have to generate additional information which gets stored within the file. It's possible that building the same source file with different versions of the tools may yield different binary files. In general, when doing a disassemble-patch-reassemble sequence, one would strive to minimize the amount of code whose address changes. If code stores the address of a function somewhere but the disassembler doesn't realize it's a function address rather than a constant, the only way the code will work is if the function stays at the same address. Unfortunately, there's no way for an object file which is compatible with MS-DOS or Windows build tools to specify where things should be located. When the code was originally compiled or assembled, the linker would be free to put the function anywhere and would update the "constant" function address appropriately. When processing disassembled code, there's no way to ensure that the function stays at the old address, nor of having all addresses which depend upon it will get updated if it moves.
Recently I realised that, as part of his 8086 reverse-engineering series, Ken Shirriff had posted online a high resolution photograph of the 8086 die with the metal layer removed. This was something I have been looking for for some time, in order to extract and disassemble the 8086 microcode. I had previously found very high resolution photos of the die with the metal layer intact, but only half of the bits of the microcode ROM were readable. Ken also posted a high resolution photograph of the microcode ROM of the 8088, which is very similar but not identical. I was very curious to know what the differences were.
The microcode is partially documented in US patent 4363091. In particular, that patent has source listings for several microcode routines. Within these, there are certain patterns of parts of instructions which I was able to find in the ROM dump. This allowed me to figure out how the bit patterns in the ROM correspond to the operands and opcodes of the microcode instruction set, in a manner similar to cracking a monoalphabetic substitution cipher. My resulting disassembly of the microcode ROM can be found here and the code for my disassembler is on github.
The differences are in the interrupt handling code. I think it comes down to fact that the 8086 does two special bus accesses to acknowledge an interrupt (one to tell the PIC that it is ready to service the interrupt, the second to fetch the interrupt number for the IRQ that needs to be serviced). These are word-sized accesses for some reason, so the 8088 would break them into four accesses instead of two. This would confuse the PIC, so the 8088 does a single access instead and relies on the BIU to split the access into two. The other changes seem to be fallout related to that.
The version in the patent does a check for pending interrupts in the "RPTS" routine, before it processes any iterations of the string. This means that if there is a continuous "storm" of interrupts, the string instruction will make no progress. The version in the CPU corrects this, and checks for interrupts on line 3, after it has done the store, allowing it to progress. This was probably not a situation that was expected to occur in normal operation (in fact, I seem to recall crashing my 8088 and 8086 machines by having interrupts happen too rapidly to be serviced). The change was most likely done to accommodate debugging with the trap flag (which essentially means that there is always an interrupt pending when the trap flag is set). Without this change, code that used the repeated string instructions would not have progressed under the debugger.
aa06259810