AFAIK, You will need some sort of encoder/decoder that will convert a given
instruction opcode into a printable character. The hard part is to write the
decoder (or loader) in printable characters then append the rest of the
encoded program into it, something like:
XY?/:01AALAS <- say this is the decoder that will decode the next chunks:
L7KAXKXAO01!?~0_0
.
.
.
<- this was your original program encoded.
After decoding you jump to original entry point.
Time ago I saw a program name 'com2txt' or 'comtotxt' that does that.
Regards,
Elias
"Uno" <nos...@no.spam> wrote in message
news:UO8xb.105905$hV.38...@news2.tin.it...
Refer to the Intel Architecture Manuals Volume 2 available on Intel's
website. It details how instructions are encoded, and Appendix A lists all
of the opcodes. Volume 3 of the x86-64 manuals also talks about encodings,
and the x86-64 encodings are quite similar to existing ones. The website
www.sandpile.org is also a good generally reference for x86 opcodes.
-Matt
Funny you should ask! :-)
I happen to have written the ultimate executable ascii program:
The bootstrap loader gets away with the least amount of selfmodification
possible: A single two-byte instruction (a backwards branch).
It uses much less than the regular ascii set, in fact it stays within
the 70+ characters defined in the MIME standard as being capable of
surviving all mail gateways. (This BTW is really hard, because it makes
stuff like POP BX, SI, DI or BP impossible.)
It is more or less self-relocating, in that it will survive most forms
of reformatting:
You can replace the CRLF line terminators with a single CR (Mac) or LF
(Unix), or remove them completely (Word: Each paragraph becomes a single
line).
After the first two 64-character lines, which contains the level 1
bootstrap, the rest can be reformatted any way you like, including
inserting/removing any amount of whitespace.
The first-level bootstrap code (after self-modification) contains a
two-to-one character decoder, capable of generating any binary value
from a pair of characters in the MIME Base64 character set. It does this
by taking the first character and then subtracting the second char
_twice_, which effectively affords the needed shift.
The second-level bootstrap code generated by this process contains what
is probably the world's shortest MIME Base64 decoder (i.e. a 4-to-3
binary decoder).
Finally, the resulting binary is either written to disk, or relocated
and executed in-place.
I'll include just the first level bootstrap, if I do anything more than
that I'll probably trigger some 'no binary posts' filter somewhere:
ZRYPQIQDYLRQRQRRAQX,2,NPPa,R0Gc,.0Gd,PPu.F2,QX=0+r+E=0=tG0-Ju E=
EE(-(-GNEEEEEEEEEEEEEEEF 5BBEEYQEEEE=DU.COM=======(c)TMathisen95
Terje
PS. I wrote this program over a ski weekend in the montains, with no PC
access: I had brought along a list of possible opcodes with MIME ascii
encoding.
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Could you explain me better? I'm interested in this topic too,
Thanks!
Rickey Bowers Jr.
> Terje, very talented solution!
>
Glad you liked it. :-)
I do feel that this is probably the best single hack I've ever written,
it was extremely satisfying to work out how to do it after first
deciding that it was impossible.
A few days later I suddenly discovered that even though all the POP reg
opcodes for memory-adressing registers were out of bounds, POPA is OK!
This does mean that the mime version of the startup code won't run on
the original 1981 PC, since the POPA opcode was introduced a year later,
with the 80186 cpu.
BTW, did you notice the way I'm using INC BP (ascii 'E') as a NOP?
One important element is the need to align all jump targets properly, so
as to fit within the MIME set. I also use a pair of E's as a buffer in
case the first CRLF pair is modified.
That CRLF pair is itself hidden within the immediate part of an
CMP AX,0A0Dh
opcode, if the CRLF pair is removed or shortened, the following pair of
E's will be included instead.
Terje