Has anyone done with NASM and GNU/Linux what Iczelion has done for MASM
and the Win32 API through his tutorials? (or with any assembler and GNU/
Linux).
--
Regards,
Pop Tart
Here are a few self explaining graphics demos for Linux/NASM:
Thanks Herbert. What assembler is used for those examples - doesn't look
like nasm to me, but I'm not too familiar with nasm - only a little
; ********** connect socket to /tmp/.X11-unix/X0" ******************
moveq.l sockaddr_un_l,-[sp]
move.l sockaddr_un,-[sp] ; (/usr/include/linux/un.h)
move.l [x_handle],-[sp] ; socket handle
move.l r7,r2 ; pointer to parameter for "connect"
move.l 3,r3 ; "connect" (/usr/include/linux/net.h)
move.l 102,r0 ; socketcall (/usr/include/asm/unistd.h)
trap $80
--
Regards,
Pop Tart
; nasm -O99 -f bin -o hla hla.asm
%include "mac.inc"
It is NASM but a little bit beautified by the macros in "mac.inc".
But I think Frank has the original code in pure NASM syntax.
The stuff in the lindela directory is for Herbert's own assembler, the
stuff in the nasm directory will assemble with Nasm. The "trick" is in
the %include "mac.inc". Doesn't look like nasm, but it is...
You can get something that looks more like nasm out of it by using the
"-e" (preprocess only) switch... but that may not be where you want to
"_start"...
The famous Iczelion tutorials address the graphics API in Windows. All
part of the Windows API, pretty much. In Linux, the "graphics API" is
provided by the Xwindows server - another "app". The "almost normal" way
(the "normal way would be to use C) to access the X server would be to
call Xlib - or perhaps "helper libs" which call Xlib for us. These
examples communicate with the Xserver by opening a socket and
writing/reading to/from it.
It's also "normal" to use a linker - "nasm -f elf myfile.asm", "ld -o
myfile myfile.o". These examples include the executable header in the
source - "nasm -f bin myfile.asm" (you also have to "chmod +x myfile" -
Nasm's too "netwide" to do it). Nothing wrong with that, but it makes
'em look a little "weird", if you're not used to seeing it.
This may not be the first thing you want to jump into in Linux.
Iczelion's first tut is "hello world", via MessageBoxA. We have no
MessageboxA. We can create a window and put "hello world" in it - oh,
and an "OK" button - MessageBoxA comes with a selection of button
options as standard equipment - ours are "added cost extras". Linux is
different, and one of the differences is that graphics is a bitch!!!
If you'll settle for "hello world" in a lame '80s dos-style console - at
least to start with, that'd be easier...
So there isn't really an "equivalent" to Iczelion's tuts. To work you up
to it gently, here's "hello uhClem" with the header "built in", courtesy
of Herbert. Can arrange examples using a linker, too. I've got Herbert's
"annie.asm" translated to "real Nasm" - other's are on my TODO list - if
you really want to jump right into Xwindows ("the hard way").
Best,
Frank
; nasm hw2u.asm
; chmod +x hw2u
; [map all]
;===========================================================================
bits 32
ORIGIN equ 8048000h
org ORIGIN
section .text
code_offset equ 0
code_addr:
;--------------------------- ELF header -----------------------------------
dd $464c457f,$00010101,0,0,$00030002,1,main,$34,0,0,$00200034,2,0
dd 1,code_offset,code_addr,code_addr,code_filez,code_memsz,5,4096
dd 1,data_offset,data_addr,data_addr,data_filez,data_memsz,6,4096
;--------------------------- code ------------------------------------------
MAXNAME equ 256
main:
nop ; for the debuggers
mov ecx, prompt
mov edx, prompt_len
call write_stdout
mov eax, 3 ; __NR_read
mov ebx, 0 ; stdin
mov ecx, namebuf ; buffer
mov edx, MAXNAME ; maximum to read
int 80h
dec eax ; length returned includes LF we don't want
push eax ; save it for later
mov ecx, greet
mov edx, greet_len
call write_stdout
mov ecx, namebuf
pop edx ; retrieve the length
call write_stdout
mov ecx, coda
mov edx, coda_len
call write_stdout
exit:
mov eax, 1 ; __NR_exit
int 80h
write_stdout:
push ebx
mov eax, 4 ; __NR_write
mov ebx, 1 ; stdout
int 80h
pop ebx
ret
;--------------------------- constant data ---------------------------------
; (note that we're in .text, not .rdata)
align 4
; could put our strings here...
;---------------------------------------------------------------------------
align 4
code_memsz equ $ - $$
code_filez equ code_memsz
data_addr equ (ORIGIN+code_memsz+4095)/4096*4096 + (code_filez
% 4096)
data_offset equ code_filez
section .data vstart=data_addr
;--------------------------- initialized data ------------------------------
prompt db "Please tell me your name? "
prompt_len equ $ - prompt
greet db "Hello, "
greet_len equ $ - greet
coda db "! Welcome to Linux Assembly!", 10
coda_len equ $ - coda
;---------------------------------------------------------------------------
idat_memsz equ $ - $$
bss_addr equ data_addr + ($ - $$)
section .bss vstart=bss_addr
;--------------------------- uninitialized data ----------------------------
namebuf resb MAXNAME
;---------------------------------------------------------------------------
udat_memsz equ $ - $$
data_memsz equ idat_memsz + udat_memsz
data_filez equ idat_memsz
;===========================================================================
Thanks Frank. I couldn't get it to assemble
uhClem.asm:8: error: parser: instruction expected
when the line number is reported above, does it include comment lines?
If so, "org Origin" would be the 8th line
--
Regards,
Pop Tart
Yup. Assembling it with "-f elf" will produce that result ("org" is only
recognised in "-f bin"... and "-f ieee"???). This isn't a "normal" Linux
file, remember. We "should" use:
nasm -f bin -o myfile myfile.asm
But since the output format and the name both default to what we want...
nasm myfile.asm
That won't have executable permission set, so "chmod +x myfile".
If that doesn't do it, I'm switchin' to C! :)
Best,
Frank
Or maybe you like it better this way - assemble with "-f elf" and let
the linker do its job - ld -o myfile myfile.o. Even after "strip
-R.comment myfile" (Nasm puts its *name* in it!!!), it's bigger than
"Herbert's Method"...
; nasm -f elf hw2u.asm
; ld -o hw2u hw2u.o
; or "uhClem..."
global _start
MAXNAME equ 256
section .text
_start:
nop ; for the debuggers
section .data
prompt db "Please tell me your name? "
prompt_len equ $ - prompt
greet db "Hello, "
greet_len equ $ - greet
coda db "! Welcome to Linux Assembly!", 10
coda_len equ $ - coda
section .bss
namebuf resb MAXNAME
Even if does do it, switching to C wouldn't be a bad idea. Here your
program in C. What is better readable, C or assembly?
/**********************************************************************************/
main()
{char name[80];
putstring ("Please tell me your name? ");
getstring (79,name);
putstring ("Hello, ");
putstring (name);
putstring ( "! Welcome to Linux Assembly!\n");
}
putstring(s) char *s;
{while (*s) putchar(*s++);}
getstring(i,s) int i; char *s;
{int j; for (j=0; j<i; j++) if ((*s++ = getchar())=='\n') {s--; break;} *s=0;}
/**********************************************************************************/
You wanna read it, or you wanna run it? To be honest, I find your C
pretty "readable", up until the end. If I interpret the punctuation
salad at the end correctly, it looks like shit code, compared to mine.
Does it really call the OS for each character? Hard to tell...
What would a beginner (or anyone else) see if they stepped through your
code in a debugger, compared to mine? Which would be easier to "follow"?
For readability, "echo hello world" is hard to beat! Is that the "point"?
Best,
Frank
> You wanna read it, or you wanna run it?
If you only want to run it: the CPU doesn't care about the language
in which the source was written.
> To be honest, I find your C
> pretty "readable", up until the end. If I interpret the punctuation
> salad at the end correctly, it looks like shit code, compared to mine.
> Does it really call the OS for each character? Hard to tell...
I don't think putchar is a subroutine but a macro:
#define getc(fp) ((fp)->cnt-- ? (int) *(fp)->ptr++ : __getc(fp))
#define getchar() getc(stdin)
#define putc(c, fp) ((fp)->cnt-- > 1 ? (int)(*(fp)->ptr++ = (c)) : __putc(c, fp))
#define putchar(c) putc(c, stdout)
> What would a beginner (or anyone else) see if they stepped through your
> code in a debugger, compared to mine? Which would be easier to "follow"?
When you step through your code in a source code debugger, then you see
assembly code when your have an assembly source and you see C code if
you have a C source. If you write in C, then the lowest level is the C
level. If there isn't a bug in the C compiler, then there is no need
to go to a lower level. Just assume you have a CPU which directly
executes C code. When you debug your assembly program you also don't
step through the micro code of the processor.
> For readability, "echo hello world" is hard to beat! Is that the "point"?
But the readability of an input in a DOS batch file can easily beaten
by C and assembly!
> So there isn't really an "equivalent" to Iczelion's tuts.
Maybe you should also explain the reason why, 10 years later,
there is still "close to nothing" for Linux, whereas Linux
has hundred times more volunters than Windows ever had.
Just the truth would be enough: Linux is such an insane
packet of shit, that C is the tool for. Giving a chance to
Assembly, in the "Win32-Assembly" manner will be heroďc, to
say the less, and using any Library would be a kind of suicide.
Betov.
Yes, it cares only about what it's being asked to do.
>>To be honest, I find your C
>>pretty "readable", up until the end. If I interpret the punctuation
>>salad at the end correctly, it looks like shit code, compared to mine.
>>Does it really call the OS for each character? Hard to tell...
>
>
> I don't think putchar is a subroutine but a macro:
>
> #define getc(fp) ((fp)->cnt-- ? (int) *(fp)->ptr++ : __getc(fp))
> #define getchar() getc(stdin)
> #define putc(c, fp) ((fp)->cnt-- > 1 ? (int)(*(fp)->ptr++ = (c)) : __putc(c, fp))
> #define putchar(c) putc(c, stdout)
Okay, perfectly easy to read that and see what we're asking the CPU to
do... for *you*... (I guess I believe you...)
>>What would a beginner (or anyone else) see if they stepped through your
>>code in a debugger, compared to mine? Which would be easier to "follow"?
>
> When you step through your code in a source code debugger, then you see
> assembly code when your have an assembly source and you see C code if
> you have a C source. If you write in C, then the lowest level is the C
> level.
??? Having looked at it in assembly, I can see you don't wanna... but
you can.
> If there isn't a bug in the C compiler, then there is no need
> to go to a lower level.
You're in denial!
> Just assume you have a CPU which directly
> executes C code.
Sure. "echo" is an opcode, too!
> When you debug your assembly program you also don't
> step through the micro code of the processor.
True... and perhaps unfortunate, but it's "as low as we can go".
>>For readability, "echo hello world" is hard to beat! Is that the "point"?
>
> But the readability of an input in a DOS batch file can easily beaten
> by C and assembly!
True, but irrelevant. You're grasping at straws!
We can look at your C code. We could disassemble it with ndisasm - hard
to find the "real" code. Objdump is easier, although its output is AT&T
syntax. I'm gonna be a nice guy and spare you the *large* amout of cruft
that surrounds this, and post just "your code".
shit - wrong button... Okay, objdump...
(notice the address we're at before we even *get* to "main".)
080483d0 <main>:
80483d0: 55 push %ebp
80483d1: 89 e5 mov %esp,%ebp
80483d3: 53 push %ebx
80483d4: 83 ec 54 sub $0x54,%esp
80483d7: 83 e4 f0 and $0xfffffff0,%esp
80483da: 83 ec 0c sub $0xc,%esp
80483dd: 68 64 85 04 08 push $0x8048564
80483e2: e8 39 00 00 00 call 8048420 <putstring>
80483e7: 58 pop %eax
80483e8: 5a pop %edx
80483e9: 8d 5d a8 lea 0xffffffa8(%ebp),%ebx
80483ec: 53 push %ebx
80483ed: 6a 4f push $0x4f
80483ef: e8 6c 00 00 00 call 8048460 <getstring>
80483f4: c7 04 24 80 85 04 08 movl $0x8048580,(%esp)
80483fb: e8 20 00 00 00 call 8048420 <putstring>
8048400: 89 1c 24 mov %ebx,(%esp)
8048403: e8 18 00 00 00 call 8048420 <putstring>
8048408: c7 04 24 88 85 04 08 movl $0x8048588,(%esp)
804840f: e8 0c 00 00 00 call 8048420 <putstring>
8048414: 8b 5d fc mov 0xfffffffc(%ebp),%ebx
8048417: 89 ec mov %ebp,%esp
8048419: 5d pop %ebp
804841a: c3 ret
804841b: 90 nop
804841c: 8d 74 26 00 lea 0x0(%esi),%esi
08048420 <putstring>:
8048420: 55 push %ebp
8048421: 89 e5 mov %esp,%ebp
8048423: 53 push %ebx
8048424: 51 push %ecx
8048425: 8b 5d 08 mov 0x8(%ebp),%ebx
8048428: 8a 03 mov (%ebx),%al
804842a: 84 c0 test %al,%al
804842c: 75 12 jne 8048440 <putstring+0x20>
804842e: 8b 5d fc mov 0xfffffffc(%ebp),%ebx
8048431: c9 leave
8048432: c3 ret
8048433: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
8048439: 8d bc 27 00 00 00 00 lea 0x0(%edi),%edi
8048440: 83 ec 0c sub $0xc,%esp
8048443: 0f be c0 movsbl %al,%eax
8048446: 50 push %eax
8048447: 43 inc %ebx
8048448: e8 8b fe ff ff call 80482d8 <_init+0x38>
804844d: 8a 03 mov (%ebx),%al
804844f: 83 c4 10 add $0x10,%esp
8048452: 84 c0 test %al,%al
8048454: 75 ea jne 8048440 <putstring+0x20>
8048456: eb d6 jmp 804842e <putstring+0xe>
8048458: 90 nop
8048459: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi
08048460 <getstring>:
8048460: 55 push %ebp
8048461: 89 e5 mov %esp,%ebp
8048463: 57 push %edi
8048464: 56 push %esi
8048465: 53 push %ebx
8048466: 83 ec 0c sub $0xc,%esp
8048469: 31 ff xor %edi,%edi
804846b: 3b 7d 08 cmp 0x8(%ebp),%edi
804846e: 8b 75 0c mov 0xc(%ebp),%esi
8048471: 7c 0d jl 8048480 <getstring+0x20>
8048473: c6 06 00 movb $0x0,(%esi)
8048476: 83 c4 0c add $0xc,%esp
8048479: 5b pop %ebx
804847a: 5e pop %esi
804847b: 5f pop %edi
804847c: 5d pop %ebp
804847d: c3 ret
804847e: 89 f6 mov %esi,%esi
8048480: e8 43 fe ff ff call 80482c8 <_init+0x28>
8048485: 89 f3 mov %esi,%ebx
8048487: 46 inc %esi
8048488: 3c 0a cmp $0xa,%al
804848a: 88 03 mov %al,(%ebx)
804848c: 74 08 je 8048496 <getstring+0x36>
804848e: 47 inc %edi
804848f: 3b 7d 08 cmp 0x8(%ebp),%edi
8048492: 7c ec jl 8048480 <getstring+0x20>
8048494: eb dd jmp 8048473 <getstring+0x13>
8048496: 4e dec %esi
8048497: eb da jmp 8048473 <getstring+0x13>
[more cruft snipped]
Here's the *entire* objdump ("-d") output from my code:
hw2u: file format elf32-i386
Disassembly of section .text:
08048080 <_start>:
8048080: 90 nop
08048081 <commence>:
8048081: b9 e8 90 04 08 mov $0x80490e8,%ecx
8048086: ba 1b 00 00 00 mov $0x1b,%edx
804808b: e8 48 00 00 00 call 80480d8 <write_stdout>
8048090: b8 03 00 00 00 mov $0x3,%eax
8048095: bb 00 00 00 00 mov $0x0,%ebx
804809a: b9 28 91 04 08 mov $0x8049128,%ecx
804809f: ba 00 01 00 00 mov $0x100,%edx
80480a4: cd 80 int $0x80
80480a6: 48 dec %eax
80480a7: 50 push %eax
80480a8: b9 03 91 04 08 mov $0x8049103,%ecx
80480ad: ba 07 00 00 00 mov $0x7,%edx
80480b2: e8 21 00 00 00 call 80480d8 <write_stdout>
80480b7: b9 28 91 04 08 mov $0x8049128,%ecx
80480bc: 5a pop %edx
80480bd: e8 16 00 00 00 call 80480d8 <write_stdout>
80480c2: b9 0a 91 04 08 mov $0x804910a,%ecx
80480c7: ba 1d 00 00 00 mov $0x1d,%edx
80480cc: e8 07 00 00 00 call 80480d8 <write_stdout>
080480d1 <exit>:
80480d1: b8 01 00 00 00 mov $0x1,%eax
80480d6: cd 80 int $0x80
080480d8 <write_stdout>:
80480d8: 53 push %ebx
80480d9: b8 04 00 00 00 mov $0x4,%eax
80480de: bb 01 00 00 00 mov $0x1,%ebx
80480e3: cd 80 int $0x80
80480e5: 5b pop %ebx
80480e6: c3 ret
I observed that you code "looks like" it calls the OS for every
character. It does not. In fact, it beats my code in that regard. We can
observe OS calls with "strace". Here's the output of strace from my code:
execve("nolink/hw2u", ["nolink/hw2u"], [/* 36 vars */]) = 0
write(1, "Please tell me your name? ", 27) = 27
read(0, "fred\n", 256) = 5
write(1, "Hello, ", 7) = 7
write(1, "fred", 4) = 4
write(1, "! Welcome to Linux Assembly!\n", 29) = 29
_exit(0) = ?
Here's the output from strace from your C code.
(this initial stuff is loading the shared library.
execve("./hkhw2u", ["hkhw2u"], [/* 36 vars */]) = 0
brk(0) = 0x80496b4
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40015000
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=88383, ...}) = 0
old_mmap(NULL, 88383, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40016000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360Y\1"...,
1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=1250840, ...}) = 0
old_mmap(NULL, 1237892, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002c000
mprotect(0x40154000, 25476, PROT_NONE) = 0
old_mmap(0x40154000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
3, 0x128000) = 0x40154000
old_mmap(0x40158000, 9092, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40158000
close(3) = 0
munmap(0x40016000, 88383) = 0
fstat64(1, {st_mode=S_IFCHR|0720, st_rdev=makedev(4, 1), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40016000
fstat64(0, {st_mode=S_IFCHR|0720, st_rdev=makedev(4, 1), ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40017000
I include all this because the parameters to ioctl may be of interest...
Here's the "business":
write(1, "Please tell me your name? ", 27) = 27
read(0, "fred\n", 4096) = 5
write(1, "Hello, fred! Welcome to Linux As"..., 40) = 40
munmap(0x40016000, 4096) = 0
exit_group(0) = ?
As you see...
putstring ("Hello, ");
putstring (name);
putstring ( "! Welcome to Linux Assembly!\n");
were concatenated to a single OS call. This is a Good Thing. I'll give C
this one. Aside from that, I don't find it particularly "elegant" or
"easy to follow" compared to "pure asm".
Which language to use is "programmer's choice", of course. But those who
claim C is "superior" in some absolute sense, either don't know, or
they're deceiving us! The CPU *doesn't* know what language code was
written in, it just follows the "detour" signs...
Best,
Frank
> I observed that you code "looks like" it calls the OS for every
> character. It does not.
Now GCC disappoints me. Just compiled the two line C program:
#include <stdio.h>
main() {putchar(10);}
GCC gives me:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl stdout, %eax
movl $10, (%esp)
movl %eax, 4(%esp)
call _IO_putc
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)"
.section .note.GNU-stack,"",@progbits
This means, for each putchar() in the source, _IO_putc is called.
The Microsoft C compiler defines the macro putchar() in stdio.h
#define putchar(_c) putc((_c),stdout)
#define putc(_c,_stream) (--(_stream)->_cnt >= 0 \
? 0xff & (*(_stream)->_ptr++ = (char)(_c)) : _flsbuf((_c),(_stream)))
And therefore the compiled code looks like:
; 3 : main() {putchar(10);}
00000 a1 24 00 00 00 mov eax, DWORD PTR __iob+36
00005 48 dec eax
00006 a3 24 00 00 00 mov DWORD PTR __iob+36, eax
0000b 78 14 js SHORT $L201
0000d a1 20 00 00 00 mov eax, DWORD PTR __iob+32
00012 c6 00 0a mov BYTE PTR [eax], 10 ; 0000000aH
00015 a1 20 00 00 00 mov eax, DWORD PTR __iob+32
0001a 40 inc eax
0001b a3 20 00 00 00 mov DWORD PTR __iob+32, eax
00020 c3 ret 0
$L201:
00021 68 20 00 00 00 push OFFSET FLAT:__iob+32
00026 6a 0a push 10 ; 0000000aH
00028 e8 00 00 00 00 call __flsbuf
0002d 83 c4 08 add esp, 8
00030 c3 ret 0
_main ENDP
It only calls an external subroutine when the buffer is full.
> Maybe you should also explain the reason why, 10 years later,
> there is still "close to nothing" for Linux, whereas Linux
> has hundred times more volunters than Windows ever had.
>
> Just the truth would be enough: Linux is such an insane
> packet of shit, that C is the tool for. Giving a chance to
Maybe you can explain what exactly is a "packet of shit".
The API? Or the lack of any graphics support? Unix is an OS
an not pixel drawing program. You always have stdin/stdout
available. And if you really need graphics, there is an
application called "X" which is available for most Unix systems.
Yes, this makes graphics slow and complicated, but Windows
is also slow compared to playstation and Windows graphics
is also complicated compared with DOS graphics.
> Assembly, in the "Win32-Assembly" manner will be heroïc, to
You want gcc to over-ride the headers provided by Ubuntu (and make
similar provisions for each target), even if you don't invoke whatever
options are present in your particular header?
"No equivalent to Iczelion" ne "close to nothing"!
There isn't a "Beginner's Guide to GUI Assembly Programming in Linux",
because it ain't a job for a beginner. If easy eye-candy is your goal, I
can see why you'd prefer Windows.
If you just want to program for a 32-bit, multi-tasking, network-aware
OS (this ain't dos!)... that ain't MS... start with the assembly
language "HowTo"... Jeff Owens has got some great documentation (though
not a tutorial "equivalent to Iczelion").
> Just the truth would be enough: Linux is such an insane
> packet of shit, that C is the tool for.
C *is* the tool for Linux, because Linux runs on other-than-x86. If we
wanna do "asm for Linux", we necessarily target a subset of Linux.
Linux-on-x86 is a pretty good sized target, if a small minority of the
"total". (if we count all microprocessors, x86-on-desktops is
approximately 0%!) Asm for Linux is selfish - contrary to the spirit of
the GPL, we're excluding certain users. (balancing this, "why should x86
users be deprived of the superiority of asm code just because a few
can't use it"...)
> Giving a chance to
> Assembly, in the "Win32-Assembly" manner will be heroīc, to
> say the less, and using any Library would be a kind of suicide.
Ummm, you really don't have much choice but use libraries in Windoze, do
you? In Linux, we've got the option of doing that or using the int 80h
interface. Any advantage to the latter is somewhat deceptive. We have
source code for everything, so we can see that behind the int 80h, the
parameters we so lovingly put into registers are pushed on the stack and
we jump off into the "same damn code" via a jump table indexed off eax
(as I recall...).
For Windows, you mostly don't have source code for the "magic boxes"
you're calling. You could look at the ReactOS code, I suppose (alleged
to be the same thing, in some circles). Or disassemble the code that's
actually being executed.
Whip up a "hello joe" - Wannabee assures us it'll only take about ten
seconds with:
> < http://rosasm.org >
We'll compare what the code *really* does - deceptively "simple" source
code - like C - doesn't impress the CPU - and then we can decide which
is an insane packet of shit, and which, if any, isn't.
Trying to do asm in Linux in the "Win32-Assembly manner" would indeed be
"heroic"! Doing asm in Linux in the "Linux-Assembly manner" is quite
possible. We can even have eye-candy - with or without libraries - but
first, we need sys-calls, and particularly socket calls, or else we need
to learn to link against and call libraries (like you-know-who-doze).
The latter is not particularly difficult - less "built in" than Windows,
perhaps - but it's less like "assembly language"...
Have you *done* any asm programming for Linux? Do you want to, or do you
just like bitchin' about how impossible it is?
Best,
Frank
>This means, for each putchar() in the source, _IO_putc is called.
I think this is for thread-safety (so multiple threads can use the
same stream simultaneously, though that seems like a Really Bad Idea).
There may be some way to override it.
-- Richard
--
:wq
c.l.c removed. I got no inclination to talk to them, and I doubt if the
want to talk to me. :)
> Now GCC disappoints me.
Well, gcc has impressed me favorably on occasion, so I guess it's okay
if it disappoints you sometimes. It is what it is.
> Just compiled the two line C program:
Any "-O" switches?
> #include <stdio.h>
> main() {putchar(10);}
>
> GCC gives me:
>
> main:
> leal 4(%esp), %ecx
> andl $-16, %esp
> pushl -4(%ecx)
> pushl %ebp
> movl %esp, %ebp
> pushl %ecx
> subl $20, %esp
> movl stdout, %eax
> movl $10, (%esp)
> movl %eax, 4(%esp)
> call _IO_putc
> addl $20, %esp
> popl %ecx
> popl %ebp
> leal -4(%ecx), %esp
> ret
> .size main, .-main
> .ident "GCC: (GNU) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)"
> .section .note.GNU-stack,"",@progbits
>
> This means, for each putchar() in the source, _IO_putc is called.
Yeah...
> The Microsoft C compiler
Microsoft got a C compiler for Linux??? We're comparing apples and oranges!
> defines the macro putchar() in stdio.h
>
> #define putchar(_c) putc((_c),stdout)
> #define putc(_c,_stream) (--(_stream)->_cnt >= 0 \
> ? 0xff & (*(_stream)->_ptr++ = (char)(_c)) : _flsbuf((_c),(_stream)))
>
> And therefore the compiled code looks like:
>
> ; 3 : main() {putchar(10);}
>
> 00000 a1 24 00 00 00 mov eax, DWORD PTR __iob+36
> 00005 48 dec eax
> 00006 a3 24 00 00 00 mov DWORD PTR __iob+36, eax
> 0000b 78 14 js SHORT $L201
> 0000d a1 20 00 00 00 mov eax, DWORD PTR __iob+32
> 00012 c6 00 0a mov BYTE PTR [eax], 10 ; 0000000aH
> 00015 a1 20 00 00 00 mov eax, DWORD PTR __iob+32
> 0001a 40 inc eax
> 0001b a3 20 00 00 00 mov DWORD PTR __iob+32, eax
> 00020 c3 ret 0
> $L201:
> 00021 68 20 00 00 00 push OFFSET FLAT:__iob+32
> 00026 6a 0a push 10 ; 0000000aH
> 00028 e8 00 00 00 00 call __flsbuf
> 0002d 83 c4 08 add esp, 8
> 00030 c3 ret 0
> _main ENDP
>
> It only calls an external subroutine when the buffer is full.
Objdump is showing:
8048448: e8 8b fe ff ff call 80482d8 <_init+0x38>
If I step through this in gdb, it's showing me a function (I think),
from libc.so.6 - "putchar [ + ???]". In order to make gdb a little more
"compliant", I had to compile with "-g", which may account for some of
the detours I'm seeing... "_r_debug_", "__overflow",
"flock/unlockfile"... I got quite lost.
However, for all of this - a call to libc for each character - libc is
buffering output "behind our back", and actually calls "write" fewer
times than my asm code ("your" header on some versions, but for
objump/gdb, assembled with "-f elf -g" and ld, unstripped - so I'm
comparing apples and oranges, too - same code is executed - the
difference is in the headers). So there's a tradeoff...
In the context of "Iczelion's tuts", I still think I like my way better...
Best,
Frank
Not really relevant. The C compilers I use seem to put main() near the end
of the file. main() isn't the first function called by C anyway. The C
startup has to a few things:
1) setup auto stack
2) initialize static storage
3) open three required streams
4) get command line arguments
5) call main
6) call exit with the return from main
It might have to do other stuff, like allocate a memory block from the host
OS for sbrk, malloc, etc...
> 080483d0 <main>:
[original cruft snipped]
> [more cruft snipped]
Interesting, the code seems to have some type size thrashing, i.e., mixing
8-bit and 32-bit registers and using 'movsbl'. I usually only see that
w/DJGPP when 'int' and 'char' types are mixed in the C code, but that
doesn't seem to be the case. Was this with -O2?
> Here's the *entire* objdump ("-d") output from my code:
>
[Frank's cruft snipped]
>
Get rid of the stack frames, main(), char a time retrieval, and they begin
to look much more alike...
> I observed that you code "looks like" it calls the OS for every
> character. It does not.
That's because C is buffered unless disabled via setbuf() and/or modified by
setmode()...
> In fact, it beats my code in that regard.
That's the point of buffering, isn't it?
> We can
> observe OS calls with "strace". Here's the output of strace from my code:
>
> execve("nolink/hw2u", ["nolink/hw2u"], [/* 36 vars */]) = 0
> write(1, "Please tell me your name? ", 27) = 27
> read(0, "fred\n", 256) = 5
> write(1, "Hello, ", 7) = 7
> write(1, "fred", 4) = 4
> write(1, "! Welcome to Linux Assembly!\n", 29) = 29
> _exit(0) = ?
>
From a non-Linux user, looks decent. unistd.h only... :-) But, using
_exit() instead of exit() can leave files open, memory unfreed, etc...
Buyer beware!
> Here's the output from strace from your C code.
>
> (this initial stuff is loading the shared library.
>
> execve("./hkhw2u", ["hkhw2u"], [/* 36 vars */]) = 0
> brk(0) = 0x80496b4
[snip, much more...]
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x40017000
>
...what a ... mess...
> I include all this because the parameters to ioctl may be of interest...
>
> Here's the "business":
>
> write(1, "Please tell me your name? ", 27) = 27
> read(0, "fred\n", 4096) = 5
> write(1, "Hello, fred! Welcome to Linux As"..., 40) = 40
> munmap(0x40016000, 4096) = 0
> exit_group(0) = ?
>
Looks good, except shouldn't you have left in one of the mmap's since you
have an munmap? :-)
> As you see...
>
> putstring ("Hello, ");
> putstring (name);
> putstring ( "! Welcome to Linux Assembly!\n");
>
> were concatenated to a single OS call. This is a Good Thing. I'll give C
> this one. Aside from that, I don't find it particularly "elegant" or
> "easy to follow" compared to "pure asm".
>
Was that with -O2 or -O3 on the C code?
> Which language to use is "programmer's choice", of course. But those who
> claim C is "superior" in some absolute sense, either don't know, or
> they're deceiving us!
I know both and use both. I'm fond of assembly. 6502 on the 6510 was the
second language I learned. But, I love C. At one point, I loved BASIC. I
looked at the BASIC I learned (C64) a year or two ago. It's very limited.
It's very limited even if updated. About the only useful aspect was the
simple string functionality: mid$, left$, right$, and + for concatenation.
Assembly is my second choice today, and was my second choice then... The
other numerous languages I've experienced are useless, dead, etc. They're
not going to make a recovery, ever. However, I'd rather not use assembly.
Although I don't care about code portability from machine to machine, I do
care about code portability from toolset to toolset. C does that. Assembly
doesn't. I can't take assembly code from the GNU toolset (GAS x86) to NASM
toolset or MASM toolset. The big advantages to C for me are keeping track
of variables, eliminate many number calculations: offsets, bits, constants,
and optimizers that I don't have to code myself... C isn't perfect. There
are quite a few small little problems. There are things from Pascal and
PL/1 etc. that I'd like to see in the C language.
> The CPU *doesn't* know what language code was
> written in, it just follows the "detour" signs...
True.
Rod Pemberton
> Unix is an OS
> an not pixel drawing program
I wouldn't call "that thing" an OS, but, basicaly, you are
unfortunately right.
Betov.
> Trying to do asm in Linux in the "Win32-Assembly manner" would indeed
> be "heroic"! Doing asm in Linux in the "Linux-Assembly manner" is
> quite possible. We can even have eye-candy - with or without libraries
> - but first, we need sys-calls, and particularly socket calls, or else
> we need to learn to link against and call libraries (like
> you-know-who-doze). The latter is not particularly difficult - less
> "built in" than Windows, perhaps - but it's less like "assembly
> language"...
I love your sense of humour, but you still fail to explain the
reason why, whereas the Linux volunteers count units are hundreads
and thousands, in the past 10 years, there has been about 4 or 5
significative Asm projects, and 0 for Linux (sorry, i am unable
to count the Jeff project as a *significative* one, dispiting
my natural sympathie for anything done in the area)
Betov.
[assembly snipped]
>This means, for each putchar() in the source, _IO_putc is called.
It is possible that any putchar() calls _IO_putc(), but the above
does not show this (even if you put back the assembly that I
snipped). The reason is that 10 happens to be equal to value of
'\n' on this system, and putchar('\n') must push the output to the
system if the stream in question is line-buffered. Since stdout
is very often line-buffered, one should expect putchar('\n') to
make an O/S call very often as well, and an implementation-specific
function like _IO_putc() is a good place to hide the test for
whether the call is needed, along with the call itself.
On the other hand, putchar('x'), for instance, could be done without
making a function call (since 'x' is clearly not a newline).
Similarly, a source-code construct like putchar(c), where c is a
variable rather than a known-constant newline, might expand to code
that tests whether c=='\n', then buffers c or calls _IO_putc(c) as
appropriate. (As someone suggested in another follow-up, one might
need to turn off thread support to see this, too.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
No shit! You want me post the whole thing? :)
>>080483d0 <main>:
>
> [original cruft snipped]
>
>>[more cruft snipped]
>
>
> Interesting, the code seems to have some type size thrashing, i.e., mixing
> 8-bit and 32-bit registers and using 'movsbl'. I usually only see that
> w/DJGPP when 'int' and 'char' types are mixed in the C code, but that
> doesn't seem to be the case. Was this with -O2?
Don't recall... Still in there with -O2.
>>Here's the *entire* objdump ("-d") output from my code:
>>
>
> [Frank's cruft snipped]
Unlike Herbert's example, it looked pretty much like the source...
> Get rid of the stack frames, main(), char a time retrieval, and they begin
> to look much more alike...
Sure. A few minutes in a hex editor will do a lot for C.
>>I observed that you code "looks like" it calls the OS for every
>>character. It does not.
>
> That's because C is buffered unless disabled via setbuf() and/or modified by
> setmode()...
"C is buffered" seems misleading to me. Many of the functions in libc
are buffered. The man pages warn us against mixing buffered and
unbuffered functions. Like so:
#include <stdio.h>
#include <unistd.h>
int main()
{
char name[80];
int name_len;
printf ("Please tell me your name? ");
/* fflush(stdout); */
name_len = read (0, name, 79);
name[name_len - 1] = 0;
/* gets (name); */ /* this *does* flush stdout! */
printf ("Hello, %s! Welcome to Linux Assembly!\n", name);
return 0;
}
Without the "fflush(stdout)", it just waits for ya to type something,
and prints prompt and all after you hit Enter. (ending with \n also
flushes buffer)
The "gets", of course, is another example of what we should *not* do!!!
Gcc yells at me for it. Do other compilers?
>>In fact, it beats my code in that regard.
>
> That's the point of buffering, isn't it?
Yeah.
>>We can
>>observe OS calls with "strace". Here's the output of strace from my code:
>>
>>execve("nolink/hw2u", ["nolink/hw2u"], [/* 36 vars */]) = 0
>>write(1, "Please tell me your name? ", 27) = 27
>>read(0, "fred\n", 256) = 5
>>write(1, "Hello, ", 7) = 7
>>write(1, "fred", 4) = 4
>>write(1, "! Welcome to Linux Assembly!\n", 29) = 29
>>_exit(0) = ?
>
> From a non-Linux user, looks decent. unistd.h only... :-) But, using
> _exit() instead of exit() can leave files open, memory unfreed, etc...
> Buyer beware!
I'm not actually "using" exit() or _exit(). I'm using sys_exit ("mov
eax, __NR_exit" (4, on this system - 64-bit differs on some of 'em!),
"int 80h". If this leaves files open, every file I've opened is still
open. I don't much care about unfreed memory - the entire address=space
vanishes anyway. Sounds like FUD to me!
>>Here's the output from strace from your C code.
>>
>>(this initial stuff is loading the shared library.
>>
>>execve("./hkhw2u", ["hkhw2u"], [/* 36 vars */]) = 0
This one is actually from strace itself, I think...
>>brk(0) = 0x80496b4
>
> [snip, much more...]
>
>>old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
>>-1, 0) = 0x40017000
>
> ...what a ... mess...
I'm not sure exactly what you find objectionable with it. Any file that
uses a shared lib does it - even if there's no C involved at all.
>>I include all this because the parameters to ioctl may be of interest...
>>
>>Here's the "business":
>>
>>write(1, "Please tell me your name? ", 27) = 27
>>read(0, "fred\n", 4096) = 5
>>write(1, "Hello, fred! Welcome to Linux As"..., 40) = 40
>>munmap(0x40016000, 4096) = 0
>>exit_group(0) = ?
>>
>
>
> Looks good, except shouldn't you have left in one of the mmap's since you
> have an munmap? :-)
Okay, "end of business - back to code we didn't write" before the "munmap".
>>As you see...
>>
>> putstring ("Hello, ");
>> putstring (name);
>> putstring ( "! Welcome to Linux Assembly!\n");
>>
>>were concatenated to a single OS call. This is a Good Thing. I'll give C
>>this one. Aside from that, I don't find it particularly "elegant" or
>>"easy to follow" compared to "pure asm".
>
> Was that with -O2 or -O3 on the C code?
Which would you recommend?
Anything but "-Wall" - some of Herbert's code isn't Sanctified by the
Kommittee, apparently. :) Works fine in spite of it. FWIW, the above
code *will* tolerate "-Wall" - it must be coming back to me. :)
I started with "-O3", IIRC, maybe tried "-O2", tried "-Os", added "-g"
at some point... I'm not sure just which version I posted the "objdump"
of. It *does* make a difference! No "-O" switch produces amusingly awful
code. ("movl $0x0, %eax"/"subl %eax, %esp" and all...)
>>Which language to use is "programmer's choice", of course. But those who
>>claim C is "superior" in some absolute sense, either don't know, or
>>they're deceiving us!
>
>
> I know both and use both. I'm fond of assembly. 6502 on the 6510 was the
> second language I learned. But, I love C. At one point, I loved BASIC. I
> looked at the BASIC I learned (C64) a year or two ago. It's very limited.
> It's very limited even if updated. About the only useful aspect was the
> simple string functionality: mid$, left$, right$, and + for concatenation.
> Assembly is my second choice today,
Off-by-one error! :)
> and was my second choice then... The
> other numerous languages I've experienced are useless, dead, etc. They're
> not going to make a recovery, ever. However, I'd rather not use assembly.
> Although I don't care about code portability from machine to machine, I do
> care about code portability from toolset to toolset. C does that.
Superficially, at least. Observing the Nasm development team try to get
Nasm to compile on "any compiler" convinces me this is wishful thinking.
Herbert's code compiles and works with gcc and MSC, but gcc is *not*
doing what he had in mind. Is this "okay"? Really doesn't matter, but if
you *did* care (and I think Herbert *did* intend some specific code),
it's another pinprick in C's portability bubble.
> Assembly doesn't.
That's *very* true! We're far too "independent minded" (pigheaded) to
tolerate any "asm standards committee". So we live in the Tower of Babel.
> I can't take assembly code from the GNU toolset (GAS x86) to NASM
> toolset or MASM toolset. The big advantages to C for me are keeping track
> of variables, eliminate many number calculations: offsets, bits, constants,
> and optimizers that I don't have to code myself... C isn't perfect. There
> are quite a few small little problems. There are things from Pascal and
> PL/1 etc. that I'd like to see in the C language.
I'm not bashing C (bash is too good for it - club it with cmd.exe! :)
I made the foolish statement "if that doesn't work, I'm switching to C".
A damned lie - I have no intention of switching to C whether "it" works
(for Brian/Pop Tart, I guess) or not! "Eat my hat" has been used (up!),
so I was just trying to make a different joke. Shouldn't have said it.
Herbert came back claiming this was "better readable":
/**********************************************************************************/
main()
{char name[80];
putstring ("Please tell me your name? ");
getstring (79,name);
putstring ("Hello, ");
putstring (name);
putstring ( "! Welcome to Linux Assembly!\n");
}
putstring(s) char *s;
{while (*s) putchar(*s++);}
getstring(i,s) int i; char *s;
{int j; for (j=0; j<i; j++) if ((*s++ = getchar())=='\n') {s--;
break;} *s=0;}
/**********************************************************************************/
I claim that this is a "preconceived notion". Swahili is "more readable"
to a native Swahili speaker, I imagine. I don't doubt that this is
"readable" to a "C speaker". Only Rosario could love those last couple
lines, but the first part is pretty "readable", even to the "naive reader".
We can write macros that make asm "look more like C". I could do:
sys_write stdout, prompt, prompt_len
sys_read stdin, name, MAXNAME
mov [name_len], eax
sys_write stdout, greet, greet_len
sys_write stdout, name, dword [name_len]
sys_write stdout, coda, coda_len
I could arrange:
say "Please tell me your name? "
nget name, 79
...
too. But these "hide" things I was intending to "show".
Perhaps in the context of "equivalent to Iczelion's tuts", I *should*
have done that - Iczelion does, but *first* he shows how to do it in
"plain assembly".
>>The CPU *doesn't* know what language code was
>>written in, it just follows the "detour" signs...
>
> True.
Yeah. These "detours" are no big deal - there's a "payoff" in terms of
readability (maybe), portability, flexibility... But for a "beginner
example", I think it'd be better to avoid 'em. We can't avoid "black
boxes" entirely - "int 80h" is just as mysterious as "call printf", but
I think it's easier for a beginner to *use* int 80h - no need to know
anything about the fact that call/ret use the stack, as do the
parameters we're passing (and we don't want 'em to bump). We want to
show 'em that pretty soon - kind of a prerequisite to Iczelion's tuts,
or (probably) any "equivalent". I was trying to "keep it simple"...
The goal isn't to find out the user's name, nor to produce the most
"readable" program to do so. The idea was to provide an example, "as
simple as possible, but no simpler", to serve as a "prerequisite" to the
non-existant Iczelion-style tuts to introduce us to Xwindows...
No reflection on your preferred language - I haven't used the phrase
"sick C religion" even once, and I won't say it now (oops!) - but I
really don't think "it's more readable in C" is any part of an answer to
the question at hand!
Best,
Frank
>It is possible that any putchar() calls _IO_putc(),
... and it seems to be true on the Linux systems I have access to. It
appears to be for thread-safety, but making writes to the same file
"thread-safe" is not very useful: it guarantees that you won't, for
example, get a segmentation fault, but the characters may come out in
any order. To make the pointless case safe at the expensive of making
the common case slow seems like a mistake to me.
>Since stdout
>is very often line-buffered, one should expect putchar('\n') to
>make an O/S call very often as well, and an implementation-specific
>function like _IO_putc() is a good place to hide the test for
>whether the call is needed, along with the call itself.
That was my initial guess as to the explanation too, but it turns out
to be wrong.
-- Richard
--
:wq
You really want to know the secret reason why? Just to piss you off!
Best,
Frank
:) Yes, K&R C...
> Works fine in spite of it.
You can thank GCC for that... No one has to support K&R anymore.
> No "-O" switch produces amusingly awful
> code.
That's why I asked. Some of the assembly didn't look like it had gone
through -O2 optimization...
> Superficially, at least. Observing the Nasm development team try to get
> Nasm to compile on "any compiler" convinces me this is wishful thinking.
Ooo, tell me more! I can use it to pester c.l.c. C is 100% portable
dorks... :-)
Actually, I have my own set of rules that keep me out of my problem C
programming areas, e.g., not using unions, or bitfields since older
compilers failed to implement them well.
> Herbert's code compiles and works with gcc and MSC, but gcc is *not*
> doing what he had in mind.
K&R... or buffering...
> Is this "okay"?
Does it work?
> Really doesn't matter, but if
> you *did* care (and I think Herbert *did* intend some specific code),
> it's another pinprick in C's portability bubble.
>
I usually state C is only 40% portable or so, and another 30% forced to look
portable... Try saying that in c.l.c!
> I made the foolish statement "if that doesn't work, I'm switching to C".
I didn't see it in your .sig... :-0
> Herbert came back claiming this was "better readable":
>
<snip>
>> {int j; for (j=0; j<i; j++) if ((*s++ = getchar())=='\n') {s--;
>> break;} *s=0;}
...
> I don't doubt that this is
> "readable" to a "C speaker".
Is it really that hard?
Without testing (i.e., probably some errors...), I'd start by writing
something like this:
getstring:
%define j_ eax
%define i_ ebx
%define s_ ecx
pop i_
pop s_
mov j_, 0
for_00:
if_00:
call getchar ; return in eax
mov [s_],eax
inc s_
cmp [s_],byte 0x10 ; want LF 0x10 or CR 0x13 ?
jne if_00_end;
if_00_body:
dec s_
jmp for_00_end ; break
if_00_end:
inc j_
cmp j_,i_ ; correct order for j<i ?
jb for_00
for_00_end:
mov [s_],byte 0
The only real issues I see for an assembly programmer is 1) where the
comparison and increment in the for loop are placed and 2) understanding of
where the body of the for loop is without {}. Admittedly, this is a fairly
simple example. Next, I'd rework the branches, labels, replace 'mov ,0'
with xor, fit the parameters to whatever prolog or register usage was in
place, did I get the arg's to cmp correct, etc... getchar would likely be a
small wrapper around whatever the OS actually uses.
Rod Pemberton
> Actually, I have my own set of rules that keep me out of my problem C
> programming areas, e.g., not using unions, or bitfields since older
> compilers failed to implement them well.
Do you use "int"s? Or "long"s? Or platform specific include files? Or
call platform specific library routines?
One can't do any of those things in the Nasm source code, since they
all break compatibility.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
> What is better readable, C or assembly?
I don't want to get bogged down in arguing over definitions, but I
would like to suggest that there are two distinct levels of
readability. To illustrate this, consider this line of code from the
Nasm source:
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
From the standpoint of what the cpu will do with the code generated
from this statement, it is quite readable to anyone who knows C.
However, as Frank recently brought to my attention, there is a bug in
this code line, the symptom of which is that statements such as
foo equ 42
result in "foo" being added to the symbol table twice, if and only if
the optimization level is greater than one. There is, of course, no
diagnostic message to help one find this bug.
The problem is that there is not enough information in the code line to
allow one to determine what the the programmer intended the code to do.
It is this second, and much more important, level of readability which
is so often lacking in HLL code.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
No, if you have a HL construct (either in a HLL or a macro in assembly),
it's never clear what the CPU will do. It is only clear what happens to
the program, but which CPU instructions are executed depends on the used
compiler or macro content.
> However, as Frank recently brought to my attention, there is a bug in
> this code line, the symptom of which is that statements such as
>
> foo equ 42
>
> result in "foo" being added to the symbol table twice, if and only if
> the optimization level is greater than one. There is, of course, no
> diagnostic message to help one find this bug.
I'm not sure what you want to say. If you want to find a bug in a compiler
you have to look at the compiler output, but if the compiler is bug free
and you want to find a bug in your C source, then you don't have to leave
the level of C and inspect any generated assembly code. If you have your
C source and the C specification, you can execute the program by hand
without first generating any transformation to a lower level representation
(like assembly or machine code).
The problem is to understand the specification. It's like you have a
video recorder but don't understand the instruction manual how to
record a video. Now because you are an assembly programmer, you open
the recorder and disassemble the code in the micro controller and now
you know how to record a telecast. That is the same as looking at
the generated assembly code of a C program. Wouldn't it be better to
understand the instruction manual (and force the author to give you
an understandable manual) instead of disassembling the micro controller?
> The problem is that there is not enough information in the code line to
> allow one to determine what the the programmer intended the code to do.
> It is this second, and much more important, level of readability which
> is so often lacking in HLL code.
You always will find the information "what" he want to do (otherwise
the program wouldn't make any sense), but you will not find the
information how to do it at any lower level. An that's true for all
level of programming. In an assembly program you may specify that you
want to add the content of two registers but you don't tell the CPU
how to do it (by a fast carry look ahead adder or maybe only micro coded).
And you really don't care about how the add is done as long it is correctly
done. And the same is true for a "while" or "for" loop in a HLL. You
don't care about how the compiler does it as long as it does it correctly.
Microsoft also supports it. And I suppose any C compiler where at
least one logical thinking developer is involved will support it.
K&R seems where smart assembly programmer (the DEC/Motorola type guys,
not the Intel type boys) and they did it the logical way. But who
is "ANSI" did you ever see an assembly program written by "ANSI"?
But I don't think this is the reason for the warning, but the implicit
int declaration for the functions (what sense does it make do define
a default type if you then get a warning when you use it) and the
missing return value (why should I return something when I don't what
anything, everything in eax is ok).
>
> Herbert came back claiming this was "better readable":
>
> /**********************************************************************************/
> main()
> {char name[80];
> putstring ("Please tell me your name? ");
> getstring (79,name);
> putstring ("Hello, ");
> putstring (name);
> putstring ( "! Welcome to Linux Assembly!\n");
> }
>
> putstring(s) char *s;
> {while (*s) putchar(*s++);}
>
> getstring(i,s) int i; char *s;
> {int j; for (j=0; j<i; j++) if ((*s++ = getchar())=='\n') {s--; break;} *s=0;}
> /**********************************************************************************/
>
> I claim that this is a "preconceived notion". Swahili is "more readable"
> to a native Swahili speaker, I imagine. I don't doubt that this is
> "readable" to a "C speaker". Only Rosario could love those last couple
> lines, but the first part is pretty "readable", even to the "naive reader".
The "readabilty" was about "main" and not the subroutines "putstring/getstring".
To make the essential readable, you has to hide the unimportant in as less as
possible lines.
If I wanted to make the subroutines "readable for an assembly programmer",
I would have written:
main()
{char name[80];
putstring ("Please tell me your name? ");
getstring (79,name);
putstring ("Hello, ");
putstring (name);
putstring ( "! Welcome to Linux Assembly!\n");
}
putstring(text) char *text;
{int pos=0;
loop1:
if (text[pos]==0) return;
putchar(text[pos]);
pos=pos+1;
goto loop1;
}
getstring(max,name) int max; char *name;
{int pos=0;
char c;
loop2:
c=getchar();
if (c=='\n') {name[pos]=0; return;}
name[pos]=c;
pos=pos+1;
if (pos <= max) goto loop2;
name[pos]=0;
}
> We can write macros that make asm "look more like C". I could do:
>
> sys_write stdout, prompt, prompt_len
> sys_read stdin, name, MAXNAME
> mov [name_len], eax
> sys_write stdout, greet, greet_len
> sys_write stdout, name, dword [name_len]
> sys_write stdout, coda, coda_len
Sure you can, but what sense does it make? If you want it to look like a
HLL, why don't you use a HLL?
[portability of C across toolchains]
>>Superficially, at least. Observing the Nasm development team try to get
>>Nasm to compile on "any compiler" convinces me this is wishful thinking.
>
> Ooo, tell me more! I can use it to pester c.l.c. C is 100% portable
> dorks... :-)
Not nice to tease the differently abled. :)
A complete list would be beyond me. I recall some issue with "snprintf
vs "_snprintf", for example. A grep for "GNUC" produces:
nasmgit/compiler.h:#ifdef __GNUC__
nasmgit/compiler.h:# if __GNUC__ >= 4
nasmgit/compiler.h:# define HAVE_GNUC_4
nasmgit/compiler.h:# if __GNUC__ >= 3
nasmgit/compiler.h:# define HAVE_GNUC_3
nasmgit/compiler.h:#ifdef __GNUC__
nasmgit/configure:#ifndef __GNUC__
nasmgit/configure:#ifndef __GNUC__
nasmgit/misc/scitech.mac:%ifdef __GNUC__
nasmgit/misc/scitech.mac:%ifdef __GNUC__
nasmgit/misc/scitech.mac:%ifdef __GNUC__
nasmgit/misc/scitech.mac:%ifdef __GNUC__
nasmgit/output/outmacho.c:#ifdef HAVE_GNUC_4
(the stuff in scitech.mac probably shouldn't count)
> Actually, I have my own set of rules that keep me out of my problem C
> programming areas, e.g., not using unions, or bitfields since older
> compilers failed to implement them well.
Sure, but you have to know what to not use.
>>Herbert's code compiles and works with gcc and MSC, but gcc is *not*
>>doing what he had in mind.
>
> K&R... or buffering...
The buffering was expected, I think (not clear to *me* that these
functions are buffered, but it's "well known", I guess). The "failure to
match expectations" was inline macro vs call to external function (I think).
>>Is this "okay"?
>
> Does it work?
Yup.
>>Really doesn't matter, but if
>>you *did* care (and I think Herbert *did* intend some specific code),
>>it's another pinprick in C's portability bubble.
>
> I usually state C is only 40% portable or so, and another 30% forced to look
> portable... Try saying that in c.l.c!
:) I'll pass. Sounds about right to me. C can be made portable, by
avoiding anything not actually implemented in a standard manner, or by
jumping through hoops with "#defines". Better'n assembly! But the notion
that C is automatically "portable" is merely a pleasant fantasy.
>>Herbert came back claiming this was "better readable":
> <snip>
>
>>> {int j; for (j=0; j<i; j++) if ((*s++ = getchar())=='\n') {s--;
>>>break;} *s=0;}
>
> ...
>>I don't doubt that this is
>>"readable" to a "C speaker".
>
> Is it really that hard?
I didn't actually say it was "hard". I just don't think it's inherently
"more readable". I suppose "[esi]" isn't inherently any clearer than
"*s", but it isn't any less so...
> Without testing (i.e., probably some errors...), I'd start by writing
> something like this:
>
> getstring:
> %define j_ eax
> %define i_ ebx
> %define s_ ecx
Well... okay...
> pop i_
> pop s_
???
> mov j_, 0
> for_00:
> if_00:
> call getchar ; return in eax
Okay, I guess... I'd expect it to return al - or possibly [e]ax, with -1
indicating error. I can't tell by looking at it what it's going to do -
I could never keep getc, getch, getchar, _getch, getkey,... straight.
> mov [s_],eax
???
> inc s_
put eax in it, and increment by one... it'll work, I guess...
> cmp [s_],byte 0x10 ; want LF 0x10 or CR 0x13 ?
I think we want 10 decimal, actually - dos/doze might want 13...
> jne if_00_end;
> if_00_body:
> dec s_
> jmp for_00_end ; break
> if_00_end:
> inc j_
> cmp j_,i_ ; correct order for j<i ?
I think so...
> jb for_00
but since "int j" is signed, probably want "jl". Shouldn't be an issue.
"<" means one thing if applied to "int" and another if applied to
"unsigned int", right?
> for_00_end:
> mov [s_],byte 0
>
> The only real issues I see for an assembly programmer is 1) where the
> comparison and increment in the for loop are placed and 2) understanding of
> where the body of the for loop is without {}. Admittedly, this is a fairly
> simple example. Next, I'd rework the branches, labels, replace 'mov ,0'
> with xor, fit the parameters to whatever prolog or register usage was in
> place, did I get the arg's to cmp correct, etc... getchar would likely be a
> small wrapper around whatever the OS actually uses.
AFAIK, the OS eventually gets to "read(stdin, buffer, 1)". This will
buffer until Enter is hit, then return, unless we tweak it - so the
wrapper may not be *that* small. In this particular case, "read(stdin,
name, MAXNAME)" would do exactly what we want. Herbert often uses a
"getc" and "putc" as "final routines" - I think to make porting
dos<->windows<->linux easy...
But the routine the caused the "surprise" (as I understand it) was this
routine.
putstring(s) char *s;
{while (*s) putchar(*s++);}
I'm not going to claim that it's "hard to read". The "naive" reading
would be "as long as we don't have a zero, call putchar and get the next
one. Which is what it does. But (I understand) Herbert expected
"putchar" to be a macro, to wit:
#define putchar(_c) putc((_c),stdout)
#define putc(_c,_stream) (--(_stream)->_cnt >= 0 \
? 0xff & (*(_stream)->_ptr++ = (char)(_c)) :
_flsbuf((_c),(_stream)))
Looks inherently easier to read than asm me... :)
And compile code like:
...
00000 a1 24 00 00 00 mov eax, DWORD PTR __iob+36
00005 48 dec eax
00006 a3 24 00 00 00 mov DWORD PTR __iob+36, eax
0000b 78 14 js SHORT $L201
[__iob+36] is presumably initialized to "maxcount" by... others... (and
[__iob+ 32] with "previous pointer", if any, or pointer to buffer)
0000d a1 20 00 00 00 mov eax, DWORD PTR __iob+32
00012 c6 00 0a mov BYTE PTR [eax], 10 ; 0000000aH
This isn't the "real code" - we'd be moving the character we fetched
from our string. And we'd be checking to see if it *was* a 10 (LF), in
which case we flush buffer and continue (with [__iob+32] and [iob+36]
properly updated).
00015 a1 20 00 00 00 mov eax, DWORD PTR __iob+32
This looks "redundant", but makes it thread-safe, I guess.
0001a 40 inc eax
0001b a3 20 00 00 00 mov DWORD PTR __iob+32, eax
00020 c3 ret 0
In the "real code", we'd get the next character from our string, and repeat.
$L201:
00021 68 20 00 00 00 push OFFSET FLAT:__iob+32
00026 6a 0a push 10 ; 0000000aH
I think *this* 10 is stdout, not our LF, right???
00028 e8 00 00 00 00 call __flsbuf
0002d 83 c4 08 add esp, 8
00030 c3 ret 0
_main ENDP
Again... this is the "putchar(LF)", not the real code. In any case, an
external routine, __flsbuf, was intended to be called only when the
buffer was full. (this, I assume ends up calling write()?). Instead, he
got code that call an external routine for every character. This
*doesn't* call the OS for every character, but does pretty much like the
above, but in a more "detoured" manner.
The advantage of this is that we write "greet", "name", and "coda" in
*one* call to the OS instead of three. This is probably "worth it", no
matter how grand a tour of libc.so.6 we get. We can arrange this...
leave some space after the "prompt", read "name" after the prompt, and
copy "coda" after that, and then write it. Or we could emulate the
above, using our own buffer and homemade __iob-alike... pretty much the
same thing.
This looks to me like an application for "sys_writev". I wasn't aware
that this existed, until I saw Xlib using it (in an "strace" dump). It
takes as parameters a pointer to a memory area containing pointer,
length, pointer, length, ..., and a count of how many of 'em you've got.
I haven't looked at source code, it may just call write() for each item,
but it might be something to look into... (if I do, I'll probably post
it - you know me...)
Best,
Frank
...
> This looks to me like an application for "sys_writev". I wasn't aware
> that this existed, until I saw Xlib using it (in an "strace" dump). It
> takes as parameters a pointer to a memory area containing pointer,
> length, pointer, length, ..., and a count of how many of 'em you've got.
> I haven't looked at source code, it may just call write() for each item,
> but it might be something to look into... (if I do, I'll probably post
> it - you know me...)
Okay, seems to work like I thought... I don't know if it's "better" in
any sense... Another sys_call in my sys_call bag, anyway...
Best,
Frank
; nasm -f elf hw2uv.asm
; ld -o hw2uv hw2uv.o
global _start
MAXNAME equ 256
section .text
_start:
nop ; for the debuggers
commence: ; ditto
mov eax, 4 ; __NR_write
mov ebx, 1 ; stdout
mov ecx, prompt
mov edx, prompt_len
int 80h
mov eax, 3 ; __NR_read
mov ebx, 0 ; stdin
mov ecx, namebuf ; buffer
mov edx, MAXNAME ; maximum to read
int 80h
dec eax ; length returned includes LF we don't want
mov [name_len], eax
mov eax, 146 ; __NR_writev
mov ebx, 1 ; stdout
mov ecx, my_vector ; ptr, len, ptr, len...
mov edx, 3 ; three items to write
int 80h
exit:
mov eax, 1 ; __NR_exit
int 80h
section .data
prompt db "Please tell me your name? "
prompt_len equ $ - prompt
greet db "Hello, "
greet_len equ $ - greet
coda db "! Welcome to Linux Assembly!", 10
coda_len equ $ - coda
my_vector:
dd greet
dd greet_len
dd namebuf
name_len:
dd 0 ; fill in at runtime
dd coda
dd coda_len
section .bss
namebuf resb MAXNAME
;------------------------------
> I'm not sure what you want to say.
I am sorry if my words misled you.
"if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1)))" is not my
code. In fact, I don't even know who did write it. I was in the common
situation of a maintenance programmer who is expected to find, and fix
a bug reported by a user, long after the original programmer had left
the scene, and (in this case) without any formal specification for the
code.
My use of the phrase "what the cpu will do with the code . . ." was
not meant to imply any dropping down to any lower level, but rather to
refer to the logical steps which take place at run time. In this case,
testing each of the conditions in turn, until it can be determined if
the overall conditional statement is true or false. It is this literal
interpretation of the code which I think of as "low level readability".
Unfortunately, to fix the bug, this level of readability is too low to
be of much use. However, when combined with my previous assurance that
the bug does indeed lie in that line of code, I would expect that most
experienced C programmers could make a pretty good guess about what
the programmer overlooked.
Care to try your luck?
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
Well, the C parameters had to go somewhere... I didn't write the rest of
it. And, C usually puts them on the stack. It's easier than ebp or esp
addressing for a simple example.
> > mov j_, 0
> > for_00:
> > if_00:
> > call getchar ; return in eax
>
> Okay, I guess... I'd expect it to return al - or possibly [e]ax, with -1
> indicating error. I can't tell by looking at it what it's going to do -
> I could never keep getc, getch, getchar, _getch, getkey,... straight.
>
> > mov [s_],eax
>
> ???
If getchar returns the char in eax, this sets the char, *s, at the char
pointer or address, s, to char returned from getchar(), i.e., *s=getchar().
> > inc s_
>
> put eax in it, and increment by one... it'll work, I guess...
>
> > cmp [s_],byte 0x10 ; want LF 0x10 or CR 0x13 ?
>
> I think we want 10 decimal, actually - dos/doze might want 13...
>
> > jne if_00_end;
> > if_00_body:
> > dec s_
> > jmp for_00_end ; break
> > if_00_end:
> > inc j_
> > cmp j_,i_ ; correct order for j<i ?
>
> I think so...
>
> > jb for_00
>
> but since "int j" is signed,
Who said "int" is signed? IIRC, that's C compiler implementation issue.
But, yes, for the C compilers I've used it's likely signed...
> probably want "jl". Shouldn't be an issue.
> "<" means one thing if applied to "int" and another if applied to
> "unsigned int", right?
>
Still means "less than"... The range of values for the comparison is
different. -signed_minimum to +signed_minimum vs. 0 to +unsigned_maximum.
The range values are in limits.h.
> AFAIK, the OS eventually gets to "read(stdin, buffer, 1)".
For POSIX or *nix...
> This will
> buffer until Enter is hit, then return, unless we tweak it - so the
> wrapper may not be *that* small. In this particular case, "read(stdin,
> name, MAXNAME)" would do exactly what we want. Herbert often uses a
> "getc" and "putc" as "final routines" - I think to make porting
> dos<->windows<->linux easy...
Probably the most portable, but definately slow. DJGPP, IIRC, uses 16k
buffers. I've found that I basically can max out Win98SE's disk throughput
on my old K6-2 500Mhz machine with 4k buffers...
> But the routine the caused the "surprise" (as I understand it) was this
> routine.
>
> putstring(s) char *s;
> {while (*s) putchar(*s++);}
>
> I'm not going to claim that it's "hard to read". The "naive" reading
> would be "as long as we don't have a zero, call putchar and get the next
> one. Which is what it does. But (I understand) Herbert expected
> "putchar" to be a macro, to wit:
>
> #define putchar(_c) putc((_c),stdout)
> #define putc(_c,_stream) (--(_stream)->_cnt >= 0 \
> ? 0xff & (*(_stream)->_ptr++ = (char)(_c)) :
> _flsbuf((_c),(_stream)))
>
> Looks inherently easier to read than asm me... :)
>
> And compile code like:
<snip - assumed example code mapped nicely to assembly>
It's been a while since I looked at what the standards say about
implementing functions as macro's, but as I remember it, everything that can
be implemented as a macro can also be implemented as a function... So, the
idea that they map nicely to assembly because they are or can be implemented
as macro's is probably not correct.
Rod Pemberton
Given:
> if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
>
Given:
> result in "foo" being added to the symbol table twice, if and only if
> the optimization level is greater than one
pass0 1st_term 2nd_term
3 F -
2 F -
1 T X
0 F F
Implies X is F sometimes. The 2nd_term (!is_norm && !isextrn && (segment &
1)) is occasionally or always false when pass0 is one, otherwise 1st_term
(pass0==1) is irrelevant due to the logical or.
pass0 1st_term 2nd_term
3 F -
2 F -
1 T "F"
0 F F
Since the duplicate additions to the symbol table are the apparent result of
executing the if() body multiple times, the complete term must be false when
pass0 is 0 and only be true for a single pass0 value. This also implies
pass0 is being inc/decremented. Since the first term, pass0==1, is always
false for pass0>1, and the first term is true when pass0 is one, the only
way the if() can execute exactly twice, as stated, (instead of multiple
times) is if the second term is true for a _single_ pass0 value above one.
E.g., if pass0 starts with 3 and is decremented:
pass0 1st_term 2nd_term
3 F T
2 F F
1 T "F"
0 F F
i.e., the if() executes once when pass0==3, due to second term, and again
when pass0 is one, due to first term (pass0==1).
> The problem is that there is not enough information in the code line to
> allow one to determine what the the programmer intended the code to do.
I believe the compound error was when the programmer added the first term,
"pass0==1", to compensate for an error in the second term. Perhaps he
intended to fix the error later. I believe the original error is in the
second term, i.e., the second term is sometimes or always false when pass0
is one.
Rod Pemberton
Not an implementation issue at all - it's in the standard in black
and white.
Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
>
> Unfortunately, to fix the bug, this level of readability is too low to
> be of much use. However, when combined with my previous assurance that
> the bug does indeed lie in that line of code, I would expect that most
> experienced C programmers could make a pretty good guess about what
> the programmer overlooked.
>
> Care to try your luck?
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1)))
In this term all variables are read only (no =, ++, --, ...).
If no variable is shared with an other process or mapped to I/O or
there are side effects when a page fault occurs when accessing this
variables, then the order of evaluations shouldn't matter. If there
is a different behaviour with different optimization levels, then
I would say, the compiler is broken.
...
> Care to try your luck?
if (pass0 == 1 ||
(!is_norm && !isextrn && (segment & 1) && segment != -1))???
I'm definitely not smart enough for C, but sometimes I'm lucky...
Best,
Frank
Because the "culture" of UNIX (and Linux) is to use either C or Perl or
shell for everything. Most programmers using Linux/UNIX accept that. C
made UNIX what it is today and that made C what it is today. If C is
too low level for your taste you can always dabble with Perl and play
around with shell until you (and your system) are one big mess.
Seriously, assembler is used where it matters, like in GNU GMP,
cryptography and stuff. But vanishingly few people (particularly
outside the Wintel world) share your opinion that everything should be
written in assembly.
If anything the trend seems to be in the opposite direction with ever
more high-level languages, languages to create your own languages and
stuff. You must be a lonely man Betov...
> Because the "culture" of UNIX (and Linux) is to use either C or Perl or
> shell for everything. Most programmers using Linux/UNIX accept that. C
> made UNIX what it is today and that made C what it is today. If C is
> too low level for your taste you can always dabble with Perl and play
> around with shell until you (and your system) are one big mess.
> Seriously, assembler is used where it matters, like in GNU GMP,
> cryptography and stuff. But vanishingly few people (particularly
> outside the Wintel world) share your opinion that everything should be
> written in assembly.
> If anything the trend seems to be in the opposite direction with ever
> more high-level languages, languages to create your own languages and
> stuff. You must be a lonely man Betov...
Courage Rene ! :)
you/we are not alone at all, beside the merchant
ruled/programmer-convenient/
quick done/but bloated library collectors, we are forced to use too often,
there are more than just a few coders who still do it the so called hard
way.
Also game-designers start to ask questions about direct hardware access and
are willing to code in a 'back to the roots'-style for performance reason.
__
wolfgang
Okay. I think I have a different view of what's "essential". I'm
assuming that the "student" can figure out that we need to
putstring/getstring/putstring (or should change major). Although this
*is* the "essential" part of the program, in the context of "Iczelion
equivalent", how we *do* this is higher-than-normal on the "essential"
list, in my view.
I think Chuck make a really good point about "kinds" of readability. For
a "real program", the reader needs to see *why* we're doing what we're
doing. Presumably, a moderately skilled user of whatever language can
figure out *what* the code does. Hiding the non-essentials is a Good
Thing (although I like to have the nitty-gritty "available"). But I
think a beginner needs to see more of this stuff, laid out as clearly as
we can manage. As your preference in assembler syntax illustrates,
"clearly" isn't always going to be the same thing for everybody.
More readable by orders of magnitude! Is this not more readable to
anyone other than an assembly programmer???
>>We can write macros that make asm "look more like C". I could do:
>>
>>sys_write stdout, prompt, prompt_len
>>sys_read stdin, name, MAXNAME
>>mov [name_len], eax
>>sys_write stdout, greet, greet_len
>>sys_write stdout, name, dword [name_len]
>>sys_write stdout, coda, coda_len
>
> Sure you can, but what sense does it make? If you want it to look like a
> HLL, why don't you use a HLL?
*I* don't want it to look more like a HLL. I thought *you* did. I was
just trying to oblige.
Anyway... if you wanted to demonstrate readability to *me* (not a very
worthwhile endeavor), you shoulda said it this way in the first place!
*I* find this much "better"!
Best,
Frank
> I believe the original error is in the
> second term, i.e., the second term is sometimes or always false when
> pass0 is one.
An excellent analysis, and if I had provided just a bit more diagnostic
info, I am sure that your conclusion would have been spot on, instead
of just close.
The error is indeed in the second term, which handles certain
local labels which are associated with odd numbered sections. [Such
sections have a special meaning within the assembler.] However, there
is another special case in which section number is passed as minus
one, to indicate that it is not associated with any specific section.
Therefore, the second term needs to also check that section is
positive.
It seemed to me that this error was close enough to the classic
problem of foo->bar segfaulting when foo == 0 that someone in this
forum would think to ask if section could possibly be negative, but I
guess my expectations were too high.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
> if (pass0 == 1 ||
> (!is_norm && !isextrn && (segment & 1) && segment != -1))???
That works, but I think that it is both more general and more readable
to do
if (pass0 == 1 ||
(!is_norm && !isextrn && (segment > 0) && (segment & 1)))
which emphasizes the fact that the condition is expecting segment to be
a positive number. In addition it guards against the (admittedly
unlikely) possibility that someday someone will attempt to handle some
other special case by passing section as -3.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
> If no variable is shared with an other process or mapped to I/O or
> there are side effects when a page fault occurs when accessing this
> variables, then the order of evaluations shouldn't matter.
It matters for performance. Most compilers are smart enough to
generate code that stops evaluating conditions as soon as the result is
determined. In this case, if pass0 == 1 then none of the other
conditions need to be tested. Likewise, if pass0 != 1 then evaluation
will stop as soon as any one of the remaining conditions is known to be
false.
> If there
> is a different behaviour with different optimization levels, then
> I would say, the compiler is broken.
The C compiler has never been an issue in this discussion. The
reference to optimization level in the problem description is the -O
command line switch which is passed to Nasm. However, perhaps you have
never used an assembler which allows such a setting, in which case it
is quite understandable why you were confused.
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
You'll have to expound, because I see _no_ statements in n1256 which
_explicitly_ state int is signed.
6.2.5 Types sub 5
"... A ''plain'' int object has the natural size suggested by the
architecture of the execution environment"
This doesn't claim the representation of int has to be signed.
In fact, it explicitly claims int should be the "natural size" that the
"architecture" supports. What if the "natural size" is unsigned? And, what
if the architecture only supports unsigned types?
6.2.5 Types sub 5 continues:
"...(large enough to contain any value in the range INT_MIN to INT_MAX as
defined in the header <limits.h>)."
Nor does this does qualification claim the representation of int has to be
signed. It explicitly states that int must be _"large enough"_ to represent
any value of the range. An unsigned int can be "large enough" to represent
the necessary range (16-bits), i.e., you or the compiler can add or subtract
INT_MIN as necessary.
HTH,
Rod Pemberton
No, somewhere else does. Get reading. If you want to
pretend to be knowledgeable about the language, then you're
going to have to put the work in.
> In fact, it explicitly claims int should be the "natural size" that the
> "architecture" supports. What if the "natural size" is unsigned?
What a meaningless thing to say! Size and signedness are unrelated
concepts.
> And, what
> if the architecture only supports unsigned types?
Irrelevant. C is defined in terms of an abstract machine.
Even architectures which only have integer types have to
handle 'float' and 'double' types.
> HTH,
Didn't help you, it appears. You're still entrenched in your
view that you know what you're talking about. Alas you have
again demonstrated that you don't.
cc'ing this to the list... conversation started on a.l.a.
>>if (pass0 == 1 ||
>> (!is_norm && !isextrn && (segment & 1) && segment != -1))???
>
>
> That works, but I think that it is both more general and more readable
> to do
>
> if (pass0 == 1 ||
> (!is_norm && !isextrn && (segment > 0) && (segment & 1)))
Okay.
> which emphasizes the fact that the condition is expecting segment to be
> a positive number. In addition it guards against the (admittedly
> unlikely) possibility that someday someone will attempt to handle some
> other special case by passing section as -3.
Unlikely things happen!
I was just "trying things" - printed out some variables, noticed that
"segment -1" seemed to be "odd man out" (or should I say odd man in?),
and tried the above. To my astonishment, it solved the problem, and
didn't seem to mess up anything else "obvious" (to be determined).
I thought Rod's analysis was very good. I suspect he's right that the
bug was there *before* the "pass0" clause was added - haven't checked
it, doesn't matter...
I vote "yes please".
Best,
Frank
Don't you think it should've been in "6.2.5 Types" for n1256?
I can't even begin to imagine why they'd define a type somewhere other than
the section they specifically created for defining types... Does it
_really_ make sense to you that they would define a type somewhere other
than the section they titled "Types"? Given this perspective, don't you
find this claim of yours, "No, somewhere else [sic, it] does." to be
illogical?
n1256 (or other) is a big document and I learned C from other sources...
Respectable sources, K&R and C90 sources, but not the current spec. YOU
made the claim that "it's in the standard in black and white". Wow! ...in
BLACK and WHITE... What an amazing claim! He must know the section right
off the top of his head! Right?
You made the claim. Why can't you prove it? You've made _numerous_ other
claims - not just to me. You couldn't prove them either. So far, all your
claims seem baseless, unfounded, and worthless. What's your deal exactly?
I'm not saying all your claims are. Just the ones I've seen - so far. You
have a chance with this one. Perhaps...
What page number _is_ "somewhere else does"? In "black and white"... So,
it SHOULD'VE been a CINCH for you to prove me wrong, CORRECT? All you had
to do is post a page number or section number. It couldn't be more than
three to four characters max. Where IS your proof? Do you intend to POST
it? Even though I was faced with such a devastating (NOT!) potential
"wrong-ness," I still did you the courtesy of looking it up and made an
attempt to explain it to you... even though I knew from past dealings with
you that'd be a _failure_ before I posted.
Let's recap:
RP's thoughts: "The guy is sure of himself, or full of himself one... Like,
where exactly, man? I don't recall learning that. I don't recall reading
that. I'd like to see that for myself... Hmm, my memory isn't quite as
good as it used to be, but I'm pretty sure I recall reading C could be
implemented completely _without_ signed types. So, what is the fool talking
about?"
RP: "You'll have to expound,..."
PC's thoughts as perceived by RP's thoughts: "Uh, duh, umm, sheesh,
shouldn't have posted that, time to cover my ass... yet again. Darn, why do
I keep contradicting him? I know he's right... He never states anything
ever that he thinks is wrong."
PC: "No, somewhere else does."
RP's thoughts: "Huh? Is he drunk?..."
PC's thoughts as perceived by RP's thoughts: "Hope he buys that... Where's
my beer? I'm not quite blitzed yet. Heh, maybe I can get him to run around
in circles... for a while... like my dog chasing his bollocks. I'm such a
louse and sot..."
> > In fact, it explicitly claims int should be the "natural size" that the
> > "architecture" supports. What if the "natural size" is unsigned?
>
>Size and signedness are unrelated
> concepts.
True. The point was there may not be a signed "natural size" available to
use for an int... That would make a "natural size" int an _unsigned_ int.
Wouldn't it, Phil? It seems you don't always comprehend what is said or you
have an impaired ability to follow through logically. Or, perhaps, I have
an above average ability and am extra terse...
> > And, what
> > if the architecture only supports unsigned types?
>
> Irrelevant. C is defined in terms of an abstract machine.
For a virtual machine or interpreter, you can code whatever is needed, in
terms of characters and character pointers - if you wish or need be - to be
compliant. Since this section refers to fitting an int to the underlying
architecture and since one can "easily" fit it to a virtual machine or
interpreter, I can't possibly begin to understand why'd you even attempt to
argue anything about C's "abstract machine" here. The section _only_ makes
sense for a physical architecture. Don't you think so too? Your argument
seems completely illogical to me.
> Even architectures which only have integer types have to
> handle 'float' and 'double' types.
So? You only need characters and character pointers to do so...
> Didn't help you, it appears. You're still entrenched in your
> view that you know what you're talking about. Alas you have
> again demonstrated that you don't.
>
Okay, whatever... At six insults - none of which were necessary - my insult
limit has been exceeded by Phil "The Affronter - in black and white no less"
Carmody.
Rod Pemberton
This is true, but...
> > HTH,
>
> Didn't help you, it appears. You're still entrenched in your
> view that you know what you're talking about. Alas you have
> again demonstrated that you don't.
>
Couldn't you just step outside and piss your testosterone into the
snow??
Nathan.
This function has interesting usefullness. Think about Paul's
adventure game, for instance... lots of strings printing to the screen
at one time.
Nathan.
We are not speaking about performance. I think it is clear that the
optimization level has an influence on performance, otherwise this
switch would be pretty useless. Here your original statement:
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
However, as Frank recently brought to my attention, there is a bug in
this code line, the symptom of which is that statements such as
foo equ 42
result in "foo" being added to the symbol table twice, if and only if
the optimization level is greater than one.
What you said here is, that the logical behaviour of the program (not
the speed) is different depending on the used optimization level. And
my answer was, this is only possible with this code if the compiler
is broken.
> generate code that stops evaluating conditions as soon as the result is
> determined. In this case, if pass0 == 1 then none of the other
> conditions need to be tested. Likewise, if pass0 != 1 then evaluation
> will stop as soon as any one of the remaining conditions is known to be
> false.
Yes, and that can lead to different behaviour maybe for:
if (pass0 == 1 || (!is_norm && !isextrn && (segment++ & 1))) {
but not for the given:
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
> > If there
> > is a different behaviour with different optimization levels, then
> > I would say, the compiler is broken.
>
> The C compiler has never been an issue in this discussion. The
If the C compiler isn't an issue, then I still don't believe that
there was a different behaviour for different optimization levels.
And if there was, I really would appreciate it if you could explain
what happened.
> reference to optimization level in the problem description is the -O
> command line switch which is passed to Nasm. However, perhaps you have
> never used an assembler which allows such a setting, in which case it
> is quite understandable why you were confused.
if (pass0 == 1 || (!is_norm && !isextrn && (segment > 0) && (segment & 1)))
If the problem was, that the special case "segment<0" produces the bug,
then this bug was also produced with optimization level 0. But such a bug
has nothing to do with readability of the code but forgetting one item
in a list.
But in the above code I wouldn't have added a "(segment > 0)" but just
used -2 as the special case.
> > The "readabilty" was about "main" and not the subroutines "putstring/getstring".
> > To make the essential readable, you has to hide the unimportant in as less as
> > possible lines.
>
> Okay. I think I have a different view of what's "essential". I'm
> assuming that the "student" can figure out that we need to
> putstring/getstring/putstring (or should change major).
One thing is to know what we have to do and the other thing is
how easy it is to check for other what we have done.
Compare this one:
putstring ("Please tell me your name? ");
getstring (79,name);
putstring ("Hello, ");
putstring (name);
putstring ( "! Welcome to Linux Assembly!\n");
with this one:
mov ecx, prompt
mov edx, prompt_len
call write_stdout
call read_name
dec eax ; length returned includes LF we don't want
push eax ; save it for later
mov ecx, greet
mov edx, greet_len
call write_stdout
mov ecx, namebuf
pop edx ; retrieve the length
call write_stdout
mov ecx, coda
mov edx, coda_len
call write_stdout
and make up your decision.
> I think Chuck make a really good point about "kinds" of readability. For
> a "real program", the reader needs to see *why* we're doing what we're
> doing.
I really didn't understand what point Chuck made up. What has forgetting
an item in a list to do with readability of a code?
> > If I wanted to make the subroutines "readable for an assembly programmer",
> > I would have written:
putstring(text) char *text;
{int pos=0;
loop1:
if (text[pos]==0) return;
putchar(text[pos]);
pos=pos+1;
goto loop1;
}
> More readable by orders of magnitude! Is this not more readable to
> anyone other than an assembly programmer???
Do you real think the above 8 lines of code are easier to "read" than
the one line:
putstring(s) char *s; {while (*s) putchar(*s++);}
I think that's the same as using smilies like :) or words like
LOL in a posting. If you don't know the meaning, then it would
be much more "readable" if the symbols are replaced by a sentence
which express the meaning of the symbol. But if you know the
symbols, then a text where every occurrence of the symbol is
replaced by an appropriate sentence would make the complete text
pretty unreadable.
Or a "xor ecx,ecx" is for a beginner less readable than a
"mov ecx,0". But if you are used to it, then you read both simple
as "clear eax". You can't compare readability if you know one
but not the other. You must know both, then you can say which
on has a good and which one has a bad readability.
> > Sure you can, but what sense does it make? If you want it to look like a
> > HLL, why don't you use a HLL?
>
> *I* don't want it to look more like a HLL. I thought *you* did. I was
> just trying to oblige.
The problem is, if you want it more readable, it automatically becomes
more HLL like.
> If the C compiler isn't an issue, then I still don't believe that
> there was a different behaviour for different optimization levels.
> And if there was, I really would appreciate it if you could explain
> what happened.
This is a complicated issue, but I'll try to keep it as simple as
possible. Because asm programmers don't like an assembler to mess with
their code, optimization in Nasm is basically limited to reducing the
size of the code by selecting the shortest form of a jump or call
instruction which can reach the target address. Since changing the
size any given instruction changes the target address for all labels
following the change, this is done incrementally by multiple
optimization passes. The number of such passes is controlled by the
optimization level command switch. Thus, "- O5" means that the user is
willing to allow Nasm to attempt to optimize the size of the code as
long as the total number of passes does not exceed 5, whereas "- O0",
which is the default, means no optimization is allowed.
Since the values of all labels are subject to change on each pass, Nasm
provides both define label and redefine label routines. Although
the values in the internal symbol table may change on each pass, only
the final value is supposed to be passed to the platform specific
routines which build the symbol table in the object module.
The following trace from the code at nasm.c:1366 shows the interaction
of the various pass numbers for -O5. The first entry
calls define_label because passn is not > 1, which, in turn calls the
elf_deflabel routine because the condition (!is_norm && !isextrn &&
(segment & 1)) is met, thereby creating the 1st instance of the symbol.
The next three entries do not cause a call to elf_deflabel, because
redefine_label only makes such a call when pass0 == 1.
The final entry creates the duplicate instance of the symbol because
pass0 has been incremented to 1.
pass0: 0 pass1: 1 pass2: 1 passn: 1 vermsgl define_label
is_norm: 0 isextrn: 0 segment: -1
pass0: 0 pass1: 1 pass2: 2 passn: 2 vermsgl redefine_label
pass0: 0 pass1: 1 pass2: 2 passn: 3 vermsgl redefine_label
pass0: 0 pass1: 1 pass2: 2 passn: 4 vermsgl redefine_label
pass0: 1 pass1: 1 pass2: 2 passn: 5 vermsgl redefine_label
Please note that the special condition variables are only given for
calls to define_label, because redefine_label tests only for pass0 == 1.
If optimization is turned off, then the call to define_label with
pass0 == 0 does not occur, and therefore, the symbol is not duplicated.
> But such a bug
> has nothing to do with readability of the code but forgetting one item
> in a list.
It is very polite of you to suggest that the original programmer merely
"forgot" one item in a list, and none of us are trying to find someone
to blame, but the truth is that there are 29 separate source files in
the front end of the assembler, and another dozen for the various back
ends (about 21 megabyte, in total), and it is difficult for any one
person to be familiar with them all. Therefore, it is important that
the code be readable, not only in the trivial sense of the C language
specification, but in making clear what the code is intended to
contribute to the correct assembly of a user's program.
Readability has little or nothing to do with the programmer who wrote
the code, and everything to do with the people who (often years later)
have to try and figure out what the code was intended to accomplish.
The point of surfacing this bug was to demonstrate that the code line
containing the bug was virtually unreadable in terms of figuring out
how to fix the bug.
And I believe that this point has been well demonstrated by the fact
the neither you, nor anyone else in this forum, has even suggested that
they understand why this special condition is required. In fact, I
still don't fully understand it, myself.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
>> Seriously, assembler is used where it matters, like in GNU GMP,
>> cryptography and stuff. But vanishingly few people (particularly
>> outside the Wintel world) share your opinion that everything should
>> be written in assembly.
>
>> If anything the trend seems to be in the opposite direction with ever
>> more high-level languages, languages to create your own languages and
>> stuff. You must be a lonely man Betov...
>
> Courage Rene ! :)
> you/we are not alone at all, beside the merchant
> ruled/programmer-convenient/
> quick done/but bloated library collectors, we are forced to use too
> often, there are more than just a few coders who still do it the so
> called hard way.
> Also game-designers start to ask questions about direct hardware
> access and are willing to code in a 'back to the roots'-style for
> performance reason.
:))
Santosh is able to be a funny provocator, sometimes, you know...
Anyway, being alone, never made anybody wrong, and it is quite
funny to get such answers after a so amaizing flow of demential
posts, in a discussion about a demential C expression, that
would have been so simple, and so readable in Assembly.
C language *implies* bugs, but i am 100% convinced that there
will never be any way back from this programming disaster. As
i am used to say, mythology is way stronger than any fact and
any demonstration. Not a reason to submit. Better dead than C.
Betov.
Better dead than C.
Try D. Maybe you like it more than C (if I remember right, it includes all
the OO features of C++).
To say it in one sentence: you are not speaking about the optimization
switch in the C compiler statement when compiling the NASM source
code but about the optimization switch in the NASM command line when
assembling an assembly source code.
But anything you wrote till now lead to the impression you meant the
compiler -O switch, for example:
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
It matters for performance. Most compilers are smart enough to
generate code that stops evaluating conditions as soon as the result is
determined. In this case, if pass0 == 1 then none of the other
conditions need to be tested. Likewise, if pass0 != 1 then evaluation
will stop as soon as any one of the remaining conditions is known to be
false.
> And I believe that this point has been well demonstrated by the fact
> the neither you, nor anyone else in this forum, has even suggested that
> they understand why this special condition is required. In fact, I
> still don't fully understand it, myself.
When speaking of the -O switch of NASM and not of the C compiler, your
question doesn't make much sense. It would be the same as saying, there
is a bug in the line
if (level>4) {...
and then you tell us, the correct line would be
if ((unsigned int)level >4) {..
and that nobody was able to see this bug. There is neither a bug
in the above line nor in
if (pass0 == 1 || (!is_norm && !isextrn && (segment & 1))) {
it's only that the code doesn't mach the specification of the parameters.
But if you only present the code without the specification then it
is impossible to find a "bug", because then there is no bug.
I do. And it is. Stop being thick.
> I can't even begin to imagine why they'd define a type somewhere other than
> the section they specifically created for defining types... Does it
> _really_ make sense to you that they would define a type somewhere other
> than the section they titled "Types"? Given this perspective, don't you
> find this claim of yours, "No, somewhere else [sic, it] does." to be
> illogical?
Read again what you wrote.
> n1256 (or other) is a big document and I learned C from other sources...
> Respectable sources, K&R and C90 sources, but not the current spec. YOU
> made the claim that "it's in the standard in black and white". Wow! ...in
> BLACK and WHITE... What an amazing claim! He must know the section right
> off the top of his head! Right?
I do not detect any valid syllogism in the above.
> You made the claim. Why can't you prove it? You've made _numerous_ other
> claims - not just to me. You couldn't prove them either. So far, all your
> claims seem baseless, unfounded, and worthless. What's your deal exactly?
> I'm not saying all your claims are. Just the ones I've seen - so far. You
> have a chance with this one. Perhaps...
It's a shame that you could be so close, and yet so far.
That's all the clue I'm giving you. Wanna learn the language -
put in the work.
> What page number _is_ "somewhere else does"? In "black and white"... So,
> it SHOULD'VE been a CINCH for you to prove me wrong, CORRECT?
It is. I'm just toying with you. I'm winding you up. I do
this because I know I am correct, that you are wrong, and
I know that winds you up terribly.
> All you had
> to do is post a page number or section number. It couldn't be more than
> three to four characters max. Where IS your proof? Do you intend to POST
> it? Even though I was faced with such a devastating (NOT!) potential
> "wrong-ness," I still did you the courtesy of looking it up and made an
> attempt to explain it to you... even though I knew from past dealings with
> you that'd be a _failure_ before I posted.
>
> Let's recap:
Let's not.
> > > In fact, it explicitly claims int should be the "natural size" that the
> > > "architecture" supports. What if the "natural size" is unsigned?
> >
> >Size and signedness are unrelated
> > concepts.
>
> True. The point was there may not be a signed "natural size"
That is a meaningless statement.
What would be meaningful would be "the native size of the
architecture may not have any native operations which treat
the type as if it were signed". But that's not what you said.
> available to
> use for an int... That would make a "natural size" int an _unsigned_ int.
> Wouldn't it, Phil?
No it wouldn't. C is defined in terms of the behaviour of an
abstract machine. C cares not whether the native types can
inrtinsically be treated as signed or unsigned. As long as
it can support the computational model specified in the C
standard, then that's fine. If a simple add turns into 20
operations with loops and conditional jumps, that's fine.
> It seems you don't always comprehend what is said or you
> have an impaired ability to follow through logically. Or, perhaps, I have
> an above average ability and am extra terse...
No, you're gibbering meaningless phrases, and you're wrong.
> > > And, what
> > > if the architecture only supports unsigned types?
> >
> > Irrelevant. C is defined in terms of an abstract machine.
>
> For a virtual machine or interpreter, you can code whatever is needed, in
> terms of characters and character pointers - if you wish or need be - to be
> compliant. Since this section refers to fitting an int to the underlying
> architecture and since one can "easily" fit it to a virtual machine or
> interpreter, I can't possibly begin to understand why'd you even attempt to
> argue anything about C's "abstract machine" here. The section _only_ makes
> sense for a physical architecture. Don't you think so too? Your argument
> seems completely illogical to me.
Sizes - yes, architecture. Signedness - no, that's not a
dependent on the architecture at all. It can have native
signed operations, and simulate unsigned ones, or native
unsigned ones, and simulate signed ones, or simply have
them both. Did you miss the decade of discussions about
sign-magnitude numbers vs. 2s-complement ones? It's the
same discussion. It matters not how the behaviour is
implemented, all it has to do is follow the behaviour
specified for the abstract machine. The abstract machine
is 100% relevant to me.
By some better logic somewhere. Walmart?
> > Even architectures which only have integer types have to
> > handle 'float' and 'double' types.
>
> So? You only need characters and character pointers to do so...
No signed? So? You only need unsigned to do so...
You really didn't think about that retort long enough, did you.
> > Didn't help you, it appears. You're still entrenched in your
> > view that you know what you're talking about. Alas you have
> > again demonstrated that you don't.
>
> Okay, whatever... At six insults - none of which were necessary - my insult
> limit has been exceeded by Phil "The Affronter - in black and white no less"
> Carmody.
Hehehe. Yes, I do take affront at people who are willing to
spend much longer composing incorrect or meaningless posts
than they are verifying the correctness of their stance.
However, it is painful seeing you thrash around, so I'll
put you out of your misery:
6.2.5 Types, paragraph 4. See what I mean about how close you were?
And don't try and pull any 'the signed modifier distributes over
commas in a list' bullshit.
The minute he stops making incorrect statements, I'll stop
correcting them.
> But if you only present the code without the specification then it
> is impossible to find a "bug", because then there is no bug.
So, as long as the Nasm development team does not publish a detailed
code specification, then there are no bugs in our product? As a
developer, I love this concept -- but I don't think the users will
accept it.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html
I already know the language. I have for over 16 years. I've already "put
in the work." Perhaps, you need to begin to understand that you're the
ignorant one...
> > What page number _is_ "somewhere else does"? In "black and white"...
So,
> > it SHOULD'VE been a CINCH for you to prove me wrong, CORRECT?
>
> It is. I'm just toying with you. I'm winding you up.
It isn't. And, so am I. Don't you find it odd that every time I make some
_odd_ claim about C that I always _know_ the spec. sections to back it up?
Don't you find it odd that the spec. _always_ agrees with me? Well, perhaps
you should...
> What would be meaningful would be "the native size of the
> architecture may not have any native operations which treat
> the type as if it were signed". But that's not what you said.
>
That is what was said... to those who have some comprehension. I have no
choice but to assume you have a certain level of intelligence or else you
wouldn't be commenting on these issues... I could lower the intellectual
output just for you, but no one ever knows when you're going to pop in on
some conversion, and that causes a flood of other posts complaining about
missing details...
> > available to
> > use for an int... That would make a "natural size" int an _unsigned_
int.
> > Wouldn't it, Phil?
>
> No it wouldn't.
False.
> C is defined in terms of the behaviour of an
> abstract machine.
True.
> C cares not whether the native types can
> inrtinsically be treated as signed or unsigned.
True.
> As long as
> it can support the computational model specified in the C
> standard, then that's fine. If a simple add turns into 20
> operations with loops and conditional jumps, that's fine.
True.
But, the above statements indicate that you're just like Alexei Frounze:
confusing the C model and the underlying types.
> No, you're gibbering meaningless phrases, and you're wrong.
That would be you. Perhaps you should check Google and make sure someone
isn't hacking your posts...
> > > > And, what
> > > > if the architecture only supports unsigned types?
> > >
> > > Irrelevant. C is defined in terms of an abstract machine.
> >
> > For a virtual machine or interpreter, you can code whatever is needed,
in
> > terms of characters and character pointers - if you wish or need be - to
be
> > compliant. Since this section refers to fitting an int to the
underlying
> > architecture and since one can "easily" fit it to a virtual machine or
> > interpreter, I can't possibly begin to understand why'd you even attempt
to
> > argue anything about C's "abstract machine" here. The section _only_
makes
> > sense for a physical architecture. Don't you think so too? Your
argument
> > seems completely illogical to me.
>
> Sizes - yes, architecture. Signedness - no, that's not a
> dependent on the architecture at all.
Correction 1) The signedness doesn't have to use the architecture type, but
it can... that's what 6.5.2 sub 5 is for.
Correction 2) The sizes aren't dependent on the architecture either, but one
can implement them using the native types...
This just confirms for me that you don't know what you're talking about.
> Hehehe. Yes, I do take affront at people who are willing to
> spend much longer composing incorrect or meaningless posts
> than they are verifying the correctness of their stance.
The later part of that describes you perfectly. Isn't that what _numerous_
people have said to you? Do I need to go the posts up for you? Do you need
some psychological help? Do you understand that I find it really odd and
creapy that you insistentantly project your personality traits onto me?
> However, it is painful seeing you thrash around, so I'll
> put you out of your misery:
>
> 6.2.5 Types, paragraph 4. See what I mean about how close you were?
>
No. Not at all. I'm completely serious. I am. Do you see what I mean
about failure to comprehend? You DON'T? Here, I'll explain the spec. to
you - including your precious 6.2.5 sub 4. (Did you _honestly_ think I
hadn't read it when I posted 6.2.5 sub 5? Shaking my head in disbelief...)
6.2.5 sub 4
"4 There are five standard signed integer types, designated as signed char,
short
int, int, long int, and long long int. (These and other types may be
designated in several additional ways, as described in 6.7.2.) There may
also be
implementation-defined extended signed integer types.28) The standard and
extended
signed integer types are collectively called signed integer types.29)"
Nowhere does that state that int is _required_ to be signed. It explicitly
states that there are five types are _normally_ implemented as signed, e.g.,
"standard". Here's the definition of "standard" for this context:
standard - "an object that is regarded as the usual or most common size
or form of its kind"
See? "standard" does not mean "required," but "usual..." And, don't try to
give me the bullshit that common English definitions aren't used by the
spec. One author of the spec., Douglas A. Gwyn, explicitly stated that they
are.
Then, it goes on to describe sections which place other requirements on the
listed types. Nor does the section describe how an int should be
represented. That's 6.2.5 sub 5:
6.2.5 sub 5
"5 An object declared as type signed char occupies the same amount of
storage as a
''plain'' char object. A ''plain'' int object has the natural size suggested
by the
architecture of the execution environment (large enough to contain any value
in the range
INT_MIN to INT_MAX as defined in the header <limits.h>)."
THIS describes how an int should be represented: as the "natural size
suggested by the architecture of the execution environment."
> And don't try and pull any 'the signed modifier distributes over
> commas in a list' bullshit.
Never heard of that. But, coming from you, I'll definately accept that it's
bullshit due to your lack of comprehension...
Rod Pemberton
Like it or not, I didn't make an incorrect statement. Please try to learn
something from me for once...
Rod Pemberton
Awright, awright, awright...
; nasm -f elf hw2uyane.asm
; ld -o hw2uyane hw2uyane.o
%include "yanetut.inc"
putstring "Please tell me your name? "
getstring 79,name
putstring "Hello, "
putstring name
putstring "! Welcome to Linux uhhh... Assembly!", linefeed
putstring "BTW, the answer is "
mov ecx, 42
mov ebx, 1
add ebx, ecx
; HLLp, I can't read this part!!!
putnumber ebx
putstring ", not "
putnumber ecx
putstring ". Deep Thought was off-by-one.", linefeed
depart victorious
I was in the midst of composing this last night, when the power blinked
out. When the power came back on, the net didn't, so I had some time on
my hands and threw in putnumber, for the hell of it. Sorry for
complicating the situation...
Sure, it's "readable", in one sense. So is HLA. (we don't "begin", just
"depart" :) Is it a "Iczelion equivalent for Nasm/Linux"? I wouldn't say so.
> with this one:
>
> mov ecx, prompt
> mov edx, prompt_len
> call write_stdout
>
> call read_name
> dec eax ; length returned includes LF we don't want
> push eax ; save it for later
>
> mov ecx, greet
> mov edx, greet_len
> call write_stdout
>
> mov ecx, namebuf
> pop edx ; retrieve the length
> call write_stdout
>
> mov ecx, coda
> mov edx, coda_len
> call write_stdout
>
> and make up your decision.
In terms of "how to do it in Linux", I'm *learning* more from the
latter! If it *is*, in fact, harder to read... "no pain, no gain".
>>I think Chuck make a really good point about "kinds" of readability. For
>>a "real program", the reader needs to see *why* we're doing what we're
>>doing.
>
>
> I really didn't understand what point Chuck made up. What has forgetting
> an item in a list to do with readability of a code?
We really didn't explain that situation clearly. Actually, I don't think
the "problem" with seeing the bug, is a "readability" problem, either in
the "what" or the "why" sense. (what machine code was generated by the
compiler would be of interest only to a guy like me - when we execute
the "if" clause is the issue). Not entirely intuitively, "segment" comes
to us in odd ("segment & 1" or "test eax, 1" - which is more readble???)
positive values, which we *do* want to process, and even positive
values, which... "mean something different" (got me!) and we skip 'em
(in this clause). And then... there's -1 with "special" meaning - we
want to skip it, but it's odd! That's the "item on the list" that the
original coder (Simon, I would bet) missed (only became a "bug" when we
went multipass). A lot of other people have missed it too, in the
meantime. As you observed, "#define NO_SEG -2L" would have fixed it.
Which is more "readable" (in the "why" sense) depends on whether you're
reading nasm.h or labels.c (a general problem with "modular" code, IMO).
I favor Chuck's solution, making it obvious (???) in labels.c. Other
labels - pass0, passn, pass1 and pass2 (local to nasm.c), and pass?
names local to other functions aren't as "self explanitory" as they
might be - if I knew what to call 'em, I'd say.
>>>If I wanted to make the subroutines "readable for an assembly programmer",
>>>I would have written:
>
>
> putstring(text) char *text;
> {int pos=0;
>
> loop1:
> if (text[pos]==0) return;
> putchar(text[pos]);
> pos=pos+1;
> goto loop1;
> }
>
>
>>More readable by orders of magnitude! Is this not more readable to
>>anyone other than an assembly programmer???
>
>
> Do you real think the above 8 lines of code are easier to "read" than
> the one line:
>
> putstring(s) char *s; {while (*s) putchar(*s++);}
Yes!
> I think that's the same as using smilies like :) or words like
> LOL in a posting. If you don't know the meaning, then it would
> be much more "readable" if the symbols are replaced by a sentence
> which express the meaning of the symbol. But if you know the
> symbols, then a text where every occurrence of the symbol is
> replaced by an appropriate sentence would make the complete text
> pretty unreadable.
Rosario finds his stuff not only "readable", but "beautiful". Tastes differ!
> Or a "xor ecx,ecx" is for a beginner less readable than a
> "mov ecx,0". But if you are used to it, then you read both simple
> as "clear eax". You can't compare readability if you know one
> but not the other. You must know both, then you can say which
> on has a good and which one has a bad readability.
I know C poorly, so I can't judge. I think I can see which is "more like
Iczelion"...
>>>Sure you can, but what sense does it make? If you want it to look like a
>>>HLL, why don't you use a HLL?
>>
>>*I* don't want it to look more like a HLL. I thought *you* did. I was
>>just trying to oblige.
>
> The problem is, if you want it more readable, it automatically becomes
> more HLL like.
Readable in the "why" sense, maybe. In the "what" sense, the less HLL
like, the better. The goal is not to acquire the user's name in the
most "readable" fashion, the goal is to "show" how to do it in Linux, in
assembly. I really don't see that the above code does that - either the
C or the HLL-like (?) asm.
Here's some C. Is this readable?
#include <stdio.h>
char
*T="IeJKLMaYQCE]jbZRskc[SldU^V\\X\\|/_<[<:90!\"$434-./2>]s",
K[3][1000],*F,x,A,*M[2],*J,r[4],*g,N,Y,*Q,W,*k,q,D;X(){r [r
[r[3]=M[1-(x&1)][*r=W,1],2]=*Q+2,1]=x+1+Y,*g++=((((x& 7)
-1)>>1)-1)?*r:r[x>>3],(++x<*r)&&X();}E(){A||X(x=0,g =J
),x=7&(*T>>A*3),J[(x[F]-W-x)^A*7]=Q[x&3]^A*(*M)[2 +(
x&1)],g=J+((x[k]-W)^A*7)-A,g[1]=(*M)[*g=M[T+=A ,1
][x&1],x&1],(A^=1)&&(E(),J+=W);}l(){E(--q&&l ()
);}B(){*J&&B((D=*J,Q[2]<D&&D<k[1]&&(*g++=1 ),
!(D-W&&D-9&&D-10&&D-13)&&(!*r&&(*g++=0) ,*
r=1)||64<D&&D<91&&(*r=0,*g++=D-63)||D >=
97&&D<123&&(*r=0,*g++=D-95)||!(D-k[ 3]
)&&(*r=0,*g++=12)||D>k[3]&&D<=k[ 1]
-1&&(*r=0,*g++=D-47),J++));}j( ){
putchar(A);}b(){(j(A=(*K)[D* W+
r[2]*Y+x]),++x<Y)&&b();}t ()
{(j((b(D=q[g],x=0),A=W) ),
++q<(*(r+1)<Y?*(r+1): Y)
)&&t();}R(){(A=(t( q=
0),'\n'),j(),++r [2
]<N)&&R();}O() {(
j((r[2]=0,R( ))
),r[1]-=q) &&
O(g-=-q) ;}
C(){( J=
gets (K
[1]))&&C((B(g=K[2]),*r=!(!*r&&(*g++=0)),(*r)[r]=g-K[2],g=K[2
],r[
1]&&
O())
);;}
main
(){C
((l(
(J=(
A=0)
[K],
A[M]
=(F=
(k=(
M[!A
]=(Q
=T+(
q=(Y
=(W=
32)-
(N=4
))))
+N)+
2)+7
)+7)
),Y=
N<<(
*r=!
-A))
);;}
It's "pretty", I'll give ya that. To understand the "shape", you'd have
to run it. It takes input from stdin, and emits it to stdout, in the
form of little ascii-art cartoons of guys waving semaphore flags around
(a method of communication used by navies... but not lately). Found it
on an obfuscated C site, so it's not intended to be readable. I saved it
as "anderson.c" - I assume that's the name of the guy who wrote it. More
than that, I do not recall. I think it's even cooler than morse code! :)
Oh... you'll probably want the "hide_my_shit.inc" file for the above
code - "yanetut.inc". The name might suggest that it's intended to be a
"tutorial" of some kind. It is not! You Are Not Expected To find it
"readable", either. Unlike a certain website named after a dictionary, I
think this is a horrible way to "show" assembly. Tastes vary.
Best,
Frank
; yanetut.inc - to be included immediately before
; the start of code.
linefeed equ 10
%macro putstring 1+
%ifstr %1
section .data
%%str db %1
%%strlen equ $ - %%str
section .text
pusha
mov ecx, %%str
mov edx, %%strlen
%else
section .text
pusha
mov ecx, %1
or edx, byte -1
%%getlen:
inc edx
cmp byte [ecx + edx], 0
jnz %%getlen
%endif
section .text
call write_stdout
popa
%endmacro
%macro getstring 2
section .bss
%2: resb %1
section .text
pusha
mov edx, %1
mov ecx, %2
xor ebx, ebx
push byte 3
pop eax
int 80h
mov byte [ecx + eax - 1], 0
popa
%endmacro
%macro putnumber 1
push dword %1
call showdec
%endmacro
victorious equ 0
abject equ -1
%macro depart 1
mov ebx, %1
push byte 1
pop eax
int 80h
%endmacro
write_stdout:
push byte 1
pop ebx
push byte 4
pop eax
int 80h
ret
read_stdin:
xor ebx, ebx
push byte 3
pop eax
int 80h
ret
showdec: ; stdcall
pusha
mov eax, [esp + 4 + 32]
mov ecx, esp
sub esp, byte 16
mov ebx, 10
xor esi, esi
.top:
dec ecx
xor edx, edx
div ebx
add dl, '0'
mov [ecx], dl
inc esi
test eax, eax
jnz .top
mov edx, esi
call write_stdout
add esp, byte 16
popa
ret 4
global _start
section .text
_start:
OK, you're right, it doesn't say that it's "required" to be
signed, it simply states, in a normative part of the spec.,
that it _is_ signed.
Have it your way, I refuse to join you on planet woo-woo.
Join the cranks in the killfile, you've passed the entry test
with flying colours.
this is false in general
the readability and the optimal language depends
> In terms of "how to do it in Linux", I'm *learning* more from the
> latter! If it *is*, in fact, harder to read... "no pain, no gain".
Sure, that's why I always say, that people should do assembly programming
for learning and HL programming for writing applications. But for
learning you should do assembly programming at the lowest level (I even
prefer it to do it without a linker). And when you want to make the
step to writing applications, in my opinion it is not the correct way
to just add some HL constructs to the assembler, but you should switch
to a real HLL (but not forgetting your assembly knowledge when using
the HLL).
But in this sub thread we ar no longer taking about "Linux / NASM equivalent
of Iczelion's Win32 assembly tut's" but your condition ("If that doesn't do
it, I'm switchin' to C! :)") for switching to C. All I said is, C is worth
a try even if this condition isn't met.
> As you observed, "#define NO_SEG -2L" would have fixed it.
> Which is more "readable" (in the "why" sense) depends on whether you're
> reading nasm.h or labels.c (a general problem with "modular" code, IMO).
> I favor Chuck's solution, making it obvious (???) in labels.c.
I don't think you have a free choice. Somewhere near the definition of
NO_SEG there should be some definition like:
/* odd: process
even: don't process */
or
/* positive odd: process
negative or even: don't process */
And depending on this documentation you have to choose the correct code.
And if there is no such documentation, then NASM source is in a poor state.
> I don't think you have a free choice.
Although it was difficult to tell, for a while, the emergence of Nasm
2.0 has shown that Nasm is still an evolving product, and therefore the
standard is whatever the developers (in our collective wisdom)
say it is. Thus, subject to review by the rest of the team, both Frank
and I have the ability to add, delete, or modify comments whenever we
feel that the changes make the code more readable.
> Somewhere near the definition of
> NO_SEG there should be some definition like:
NO_SEG is a global variable, defined in nasm.h as
"#define NO_SEG -1L /* null segment value */".
Since it is intended to be totally outside the realm of even and odd
segment values, it would be highly misleading to place a comment
relating to a localized exception to an exception anywhere near the
global definition.
> /* odd: process
> even: don't process */
>
> or
>
> /* positive odd: process
> negative or even: don't process */
Both of these comments are highly misleading, no matter where they are
placed. The normal operation of the code line in question is to process
symbols for both odd and even segments on the final pass. The exception
condition is to process certain special symbols related to odd
numbered segments on an earlier pass. The incorrect behavior is that
certain symbols which are not related to any section get processed once
during the first optimization pass, and again on the final pass.
What the define_label code needs is a comment that points out which
special symbols need to be processed by the output driver before
the final pass, and perhaps why. And if I ever take the time to figure
it out, I will add such a comment.
--
Chuck
http://www.pacificsites.com/~ccrayne/charles.html