xor.asm(31) : error A2070: invalid instruction operands
Here's the code: http://pastebin.com/cU0eQDmb
The highlightied line is the one that causes the error.
I have no idea what could be wrong.
Any help would be appriciated.
Thanks in advance,
xarti
The line in question is
xor eax,Key[edx]
where Key is defined (via db) later in the code area. Just
for a test, try moving Key up into the .data area ahead of
KeyLength. (That's what the .data area is there for.) I
have sometimes seen MASM gag on certain forward references,
so this will remove that possible problem.
You can move Payload to the .data area as well. When you
later want to operate on file data, you can have Payload be
in the .data? area and load it from the file. If the file
may be longer than the Payload length, you can allocate
memory dynamically (GlobalAlloc under Windows), or you can
read it into Payload one chunk at a time.
One other thing to be careful of is confusing the offset of
a location with the contents at that location. Normally
under MASM when you use a variable name alone, it is
interpreted as the *contents* of that location. If you want
to refer to the offset, you would use an explicit OFFSET in
the command. So you should change your KeyLength equate to
KeyLength equ $ - OFFSET Key
to conform to MASM standards. (Though in this case it may
be OK as is, since an equate can't use memory contents and
MASM may be smart enough to "do what you want", it's best to
get into a rigorous habit with this.) Again, Key should
appear in .data ahead of KeyLength to insure MASM is happy.
Please note that I have made no attempt to check the logic
of your code, just looked for possible MASM issues.
Best regards,
Bob Masta
DAQARTA v5.10
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
Frequency Counter, FREE Signal Generator
Pitch Track, Pitch-to-MIDI
DaqMusic - FREE MUSIC, Forever!
(Some assembly required)
Science (and fun!) with your sound card!
Masm won't run on my OS, but my guess was that Masm "remembers" that
"Key" was "db", and therefore was taking "Key[edx]" to be a byte, which
he's trying to compare to a dword register. The comment says "byte", and
the logic calls for byte - I think he wants al here (and a few other
places), not eax.
> You can move Payload to the .data area as well.
You notice he's executing Payload, right?
Hmmmm... If I'd looked at that code before approving the message, I
might have set it aside for Nathan's opinion before approving it. We
don't do malware here! But since I did approve it, and since it *could*
be legitimate...
> When you
> later want to operate on file data, you can have Payload be
> in the .data? area and load it from the file. If the file
> may be longer than the Payload length, you can allocate
> memory dynamically (GlobalAlloc under Windows), or you can
> read it into Payload one chunk at a time.
If that's his intention...
> One other thing to be careful of is confusing the offset of
> a location with the contents at that location. Normally
> under MASM when you use a variable name alone, it is
> interpreted as the *contents* of that location. If you want
> to refer to the offset, you would use an explicit OFFSET in
> the command. So you should change your KeyLength equate to
> KeyLength equ $ - OFFSET Key
> to conform to MASM standards. (Though in this case it may
> be OK as is, since an equate can't use memory contents and
> MASM may be smart enough to "do what you want", it's best to
> get into a rigorous habit with this.) Again, Key should
> appear in .data ahead of KeyLength to insure MASM is happy.
In Nasm, that expression would need to immediately follow "Key" to work
as intended...
> Please note that I have made no attempt to check the logic
> of your code, just looked for possible MASM issues.
Well, I can't check Masm issues, but I wondered what Payload decrypted
into. I mutilated the code into something Nasm/Linux would assemble/run.
I'm not going to execute Payload, thanks, so I modified it to spit the
decrypted Payload to stdout, captured the file, and disassembled it. I
don't get "sensible" code out of it, malicious or otherwise. Perhaps I
did something wrong, perhaps there's something wrong with the original
code, but I still can't determine the "intent".
So why don't you tell us, xarti, what are you up to?
Best,
Frank
To sum up what I have written my intentions are good and I'm just
looking for the knowledge.
xarti
Okay. I apologize for doubting you. I still think it's odd code for a
beginner to tackle...
My disassembly starts off making sense...
push 0 ; MB_OK
jmp getcaption
gotcaption:
pop eax
jmp gettitle
gottitle:
pop ebx
push eax
push ebx
push 0
call... well, I suppose it's MessageBox...
push 0
call... well, I suppose it's ExitProcess
At this point, I'd expect to see:
getcaption:
call gotcaption
db 'sample caption', 0
gettitle:
call gottitle
db 'sample text', 0
Or similar (I'm making up names, obviously). But at this point, my
attempt to decrypt goes all to pieces. Doesn't "make sense" as code, and
doesn't appear to be ascii text. Quite possibly my error, not yours - I
butchered your code some. :)
Did you get past making Masm do "xor eax, Key[edx]"?
I ASSumed you really wanted al there. Maybe why my code doesn't work...
Maybe "xor eax, dword ptr Key[edx]"? (a "cast"?) Try what Bob suggests,
too...
Once you get by that, you may have trouble writing the decrypted byte
back to the code section. I don't think Windows will let you do that
(Linux definitely doesn't!). There are "linker tricks" that will allow
the code section to be writeable... saw it fairly recently, but
unfortunately, I forget the syntax...
Again, sorry to have doubted your motives...
Best,
Frank
I don't use Masm so these are just guesses based on what other
assemblers may or may not expect:
1. Key is the only label without a colon - does it need one?
2. Key is preceded by a comma (at the end of the Payload db
definition). Why is that there?
3. Do you need the complicated semicolon constructs? For example,
could you replace
db 00bh, 064h, 08dh, 07fh, 032h, 0d2h, 015h,\ ;
069h, 022h, 039h, 00fh, 064h, 0d8h, 055h,\ ;
with
db 00bh, 064h, 08dh, 07fh, 032h, 0d2h, 015h
db 069h, 022h, 039h, 00fh, 064h, 0d8h, 055h
By the way, you should be commended for a clear description of the
problem. :-)
James
>Hi,
>I'm new to assembler and to this group by the way. I'm learning to
>code in masm and I wanted to write simple code encryption. Based on
>the XOR cipher I made a cipher payload and now I'm trying to decipher
>it on fly but I get this error:
>
>xor.asm(31) : error A2070: invalid instruction operands
>
>Here's the code: http://pastebin.com/cU0eQDmb
>The highlightied line is the one that causes the error.
>I have no idea what could be wrong.
MASM is partially type-aware. It knows that Key is defined as a byte
array, and you're trying to use it as a dword here. The comments talk
about handling a BYTE at a time; if that's really what you want to do, then
you need to be using the byte-wide registers, not the dword-wide registers:
mov al, Payload[ecx]
cmp al, 090h
....
xor al, Key[edx]
mov Payload[ecx], al
Since ecx and edx move in lockstep here, there's really no reason to burn
two registers for them, is there?
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
As I understand it, edx goes back to zero and "Key" is reused from the
beginning... (I'm not sure "KeyLength" is correctly calculated, as shown...)
Best,
Frank
However the payload is miscalculated, error in my C program.
After the fixation hope it will work fine. :)
And normally the code is no use, because .code section is read/
executable only.
I've managed to make it writable however I won't post the info how as
I've noticed
it can be used to start making malware and people around here don't
like stuff like this.
Big thanks to everybody and I'm going on in my search for knowledge :)
Need to mention that included payload is miscalculated, error in my C
program.
I'm going to do it again and hope everything will be fine :)
However, normally program is no use, because in windows (same as in
linux)
code section is only readable and executable. I've managed to make it
writable but I won't post the info how since I've noticed it can help
in start to learn how to make malware. And maybe I'm new in this group
but I realised that people around here don't like malware.
Big thanks to everybody and I'm moving further on my search for
knowledge :)
Thanks,
xarti
>Once you get by that, you may have trouble writing the decrypted byte
>back to the code section. I don't think Windows will let you do that
>(Linux definitely doesn't!). There are "linker tricks" that will allow
>the code section to be writeable... saw it fairly recently, but
>unfortunately, I forget the syntax...
I posted this info on 19 Jul 2010 under the "Code
self-change" thread. Here it is again:
<quote>
Under Windows (using MASM32) It's actualy trivial to combine
code and *initialized* data segments. You just put
everything in .code, and don't use a .data segment (only
data? for uninitialized). Then tell the linker with
/SECTION:.text,ERW
I think you can run code in allocated memory as well, by
telling Windows what you want to do, but I've never needed
that.
<end quote>
<snip>
>However, normally program is no use, because in windows (same as in
>linux)
>code section is only readable and executable. I've managed to make it
>writable but I won't post the info how since I've noticed it can help
>in start to learn how to make malware. And maybe I'm new in this group
>but I realised that people around here don't like malware.
See my reply to Frank in this same thread. My method
involves the way you set up your code and data sections, and
the instructions to the linker. It's not something that
would be useful to malware writers who want to tamper with
existing code.
Please note that self-modifying code is important for apps
that use self-decryption or self-decompression. It's a
perfectly valid technique, even if the pedants denigrate it
like the venerable GOTO in BASIC. <g>
Good! (if true)
> Need to mention that included payload is miscalculated, error in my C
> program.
Okay... I'll stop trying to decrypt it. :)
> I'm going to do it again and hope everything will be fine :)
Hope and change, eh? (aka "trial and error") :)
> However, normally program is no use, because in windows (same as in
> linux)
> code section is only readable and executable. I've managed to make it
> writable but I won't post the info how since I've noticed it can help
> in start to learn how to make malware. And maybe I'm new in this group
> but I realised that people around here don't like malware.
That's true. At least *I* don't like malware - a lot! However, this
isn't malware, so it isn't an issue. Bob has (re-)posted the syntax -
knew I'd seen it recently... I imagine it's in the Friendly Manual. I
may have mentioned that in Linux, with Nasm, I can do "section .text
write", and Nasm will make it writeable in the .o file, but ld (the
linker) "knows" that .text is supposed to be read-only and changes it
back. I imagine there's a way to override that... and I imagine it's in
the Friendly Manual. "section kode write exec" solves the problem...
Now that I think of it, a writeable code section is probably more of a
vulnerability to malware, than an attribute of it. As Bob points out,
there are legitimate reasons to do it.
> Big thanks to everybody and I'm moving further on my search for
> knowledge :)
Okay. One good thing to know is that code can "work" and still have bugs
in it...
.386
.model flat, stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
.data
Key db "adfkj902rjed0f8j230fjewf" ;key for deciphering
KeyLength equ $- offset Key
James picked up that "Key" doesn't have a colon after it, but "Payload"
does. That *will* make Masm treat it differently, I think!
.data?
Buffer db ?
Unused. Doesn't hurt.
.code
start:
jmp Go
You don't actually have to do this. Your entrypoint is where the start
label (specified by "end"!) is. It doesn't have to be the first thing in
your code section. It is permissible - fairly common - to put
subroutines here, without having to jump over them. Just put your
"start" label down where "Go" is.
Payload:
db 00bh, 064h, 08dh, 07fh, 032h, 0d2h, 015h,\ ;
069h, 022h, 039h, 00fh, 064h, 0d8h, 055h,\ ;
038h, 06ah, 032h, 059h, 030h, 08eh, 04ch,\ ;
065h, 077h, 061h, 08ch, 081h, 094h, 095h,\ ;
0c6h, 063h, 053h, 01fh, 01ah, 009h, 001h,\ ;
010h, 025h, 059h, 01ah, 046h, 05ah, 05fh,\ ;
008h, 06ah, 08dh, 0a1h, 09eh, 09bh, 099h ;
db 038h, 00bh, 054h, 040h, 05eh, 017h, 06ah,\ ;
090h, ;
Go:
mov ebx, KeyLength ;setting the Keylength
mov ecx, 0 ;zeroing the main iterator
mov edx, 0 ;zeroing the key iterator
No longer used for this purpose...
lea eax, Key
Okay...
Cipher:
cmp Payload[ecx], 090h ;checking if the byte isn't the NOP
je Payload ;if so jump to Payload
While you've placed 90h (nop) to mark the end of Payload, some byte in
the "plain text" payload could xor into 90h, no? Possible "false end"?
cmp ebx, edx ;checking if the key iterator didn't reach the length limit
jg Skip ;if not the skip zeroing the next instruction
ebx holds KeyLength. edx holds... who knows, in this version of the
code? Was the "key iterator". This is going to jump if ebx is greater
than edx...
lea eax, Key ;zeroing the key iterator
Skip:
mov edx, [eax]
But here we use edx to hold four bytes of "Key"...
xor Payload[ecx], edx ;xoring the byte of payload
... and xor *four* bytes of payload with it. It is possible to work four
bytes at a time, but...
inc ecx ;Iterator++
... you only go to the next byte of "Payload" for the next go-round.
Unless this undoes whatever you did to create "Payload" in the first
place...
I really think you want to be using an 8-bit register to get a single
byte of "Key" and xor it with a single byte of "Payload". I suspect
that's what Masm was complaining about in the first place (see Tim's post).
inc edx ;KeyIterator++
Not anymore...
inc eax
jmp Cipher ;returning to the beggining of the loop
end start
Trying to use edx for two purposes definitely looks like a problem to
me! The rest of it I'm less sure of.
I'm a little puzzled how (if) this is going to work, anyway. I'm
ASSuming "Payload" was created by xoring a MessageBox executable with
your "Key", and that the executable was made position independent by
employing a "trick" to find the addresses of "caption" and "text". When
we come to "call MessageBox", the assembler will emit E8 xx xx xx xx.
But unless I'm mistaken, the xx xx xx xx is filled in by the linker. The
Linker knows the address, 'cause it put MessageBox there. When we
assemble and link *this* code, how does the linker know to put
MessageBox in the right place? What am I missing?
Best,
Frank
I got the code to assemble, but it brings up my debugger when run.
The jmp Cipher makes it an endless loop.
<snip>
>James picked up that "Key" doesn't have a colon after it, but "Payload"
>does. That *will* make Masm treat it differently, I think!
MASM does not require a colon after the label in a DB (or
DW, DD, etc) line, probably since the usage is clear from
the DB itself. You do need the colon to label code,
otherwise MASM tries to treat the label like an instruction.
I don't suppose it would *hurt* to use a colon with DB (the
"belt and suspenders" approach?), but I've never done it
myself so I can't say for certain.
...
> MASM does not require a colon after the label in a DB (or
> DW, DD, etc) line, probably since the usage is clear from
> the DB itself. You do need the colon to label code,
> otherwise MASM tries to treat the label like an instruction.
Since you know some about Masm syntax can you say whether the <comma>
<backslash> <semicolon> has a useful effect in the following which was
taken from the OP's post? I presume the semicolon starts a nonexistent
comment and is unnecessary but what of the other two? Are both needed
and are they there to convey the length of Payload? I couldn't find
the info on MSDN which seemed to be present more to say there was
documentation online than to provide a useful reference about Masm. :-
(
Payload:
db 00bh, 064h, 08dh, 07fh, 032h, 0d2h, 015h,\ ;
069h, 022h, 039h, 00fh, 064h, 0d8h, 055h,\ ;
038h, 06ah, 032h, 059h, 030h, 08eh, 04ch,\ ;
065h, 077h, 061h, 08ch, 081h, 094h, 095h,\ ;
0c6h, 063h, 053h, 01fh, 01ah, 009h, 001h,\ ;
010h, 025h, 059h, 01ah, 046h, 05ah, 05fh,\ ;
008h, 06ah, 08dh, 0a1h, 09eh, 09bh, 099h ;
db 038h, 00bh, 054h, 040h, 05eh, 017h, 06ah,\ ;
090h, ;
Why is the second DB split off? Might Masm relocate it so it doesn't
follow the first in memory? What about the comma at the end of the
last DB; will it have any effect?
Don't worry if no one knows. I've no pressing reason to change to use
Masm and am just curious.
James
Good catch, and good question! The backslash is just a line
continuation character, and would indeed allow something
like
mov ecx,SIZEOF Payload
(The OP did not use that in the original code.) The
semicolon is indeed superfluous here.
>Why is the second DB split off? Might Masm relocate it so it doesn't
>follow the first in memory? What about the comma at the end of the
>last DB; will it have any effect?
The second DB would *not* be part of Payload.
>Don't worry if no one knows. I've no pressing reason to change to use
>Masm and am just curious.
Best regards,
<Snippage>
Hi,
I made a MASM listing to see what was happening. Hope
things don't wrap too badly.
010C Payload:
010C 0B 64 8D 7F 32 D2 db 00bh, 064h, 08dh, 07fh, 032h, 0d2h, 015h,\ ;
15 69 22 39 0F 64
D8 55 38 6A 32 59
30 8E 4C 65 77 61
8C 81 94 95 C6 63
53 1F 1A 09 01 10
25 59 1A 46 5A 5F
08 6A 8D A1 9E 9B
99
069h, 022h, 039h, 00fh, 064h, 0d8h, 055h,\ ;
038h, 06ah, 032h, 059h, 030h, 08eh, 04ch,\ ;
065h, 077h, 061h, 08ch, 081h, 094h, 095h,\ ;
0c6h, 063h, 053h, 01fh, 01ah, 009h, 001h,\ ;
010h, 025h, 059h, 01ah, 046h, 05ah, 05fh, ; < experiment.
008h, 06ah, 08dh, 0a1h, 09eh, 09bh, 099h ;
013D 38 0B 54 40 5E 17 db 038h, 00bh, 054h, 040h, 05eh, 017h, 06ah,\ ;
6A 90
090h, ;
0145 RESULT DW SIZEOF Payload
test25a.asm(25) : error A2143: expected data label
The backslash seems to act as a line continuation character
in MASM 6+. MASM 5.0 didn't like it. An experiment seems
to imply that the trailing comma can work by itself. That also
generates an error in 5.0.
The Payload with a comma means it is a code label and not
a data label, so SIZEOF does not work.
>Why is the second DB split off? Might Masm relocate it so it doesn't
>follow the first in memory? What about the comma at the end of the
>last DB; will it have any effect?
Doesn't seem to cause anything untoward.
Regards,
Steve N.
there is actually no secret in coding self-modifying-code,
neither in windoze nor in Loonix. Both use a flat 'unsegmented'
variant, here DS == CS and so pray tell how any write to DS: and
execute the stuff at CS: could invoke any exception ...
paging may show a difference in accessing by CS: vs DS:, but when
both point to the same address, what the heck should happen?
> And maybe I'm new in this group but I realised that people
> around here don't like malware.
we all here started once as newbies, at least new to news-group
and the regular posters (almost variable) behaviour :)
> Big thanks to everybody and I'm moving further on my search for
> knowledge :)
be welcome to ask away whatsoever might not be clear enough ...
Vendor-docs and other help-sources aren't always written by
folks who know (at all, or just not in detail) or more worse
often hide behind abstractions like C-Libs and driver DLLs.
__
wolfgang
Pagefault, of course :) paging is the new segmentation !
Memory management of modern OSes and MS-Windows-NT+ uses 386+ paging
hardware to do all sorts of neat things like page sharing, demand
paging, copy-on-write and other performance & memory "improvements"
so the bloatware doesn't run as slowly as it ought to.
SMC is complex on these systems because code pages (usually 4kB)
are marked read-only so they can be shared. Any attempt to
write to those pages will segfault. To do SMC (the Linux kernel
used some trampoline code), you have to write the code on a page
marked r/w (most likely CoW) like .data . More modern systems
with execute-bits should avoid using .stack.
-- Robert
> Pagefault, of course :) paging is the new segmentation !
would this mean that I can't have two identical mapped pages with
different attributes ? at least in my own Os ? ;)
> Memory management of modern OSes and MS-Windows-NT+ uses 386+ paging
> hardware to do all sorts of neat things like page sharing, demand
> paging, copy-on-write and other performance & memory "improvements"
> so the bloatware doesn't run as slowly as it ought to.
:)
> SMC is complex on these systems because code pages (usually 4kB)
> are marked read-only so they can be shared. Any attempt to
> write to those pages will segfault. To do SMC (the Linux kernel
> used some trampoline code), you have to write the code on a page
> marked r/w (most likely CoW) like .data . More modern systems
> with execute-bits should avoid using .stack.
yes, allocate DATA and write/modify/execute within this block
was the way it worked on winXP when I last tried.
If an Os inhibits execution in data pages then it may need
twice as much memory for tiny DLLs and short applications.
__
wolfgang
and here I have thank all who sold/sell heavy bloated code,
this really made memory-chips faster, larger and cheaper.
Doch! the page attributes are part of the LDT entry, and
you can have different attributes for the same physical RAM.
This is alot like common data pages and needs careful protection
(semaphore/spinlock mutex). You might not want some other process
or processor accessing SMC before the code was completely modified
-- the results might be indistinguishable from MS-Windows :)
> If an Os inhibits execution in data pages then it may need
> twice as much memory for tiny DLLs and short applications.
Nowhere have I heard about inhibiting execution on .data pages.
Only on .stack to change the style of stack-overflow exploits.
Eliminating exec on .data pages makes interpreters like Java much
more difficult -- they have to be run as emulators. But I wouldn't
be surprised to see it in some return of [un]"Trusted Computing".
-- Robert
>wolfgang kern <now...@never.at> wrote in part:
>>
>> would this mean that I can't have two identical mapped pages
>> with different attributes ? at least in my own Os ? ;)
>
>Doch! the page attributes are part of the LDT entry, and
>you can have different attributes for the same physical RAM.
I assume you meant to say "PTE", not "LDT". The LDT, which is not used by
modern 32-bit operating systems, doesn't have any page attributes.
>>>> paging may show a difference in accessing by CS: vs DS:, but
>>>> when both point to the same address, what the heck should happen?
>>> Pagefault, of course :) paging is the new segmentation !
>> would this mean that I can't have two identical mapped pages
>> with different attributes ? at least in my own Os ? ;)
> Doch! the page attributes are part of the LDT entry, and
> you can have different attributes for the same physical RAM.
LDT ?
> This is alot like common data pages and needs careful protection
> (semaphore/spinlock mutex). You might not want some other process
> or processor accessing SMC before the code was completely modified
I think nobody with a halfway sane brain would try SMC on 'current
hot' code parts like IRQ-routines or other CPU's active code ;)
> -- the results might be indistinguishable from MS-Windows :)
an Os dont need to know everything what an application does ...
>> If an Os inhibits execution in data pages then it may need
>> twice as much memory for tiny DLLs and short applications.
> Nowhere have I heard about inhibiting execution on .data pages.
> Only on .stack to change the style of stack-overflow exploits.
I didn't check in detail, but doesn't Linux use apart access
right bits for RD, WR and Execute ?
> Eliminating exec on .data pages makes interpreters like Java much
> more difficult -- they have to be run as emulators. But I wouldn't
> be surprised to see it in some return of [un]"Trusted Computing".
It may increase security, at least in the view of some HLL-coders :)
__
wolfgang
Page Table Entry (PTE), as Tim helpfully corrected.
>> This is alot like common data pages and needs careful protection
>> (semaphore/spinlock mutex). You might not want some other process
>> or processor accessing SMC before the code was completely modified
>
> I think nobody with a halfway sane brain would try SMC on 'current
> hot' code parts like IRQ-routines or other CPU's active code ;)
Of course not. But without protection it might inadvertantly
happen: one of the less-poor justifications for SMC is customizing
heavily used code for speed like inner loops of video codecs or
finite-element analysis. This sort of code is also frequently
multi-threaded, running on as many CPUs as you have (4+) to complete
quickly. Then each thread had better have its' own code, which
it will not if you trick sharded codepages into being writable.
> an Os dont need to know everything what an application does ...
That is the idea. But then there are consequnces -- since the
OS doesn't know, should it trust the app (MS-DOS/Win9*) or assume
the app may be buggy/hostile (OS/2, Linux, *BSD, MS-WinNT+)?
> I didn't check in detail, but doesn't Linux use apart access
> right bits for RD, WR and Execute ?
It and other unices (*BSD) do, but those only apply to files
the OS loads. The much newer "execute" bit applies to pages so
viruses/trojans cannot execute their imported code via stack overflows.
>> Eliminating exec on .data pages makes interpreters like Java much
>> more difficult -- they have to be run as emulators. But I wouldn't
>> be surprised to see it in some return of [un]"Trusted Computing".
>
> It may increase security, at least in the view of some HLL-coders :)
Well, even HLL coders are not _that_ stupid. Managers, anal ysts
and reporters might be. Data-only stack buffer overflow exploits
are still possible.
-- Robert
> Of course not. But without protection it might inadvertantly
> happen: one of the less-poor justifications for SMC is customizing
> heavily used code for speed like inner loops of video codecs or
> finite-element analysis.
I just use SMC to save on many various display routines for
clipping and mouse-rectangle calculations when a user or an
application changes screen-resolution. Timing is not a big
matter during the change, but altering the immediate parts of
cmp-instructions instead of mem-variables gained 'some' speed.
> ...This sort of code is also frequently
> multi-threaded, running on as many CPUs as you have (4+) to complete
> quickly. Then each thread had better have its' own code, which
> it will not if you trick sharded codepages into being writable.
My current AMD got four cores and each know only single thread.
And I'm not fully through the manuals to understand how why and
when enabling other cores would gain me some speed on my usually
short and fast routines. Ok, I don't do noise(audio) and movies
for now. But I may need deeper study if some of my more complex
stuff can benfit from split jobs without loosing synch with my
time aware but single threaded Os-core.
>> an Os dont need to know everything what an application does ...
> That is the idea. But then there are consequnces -- since the
> OS doesn't know, should it trust the app (MS-DOS/Win9*) or assume
> the app may be buggy/hostile (OS/2, Linux, *BSD, MS-WinNT+)?
I still think that access range limits are enough protection,
everthing above looks quite paranoid and an excuse for bloat.
>> I didn't check in detail, but doesn't Linux use apart access
>> right bits for RD, WR and Execute ?
> It and other unices (*BSD) do, but those only apply to files
> the OS loads. The much newer "execute" bit applies to pages so
> viruses/trojans cannot execute their imported code via stack overflows.
I see.
>>> Eliminating exec on .data pages makes interpreters like Java much
>>> more difficult -- they have to be run as emulators. But I wouldn't
>>> be surprised to see it in some return of [un]"Trusted Computing".
>> It may increase security, at least in the view of some HLL-coders :)
> Well, even HLL coders are not _that_ stupid. Managers, anal ysts
> and reporters might be. Data-only stack buffer overflow exploits
> are still possible.
:) I better dont argue on HLL, so Frank or Nate don't need
to unsolder my tongue :)
__
wolfgang
This is a simple use of SMC which would build into more
complex uses. Especially valuable when SMC can be used
to avoid a poorly-predicted branch.
> My current AMD got four cores and each know only single thread.
> And I'm not fully through the manuals to understand how why and
> when enabling other cores would gain me some speed on my usually
> short and fast routines.
The additional cores only help when there are other execution
threads which can be run -- either other processes/interrupts
to service (useful on servers, overblown on desktops) or the
code has been written to be able to split the problem into 2+
parts so it can complete in 1/2, 1/3, 1/4 the time.
> Ok, I don't do noise(audio) and movies for now.
I'm not aware of any multi-threaded audio code, but video
manipulation & encoding remains CPU bound, and I'm not aware
of any competitive code which is _NOT_ multi-threaded.
> But I may need deeper study if some of my more complex stuff
> can benfit from split jobs without loosing synch with my
> time aware but single threaded Os-core.
This is a _huge_ problem (can't just use CLI). It moved Linux
from 1.4 to 2.0 .
> I still think that access range limits are enough protection,
> everthing above looks quite paranoid and an excuse for bloat.
Famous last words. I'm sure Microsoft thinks the same :)
Given enough chances, anything that is not positively impossible
_MUST_ happen. Modern CPUs give 2,000,000,000+ chances per second.
Rare corner cases will occur.
All code has to be written with the right objectives and mindset.
For application code where speed is important and the occasional error
will pass unnoticed (video) or self-heal (finite element), then pure
speed coding with minimal/zero error checking is perfectly appropriate.
But for systems (OS) programming where the consequences of an error are
much more serious (app or OS crash) and the speed benefit much smaller
overall, extremely defensive [paranoid] programming is appropriate.
MS-WindowsNT 3.51 was a decent OS, but later versions of NT (2000,
XP, Vista, w7) have been much less stable because the graphics
driver was moved into the kernel (ring0). Graphics drivers are
often written for performance, and including those bugs in ring0
is a recipe for crashes. There are MS-Windows systems running in
life-critical applications (medical & power plants), but those systems
have carefully controlled software and even more carefully chosen
drivers. They do _NOT_ have the latest screaming NRideon driver.
MS are not idiots, but by law they are motivated almost exclusively
by profits. This leads to decisions for the bulk of the market
which may not suit all participants. More people like the speed than
detest the crashes. Microsoft has thrived since its beginnings in a
more open- and expandible- environment. Not as open as Linux/*BSD,
but nowhere near as closed as Apple. Apple has come back from the
dead because its' current shiny toasters don't need to be open.
-- Robert
> This is a simple use of SMC which would build into more
> complex uses. Especially valuable when SMC can be used
> to avoid a poorly-predicted branch.
Oh yeah, SMC can even help to avoid branches at all ...
>> My current AMD got four cores and each know only single thread.
>> And I'm not fully through the manuals to understand how why and
>> when enabling other cores would gain me some speed on my usually
>> short and fast routines.
> The additional cores only help when there are other execution
> threads which can be run -- either other processes/interrupts
> to service (useful on servers, overblown on desktops) or the
> code has been written to be able to split the problem into 2+
> parts so it can complete in 1/2, 1/3, 1/4 the time.
I haven't found any advantage in using multi-core features yet.
The overhead with changing/split threads sounds still too much
for me now. Can you post a short ASM example where I migth see
the gain of using multicore code ?
I know that I could assign some IRQs to other cores, but as long the
OS need to to know about 'every' external event where is the gain ?
>> Ok, I don't do noise(audio) and movies for now.
> I'm not aware of any multi-threaded audio code, but video
> manipulation & encoding remains CPU bound, and I'm not aware
> of any competitive code which is _NOT_ multi-threaded.
Sure, interactive video-styled games will benefit if several
cores work in parallel, but I have no idea yet where data-
aquistition (from external devices) or huge data-bank access from
a horde of LAN-linked users can see any speedup by distributing
jobs onto several cores. There is still one BUS for (LAN-)I/O
and this one is usually assigned to the boot-core anyway and
even a modern SATA NCQ HD can remember several jobs it can't
deliver/receive data in parallel.
My Os uses timeslice multiplex for pending jobs on a time aware base.
Several apart CPUs with their own busses would gain a lot of speed,
but I heavy doubt that multiple core CPUs can gain more than a few
percent on timing compared to my current usage of the hardware.
>> But I may need deeper study if some of my more complex stuff
>> can benfit from split jobs without loosing synch with my
>> time aware but single threaded Os-core.
> This is a _huge_ problem (can't just use CLI). It moved Linux
> from 1.4 to 2.0 .
Haven't checked on Loonix after Suse told me that "I don't owe"
some files in "my fully paid" PC :)
To fully benefit from multi-core features I may have to rewrite
my whole OS from scratch.
>> I still think that access range limits are enough protection,
>> everthing above looks quite paranoid and an excuse for bloat.
> Famous last words. I'm sure Microsoft thinks the same :)
:)
> Given enough chances, anything that is not positively impossible
> _MUST_ happen. Modern CPUs give 2,000,000,000+ chances per second.
> Rare corner cases will occur.
Murphy's Law ? Me think only software/OS can be buggy,
I got a 3GHz quad yet, but hardware became almost fully trustable/
failure-aware recently (see machine check opportunities)
> All code has to be written with the right objectives and mindset.
> For application code where speed is important and the occasional error
> will pass unnoticed (video) or self-heal (finite element), then pure
> speed coding with minimal/zero error checking is perfectly appropriate.
Yes.
> But for systems (OS) programming where the consequences of an error are
> much more serious (app or OS crash) and the speed benefit much smaller
> overall, extremely defensive [paranoid] programming is appropriate.
Here I can't follow, perhaps you adopted M$'s way of "first sell"
an App/Os inclusive all bugs and be aware that it wont work correct
and later sell a repair-kit (aka upgrade) on demand ?
This methode is far away from the way how I treat my customers.
> MS-WindowsNT 3.51 was a decent OS, but later versions of NT (2000,
> XP, Vista, w7) have been much less stable because the graphics
> driver was moved into the kernel (ring0). Graphics drivers are
> often written for performance, and including those bugs in ring0
> is a recipe for crashes. There are MS-Windows systems running in
> life-critical applications (medical & power plants), but those systems
> have carefully controlled software and even more carefully chosen
> drivers. They do _NOT_ have the latest screaming NRideon driver.
I haven't seen any windoze in use for controlling medical or nuclear
devices because of the random behaviour of all M$-products.
They may be just used for result store and reporting there.
> MS are not idiots, but by law they are motivated almost exclusively
> by profits. This leads to decisions for the bulk of the market
> which may not suit all participants. More people like the speed than
> detest the crashes. Microsoft has thrived since its beginnings in a
> more open- and expandible- environment. Not as open as Linux/*BSD,
> but nowhere near as closed as Apple. Apple has come back from the
> dead because its' current shiny toasters don't need to be open.
:)
Nothing new: merchants rule the M$-world and technicians are kept quiet.
Mac Os's were and still are always above M$ in terms of smart code.
My KESYS is somehow like Mac, a fully closed system because it never
ever would execute 'foreign' code, and there isn't any other OS more
safe than mine :)
about the matter in a future aspect:
I just read latest info about LWP (AMD 43724.pdf) and if this will
once become hardware, it might change my oppinion on multi-core.
__
wolfgang
if the algo is parallelizable
have multi cpu accessible of one programmer should be very good
don't know if cpu or some OS allow that
but i have at last one case of good parallelizable algo
for example
I have a big integer N
i want to find its factors so
CPU_1
can see if Ellict curve 1 factor it
CPU_2
can see if Ellict curve 2 factor it
CPU_3
can see if Ellict curve 3 factor it
CPU_4 check with some sleep, if some of CPU_1, CPU_2, CPU_3
find the solution, if someone find the solution
set one varible that stop all the loop of CPU_1,2,3
but i don't know many algo parallelizable like above
but now i think it even the search one name in a set of file
can be parallelizable [CPU_1 search the file 1..n CPU_2 search the
files n+1..2n etc]
> __
> wolfgang
>
>
You are too linear a thinker :)
> The overhead with changing/split threads sounds still too
> much for me now. Can you post a short ASM example where I
> migth see the gain of using multicore code ?
Not asm, but consider these examples:
1) doing brute-force decryption by searching the keyspace.
Divide the keyspace into as many portions as you have CPUs.
2) for video: process two+ frames simultaneously.
3) for servers: serve multiple independant file/http clients
simultaneously. Especially important when the server-side
code is non-trivial (dynamic pages like the bletcherous ASP)
> I know that I could assign some IRQs to other cores, but as long the
> OS need to to know about 'every' external event where is the gain ?
IRQ sharing is seldom a benefit. The IRQ load of modern systems
should be <2%. multicore might reduce latency in a few cases
where the OS/IRQ is re-entrant.
> Sure, interactive video-styled games will benefit if several
> cores work in parallel, but I have no idea yet where data-
> aquistition (from external devices) or huge data-bank access from
> a horde of LAN-linked users can see any speedup by distributing
> jobs onto several cores. There is still one BUS for (LAN-)I/O
> and this one is usually assigned to the boot-core anyway and
> even a modern SATA NCQ HD can remember several jobs it can't
> deliver/receive data in parallel.
Multi-core obviously won't help if one of the busses (memory,
disk, IO) is the bottleneck. But the hardware has been improving,
and in some cases the CPU[s] are still limited.
> To fully benefit from multi-core features I may have to rewrite
> my whole OS from scratch.
Yes, to get full and automatic benefit. However, that may not be
necessary -- you do not need to adopt the "Symmetric MP" model.
You can do master-slave with little or no changes: the OS fns & all
IRQs are served the master (boot) CPU. The OS is unaware and does
not care about slave CPUs (except may have a syscall for get/free
slave and map memory). Applications grab slave CPUs and use 'em.
No OS calls possible from slave. Obviously works best if only
one app needs slaves at a time.
>> Given enough chances, anything that is not positively impossible
>> _MUST_ happen. Modern CPUs give 2,000,000,000+ chances per second.
>> Rare corner cases will occur.
>
> Murphy's Law ? Me think only software/OS can be buggy, I got
> a 3GHz quad yet, but hardware became almost fully trustable/
> failure-aware recently (see machine check opportunities)
More like rare things you think cannot happen simultaneously
can and will.
>> But for systems (OS) programming where the consequences of an error are
>> much more serious (app or OS crash) and the speed benefit much smaller
>> overall, extremely defensive [paranoid] programming is appropriate.
>
> Here I can't follow, perhaps you adopted M$'s way of "first sell"
> an App/Os inclusive all bugs and be aware that it wont work correct
> and later sell a repair-kit (aka upgrade) on demand ?
> This methode is far away from the way how I treat my customers.
No, I'm just saying that sometimes paranoia is good.
> I haven't seen any windoze in use for controlling medical or nuclear
> devices because of the random behaviour of all M$-products.
> They may be just used for result store and reporting there.
I _have_ seen MS-Windows used in life-critical applications -- mostly
as a graphical terminal to some non-PC triple redundant system.
I asked, and was told these installs are very carefully controlled,
and only crash about twice per year. This is considered acceptable
since there are other terminals which can be used to do the same control.
> My KESYS is somehow like Mac, a fully closed system
> because it never ever would execute 'foreign' code,
> and there isn't any other OS more safe than mine :)
Theo de Raadt might think his OpenBSD is safer.
-- Robert
>> I haven't found any advantage in using multi-core features yet.
> You are too linear a thinker :)
Sure true :)
>> The overhead with changing/split threads sounds still too
>> much for me now. Can you post a short ASM example where I
>> migth see the gain of using multicore code ?
> Not asm, but consider these examples:
>
> 1) doing brute-force decryption by searching the keyspace.
> Divide the keyspace into as many portions as you have CPUs.
>
> 2) for video: process two+ frames simultaneously.
>
> 3) for servers: serve multiple independant file/http clients
> simultaneously. Especially important when the server-side
> code is non-trivial (dynamic pages like the bletcherous ASP)
Ok, I know that multicore can make sense for certain jobs, but my
question for an example meant to show the required overhead before
it can start. Like setup memory-maps and distribute the code-parts
beside semaphores, signaling, INT/Task setup ...
>> I know that I could assign some IRQs to other cores, but as long the
>> OS need to to know about 'every' external event where is the gain ?
> IRQ sharing is seldom a benefit. The IRQ load of modern systems
> should be <2%. multicore might reduce latency in a few cases
> where the OS/IRQ is re-entrant.
Yes, last time I checked my IRQ-timing it was ~800 cycles per
interrupt for routines w/o I/O access like IRQ-0/8/14/15 and
4000 ..10000 cycles per IRQ for mouse, keybd,.. just because
of the down-slowed legacy ports.
So IRQ-timing is almost invisible on my 3GHz machine.
>> Sure, interactive video-styled games will benefit if several
>> cores work in parallel, but I have no idea yet where data-
>> aquistition (from external devices) or huge data-bank access from
>> a horde of LAN-linked users can see any speedup by distributing
>> jobs onto several cores. There is still one BUS for (LAN-)I/O
>> and this one is usually assigned to the boot-core anyway and
>> even a modern SATA NCQ HD can remember several jobs it can't
>> deliver/receive data in parallel.
> Multi-core obviously won't help if one of the busses (memory,
> disk, IO) is the bottleneck. But the hardware has been improving,
> and in some cases the CPU[s] are still limited.
>> To fully benefit from multi-core features I may have to rewrite
>> my whole OS from scratch.
> Yes, to get full and automatic benefit. However, that may not be
> necessary -- you do not need to adopt the "Symmetric MP" model.
> You can do master-slave with little or no changes: the OS fns & all
> IRQs are served the master (boot) CPU. The OS is unaware and does
> not care about slave CPUs (except may have a syscall for get/free
> slave and map memory). Applications grab slave CPUs and use 'em.
> No OS calls possible from slave. Obviously works best if only
> one app needs slaves at a time.
Let me see if this could fit my current main needs:
* full access for all cores to global variables inside the OS.
* paging remain disabled.
* all code runs with PL=0 (so it can't use SYSCALL/SYSRET ?)
I'll check this out, even I rename 'Applications' with 'Code-modules'
yet, because my 'applications' are token/parameter strings and wont
contain any code. All required code become part of the OS by install.
>>> Given enough chances, anything that is not positively impossible
>>> _MUST_ happen. Modern CPUs give 2,000,000,000+ chances per second.
>>> Rare corner cases will occur.
>> Murphy's Law ? Me think only software/OS can be buggy, I got
>> a 3GHz quad yet, but hardware became almost fully trustable/
>> failure-aware recently (see machine check opportunities)
> More like rare things you think cannot happen simultaneously
> can and will.
I think hardware race-conditions can be avoided by some care in
driver-code. But right, cheap mice often send incomplete packets
and also modern HDs regular react unexpected variable delayed.
Things like this should be taken in consideration of course.
>>> But for systems (OS) programming where the consequences of an error are
>>> much more serious (app or OS crash) and the speed benefit much smaller
>>> overall, extremely defensive [paranoid] programming is appropriate.
>> Here I can't follow, perhaps you adopted M$'s way of "first sell"
>> an App/Os inclusive all bugs and be aware that it wont work correct
>> and later sell a repair-kit (aka upgrade) on demand ?
>> This methode is far away from the way how I treat my customers.
> No, I'm just saying that sometimes paranoia is good.
OK.
>> I haven't seen any windoze in use for controlling medical or nuclear
>> devices because of the random behaviour of all M$-products.
>> They may be just used for result store and reporting there.
> I _have_ seen MS-Windows used in life-critical applications -- mostly
> as a graphical terminal to some non-PC triple redundant system.
> I asked, and was told these installs are very carefully controlled,
> and only crash about twice per year. This is considered acceptable
> since there are other terminals which can be used to do the same control.
Loose a sick one twice a year seems affordable ... :)
>> My KESYS is somehow like Mac, a fully closed system
>> because it never ever would execute 'foreign' code,
>> and there isn't any other OS more safe than mine :)
> Theo de Raadt might think his OpenBSD is safer.
That's might be the cause he didn't buy my OS :)
__
wolfgang
>> about the matter in a future aspect:
>> I just read latest info about LWP (AMD 43724.pdf) and if this will
>> once become hardware, it might change my oppinion on multi-core.
> if the algo is parallelizable
> have multi cpu accessible of one programmer should be very good
> don't know if cpu or some OS allow that
> but i have at last one case of good parallelizable algo
> for example
> I have a big integer N
> i want to find its factors so
> CPU_1
> can see if Ellict curve 1 factor it
> CPU_2
> can see if Ellict curve 2 factor it
> CPU_3
> can see if Ellict curve 3 factor it
> CPU_4 check with some sleep, if some of CPU_1, CPU_2, CPU_3
> find the solution, if someone find the solution
> set one varible that stop all the loop of CPU_1,2,3
Yes, but what I'd like to see is all the code required to
get such an algo splitted into parts and distributed to
memory beside the required code lines for a time chained
start of every core.
> but i don't know many algo parallelizable like above
> but now i think it even the search one name in a set of file
> can be parallelizable [CPU_1 search the file 1..n CPU_2 search the
> files n+1..2n etc]
The latter may spend more time on disk-read :)
But your idea maybe right if the cores got large enough cache
to process 'some' data while another core read from disk.
__
wolfgang
Search on MapReduce.
>>> but i don't know many algo parallelizable like above
>>> but now i think it even the search one name in a set of file
>>> can be parallelizable [CPU_1 search the file 1..n CPU_2 search the
>>> files n+1..2n etc]
>> The latter may spend more time on disk-read :)
>> But your idea maybe right if the cores got large enough cache
>> to process 'some' data while another core read from disk.
> Search on MapReduce.
Thanks, good idea even it's patented ...
it seem to use symmetric multi-processor methods,
but I couldn't find a single ASM-line on the matter.
Master/slave multicore may not need symmetric sort, but it may
need what I thought when I was talking about the required
software overhead [copy of wiki google(tm)]:
________
... The hot spots, which the application defines, are:
* an input reader
* a Map function
* a partition function
* a compare function
* a Reduce function
* an output writer
________
I still like to see this 'little' overhead in ASM.
Somehow usefull become the story with the article on MARS
for multi-core GPUs:
GPU threads have low contextswitch and low creation time
as compared to CPU threads.
__
wolfgang