Hi Dzemal. This newsgroup is pretty much dead, which is why you aren't getting any responses. Try posting in news:comp.lang.asm.x86 - that's a moderated group, and you *might* have to cc clax-sub...@crayne.org to get through. Also news:alt.lang.asm is a lot more active than this one, tho not limited to x86.
I don't think I'll be able to help you very much.
> Ok, here's the c++ function:
In the first place, I don't know C++ :) (but this doesn't look too tough)
In the second place, I'm not sure I understand what we're doing here. What video mode is this? (I only know the really simple-minded ones) It *looks* to me like maybe a 16-color mode - that is, 4 bits per pixel. Is that right? I think I'm confused, because it looks to me that while we set the color of an "odd" pixel, we zero out the "even" pixel, and vice versa. I think I'm missing something - no great surprise. Wait, we're basing this on whether the *row* is odd or even. I'm just confused.
It's really important to understand exactly what we *need* to do here, because the first step is to optimize the algorithm, before we even *think* about asm. A crappy algorithm implemented in exquisite assembly is still crappy code!
> VC++ disassembly window (debug build of course therefore unoptimized) says > this translate to:
Can't you get VC++ to spit out asm for an optimized compile?
> mov eax,dword ptr [ebp+0Ch] > sar eax,1 > mov word ptr [ebp-4],ax
Okay, this is just calculating "q" and storing it in a temporary variable. I'm not sure we need to do this at all (store it, that is...). I think we'd be better off if we hadn't made it int16, too - the processor is generally happier operating on it's "native" size.
> movsx edx,word ptr [ebp-4]
Move "q" into edx - we do this repeatedly, and I don't think we need to. I think we could just use edx in the first place, and leave it there.
> mov eax,dword ptr [ebp+10h]
"x"
> mov eax,dword ptr [eax*4+4369E8h]
"*x", if I understand it...
> mov ecx,dword ptr [ebp+0Ch]
"y"
> and ecx,1 > neg ecx > sbb ecx,ecx > and ecx,4
Make cl either 4 or 0 ...
> mov ebx,0Fh > shl ebx,cl
Make ebx either 0F0h or 0Fh...
> mov cl,byte ptr [eax+edx]
Get our "destination" byte.
> and cl,bl
Mask out just the nibble we want (whyever we want it! :)
> Now I don't know a first thing about asm but still, 36 INSTRUCTIONS ???!!
Hehe! The "first thing about asm" is that it's most likely *going* to take a lot of instructions :) Don't put too much stock in the instruction count - often speed-optimized code is a *lot* longer than the size-optimized version. But I think an optimized version of this would be shorter just by eliminating duplicate code. I'd think in terms of calculating the destination address *once* and hanging on to it. Likewise, the "shift-count" could probably be figured just once - I still don't get why we're basing this on "y"! (so my analysis above may be totally off-base)
> I figure there must be a more optimal way to encode this in assembler (I > will use C++ inline assembler)
I don't know the syntax for C++ inline assembler (I'm a devout Nasmist), so I'm not going to be able to help you there. It's a problem, because I can't readily test any "bright ideas" I might come up with :)
> So, can anyone PLEASE help me with better asm code- I desperately need this > routine to run as fast as possible.
You may need to re-think what you're doing "from the top". If you're filling any horizontal "runs" of pixels, for example, there are faster ways of doing this than by calling setpixel repeatedly - and other speedups. You might want to store your data in a different manner for faster access - "offset thinking" is faster than "row-column thinking". At the very least, we can get rid of some of the unneccessary code!
It would be interesting to see what your compiler comes up with for optimized code, if you can get it to do it.
But there are surely optimized setpixel's available, if you know where to look. And folks more knowledgeable than I in a livelier newsgroup. I'm going to cross-post this to alt.lang.asm - I hope you can look for replies there(?). We *desperately* need some asm to talk about over there! :)
On Thu, 21 Feb 2002 00:08:13 GMT, Frank Kotler spake thus:
>Dzemal Kulenovic wrote:
>Hi Dzemal. This newsgroup is pretty much dead, which is why you aren't >getting any responses. Try posting in news:comp.lang.asm.x86 - that's a >moderated group, and you *might* have to cc clax-sub...@crayne.org to >get through. Also news:alt.lang.asm is a lot more active than this one, >tho not limited to x86.
Hi :) As Frank says, c.l.a is not the liveliest of assembly language groups, I read this post in a.l.a
>> Ok, here's the c++ function:
>In the first place, I don't know C++ :) (but this doesn't look too >tough)
If I understand this correctly, we are saying (in PseudoCode):
{ define z = (pBitmapRow[x])[q]
int q = y shr 1 ; ebp+0Ch
if(y == 1) { z = z AND 0xF0 } else { z = z AND 0xF }
if(y == 1) { z = z OR color } else { z = z OR (color shl 4) }
}
I'm not sure how you would code (pBitmapBow[x])[q] in assembler, so I'll leave that part of it the same as in the original code (you'll ahve to convert to make use of the library code).
; the above constant is set up by the system and is dependant on the ; position in memory of the resident code. I don't know how you would ; code that in assembler.
>> mov ecx,dword ptr [ebp+0Ch] >> and ecx,1
and ecx,1 ; ecx = y & 1
>> neg ecx >> sbb ecx,ecx >> and ecx,4
shl ecx,2 ; ecx = (y & 1 ? 4 : 0)
>> mov ebx,0Fh >> shl ebx,cl
mov ebx,0Fh shl ebx,cl ; ebx = 0xF << (y&1 ? 4:0)
>> mov cl,byte ptr [eax+edx] >> and cl,bl
xor edx,edx mov edx,byte ptr[eax+esi] and edx,ebx ; edx = z & (0xF << (y&1 ? 4:0))
The following code makes little sense to me (basically because it makes no attempt to optimise as a routine), so I'll ignore it and just show you how I would write it without trying to compare to the code MSVC gave you.
xor cl,4 ; ecx = (y & 1 ? 0 : 4) mov ebx,[ebp+14h] ; ebx low byte = color ; the processor would have pushed as ; a dword, so this is OK.
shl ebx,cl ; ebx = color<<(y&1 ? 0:4)
or edx,ebx ; z = z OR color<<(y&1 ? 0:4) ; edx = dl = z mov byte ptr[eax+esi],dl
>> Now I don't know a first thing about asm but still, 36 INSTRUCTIONS ???!!
I'm sure I havent provided the most optimal way of doing it, but 18 instructions with 6 memory accesses, instead of 36 instructions and 21 memory accesses!
The part that you will have to work out (if nobody else is able to help) is how to get the address for (pBitmapRow[x]), as it won't be 4369E8h every time you run it.
>> I figure there must be a more optimal way to encode this in assembler (I >> will use C++ inline assembler)
>I don't know the syntax for C++ inline assembler (I'm a devout Nasmist), >so I'm not going to be able to help you there. It's a problem, because I >can't readily test any "bright ideas" I might come up with :)
Me neither, but I figure that using the syntax in the posted code will provide something that should work :)
>> So, can anyone PLEASE help me with better asm code- I desperately need this >> routine to run as fast as possible.
Like Frank I'm a commited nasm user, and I haven't got a lot of idea how to do things with other assemblers. I pick it up as I go, by looking at how others write code for those other assemblers (and, of course, by looking at code samples online) :) The above should do the same as the code VC produced, but it will be a lot faster (more than twice the speed, because of the number of memory accesses saved as well as being half the number of instructions).
I agree with what Frank said about rethinking the algorithm if you are writing a lot of pixels. A line can be drawn far faster than writing it pixel by pixel. whether it's horizontal, vertical or diagonal, as coding the entire line in assembler allows you to write multiple pixels in one write (if they are adjacent horizontally) and to reuse values in registers without saving in memory between calls when they are not horizontally adjacent.
-- Debs d...@dwiles.nospam.demon.co.uk ---- If you're not part of the solution, start another problem!
Frank Kotler wrote in message <3C743A59.61DFD...@ne.mediaone.net>... >Dzemal Kulenovic wrote:
>Hi Dzemal. This newsgroup is pretty much dead, which is why you aren't >getting any responses. Try posting in news:comp.lang.asm.x86 - that's a >moderated group, and you *might* have to cc clax-sub...@crayne.org to >get through. Also news:alt.lang.asm is a lot more active than this one, >tho not limited to x86.
It seemed to be dead. Then we get this post from PacBell and discover that it's really been alive, but stuff has been getting lost. Shades of comp.lang.asm.x86! Randy Hyde