Memory (bit)Maps

catcalls

unread,

Jan 26, 2011, 5:55:51 AM1/26/11

to

Hi USENET PPL:

Well, what can I say? Been working on a Memory Management System for
an Operating System yesterday, and got quite good code as a result.
Yet, today, is part two of the coding session and rather stuck it
appears!

The Problem?

1) BTx commands. How to use them? (Bit Testing Commands)
2) How to effectively cycle through a bitmap testing for an available
bit?
3) Once available bit is discovered; How do I multiply the address as
an offset into RAM?

Some Background Information:

Using a 12 bit storage. This can be made 16 bits in size for
simplicities sake.

Firstly, declare the bitmap I guess?

resb 64 (The bitmap for 512M Bytes of RAM) ; call this resA

Each bit represents 1M Byte RAM Page.

Next up, the data area in which the offset is used...

resd 100 ; call this resB

This is a random dw reserve that requires correcting in the future.

So, inside resB is where the 16-bit data structure will be stored
using the memory offset from resA bit.

But, my question is, HOW!?

*smiles*

Yes, I just want some ideas here on how to multiply the bit address as
an effective offset into RAM for 16-bit structures.

An example might be;

mov eax, resA ; The starting address of Reserve A
shl eax, 4 ; Multiply by 16 the address
add eax, 16 ; Allocate the 16-bit data structure (Yes, this seems
wrong somehow, why?O_o)

...use eax in binary Tree routine as a storage address for the 1M Byte
RAM data/address. (Yes, two addresses are being used here)

Also, any code snippet examples of BTx commands would be appreciated.
Tried google, did not help.

Thank you for taking the time to read this message; Much appreciated.

Kind regards,

Catcalls

Harold Aptroot

unread,

Jan 26, 2011, 7:38:15 AM1/26/11

to

"catcalls" <obrzu...@nospicedham.gmail.com> wrote in message
news:5ee7ffa7-0749-4b47...@u9g2000prm.googlegroups.com...

It seems to me that you are trying to search for a bit that has a particular
value in a word - why don't you just use the BTR/BTF instructions? Even if
you have to negate the operand first, it's pretty much guaranteed to be
faster, except on very old CPU's that used a slow loop for those
instructions.

harold

wolfgang kern

unread,

Jan 26, 2011, 7:44:06 AM1/26/11

to

"catcalls" asked about:
...

> 1) BTx commands. How to use them? (Bit Testing Commands)

You could look it up in the 'friendly manuals' (Intel/AMD),

* the BT-group tests one bit at a time
* bit number is given either by a register or an immediate operand,
with the register-variant the bit-offset can span +/- 2Gig bits
which mean +/- 256 MegaByte
while the immediate byte form is masked to 0x1F (0..31)
* the bit-status (before any change) is found in the CarryFlag.

BT op1,op2 ;test only, op1 can be reg or mem, op2 can be reg or imm8
BTC ... ;test and complement (reverse it)
BTR ... ;test and reset (make it 0)
BTS ... ;test and set (make it a 1)

> 2) How to effectively cycle through a bitmap testing for an available
> bit?

mmh?? on my computer all existing bits are available :)

But if you mean to find the first set bit in your structure then
BSF and BSR instructions may help here.

> 3) Once available bit is discovered; How do I multiply the address as
> an offset into RAM?

One byte contains eight bits ...
Multiply ? I'd had a divide by 8 (aka SHR 3) here,
perhaps after you decided on what quantum one bit shall correspond
to, you can figure a factor which already imply the DIV by 8.

I see you use one MB per bit, 1MB is 2^20 bytes, remember a byte
is 8 (2^3) bits, so your factor (bitnumber * 1MB) should be 2^17
(aka SHL 17).

...

> Yes, I just want some ideas here on how to multiply the bit address as
> an effective offset into RAM for 16-bit structures.

even possible, 16-bit structures aren't really a good choice for
a Mem-manager. While code may still run in 16-bit mode we can always
(on 386 and newer) use 32-bit operands and also 32-bit address-modes
but limited to segment-size.

> An example might be;

> mov eax, resA ; The starting address of Reserve A
> shl eax, 4 ; Multiply by 16 the address
> add eax, 16 ; Allocate the 16-bit data structure (Yes, this seems
> wrong somehow, why?O_o)

Yeah, this isn't even close to anything of sense :)
and where is the bit-number (= mem-block number) in the above ?

One MB per bit and a 64 byte array mean a map for 512 MB.

So if I'd had to check if the first block is free or occupied:

BT [bitmapA],0
JC ... ;jmp if set
or more flexible:
mov eax, blockNr ;in MB
BT [bitmapA],eax
JC ... ;jmp if set
or a racefree mutex:
mov eax, blockNr
LOCK BTS [bitmapA],eax ;set it
JC ... ;jmp if it was already set (nothing changed)

__
wolfgang

catcalls

unread,

Jan 26, 2011, 7:55:42 AM1/26/11

to

On Jan 26, 12:38 pm, "Harold Aptroot"
<harold.aptr...@nospicedham.gmail.com> wrote:
> "catcalls" <obrzut.a...@nospicedham.gmail.com> wrote in message

Well, a bit can be set to 1 or 0. I'm using the bitmap (resb 64) to
mark areas of RAM as in use. I can use either 1 or 0 for this,
whatever is easier.

I will re-search BTR and BTF commands, as I have not heard of them as
of yet.

Thank you for your valued input.

Catcalls

catcalls

unread,

Jan 26, 2011, 8:05:11 AM1/26/11

to

Hi Wolfgang,

Thanks for your continued support in this endeavour of mine. Here is
the code I am using right now (Thought it might Interest you?)

CODE
****
addNode:
; mov eax, dat to begin
; Compare Node address to zero
; Jump accordingly
; mov into ebx external address
push edx
push ebx
push eax
mov eax, ebx
mov ebx, 10
call writeInt
pop eax
pop ebx
pop edx

; Start Program code here
cmp DWORD [eax], 0
je .Fin
.nodeExists:
; cmp and jmp here
; this could be changed to ebx for
; an external memory address cmp'ed
; to an internal cmp ebx, [eax] one...
cmp ebx, [eax]
ja .GreaterAddNode

;.LesserAddNode:
add eax, 4

cmp DWORD [eax], 0
je .Fin
mov eax, [eax]
jmp addNode

.GreaterAddNode:
add eax, 8 ; Node + 8 is greater branch

cmp DWORD [eax], 0
je .Fin
mov eax, [eax]
jmp addNode
.Fin:
; The Insert can reserve data for three DWORDS or more
; and also allocate Nodes and values

; Allocate add eax, 4 > ecx? (Where ecx holds the new address?)
; EBX holds data to be stored inside eax
mov [eax], edx ; Allocating data to external address rather than
mov eax, [eax] ; just storing external address into eax
; Above two lines are old node and new node allocations of addresses
mov DWORD [eax+8], 0 ; zero node
mov DWORD [eax+4], 0 ; zero node
mov DWORD [eax+0], ebx ; data node
ret
CODE
****
; Call Rez routine
mov eax, dat
mov ebx, 1
mov edx, 19142
call addNode
mov DWORD eax, dat
mov ebx, 7
mov edx, 19178
call addNode
Now, with this example, I am pretty sure you can see why I need a
bitmap. For example, the bitmap offset into the resB space, will be
used in the second code example (register edx). At the moment I am
using hard coded values of Memory addresses (19142 and 19178 for
example here).

See, in the second code example this is what is happening;

eax stores the start address of the data reserve for the first
allocation.
ebx stores an address from the 512M Byte RAM space. (How I get this I
dunno yet!)
edx is what we are working on in this USENET post. It is the bit
offset into resB. The second data reserve located directly after resA
(resb 64)

So, how to avoid a collision into resA? What to multiply by to get to
the start of resB from bit offset 0 in resA?

That is puzzling me. I am having trouble visualising the data
arrangements in code.

I might have to try really hard with some hand written mathematics if
the answer is not posted here. *smiles*

Well, thanks for the pointers guys. I'm working on it.

Catcalls

catcalls

unread,

Jan 26, 2011, 8:36:31 AM1/26/11

to

Actually, posting the code I spotted a bug. Dangit.

It seems to work as follows;

Allocate to eax 'dat' data address.
Compare contents of eax to 0.
Obviously first time round, jmp to finish.

Now it inserts edx into eax.
So, next time round, it'll compare edx to ebx!

That is incorrect, since ebx should only be compared to ebx values. :)

Working on it!

Out.

catcalls

unread,

Jan 26, 2011, 8:42:11 AM1/26/11

to

; Start Program code here
cmp DWORD [eax], 0
;je .Fin

jne .nodeExists
mov [eax], ebx
add eax, 4
jmp .Fin

This is my solution. Not rather elegant, per say. But, I believe this
works?

So, instead of je .Fin, you've got a jne .nodeExists (handy to have
that label, no?)

Now, move the ebx value first into eax.

This is the tricky part; A lecturer on Binary Trees once told me, you
either start with the left path, or the right path, but you've got to
start somewhere! And once you pick a path, stick with it for the rest
of your life.

So, I've chosen the lesserThan path, with an add eax, 4 to use for
storing the three DWORDS data structure (lines after .Fin label)

Interesting stuff.

The only problem? This code requires TWO comparisons of the first
[eax] value, rather than one. But, I can live with that...

Out.

Harold Aptroot

unread,

Jan 26, 2011, 8:42:26 AM1/26/11

to

"catcalls" <obrzu...@nospicedham.gmail.com> wrote in message

news:0d29f35f-eab0-4b92...@r19g2000prm.googlegroups.com...

I meant BSR/BSF of course, sorry about that

Terje Mathisen

unread,

Jan 26, 2011, 10:23:48 AM1/26/11

to

wolfgang kern wrote:
> "catcalls" asked about:

>> mov eax, resA ; The starting address of Reserve A
>> shl eax, 4 ; Multiply by 16 the address
>> add eax, 16 ; Allocate the 16-bit data structure (Yes, this seems
>> wrong somehow, why?O_o)
>
> Yeah, this isn't even close to anything of sense :)
> and where is the bit-number (= mem-block number) in the above ?
>
> One MB per bit and a 64 byte array mean a map for 512 MB.

With such a tiny bitmap, speed really doesn't matter!

A single 64-byte array is best checked (in order to detect the first 1
bit) using blocks of register size, then BSF to detect the actual bit:

; Returns bit index for first found bit, assumes the array
; cannot be all zero, i.e. it has a guard byte at the end.
; RSI -> bitmap
; Returns bit index in RAX

lea rbx,[rsi+8] ; Remember where we started
next:
mov rax,[rsi]
lea rsi,[rsi+8]
test rax,rax ; ~2 cycles per block
jz next

; We have found the first block with a non-zero bit
sub rsi,rbx
bsf rax,rax
lea rax,[rax+rsi*8]
ret

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

catcalls

unread,

Jan 26, 2011, 1:57:10 PM1/26/11

to

On Jan 26, 3:23 pm, Terje Mathisen <"terje.mathisen at

Hi Terje,

lea rax,[rax+rsi*8]

Having trouble with this line. Exactly *where* in RAM does this leave
you? What address?

Say I have the following data segments declared;

resb 64
resd 100

And I want to use bit 1 in resb 64 to address the start of resd 100 -
do not see how rax+rsi*8 will do that?

Cannot seriously think of a solution here. But, researched BSF on
google : No luck. So, I broke out my Assembly book, and its in there
luckily. Now, I understand exactly what it is doing. BSF returns the
position of the set bit as a number between 0 and 7 into the
destination operand. Which is rax in your code snippet. There fore,
your code could be 0-7+rsi*8. That might actually work. But, I feel
adding rsi will put the result too high in RAM. Maybe rax*9 would be
better? What do you think?

wolfgang kern

unread,

Jan 26, 2011, 7:37:41 PM1/26/11

to

"catcalls" wrote:
...

|Hi Wolfgang,
|
|Thanks for your continued support in this endeavour of mine. Here is
|the code I am using right now (Thought it might Interest you?)

....
OK, it's not my style, but I see a bit better what you try to achieve.

|So, how to avoid a collision into resA? What to multiply by to get to
|the start of resB from bit offset 0 in resA?
|That is puzzling me. I am having trouble visualising the data
|arrangements in code.
|I might have to try really hard with some hand written mathematics if
|the answer is not posted here. *smiles*

I still think you see the story more complex than required.

How I understood:

1. you have a bit array of 64 bytes (512 bits) which only tell
if a block of memory (ie: 1MB) is already allocated or free.
Let's assume a clear bit means 'allocated' to help BSF.
2. you may have decided to not imply a few first MBs at start
to have some room for OS-kernel and global data.

so all you need is a start address for the whole memory-range
and a pointer to your 64 byte array (somewhere global below)

you wouldn't need another array to store the addresses, because
every bit in this 64-byte array correspond to an address anyway:

bit 512 .... 7 6 5 4 3 2 1 0 |range:
^- 1.block 'start to start+1MB-1'
^--- 2.block 'start+1MB to start+2MB-1'
^----- 3.block 'start+2MB to start+3MB-1'
and so on ...

address calculation for a given block-number (== bit number):

Block-address = start_address + block-number /8 * block-size

which then results in:
mov eax,blockNr ;limited to 0x01ff (0..511) yet
shl eax, 17 ;multiply by 1048576/8 = 2^(20-3) aka 2^17
add eax,start_address ;and now eax holds the blocks address.

That's all about it.

and to find a free block:
xor eax,eax ;make it zero just in case
mov esi bitmapA ;pointer to your 64 byte array
xor edx,edx ;make it zero as well
L1:
BSF eax,[esi] ;see below (eax isn't altered if dword [esi]=0 )
JNZ L2
add edx,0x020 ;next 32 bits
add esi,4 ;next dword in array
cmp edx,0x01E0 ;max. = 512-32 yet
jb L1
L2:
add eax,edx ;eax can now be 0..511 only

eax may now hold the bit number of the first set bit, scan started
with bit 0. But it will set the zero-flag if no set bits found within
32-bits, so a conditional loop is required to cover all 512 bits.

Terje's example with LEA is 'a bit more' elegant :)
and me too would use LEA-instructions here, but I try it now
plain and easy for better understanding.

I see you want to have this addresses apart stored in another array,
but the calculation is that short and easy that I would renounce it
and just report and store first block-number and perhaps also number
of blocks as the size info.

__
wolfgang

catcalls

unread,

Jan 26, 2011, 8:18:34 PM1/26/11

to

Interesting, condensed stuff, Wolfgang. Trying to digest it. Makes
sense to '+ Data Structure Size' as part of the calculation for blocks/
bitmapA.

This I understand a little better. And, not so comfortable using LEA
yet. Seems a mysterious instruction to me. Load Effective Address.
What does it mean by 'Effective'?

Well, lots of good code snippets here to look at. Found it interesting
that 0-7*8 leaves a min/max of 0-56 bytes. But, when adding data
structure size, pips it over the 64 byte mark to avoid collisions.

Also, 1E0 is 480. This puzzled me slightly. But, 480 + 32 is 512. The
number of free slots in resB. And we're counting up in 32's. How ever,
its that last line that bothers me, how can add eax, edx equal 512?
When edx has a maximum of 480, and eax will be totalling 0-7? That is
480+max7 = 487? Why 487?

511
-
487
---
024 Bytes (Final space in resB available)

Perhaps this has something to do with the calculation which I missed.
The part where you shl by 17 and add to the start address of resB.

Not quite figured all this out yet. But, I'm confident in your work. I
guess I'll have to copy it out and test the numbers in real time such
as printing out start address of bitmapA and resB and seeing how the
calculations work with various different inputs, such as 0 to 7 in
different block regions.

Kind regards,

Cat

catcalls

unread,

Jan 26, 2011, 8:53:20 PM1/26/11

to

Hi Wolfgang,

Just done the first test of your code. It seems to be acting a little
strange?

Here is my code that I am using;

memBitmap:
xor eax, eax ;make it zero just in case
mov esi, rezA ;pointer to your 64 byte array
xor edx, edx ;make it zero as well
.L1:

BSF eax, [esi] ;see below (eax isn't altered if dword [esi]=0 )

JNZ .L2
add edx, 0x020 ;next 32 bits
add esi, 4 ;next dword in array
cmp edx, 0x01E0 ;max. = 512-32 yet
jb .L1
.L2:
add eax, edx ;eax can now be 0..511 only

; Part Two!

shl eax, 17 ;multiply by 1048576/8 = 2^(20-3) aka 2^17

add eax, rezB ;and now eax holds the blocks address.
ret

And this below is the test code;

mov eax, rezA
BTS DWORD [eax], 1
call memBitmap

mov ebx, 10
call writeInt

mov eax, rezB

mov ebx, 10
call writeInt

Firstly, with 0 it works. resB is 1782 and eax turns out to be 1782.

However, with the above test, when rezA bit 1 is set, eax turns out to
be 2854.

Is this what you expected?

catcalls

unread,

Jan 26, 2011, 8:58:33 PM1/26/11

to

Correction, eax comes out as 132854. Not 2854.

catcalls

unread,

Jan 26, 2011, 9:06:27 PM1/26/11

to

Dear Wolfgang,

After doing some calculations, discovered you were calculating the 1M
Byte Page offsets, and not the resB offsets. No worries, I've since
adjusted your code to 'shl eax, 7' and this reserves 4 DWORDS in resB
which is what I want.

This increases in x128 Bytes for each bit in resA.

But, I'll remember shl eax, 17 for when it comes time to use a bitmap
for addressing inside the 512M Bytes of RAM address allocations.

catcalls

unread,

Jan 27, 2011, 1:59:20 AM1/27/11

to

; Start of Binary Tree coding example
START:
call initBitmap

; Call Rez routine
call memBitmap
mov edx, eax

mov eax, rezB
mov ebx, 1

call addNode

This is the test code, above.

initBitmap:
mov eax, 0xFFFFFFFF
mov esi, rezA
xor edx, edx
.L1:
or [esi], eax
add edx, 0x020
add esi, 4
cmp edx, 0x040
jb .L1
ret

This is the Initialisation routine for reserve A.

memBitmap:
xor eax, eax ;make it zero just in case
mov esi, rezA ;pointer to your 64 byte array
xor edx, edx ;make it zero as well
.L1:
bsf eax, [esi] ;see below (eax isn't altered if dword [esi]=0 )
jnz .L2

add edx, 0x020 ;next 32 bits

add esi, 4 ;next dword in array
cmp edx, 0x01E0 ;max. = 512-32 yet
jb .L1
.L2:
btr [esi], eax
add eax, edx ;eax can now be 0..511 only

; Part Two!
shl eax, 7 ;multiply by 1048576/8 = 2^(20-3) aka 2^17
add eax, rezB ;and now eax holds the blocks address.
ret

Finally, this is the modified code for the bitmap set-bit routine.

How all this fits together is thus; Reserve A gets set to all 1's
during Initialisation. This makes the bitmap set-bit routine find and
return bit 0 from Reserve A. Next, that bit gets reset to zero,
marking it as allocated.

The test code moves the eax value from the bitmap set-bit routine into
edx, ready for the Binary Tree code (listed a few posts above and will
not be repeated here.)

So, what we have now is an address offset, into Reserve B, for an
available 4 DWORD memory segment to be used by Binary Tree code for
storing data and node addresses.

The data to be stored, in any case, is the address of the 1M Byte Page
inside the total available 512M Bytes of RAM (which could be anywhere
in that 512M Byte RAM segment and is totally unrelated to the Reserve
B address offset.)

This means, replicating the above code to discover the available 512M
Byte segment's available 1M Byte Page.

Yes, that sounds confusing, I know! But, it is how I have visualised
this memory management system. The reason for segregating the bitmaps
is because I want several binary Trees, each with their own specific
data. Such as, low to high addresses, size of Page allocations
(smallest to largest), and any other statistical data that I decide at
a later date to represent in either a linear list block or binary
Tree.

The next stages?
1) Releasing the 1M Byte RAM Page thus setting the appropriate bit
back to 1.
2) Traversing the entire binary Tree from low to high (this seems
difficult!) to extract statistical data.

Well, thank you all for your continued help in this Memory Management
Section of Lita's Operating System. Greatly appreciated.

cat

Lasse Reichstein Nielsen

unread,

Jan 27, 2011, 1:10:09 AM1/27/11

to

Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:

> lea rbx,[rsi+8] ; Remember where we started
> next:
> mov rax,[rsi]
> lea rsi,[rsi+8]

Out of curiosity, is there a specific reason for using lea here, instead
of "add rsi, 8"? They have the same length, and I would have expected the
add to parallelize slightly better on moderne CPUs.

> test rax,rax ; ~2 cycles per block
> jz next

/L
--
Lasse Reichstein Holst Nielsen
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

wolfgang kern

unread,

Jan 27, 2011, 5:59:49 AM1/27/11

to

"catcalls" wrote:
...

|Interesting, condensed stuff, Wolfgang. Trying to digest it. Makes
|sense to '+ Data Structure Size' as part of the calculation for blocks/
|bitmapA.

|This I understand a little better. And, not so comfortable using LEA
|yet. Seems a mysterious instruction to me. Load Effective Address.
|What does it mean by 'Effective'?

LEA doesn't access any memory, it just adds all the parts found
in a address-building memory reference (things within brackets).

Therefore it can be used for fast calculations without altering
any flags. ie:
LEA eax,[esi+ecx*8+table_A] ;the latter can be only a immediate
will do nothing else than:
eax = esi + ecx*8 + table_A

but there is no indication for result overflow, because the thing
just truncates the result to fit into eax, so this may look like
a signed ADD.

|Well, lots of good code snippets here to look at. Found it interesting
|that 0-7*8 leaves a min/max of 0-56 bytes. But, when adding data
|structure size, pips it over the 64 byte mark to avoid collisions.

|Also, 1E0 is 480. This puzzled me slightly. But, 480 + 32 is 512. The
|number of free slots in resB. And we're counting up in 32's. How ever,
|its that last line that bothers me, how can add eax, edx equal 512?

Never, 511 is the maximal possible.
But because bit0 is implied this mean apart 512 bits.

|When edx has a maximum of 480, and eax will be totalling 0-7? That is
|480+max7 = 487? Why 487?

dwords are 32 bits wide and the bits are called bit0..bit31 :)
so eax after BT eax,[esi] can be either 0 to 31 or remain unchanged
if the dword in [esi] is all zero.

|511-487 = 024 Bytes (Final space in resB available)

Dont we talk about bits yet ?
the count goes from 0 (means first block) up to 511
511-480 = 31

|Perhaps this has something to do with the calculation which I missed.
|The part where you shl by 17 and add to the start address of resB.

|Not quite figured all this out yet. But, I'm confident in your work. I
|guess I'll have to copy it out and test the numbers in real time such
|as printing out start address of bitmapA and resB and seeing how the
|calculations work with various different inputs, such as 0 to 7 in
|different block regions.

I dont know why you need that many arrays, I'd keep it simple.
__
wolfgang

wolfgang kern

unread,

Jan 27, 2011, 6:15:14 AM1/27/11

to

"catcalls" wrote:

> Just done the first test of your code. It seems to be acting a little
> strange?

I explained just the bitmap where every bit means one MB.
For all your other block-number related structs you'll need
of course other factors depending on the element size.

> Here is my code that I am using;
>
> memBitmap:
> xor eax, eax ;make it zero just in case
> mov esi, rezA ;pointer to your 64 byte array
> xor edx, edx ;make it zero as well
> .L1:
> BSF eax, [esi] ;see below (eax isn't altered if dword [esi]=0 )
> JNZ .L2
> add edx, 0x020 ;next 32 bits
> add esi, 4 ;next dword in array
> cmp edx, 0x01E0 ;max. = 512-32 yet
> jb .L1
> .L2:
> add eax, edx ;eax can now be 0..511 only
>
> ; Part Two!
> shl eax, 17 ;multiply by 1048576/8 = 2^(20-3) aka 2^17
> add eax, rezB ;and now eax holds the blocks address.
> ret
>
> And this below is the test code;
>
> mov eax, rezA
> BTS DWORD [eax], 1

this tests and set bit1, which isn't the first bit
BTS [eax],0 would work on the lowest bit.

> call memBitmap
> mov ebx, 10
> call writeInt
> mov eax, rezB
> mov ebx, 10
> call writeInt
>
> Firstly, with 0 it works. resB is 1782 and eax turns out to be 1782.
>
> However, with the above test, when rezA bit 1 is set, eax turns out to
> be 2854.

> Is this what you expected?

I dont know where your structs reside.
__
wolfgang

Terje Mathisen

unread,

Jan 27, 2011, 8:12:31 AM1/27/11

to

No address, it is simply doing

a += s*8;

I.e. it is taking the base bit position and adding in 8 times the byte
offset, generating the desired answer.

>
> Say I have the following data segments declared;
>
> resb 64
> resd 100
>
> And I want to use bit 1 in resb 64 to address the start of resd 100 -
> do not see how rax+rsi*8 will do that?

It does not, you keep that information external to the bitmap.

Terje Mathisen

unread,

Jan 27, 2011, 8:14:02 AM1/27/11

to

Lasse Reichstein Nielsen wrote:
> Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> writes:
>
>> lea rbx,[rsi+8] ; Remember where we started
>> next:
>> mov rax,[rsi]
>> lea rsi,[rsi+8]
>
> Out of curiosity, is there a specific reason for using lea here, instead
> of "add rsi, 8"? They have the same length, and I would have expected the
> add to parallelize slightly better on moderne CPUs.

No, no specific reason except that I've gotten into the habit in order
to allow me to move those instructions around as I like (since they
don't modify any flags), in order to improve scheduling.

In this particular example it doesn't help...

Terje

>
>> test rax,rax ; ~2 cycles per block
>> jz next
>
> /L

--

BartC

unread,

Jan 31, 2011, 8:30:32 AM1/31/11

to

"catcalls" <obrzu...@nospicedham.gmail.com> wrote in message

news:5ee7ffa7-0749-4b47...@u9g2000prm.googlegroups.com...

> 1) BTx commands. How to use them? (Bit Testing Commands)
> 2) How to effectively cycle through a bitmap testing for an available
> bit?
> 3) Once available bit is discovered; How do I multiply the address as
> an offset into RAM?

> Firstly, declare the bitmap I guess?

>
> resb 64 (The bitmap for 512M Bytes of RAM) ; call this resA
>
> Each bit represents 1M Byte RAM Page.

If there are 512Mbytes available, why not just use a 512-byte map? It will
cost an extra 448 bytes (less perhaps savings due to simpler code).

Then bitops aren't needed at all. And you can put extra status in each byte
rather than 1 or 0.

(Unless the bitmap has to be in that form because it's directly used by the
hardware.)

--
Bartc

Terje Mathisen

unread,

Jan 31, 2011, 3:01:34 PM1/31/11

to

BartC wrote:
> If there are 512Mbytes available, why not just use a 512-byte map? It
> will cost an extra 448 bytes (less perhaps savings due to simpler code).

The real cost is the increased time (8x) to scan for a set bit!

>
> Then bitops aren't needed at all. And you can put extra status in each
> byte rather than 1 or 0.

That might be a valid reason.

>
> (Unless the bitmap has to be in that form because it's directly used by
> the hardware.)

:-)

Terje

catcalls

unread,

Jan 31, 2011, 11:20:17 PM1/31/11

to

Hi,

Had considered using a byte-map. But, started as I intend to go on,
and even though we are using a 512M Byte RAM area right now, that will
increase to over 4G Bytes when converting to 64-bit Architecture. So,
we've got to have the most efficient means of Memory Management.

x8 is a big hit in Performance, when you think that the fastest CPU
cycles are 3GHz right now on Multi-Core specs. And I really don't want
to have to multi-Thread the Memory Management system at a later date.

Terje Mathisen

unread,

Feb 1, 2011, 2:33:41 AM2/1/11

to

The critical operation for a MM system isn't scan_for_a_free_block(),
but much more likely to be scan_for_N_free_blocks(), in which case a
bitmap can help a lot, particularly if you make the maximum N value 32
or 64, whatever your register size is:

If you find a register block full of 0 or 1 bits, you're done.

If you want to find such blocks overlapped between two register-sized
blocks you can scan for the first free bit while keeping two blocks in
registers:

Concatenate them and use SHRD to align, then compare against the mask
with the desired number of bits.

Terje

PS. In an allocator like this it is a very good idea to keep around a
pointer to just after the last place you found a free block and then
start there: For a system that does a lot of allocations, with no
deallocations between them, this returns the same results much faster.

catcalls

unread,

Feb 3, 2011, 12:24:49 AM2/3/11

to

; The following code exports all values from the binary
; tree to screen using a loop algorithm.
;
traverseTree:
push 0xAAAAAAAA
.LA:

push eax
mov DWORD eax, [eax]

mov ebx, 10
call writeInt

pop eax

mov ebx, eax
add ebx, 8
cmp DWORD [ebx], 0
je .noNode
push ebx
.noNode:
add eax, 4

cmp DWORD [eax], 0

je .LBW
mov eax, [eax]
jmp .LA
.LBW:
pop eax
cmp eax, 0xAAAAAAAA

je .Fin
mov eax, [eax]

jmp .LA
.Fin:
; extract left node address
; and cmp eax, 0 then loop if needed
ret

This took 3 hours to be visualised / dreamt up ~ then two days to
code. With roughly 2 hour coding sessions each day. My first attempt
was dismal. And a complete failure on the first day. So, the second
day I deleted all that code and started again. This time, attempting
to manually access the binary Tree nodes via the traverseTree routine,
and see if I could find any patterns to extract an algorithm.
Actually, I finally got down to running through the right hand side of
the tree. This became the first step towards a working traverseTree
algorithm.

Sadly, the output is not in any sorted order. So, this algorithm might
find its way to the dump!

Thanks for reading,

Catcalls