Problem with INVD instruction

Slavisa Zigic

unread,

Apr 6, 2006, 2:51:43 PM4/6/06

to

Slavisa Zigic

unread,

Apr 6, 2006, 3:32:03 PM4/6/06

to

Hello,

I am using DPMI and GNU C compiler on Pentium 4 motherboard. Program is
executing at Ring 0 and if I put INVD (inline assembly) in C code, compiled
program is rebooting computer. When I try to execute program at Ring 3, it
is reporting General Protection Fault, which is expected...
Why I am using INVD instruction?

I have a PCI card which is transfering data to main memory of PC using DMA
bus mastering. Once transfer is done, if I go and check transferred data, I
can read 'old' data from data cache (L1&L2). In other words, CPU and system
is not aware of the main memory being changed, so it is not updating cache
memory. Idea is to invalidate cache memory on CPU (without writing it back)
and then load 'fresh' memory. When I disable L1&L2 cache everything works
fine, but I have very slow Pentium 4. Instruction INVD is supposed to do exactly
what I want, but it is rebooting computer instead.

Any suggestion regarding INVD instruction?

Thank you,

Slavisa Zigic

spam...@crayne.org

unread,

Apr 6, 2006, 5:16:16 PM4/6/06

to

You're probably dying because INVD invalidates *all* of the cache
without writing it back, not just your data, so *anything* that's been
written to the cache by the OS, your code, etc., but not yet committed
to memory will be lost. Which is likely to be bad.

Now there's something wrong with your description of the problem. A
bus mastering device writing data into the PC's memory should not run
into any cache trouble (assuming a non-broken implementation), since
the bus snooping will cause the cache lines to invalidate or be written
back as appropriate.

What I suspect is happening is that you've got a PCI card with shared
memory on it, which is being updated by some hardware on the card, and
when you go to read that, you've got an old cached value.
Alternatively you might be DMA'ing to the shared memory on another
device, resulting in the same problem.

In either case the correct solution is to set the page table entries or
MTRRs to mark the area of memory non-cacheable.

Robert Redelmeier

unread,

Apr 6, 2006, 7:15:39 PM4/6/06

to

Slavisa Zigic <spam...@crayne.org> wrote in part:

Sure. Don't use it :) It takes a long time to run, or you
lose the benefits of a Writeback cache and force write-thru.
IIRC, [WB]INVD were instructions added 386/486 before MESI
came along for cache coherency.

But you have a _serious_ problem. Bus snooping is supposed
to detect the BM PCI activity and invalidate those cachelines
That's the I in MESI. The reboot sounds like a triple fault.

I'm not sure what you can do. How are you detecting that the
BM xfr is complete? You do need some sort of signal. Polling
on the data may block the BM. Maybe the SFENCE instruction
will work, although it isn't intended for this purpose.

-- Robert

Rod Pemberton

unread,

Apr 7, 2006, 3:43:58 AM4/7/06

to

"Slavisa Zigic" <spam...@crayne.org> wrote in message
news:11443519...@irys.nyx.net...

> Hello,
>
> I am using DPMI and GNU C compiler on Pentium 4 motherboard.

DJGPP? yes/no?

> Program is
> executing at Ring 0

DJGPP Ring 0 => CWSDPR0.exe

> and if I put INVD (inline assembly) in C code, compiled
> program is rebooting computer. When I try to execute program at Ring 3, it
> is reporting General Protection Fault, which is expected...
> Why I am using INVD instruction?
>
> I have a PCI card which is transfering data to main memory of PC using DMA
> bus mastering. Once transfer is done, if I go and check transferred data,
I
> can read 'old' data from data cache (L1&L2).

How did your 'old' data get into the data cache (L1&L2)? You should be able
to do the same thing _unless_ it is out of your control, i.e., it's being
done by the C compiler.

> In other words, CPU and system
> is not aware of the main memory being changed, so it is not updating cache
> memory.

Without further information, I suspect the C compiler loaded it onto the
stack, not the data cache (L1&L2). I suspect this since loading of the
cache seems out to be out of your control. I also suspect that you're
attempting to access the data via a pointer which doesn't point to the data
copied from main memory, but points to the data on stack. Could this be
correct, perhaps?

> Idea is to invalidate cache memory on CPU (without writing it back)
> and then load 'fresh' memory. When I disable L1&L2 cache everything works
> fine, but I have very slow Pentium 4. Instruction INVD is supposed to do
exactly
> what I want, but it is rebooting computer instead.

Are you sure it's in the caches and not on the stack?

Rod Pemberton

Slavisa Zigic

unread,

Apr 9, 2006, 6:14:06 PM4/9/06

to

Rod Pemberton <spam...@crayne.org> wrote:

: "Slavisa Zigic" <spam...@crayne.org> wrote in message

: news:11443519...@irys.nyx.net...
: > Hello,
: >
: > I am using DPMI and GNU C compiler on Pentium 4 motherboard.

: DJGPP? yes/no?

YES

: > Program is
: > executing at Ring 0

: DJGPP Ring 0 => CWSDPR0.exe

YES

: > and if I put INVD (inline assembly) in C code, compiled

: > program is rebooting computer. When I try to execute program at Ring 3, it
: > is reporting General Protection Fault, which is expected...
: > Why I am using INVD instruction?
: >
: > I have a PCI card which is transfering data to main memory of PC using DMA
: > bus mastering. Once transfer is done, if I go and check transferred data,
: I
: > can read 'old' data from data cache (L1&L2).

: How did your 'old' data get into the data cache (L1&L2)? You should be able
: to do the same thing _unless_ it is out of your control, i.e., it's being
: done by the C compiler.

In order to read physical memory address I am using farpeekl. Of course,
before farpeekl, I call dpmi functions to set base address and segment
limit. My old data is in the cache because before actual DMA I fill memory
with the pattern. It is interesting that after few runs of the same program,
I have good results, that is, memory is matching data sent from the PCI
card, using DMA. I still do not have explanation for this, but I know that
everything works fine the moment I disable cache from BIOS.

: > In other words, CPU and system

: > is not aware of the main memory being changed, so it is not updating cache
: > memory.

: Without further information, I suspect the C compiler loaded it onto the
: stack, not the data cache (L1&L2). I suspect this since loading of the
: cache seems out to be out of your control. I also suspect that you're
: attempting to access the data via a pointer which doesn't point to the data
: copied from main memory, but points to the data on stack. Could this be
: correct, perhaps?

: > Idea is to invalidate cache memory on CPU (without writing it back)
: > and then load 'fresh' memory. When I disable L1&L2 cache everything works
: > fine, but I have very slow Pentium 4. Instruction INVD is supposed to do
: exactly
: > what I want, but it is rebooting computer instead.

: Are you sure it's in the caches and not on the stack?

: Rod Pemberton

I do not see any reason for data to be on the stack. Test is very simple. I
just send memory destination address and number of bytes to be transferred
to PCI controller on the card and than initiate DMA write to PC memory. I am
waiting in the loop of the 'main' for DMA to be finished without setting
interrupt to inform me about DMA being finished or polling PCI controller.
So, the test could not be simpler, because I am not initiating any activity
on PCI bus while DMA is in progres.

Thank you,

Slavisa

Rod Pemberton

unread,

Apr 10, 2006, 12:45:02 PM4/10/06

to

"Slavisa Zigic" <spam...@crayne.org> wrote in message

news:11446208...@irys.nyx.net...

> Rod Pemberton <spam...@crayne.org> wrote:
>
> : "Slavisa Zigic" <spam...@crayne.org> wrote in message
> : news:11443519...@irys.nyx.net...

> In order to read physical memory address I am using farpeekl. Of course,
> before farpeekl, I call dpmi functions to set base address and segment
> limit. My old data is in the cache because before actual DMA I fill memory
> with the pattern. It is interesting that after few runs of the same
program,
> I have good results, that is, memory is matching data sent from the PCI
> card, using DMA. I still do not have explanation for this, but I know that
> everything works fine the moment I disable cache from BIOS.
>

Disabling cache is forcing your program to read from memory.

>
> I do not see any reason for data to be on the stack. Test is very simple.
I
> just send memory destination address and number of bytes to be transferred
> to PCI controller on the card and than initiate DMA write to PC memory. I
am
> waiting in the loop of the 'main' for DMA to be finished without setting
> interrupt to inform me about DMA being finished or polling PCI controller.
> So, the test could not be simpler, because I am not initiating any
activity
> on PCI bus while DMA is in progres.
>

Since disabling the cache forces memory to be read, one of these could be
occuring:

A) The processor filled the cache with the original data and doesn't know to
refresh it.
B) There may be some way for DMA to tell the processor to refresh, but DMA
ins't doing so.
C) The C compiler moved the data to the stack, which is cached by the
processor. So, your C program is reading old data from the stack which in
the cache.

For A), you'd need a 'invd', or 'wbinvd' etc...
For B), I don't know anything about this.
For C), place the C keyword 'volatile' on your DMA buffer and any pointers,
structs or unions which access the DMA buffer.

I think that C) is occuring. I don't know anything about B). I don't
think that A) is occuring.

Rod Pemberton

Slavisa Zigic

unread,

Apr 10, 2006, 6:29:44 PM4/10/06

to

I would like to place 'volatile' on my DMA buffer, but it is probably
impossible. Here is why:

mi.address = 0x08000000;//physical addr of DMA buf
mi.size = 0x08000000;// 128MB
__dpmi_physical_address_mapping(&mi);
selector = __dpmi_allocate_ldt_descriptors(1);
__dpmi_set_segment_base_address(selector, mi.address);
__dpmi_set_segment_limit(selector, mi.size - 1);

Now, I can use _farpokel, _farpeekl to do memory access.
No place for 'volatile' isn't it?

Thanks,

Slavisa Zigic

Rod Pemberton <spam...@crayne.org> wrote:

: "Slavisa Zigic" <spam...@crayne.org> wrote in message

: news:11446208...@irys.nyx.net...

Rod Pemberton

unread,

Apr 10, 2006, 9:28:19 PM4/10/06

to

"Slavisa Zigic" <spam...@crayne.org> wrote in message

news:11447081...@irys.nyx.net...

>
> I would like to place 'volatile' on my DMA buffer, but it is probably
> impossible. Here is why:
>
>
> mi.address = 0x08000000;//physical addr of DMA buf
> mi.size = 0x08000000;// 128MB
> __dpmi_physical_address_mapping(&mi);
> selector = __dpmi_allocate_ldt_descriptors(1);
> __dpmi_set_segment_base_address(selector, mi.address);
> __dpmi_set_segment_limit(selector, mi.size - 1);
>
> Now, I can use _farpokel, _farpeekl to do memory access.
> No place for 'volatile' isn't it?
>

With DJGPP, one only needs two selectors. You have a choice of using
_dos_ds or _my_ds(). By default, _dos_ds allows access to all data below
1Mb+64k and _my_ds() allows access to all data in the program space. The
advantage of _dos_ds is that it has a base address of zero (0) e.g.,
physical addressing, and can be boosted to a 4Gb limit. It's disadvantage
is that special functions (e.g., _farpoke/peek) instead of standard C
pointers must be used. The advantage of _my_ds() is that it can be boosted
to a 4Gb limit and standard C pointers can be used. It's disadvantage is
that an offset must be subtracted for physical addresses. Using fs has the
same issues as _dos_ds. Allocating a new local selector provides no
advantages over using _dos_ds or _my_ds(). In all cases, if a paging DPMI
server is used, the memory must be allocated or mapped by the DPMI server
first.

This is the method I usually use:
unsigned long CS_base;
__dpmi_get_segment_base_address(_my_ds(),&CS_base);
__djgpp_nearptr_enable();

For physical-to-virtual addressing: (e.g., data outside your program space:
screen 0xB8000)
pointer-=CS_base;
For virtual-to-physical addressing: (e.g., data in your program space:
lidt/lgdt address for your OS in C)
pointer+=CS_base;

i.e.,
unsigned char *screen=0xB8000;
screen-=CS_base;
*(volatile char *) (screen+1)+=1;

If you're accessing your DMA buffer by a pointer,
volatile unsigned char *DMA=0x08000000UL;
DMA-=CS_base;
ptr_track0diskstruct=DMA[0];
ptr_track1diskstruct=DMA[512];

I am interested in how or why your DMA buffer is at 128Mb... I would think
that you'd want it below 1Mb.

Rod Pemberton

Slavisa Zigic

unread,

Apr 11, 2006, 5:05:13 PM4/11/06

to

Rod Pemberton <spam...@crayne.org> wrote:

: "Slavisa Zigic" <spam...@crayne.org> wrote in message

Unfortunately, your method does not work. Compiler is reporting warning
on volatile unsigned char *DMA=0x080000000UL; Then, if I want to print on
the screen value at physical address 0x080000000, I have SIGSEGV with GPF.
To be more precise printf("\n %X", DMA[0]) causes GPF. It is interesting
that __gjgpp_nearptr_enable() is returning no error.

My CPU board has 512MB of RAM. I am using PCI backplane with 8 PCI cards.
Each card is DMAing data to the RAM, first card is sending data to physical
address 0x08000000; second card is sending data to address 0x09000000; third
card to address 0x0a000000; etc. So, each card is sending 16MBytes of data
to the RAM.

Now, you know why I do not want to be below 1MB.

Thank you.

Slavisa Zigic

: I am interested in how or why your DMA buffer is at 128Mb... I would think

Rod Pemberton

unread,

Apr 11, 2006, 9:24:23 PM4/11/06

to

"Slavisa Zigic" <spam...@crayne.org> wrote in message

news:11447895...@irys.nyx.net...

> Rod Pemberton <spam...@crayne.org> wrote:
>
> : "Slavisa Zigic" <spam...@crayne.org> wrote in message
> : news:11447081...@irys.nyx.net...
> :>
> :> I would like to place 'volatile' on my DMA buffer, but it is probably
> :> impossible. Here is why:
> :>
> :>
> :> mi.address = 0x08000000;//physical addr of DMA buf
> :> mi.size = 0x08000000;// 128MB
> :> __dpmi_physical_address_mapping(&mi);
> :> selector = __dpmi_allocate_ldt_descriptors(1);
> :> __dpmi_set_segment_base_address(selector, mi.address);
> :> __dpmi_set_segment_limit(selector, mi.size - 1);
> :>

CWSDPMI r5 can map 256Mb of memory total. But, it can't access any physical
memory above 256Mb either. This is hard coded.

A) Switch back to your method, since it's working somewhat for you, and

B) Try r6 of CWSDPMI. r6 of CWSDPMI which uses larger 4Mb pages:
http://clio.rice.edu/djgpp/csdpmi6t.zip
(info on CWSDPMI r6)
http://www.usenet.com/newsgroups/comp.os.msdos.djgpp/msg00331.html

C) read these
http://sunsite.utk.edu/ftp/usr-218-2/djgpp/djgpp/v2faq/faq3_10.html
"Make sure you use the latest release r5 of CWSDPMI. Versions
before r4 only supported up to 128MB of main memory, and had
bugs with more than 64MB. CWSDPMI r4 supports up to 256MB
of memory, but has bugs when the total amount of memory approaches
256MB, so if you use r4, make sure the total amount of memory is less
than 256MB."

D) What happens if you try PMODEDJ (flat 4Gb addressing) instead of CWSDPR0
or CWSDPMI? Just use stubedit.exe and change the DPMI host to PMODETSR.EXE.
Does the GPF go away? If so you're access memory outside the 128Mb's or
you're either missing a DPMI function call or some other unknown...

E) Delorie and Sandmann read comp.os.msdos.djgpp, you could post there.

Good luck...

Rod Pemberton

Charles Marslett

unread,

Apr 16, 2006, 10:56:18 PM4/16/06

to

On Tue, 11 Apr 2006 21:24:23 -0400, "Rod Pemberton"
<spam...@crayne.org> wrote:

I would also recommend you execute a WBINVD before the DMA starts and
then another WBINVD after it completes (instead of the INVD). That is
really going to slow things down but it should be (marginally) faster
than uncached operation and should work the same way. The only issue
you might have then is if some code somewhere did a write to the
buffer after the first WBINVD.

You could also set up the buffer regions as uncacheable in the MTRRs
(this has the added advantage that the full cache will be available
for your code and "real" memory accesses. The disadvantage is that
MTRRs are really a pain to use (not many of them, have absolutely
atrocious granularity problems, and a few other hassles I cannot
remember right now)....

--Charles