Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Forcing reading and writing uncached

156 views
Skip to first unread message

Lars Erdmann

unread,
May 25, 2012, 1:15:14 AM5/25/12
to
Hallo,

I need to read/write memory mapped registers.
Suppose the BIOS / OS has not properly marked that memory mapped range as
unc-cachable (UC).

Is there any instruction in the Intel instruction set so that on read I can
bypass the cache and likewise on writing so as to ensure I don't read some
non up-to-date register info and likewise on write, the data will really end
up in the device's register and not only in the cache ?

For obvious reasons WBINVD is not a good choice.
And I don't really understand the workings of
LFENCE,SFENCE,MFENCE,PREFETCHNTA because I don't understand what happens if
I read/write to memory mapped registers instead of RAM.


Lars

Terje Mathisen

unread,
May 25, 2012, 3:22:36 AM5/25/12
to
1) Don't go there! If you have memory-mapped registers that hasn't been
marked so, fix this issue instead of looking for a workaround!

2) If you still think you need to do so, WBINVD might be the only
possible workaround, since you absolutely _must_ get rid of any stale
values in cache before you do any new read operations from device
registers, otherwise your device will only ever see the first load, and
that only in the form of a 64-byte cache line burst read!

Many/most devices with meory-mapped registers will fail spectacularly if
all read/write operations do so in bursts of 32 or 64-bit transfers!

I.e. Dont do this!

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Lars Erdmann

unread,
May 25, 2012, 2:06:41 PM5/25/12
to
Don't get me wrong,

of course having memory mapped registers mapped with the wrong caching type
is a bad thing and yes,
it needs to be fixed.

I only wanted to sort out a problem which I have:

I am working on a USB HC driver and some user reports that it does not work.
Note: the registers are all memory mapped.
Here is the strange part: there exists a status register that is to report
one or more of several interrupt causes.
It will have bits set reporting the interrupt cause(s).
If the corresponding enable bit(s) in the interrupt enable register is/are
set, an interrupt will fire.
I get the interrupt, however, when I read the status register, none of the
bits is set. This can have one of two causes:
1) access to the memory mapped registers is wrong, potentially because the
caching type is not set correctly for the relevant memory address
2) some SMI USB legacy emulation code is still active in the background
(which it should not be) and keeps clearing the
status register so that I don't get to see the content I expect to see.

I am trying to find a short term workaround for 1) in order to rule out that
this is the problem.

Looking at the Intel spec, the "MOVNTI" instruction (potentially with a
following SFENCE) seems to solve my problem at least for writing memory
mapped registers:
it completely bypasses the cache. The only pitfall is that it requires >=
SSE2 support.
I don't have any good solution for reading (and bypassing the cache) a
memory mapped register..
Maybe CLFLUSH will do (but it is only supported by a few CPUs) before the
MOV.

Lars


"Terje Mathisen" <"terje.mathisen at tmsw.no"@giganews.com> schrieb im
Newsbeitrag news:tio399-...@ntp6.tmsw.no...

Robert Redelmeier

unread,
May 25, 2012, 6:51:25 PM5/25/12
to
Lars Erdmann <lars.e...@nospicedham.arcor.de> wrote in part:
> I need to read/write memory mapped registers.

Are you sure they're in mem space and not IO space? On x86/PC,
memory mapping is depracated in order to cache more aggressively.
MM is really only used for vidram, block drivers like SATA/EIDE
and sometimes ether. Not for USB 2.0

> Suppose the BIOS / OS has not properly marked that memory
> mapped range as unc-cachable (UC).

A seriously fubar'd driver or conflict with other
hardware/software in the system.


-- Robert

Tim Roberts

unread,
May 26, 2012, 12:27:30 AM5/26/12
to
Robert Redelmeier <red...@ev1.net.invalid> wrote:
>
>Lars Erdmann <lars.e...@nospicedham.arcor.de> wrote in part:
>> I need to read/write memory mapped registers.
>
>Are you sure they're in mem space and not IO space? On x86/PC,
>memory mapping is depracated in order to cache more aggressively.
>MM is really only used for vidram, block drivers like SATA/EIDE
>and sometimes ether. Not for USB 2.0

I can't tell if you are kidding about this or not, but if you aren't, I
don't understand where you got this. Memory mapping of registers has never
been deprecated by any individual in this universe. Quite the reverse --
I/O space should be just about extinct by now, because the I/O instructions
on x86 CPUs are so horribly slow that they just don't make sense, unless
you are absolutely required to for backwards compatibility.

The registers in EHCI (USB 2.0) are all memory mapped.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Lars Erdmann

unread,
May 26, 2012, 3:04:03 AM5/26/12
to
Yes, EHCI and OHCI use memory mapped registers, UHCI uses I/O mapped
registers.

The reason is not necessarily because I/O ports are slow (that largely
depends on the device implementing them/their access decoding).
They also have their advantages: there is none of those caching problems
(speculative reads etc.), writes are immediately executed across PCI bridges
(where memory mapped registers have their issues as writes end up in posting
buffers, another kind of caching issue to be taken into account).
The problem is, there exist only 65536 I/O ports that need to be shared all
across the system (every device, plus legacy I/O ports).

Lars

"Tim Roberts" <ti...@nospicedham.probo.com> schrieb im Newsbeitrag
news:gem0s7pvvr0smupqk...@4ax.com...

Philip Lantz

unread,
May 26, 2012, 5:00:59 AM5/26/12
to
Lars Erdmann wrote:
[top posting fixed]
> Tim Roberts wrote:
> > Robert Redelmeier <red...@ev1.net.invalid> wrote:
> > > Are you sure they're in mem space and not IO space? On x86/PC,
> > > memory mapping is deprecated in order to cache more aggressively.
> > > MM is really only used for vidram, block drivers like SATA/EIDE
> > > and sometimes ether. Not for USB 2.0
> >
> > I can't tell if you are kidding about this or not, but if you
> > aren't, I don't understand where you got this. Memory mapping of
> > registers has never been deprecated by any individual in this
> > universe. Quite the reverse -- I/O space should be just about
> > extinct by now, because the I/O instructions on x86 CPUs are
> > so horribly slow that they just don't make sense, unless
> > you are absolutely required to for backwards compatibility.

> The reason is not necessarily because I/O ports are slow (that largely
> depends on the device implementing them/their access decoding).

Tim didn't say I/O ports are slow--he said the I/O instructions on x86
CPUs are slow.

Philip Lantz

unread,
May 26, 2012, 5:00:59 AM5/26/12
to
Lars Erdmann wrote:
> Don't get me wrong, of course having memory mapped registers mapped
> with the wrong caching type is a bad thing and yes, it needs to be
> fixed.
>
> I only wanted to sort out a problem which I have:
>
> I am working on a USB HC driver and some user reports that it does not work.
> Note: the registers are all memory mapped.
> Here is the strange part: there exists a status register that is to report
> one or more of several interrupt causes.
> It will have bits set reporting the interrupt cause(s).
> If the corresponding enable bit(s) in the interrupt enable register is/are
> set, an interrupt will fire.
> I get the interrupt, however, when I read the status register, none of the
> bits is set. This can have one of two causes:
> 1) access to the memory mapped registers is wrong, potentially because the
> caching type is not set correctly for the relevant memory address
> 2) some SMI USB legacy emulation code is still active in the background
> (which it should not be) and keeps clearing the
> status register so that I don't get to see the content I expect to see.
>
> I am trying to find a short term workaround for 1) in order to rule out that
> this is the problem.

I think you'd be better off looking at the MMRRs and/or page tables to
confirm that they're correct, rather than trying such a hack and hoping
it behaves the way you think it will. In other words, if your goal is to
gain a better understanding of the behavior of the system leading to the
failure you are investigating, I don't know that this hack is likely to
give you useful information.

If you really want to test this by experimentation rather than
inspection, you could try doing a test write to a register in the device
that is defined to produce a different value on read than the one
written. The interrupt status register is probably W1C, so it would have
this behavior. If in your scenario it always reads as 0, then there's no
harm in writing 1's to it, and if it still reads back as 0, then the
readback value couldn't have come from the cache. If it reads back as-
written instead of the correct value, then it could be a caching issue,
but you'll still have to look at the MMRRs and page tables to be sure.

Do you really think that those are the only two possible causes of the
symptom you're seeing? I suspect I could think of quite a few more. The
first one that comes to mind is that some other place in the interrupt
handler is reading that register before the code you're looking at.

Another possibility is that the interrupt configuration of the device is
set to auto-clear. (I haven't looked at this device in a while, so I
don't remember whether that's a possibility in this case.)

If the driver is written in C, a missing "volatile" keyword could cause
the compiler to not actually read the hardware register.

Robert Redelmeier

unread,
May 26, 2012, 1:22:58 PM5/26/12
to
Lars Erdmann <lars.e...@nospicedham.arcor.de> wrote in part:
> Yes, EHCI and OHCI use memory mapped registers, UHCI uses
> I/O mapped registers.

Hmm ... wonder why the switch, unless EHCI & OHCI are
Mac-origin (IIRC they used USB first)


> The reason is not necessarily because I/O ports are slow
> (that largely depends on the device implementing them/their

True, some legacy ports are dog-slow (1 MHz), at least in
part because they have to be bridged twice. IO instructions
are of course slow since CPU internal frequencies are _much_
(~10x) faster than off-chip frequencies. But you're not
going to beat that with mem.map -- the r/w will just stall
for protection checks and hardware response.

> access decoding). They also have their advantages: there
> is none of those caching problems (speculative reads etc.),
> writes are immediately executed across PCI bridges (where
> memory mapped registers have their issues as writes end
> up in posting buffers, another kind of caching issue to
> be taken into account). The problem is, there exist only
> 65536 I/O ports that need to be shared all across the system
> (every device, plus legacy I/O ports).

I don't think so ... it has been awhile, but I distinctly
remember setting the high bit of EDX to access a PCI device.
So I infer all 32 bits are available for IO addressing.


-- Robert

Lars Erdmann

unread,
May 26, 2012, 2:13:24 PM5/26/12
to
No. For "IN" and "OUT" (and also "INS" and "OUTS") the Intel spec only
allows either DX or an 8-bit immediate value to specify the port number.
You must have been doing magic.

Lars

Lars Erdmann

unread,
May 26, 2012, 2:42:04 PM5/26/12
to
Hi,

yes the status register is W1C for some bits. I'll check.

For MTRRs and page table attributes:
The OS in question (OS/2) does not support PAT and I don't feel like
checking the MTRRs on the reporter's system as he is an ordinary end user
who cannot be bothered.

For the interrupt handler:
I am sure reading the status register is the first thing done and nothing
else writes to it before I read it (unless some SMI code executes which is
not intended of course).
There is no "auto-clear" on anything for the USB HW.
Even though the code is written in "C" I read/write memory mapped registers
with an assembler routine that does 32-bit reads/writes (it's because of
OS/2 16-bittedness when it comes to device drivers. A 16-bit C-compiler is
therefore in use. Don't ask ...). Therefore "volatile" is not an issue here.
In general the code has already been around for almost 10 years and has
worked on very many systems.
Only this one system I am dealing with is giving me a headache.


One other thing comes to mind: UHCI / OHCI / EHCI use some structures in
ordinary system memory.
These structures are manipulated by the device driver but also by the
busmaster DMA engine of the HC.
I wonder if busmaster DMA triggers cache snooping as these memory structures
undergo the usual caching.
But that would be a different problem.


Lars



"Philip Lantz" <p...@nospicedham.canterey.us> schrieb im Newsbeitrag
news:MPG.2a2a3b884...@news.eternal-september.org...

Philip Lantz

unread,
May 27, 2012, 4:41:38 AM5/27/12
to
Lars Erdmann wrote:
> In general the code has already been around for almost 10 years and
> has worked on very many systems.
> Only this one system I am dealing with is giving me a headache.

I guess I jumped to some conclusions that were unwarranted: that this
was code under development, and that you had access to the system
showing the problem. Sorry about that.

wolfgang kern

unread,
May 27, 2012, 8:48:10 AM5/27/12
to

Lars Erdmann answered Robert:

> No. For "IN" and "OUT" (and also "INS" and "OUTS") the Intel spec only
> allows either DX or an 8-bit immediate value to specify the port number.
> You must have been doing magic.

:)
Yes, seems Robert meant the PCI-configuration access rules...
in opposition to the 16-bit PCI I/O-ports (0cf8,0cfc) which are 32 bit
wide from the data-bus aspect, most PCI-registers are memory-mapped-I/O
anyway, so you must use EAX,DX registers with a set MSBit to access
the PCI-configuration space.

MOV dx,0cf8
MOV eax,8xxx_xxxx ;have MSBit set
OUT dx,eax ;set bus, device and offset at once
;followed by IN/OUT 0cfc (many locations work with 32 bit rd/wr only)

But after knowing that a PCI-device is found in the MMIO-space, you
can use any register with all available addressing formats to RD/WR
as if they were just ordinary memory (let aside W1C and friends yet),
even you better read/write always aligned 32-bits at once.

The BIOS should really know enough about its own hardware to correct
set MMIO-space to UC. As long the OS wont corrupt the MTTR-setting...

Could your reported problem be just caused by the used debugger ?
I once had to modify mine to always read aligned 32-bit junks to see
the correct contents of any PCI-memory, bytewise read showed garbage.

>>> access decoding). They also have their advantages: there
>>> is none of those caching problems (speculative reads etc.),
>>> writes are immediately executed across PCI bridges (where
>>> memory mapped registers have their issues as writes end
>>> up in posting buffers, another kind of caching issue to
>>> be taken into account). The problem is, there exist only
>>> 65536 I/O ports that need to be shared all across the system
>>> (every device, plus legacy I/O ports).

Yes, and 8-bit legacy ports are awful slow (RTCL,KEYBD,COMx,LPTn)~1MHz,
while their memory-mapped aliases (if available) allow faster response.
Seems that IDE and SATA IO-ports (32-bit wide) are quite faster(>20MHz).

Memory mapped I/Os could be slow as well, it just depends on hardware.

>> I don't think so ... it has been awhile, but I distinctly
>> remember setting the high bit of EDX to access a PCI device.
>> So I infer all 32 bits are available for IO addressing.

Unfortunately (or for the better) IO-space remain limited to 16bit.

__
wolfgang


Lars Erdmann

unread,
Jun 13, 2012, 2:41:02 AM6/13/12
to
Hallo Phil,

> I think you'd be better off looking at the MMRRs and/or page tables to
> confirm that they're correct, rather than trying such a hack and hoping
> it behaves the way you think it will. In other words, if your goal is to
> gain a better understanding of the behavior of the system leading to the
> failure you are investigating, I don't know that this hack is likely to
> give you useful information.

I have now decided to do what you are suggesting: I am looking at the MTRRs
and also at the page tables.

I have a question about MTRRs:
Say a device's memory mapped registers fall into a region that is
(erroneously) mapped by the MTRRs as "write back".
What effect would that have ? Would the CPU really write to the cache ? Or
would it (or the cache controller) somehow "know" that that physical address
does not end up at system memory (RAM) ?


I already have a hack in place to update the PTE (page table entry) for the
one physical page (4k) that maps the memory mapped device registers of the
USB device. The hack will set the PCD and PWT bits of the PTE so that the
corresponding page will be uncached (overriding whatever is set by any MTRR
for that memory range).
However I feel uncomfortable with that solution:
1) a device driver should not muck around with the page tables.
2) there are coherency issues regarding the TLB on multiple CPU cores (where
each core has its own TLB).
The hack will only do the (necessary) invalidation of the TLB entry on the
core it is executed on.
Unfortunately, for TLB entries, there is no such thing as "snooping" which
would ensure coherency across multiple cores.

Lars

CN

unread,
Jun 13, 2012, 11:20:18 AM6/13/12
to
On 6/12/2012 11:41 PM, Lars Erdmann wrote:

> I have now decided to do what you are suggesting: I am looking at the
> MTRRs and also at the page tables.
>
> I have a question about MTRRs:
> Say a device's memory mapped registers fall into a region that is
> (erroneously) mapped by the MTRRs as "write back".
> What effect would that have ? Would the CPU really write to the cache ?
> Or would it (or the cache controller) somehow "know" that that physical
> address does not end up at system memory (RAM) ?

All recent Intel CPUs machine check if non-RAM memory is accessed with a
cached mode (i.e. not UC or WC). If the OS doesn't have machine checks
enabled, the CPU just halts, and the OS freezes.

If you are not getting machine checks or freezes, it means the caching
type is not an issue and you're looking in the wrong place.

Note: this is only Intel, I don't know what AMD CPUs do.

Lars Erdmann

unread,
Jun 15, 2012, 2:57:16 AM6/15/12
to
Thanks. That helps.

Lars

"CN" <qmbmn...@nospicedham.pacbell.net> schrieb im Newsbeitrag
news:jrab3j$3u2$1...@dont-email.me...
0 new messages