XILINX PCIe read of slow device

411 views
Skip to first unread message

David Binette

unread,
Oct 27, 2014, 2:05:32 PM10/27/14
to
What is the correct way to handle a PCIE request to a slow device?

I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.

The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.

Is there a way to tell the PCI bus to wait, or retry..?

thanks



Mark Curry

unread,
Oct 27, 2014, 2:37:05 PM10/27/14
to
In article <d7e5311e-f5ea-4170...@googlegroups.com>,
David,

What specific problem are you trying to address?

The Completion Timeout Mechanism of the PCIE spec is
optional, and must be enable by SW during device configuration.

Can you just disable this? You can force it disable on either end
(root complex, or endpoint). I don't think it's enabled by default,
but I can't check at the moment...

Or are you asking something else?

Regards,

Mark

David Binette

unread,
Oct 27, 2014, 2:53:30 PM10/27/14
to
On Monday, October 27, 2014 1:37:05 PM UTC-5, Mark Curry wrote:
> In article <d7e5311e-f5ea-4170...@googlegroups.com>,
Mark, thanks, I will look into 'completion timeout mechanism' to see if it is the answer to my need. .. Am i asking something else? I don't know, it is all kind of new to me.

part of the difficulty is that the PCI system and the local app are on different clock domains, so when the PCIE read occurs I deal with the clock crossing but it takes clock cycles before I can return something to the PCI read request

lang...@fonz.dk

unread,
Oct 27, 2014, 7:09:09 PM10/27/14
to
For peripherals that a slow like I2C on a normal MCU, you would normally have
a register to initiate the read, and a status register you can poll to see when the result is ready


-Lasse

-Lasse

David Binette

unread,
Oct 28, 2014, 10:12:33 AM10/28/14
to
yes, that is a good solution, but for a different problem.
In this case, the data is always 'ready' it is continuously changing, on a faster clock domain and I need a couple of cycles for the read request to cross domains.

I've tried unsuccessfully to manipulate the IP cores 'trn_tsrc_rdy_n' line to look at the read address and before setting the start of frame line in an effort to pre-fetch the data, but for some reason the core will not tolerate any delays.

lang...@fonz.dk

unread,
Oct 28, 2014, 10:37:43 AM10/28/14
to
can't you just keep a copy of the data on the other clock domain?

-Lasse

David Binette

unread,
Oct 28, 2014, 12:25:28 PM10/28/14
to
yes that is feasible for a small number of items and it my be 'plan-b' if no PCI bus solution is available to me.

I like your suggestions, they are all reasonable and I'll take the best alternative I can get if I dont find a way to do this via PCIe

Joe Chisolm

unread,
Oct 28, 2014, 6:37:12 PM10/28/14
to
This is out of UG654, page 133, for a simple PIO access. I'm not
sure what your host driver might be using.

"While the read is being processed, the PIO design RX state machine
deasserts trn_rdst_rdy_n, causing the Receive TRN interface to stall
receiving any further TLPs until the internal Memory Read controller
completes the read access from the block RAM and generates the
completion. Deasserting trn_rst_rdy_n in this way is not required for all
designs using the core. The PIO design uses this method to simplify the
control logic of the RX state machine."

Also take a look at page 141

--
Chisolm
Republic of Texas

Sean Durkin

unread,
Oct 29, 2014, 4:49:12 AM10/29/14
to
David Binette wrote:
> yes that is feasible for a small number of items and it my be 'plan-b' if no PCI bus solution is available to me.
>
> I like your suggestions, they are all reasonable and I'll take the best alternative I can get if I dont find a way to do this via PCIe

I'm still not sure on what exactly your requirement is. In one post you
write that you want to read from slow devices (like I2C). That would
mean the problem is this:
- you issue a PCIe read request
- this read request triggers something, e.g. a read from an I2C device,
which takes a certain time
- meanwhile, you cannot respond to the PCIe read request in time because
you haven't received the result yet

In that case, do what Lasse suggests: Have one register to trigger the
read and another one that can be polled via PCIe indicating when the
data is ready.

But in another post you write "the data is always 'ready' it is
continuously changing, on a faster clock domain", which is something
entirely different. Is it streaming data? Do you need to catch all the
data or do you want to read out only one single value occasionally? Is
it dependant on your read, meaning that your read requests initiates a
calculation or something that you want the result of, or is the data
totally independant and you only occasionally want to read the current
value?

Since I don't understand what you really want to do, here's a few other
possibilities:

- You could just always transfer the data you have to the PCIe clock
domain whenever it changes. Each time there is a new value, always
transfer it to the PCIe clock domain immediately and put it e.g. into a
BAR register. So when you issue a PCIe read request, there's data
already there that you can put into your reply message immediately.
Worst case is you don't get the very latest value but the one before that.

- If you need to catch all the values, I'd put the data into a FIFO. You
could then e.g. issue an MSI (Message signaled interrupt) when the FIFO
is e.g. half-full (or keep polling prog_full or something) and then read
it out in a burst from the PCIe side. No need for clock-domain-crossing
for the read request, as you only read from the FIFO that has its read
port in the PCIe clock domain. No need for PCIe to wait for data too
long, since data from the FIFO is available one or two clock cycles
after the read request was issued (depending on how you configure the FIFO).

- If in your design the read request itself triggers something that
takes a while, do what Lasse suggests.

HTH,
Sean

David Binette

unread,
Oct 29, 2014, 10:13:51 AM10/29/14
to
I understand this
"deasserts trn_rdst_rdy_n, causing the Receive TRN interface to stall
receiving any further TLPs"

but I'm not so much interested in "any further TLPs' as allowing the current TLP to continue processing, it seems that if i delay even a single extra cycle it causes distress to the linux host.

David Binette

unread,
Oct 29, 2014, 10:33:54 AM10/29/14
to
Hi Sean,
Thanks for the suggestions, but I think what I really need is a way
to stall the current TLP to allow the read/access to complete.

-- Is it streaming data? Do you need to catch all the
-- data or do you want to read out only one single value occasionally? Is

The data is always changing, and only needs to be read occassionally.

-- You could just always transfer the data you have to the PCIe clock
-- domain whenever it changes. Each time there is a new value, always
-- transfer it to the PCIe clock domain immediately and put it e.g. into a
-- BAR register. So when you issue a PCIe read request, there's data
-- already there that you can put into your reply message immediately.
-- Worst case is you don't get the very latest value but the one before that.

That would be OK for most cases but some reads have side effects
, such as clearing another register upon read. This could be overcome
and is not a show stopper, that part could be redesigned.

also since the external device has a lot of registers and they are
typically accessed by setting their address and reading the result
(sometimes a calculated result) it would require significant changes
to create a bank of shadow values to capture them all for
instantaneous retrieval instead of indexed on-demand access

How do other ppl handle things like doing SMBus reads over PCIe or
an I2C device.. the first read is certainly going to need some time
to complete before it can return data.

Perhaps I just fumbled something during my tests and subsequently discarded
what should have been a viable approach.

If I knew exactly how it should be done I could focus my efforts on that.

David Binette

unread,
Oct 29, 2014, 10:47:21 AM10/29/14
to
ps, i know that SMBus is an independant bus on the PCIe connector, I don't mean to complicate the topic with that. It was an example to illustrate only.

Chris Higgs

unread,
Oct 29, 2014, 12:34:19 PM10/29/14
to
On Wednesday, October 29, 2014 2:33:54 PM UTC, David Binette wrote:

> That would be OK for most cases but some reads have side effects
> , such as clearing another register upon read. This could be overcome
> and is not a show stopper, that part could be redesigned.

It's generally best to avoid side-effects if at all possible and make all reads idempotent. Life is much easier for software that way.

For example, TLPs may be re-ordered, accesses above a certain size may not occur in the order you expect, the root complex may attempt to pre-fetch a value, in future you may be using this device over a lossy medium like Ethernet.

All of these things can be controlled (or worked around) in software but often lead to inefficiencies. If you have the choice, it's always better to design your interface with a view to simplifying the software interaction. This generally also yields simpler hardware and fewer gotchas in the documentation so everyone's a winner!

Thanks,

Chris

Mark Curry

unread,
Oct 29, 2014, 2:36:04 PM10/29/14
to
In article <b22fff2a-6bf2-4285...@googlegroups.com>,
David Binette <david....@gmail.com> wrote:
>On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
>> What is the correct way to handle a PCIE request to a slow device?
>>
>> I have a xilinx spartan 6 PCIe using Integrated Block for PCI Express.
>>
>> The BAR memory map is decoded and some addresses map to fast ram, or local registers and these work OK,
>> but some addresses map to slow devices.. like I2C or internal processes that need a few cycles to process before they can produce valid data to be returned to the PCI bus.
>>
>> Is there a way to tell the PCI bus to wait, or retry..?
>>
>> thanks
>
>
>
>How do other ppl handle things like doing SMBus reads over PCIe or
>an I2C device.. the first read is certainly going to need some time
>to complete before it can return data.
>
>Perhaps I just fumbled something during my tests and subsequently discarded
>what should have been a viable approach.
>

David,

I can't offer any specific advise - but generally all PCIE transcations
are "stalled", whether they're reading from a slow device on another clock
or a "fast" device on the same clock.

For A PIO read you get:
1. The host issues a PIO read.
2. A TLP MRd packet is formed and sent across the serial interface.
3. The xilinx endpoint decodes the packet, determines that the packet
is meant for the user logic - you. It sends the information
out to the user interface logic.
4. Your logic issues the read, and responds.
5. The CPLd packet is formatted and transmitted back across the PCIE
link.
...

All of that takes quite a bit of time. The fact that step 4 takes
a few cycles (give or take 10s or perhaps even 100s) is almost irrelavant.
The PCIE time mechanism doesn't come into play until this number is
very high (I've not used it, but I'd think we're talking 10s of ms)

The whole process has quite a bit of latency. A few cycles
here or there aren't going to matter.

I don't use that specific PCIE core, nor Xilinx logic (I'm using the Virtex7 core,
with AXIS interfaces tied to my logic). But the general flow should be the
same. I'd review the interfaces specification to fully understand
what's required. Are you running sims with the Xilinx logic?

Regards,

Mark


David Binette

unread,
Oct 30, 2014, 8:38:06 AM10/30/14
to
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
Thanks Mark
for your time and comments, which were helpful.

I haven't put it on the simulator, just doing compiles and tests but the turn time is long.

Petter Gustad

unread,
Oct 31, 2014, 6:58:53 AM10/31/14
to
David Binette <david....@gmail.com> writes:

> I haven't put it on the simulator, just doing compiles and tests but the turn time is long.

Does Xilinx provide a realistic Root Complex model or some other type of
PCIe verification environment?

Rolling your own can be some amount of work. However, it might be
possible to instantiate a Xilinx Root Complex in your testbench and use
that to stimulate your DUT.


//Petter


--
.sig removed by request.

kkoorndyk

unread,
Oct 31, 2014, 2:09:45 PM10/31/14
to
Yes, the example design provided with the PCIe EP Block contains a root port model.

I've recently worked a Spartan 6 design similar to the OP in which the FPGA is a bridge between the processor over PCIe and a local bus with several peripherals. I started with the example design and modified the PIO Rx and Tx engines to work for my application. Most of the local bus cycles are fast enough that software is not having to wait. A timeout was implemented on the local bus cycles that issues an MSI interrupt on the PCIe link if the peripheral doesn't respond within the timeout period (~1 us). One issue we ran into WRT PCIe packet timing is that the MSI interrupt was not being seen by software before the next transaction was issued on the link. We ended up using a status register for software to poll instead.

David Binette

unread,
Nov 4, 2014, 5:50:56 PM11/4/14
to
On Monday, October 27, 2014 1:05:32 PM UTC-5, David Binette wrote:
I have found some documents that specifically refer to 'throttling data on the transmit path'

http://www.xilinx.com/support/answers/21707.html
"You can pause the transfer of packets between the user application and the PCI Express Core by deasserting trn_tsrc_rdy_n. There is no limit to the number of cycles that trn_tsrc_rdy_n can be deasserted. The PCI Express Core holds the packet in its transmit buffer until you finish moving the packet into the core signified by the assertion of trn_teof_n. Once the complete packet is stored inside the core, it is transmitted on the PCI Express Link. You cannot directly affect the packet's transmission on the link through trn_tsrc_rdy_n. However, if you deassert trn_tsrc_rdy_n excessively it slows the overall bandwidth because the core does not have the packet to send until you assert trn_teof_n.
NOTE: Currently, you must pause back-to-back TLPs by at least one cycle by deasserting trn_tsrc_rdy_n. Please see (Xilinx Answer 21708) for more information."

and

http://www.xilinx.com/support/answers/21592.html
"The user application input trn_tsrc_rdy_n should not be asserted low all the time. It should only be asserted when the user application is involved in a data transfer. It should be asserted at the same time as trn_tsof_n and deasserted with trn_teof_n. It is permissible to insert wait states between the assertions of trn_sof_n and trn_teof_n by deasserting trn_tsrc_rdy_n."


I have some screen snapshots of the PCIe signals involved.
1) the normal PCIe signals (this works) '99' is read on the PCI bus on the linux system with about 600,000 reads/sec
http://www.mediafire.com/view/3v81znw933hwtq4/throttle1.png

2) the throttled version 0xffffffff is read on the PCIe bus and the rate is about 23 reads/sec it should read the value 98
http://www.mediafire.com/view/g8gc6r864c3ze3p/throttle2.png


I added some extra lines,
'rd_wait_i' tells the IP core to wait until this line is de-asserted
in operation it is intended to throttle the transmits of the
single 32 bit data value to be transmitted during
trn_tsrc_rd_n=0 and the trn_eof_n=0.

'req_data' tells my app that a read cycle is occurring

If anyone sees something awry with 'throttle2.png' i'd sure like to know.



Luis Benites

unread,
Mar 8, 2021, 9:52:04 PMMar 8
to
Just in case this is what you are trying to so: stalling your whole system and all other PCIe accesses to wait for an i2c read should never be the solution to anything. You send your completion whenever it's ready. If it takes you longer than the spec to complete then you need to initiate the read in some other way (earlier), check for ready and only then issue the read you can complete on time.

David Brown

unread,
Mar 9, 2021, 1:56:20 PMMar 9
to
Please look at the date of the post you are replying to. Do you think
someone will have been waiting over six years for an answer to a Usenet
post? It's nice that you are trying to help, of course.

Luis Benites

unread,
Mar 22, 2021, 2:03:44 PMMar 22
to
Ha ha. Let's start a flame war over trying to help. Don't you have better use of your time? Anyone looking for CURRECT PCIe help with a google search will come across this post and get something from it. Nothing that was said is outdated.
Reply all
Reply to author
Forward
0 new messages