Any ideas to help?
I'm guessing a bit, but...
That card has an x8 PHY on it. So the motherboard thinks that it is
an x8 card, and link aggregation fails.
The motherboard (typically) decides that it's an x8 card by doing a
receiver detect. All the receiver detect does is check for a load on
the drivers.
To test this out, mask off all of the receivers for the unused (last
4) channels. If I'm right (it could happen) the motherboard won't see
the receivers, and aggregate as an x4.
RK
I talked to two of our PCIe experts, and they suggested checking three
things:
The DIP switch that controls how many lanes are supported via the PCIe
connector presence detect settings could be set wrong. It is SW5 on
the board, see table 2-15 in the board reference manual
http://www.altera.com/literature/manual/rm_sivgx_fpga_dev_board.pdf.
If it is an Intel motherboard in some cases they send out Vendor
Defined messages. The customer’s application design needs to be
designed to ignore these messages (unless the customer is Intel then
their application might need to know what these messages are and do
the right thing). If the application design doesn’t accept these
messages from the core it will lock things up and cause configuration
problems.
If you have Engineering Sample (not production) silicon, it could be a
case of the Stratix IV GX ES erratum entitled “Endpoints Using the
Hard IP Implementation Incorrectly Handle CfgRd0” as described in the
IP Release notes: http://www.altera.com/literature/rn/rn_ip.pdf.
Workaround for this is to use production devices or use soft IP (v9.0
or later) on ES devices.
Hope this helps,
Vaughn Betz
Altera
[v b e t z (at) altera.com]
DIP switch appears to have no effect at all. That is, I can set it
into x1-only position and host/board still sometimes negotiate to x4.
Or I can set it to x1+x4+x8 and it sometimes correctly detects x4. the
key word here - sometimes.
BTW, what are those presence detects? What are they supposed to do? Is
it something dev-kit specific, Altera-specific or standard?
> If it is an Intel motherboard in some cases they send out Vendor
> Defined messages. The customer’s application design needs to be
> designed to ignore these messages (unless the customer is Intel then
> their application might need to know what these messages are and do
> the right thing). If the application design doesn’t accept these
> messages from the core it will lock things up and cause configuration
> problems.
>
Yes, it is an Intel board - brand new S3420GPLC. I also tested on the
other Intel board - 4 years desktop based on 915-series chipset. It
misbehaves in a similar manner. With a bit of effort I could find some
Dell or Asus or may be Supermicro to test but all of them are based on
Intel chipsets so it probably wouldn't make difference. Finding
testing platform not based on Intel chipset would present a serious
challenge. Besides, even if I'd find one - it's not going to help
since it _has_ to work on Intel at the end.
So tell me more about how exactly my application could ignore Vendor
Defined messages. Please keep in mind that I am using Avalon-MM
variant of the PCIe core so I don't have too much control on what's
going on under the hood.
One more point - with soft PCIe IP it misbehaves slightly less (and
differently) than with hard IP but misbehave nevertheless.
BTW, even in the slot which is x4 electrically things are not rosy.
Quite often in that slot board is detected as x1 or x2 instead of x4.
When detected as x2 it tends to not work at all, when detected as x1
it behaves better (not good, just better).
Anyway I _need_ x4. Were I wanted x1 I'd rather build from cheap PLX
bridge + cheap StartixIII - combo that "just works".
> If you have Engineering Sample (not production) silicon, it could be a
> case of the Stratix IV GX ES erratum entitled “Endpoints Using the
> Hard IP Implementation Incorrectly Handle CfgRd0” as described in the
> IP Release notes:http://www.altera.com/literature/rn/rn_ip.pdf.
> Workaround for this is to use production devices or use soft IP (v9.0
> or later) on ES devices.
How do I know whether it is Engineering Sample or production device?
>
> Hope this helps,
Yes, it does, thanks. But more help needed.
In fact I am starting to suspect that the kit we have is physically
damaged. Would be real shame if it is the case - so much time already
wasted, but better that than not finding solution at all.
If we are talking already - one more question, may be related or may
be not.
I measured PCIe read latency from host to zero-latency avalon-mm slave
that lives in pcie_clock_out clock domain. To my big surprise the
latency was absolutely huge - around 1050 ns for hard IP and 880 ns
for soft IP. I expected much shorter latency - 250 ns, at wost 300.
The measurements were done in the PCIe slot directly attached to Xeon
3400 CPU so from the host perspective it's probably the fastest
configuration in current existence.
Why is read so slow? Is it (hopefully) an another sign of hardware
problem? Or is Altera ST-to-MM converter so slow (I find it hard to
believe, according to my estimate that particular part of the loop
should contribute about 50 ns, if not less)? Or Altera PCIe IPs
themselves are poorly suited for serving host read access?
Couple of updates since last time:
1. Our kit is indeed based on engineering sample.
2. QuartusII 9.1 appears to have exactly the same problems as 9.0 SP2
that we used before.
[Whining on]
BTW, do you know that PCIe core v. 9.1 is not 100% source code
compatible with v.9.0? I though that the whole point of _minor_
version number that it's supposed to be backward compatible :( Next
time you break backward compatibility, pleas increment the major
version number then, at least, we would know that the trouble is
coming.
Plus, "soft" IP is not 100% source code compatible with the hard IP
and the differences are more than just test inputs/output. It's pretty
annoying.
[Whining off]
Regards,
Michael