On Sat, 30 Aug 2014 16:06:36 +0200
Diego Roversi <
die...@tiscali.it> wrote:
> Hello,
>
> I want to share what I found, mading some experiment with dram setting,
> following the A10_DRAM_Controller_Calibration on the wiki.
Thanks a lot!
You are the first who actually tried to follow the instructions
from this wiki page (after Jens Kuske), so your feedback is very
much welcome.
I would surely prefer to get some feedback and test results from
Cubietruck users a lot earlier. Unfortunately this has not happened.
Probably because Cubietruck is relatively expensive and very likely
not many subscribers of this list have it.
But anyway, the timing is almost perfect because the mainline u-boot
v2014.10-rc2 with the new DRAM initializations code has been just
tagged. And we can now actively move to the second phase, which
means finding and contributing better DRAM settings for various
boards to u-boot :-)
> First of all, I've patched the a10-dram-timings-calculator, adding the
> definition for olinuxino-a20-micro in the a10-dram-info.rb:
>
> "A20-OLinuXino-Micro" => {
> url: "
http://linux-sunxi.org/Olimex_A20-OLinuXino-Micro",
> dram_size: 1024,
> dram_chip: MEM4G16D3EABG_125,
> dram_para: {
> zq: 0x7f,
> odt_en: 0,
> tpr3: 0,
> tpr4: 0,
> emr1: 0x4,
> }
>
> And I hope is correct, because I just looked at the others definition
> and guess the right value :-). Dram chip is the same of the lime
> version, so should be right.
The JEDEC specification defines a set of speed bins with their timings.
And every DDR3 chip must be good enough to support at least the worst
standard timings.
But if we know the exact chip brand and have its datasheet, then we can
get a bit better memory latency and performance compared to the JEDEC
derived defaults. This is good for performance tuning. And ensures that
a higher percentage of the theoretical bandwidth can be utilized. But
the theoretical bandwidth is still determined by the memory bus width
and clock speed.
This step is useful, but not strictly necessary. It is also possible
to just use generic JEDEC timings for tpr0/tpr1/tpr2 parameters. The
tpr0/tpr1/tpr2 parameters depend on the DRAM clock frequency though,
so the calculator script is still needed.
> The other values were copied from fex file.
A big problem is that the fex files are not a very good source of
useful information. It looks like the device vendors generally
just semi-randomly picked the 0x7b/0x7f values for 'zq' and 0x0/0x4
for 'emr1' (copied them from each other and maybe in some cases also
tried to experiment themselves, inventing minor variations).
The 'zq', 'odt_en' and 'emr1' parameters are responsible for impedance
matching. Configuring the right values for them is very important for
the DRAM reliability and clock speed. The "theoretical" part of the
guide in the wiki page is here:
http://linux-sunxi.org/A10_DRAM_Controller_Calibration#Impedance_settings.2C_ODT_and_ZQ_calibration
The "practical" part of the guide was lacking, but I have added some
more information there a few days ago.
> Then I used the program for calculate dram setting, and tested them
> with a10-tpr3-scan program, using u-boot-sunxi (ssvb github repo,
> missed definition for olimex boards).
The 'zq' and 'odt_en' settings are not working properly in
u-boot-sunxi, that's why we need to upgrade to u-boot v2014.10
and I'm going to take care of this in the next few days.
> After some days of testing, I found a stable dram setting at 432Mhz:
>
> static struct dram_para dram_para = { /* DRAM timings: 7-6-6-16 (432 MHz) */
> .clock = 432,
> .type = 3,
> .rank_num = 1,
> .density = 4096,
> .io_width = 16,
> .bus_width = 32,
> .cas = 7,
> .zq = 0x7f,
> .odt_en = 0,
> .tpr0 = 0x2a906690,
> .tpr1 = 0xa068,
> .tpr2 = 0x22e00,
> .tpr3 = 0x21111,
> .tpr4 = 0x0,
> .tpr5 = 0x0,
> .emr1 = 0x4,
> .emr2 = 0x8,
> .emr3 = 0x0,
> };
>
> I've also tried 456Mhz, with no lucks. Seems to find a stable ragion
> around tpr3=0x71111, but still sometimes crash with lima-tester when
> it tried the 100 loops test.
This was surely a useful training exercise, but we can't get really
major DRAM performance improvements without finding optimal impedance
settings. The 432MHz or 456MHz DRAM clock speed is likely still far
from its real limit.
For example, I found DRAM parameters, which work stable at 600MHz
MBUS and 648MHz DRAM clock speeds on my Cubietruck. This particular
Cubietruck board earlier could not run DRAM higher than 456MHz with
the default zq/emr1/tpr3 settings:
https://www.mail-archive.com/linux...@googlegroups.com/msg03510.html
The tpr3 tweak found by Jens allowed it to run up to 552MHz. And
only adjusting zq/emr1 allowed it to go beyond 600MHz.
Even using u-boot-sunxi, it is possible to try testing different
values for the 'emr1' parameter. Proper explanations were missing,
but I have just added a table with possible 'emr1' alternatives:
http://linux-sunxi.org/DDR_Calibration#Finding_good_impedance_settings
And a few more things:
a) As far as DRAM configuration is concerned, there is no such thing as
"A20-OLinuXino-MICRO" board. It has a number of revisions:
https://github.com/OLIMEX/OLINUXINO/tree/master/HARDWARE/A10-OLinuXino-MICRO
If we check PDF files with schematics, then we see that revisions C
and D use 237 ohm ZQ resistors. And revisions E2 and F1 use 330 ohm.
This unfortunately means that these boards very likely have different
optimal DRAM settings. At relatively low ~400MHz clock speeds they all
may work fine with the same misconfigured impedance settings. But if
we aim at 500MHz and higher, then it may make a big difference.
b) Even if we pretend that zq=0x7f,odt_en=0,emr1=0x4,tpr3=0x21111 are
optimal settings for your board, we can't be sure that they will
work for other A20-OLinuXino-MICRO boards (even with the same revision).
Each individual board may have some slight differences and each A20 chip
has its own voltage tolerances.
We need tests from more than one board to see how much they can differ.
Then we can try to estimate confidence intervals and decide which
settings are supposed to be 'safe' for everyone.
c) Also we can't be totally sure that the settings are safe for each
individual board even if we can't make it fail the lima-memtester test.
Theoretically, there may be workloads which are even more reliability
sensitive than lima-memtester. Or we just don't run lima-memtester
long enough (for example, even if it does not fail after let's say
8 hours, then we can't rule out the possibility of a fail after a week
of running non-stop). So we still need to have some extra safety
margin by increasing the dcdc3 voltage and decreasing the MBUS/DRAM
clock speed.
d) Humans can't be trusted :-) We can't be sure that the users
will always pick correct configurations when compiling u-boot.
For example, I can easily imagine a user compiling u-boot for
A20-OLinuXino-MICRO Rev.F1 (while he has Rev.C) and failing to
notice this immediately if the board just happens to boot. The
device manufacturers may silently update PCB without telling
anyone. And finally, there are a lot crazy overclockers, who just
mindlessly bump the clock speed (right now they are doing this
for the CPU clock speed) and don't care about the consequences.
So we still need to somehow enforce the use of validation tools,
which could run some strong reliability tests and write a "stamp
of approval" to non-removable non-volatile memory (NAND or EEPROM).
> If someone needs it, I can upload somewhere the report.hml of this tests.
The linux-sunxi wiki supports html markup just fine, so you can paste
your data there. And get something like the following page:
http://linux-sunxi.org/A10_DRAM_Controller_Calibration_(impedance_configuration_example)
BTW, I have updated the a10-tpr3-html-report to make it easier to use. Now
it supports '--notitle' option to just dump the table(s) without the
title headers. And fixed a html escaping bug. So you can just dump the
output of a10-tpr3-html-report to some page in the linux-sunxi wiki.
We can use irc to discuss the details about how to best share the
test results. And also about improving the documentation. Feel free
to ping me there.
--
Best regards,
Siarhei Siamashka