I'm getting way too many bit errors on spitz, with various
kernels.
It may be tied to network usage (bluetooth or wifi?). It happens even
on AC power.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Pavel,
What kind of bit errors? I'm not using any network here on my spitz so
not sure what exactly was happening. Could you paste the dmesg here
so we can help take a look?
- eric
dmesg would not be useful, it usually hits user programs. Like... mutt
suddenly displaying , instead of - in the header. Program failing to
start because function printg is not found (it was not exactly
printf->printg, I don't remember exact symbol), ping complaining
about discarding corrupted packets, etc.
(Or of course, kernel oopsing or not going from suspend at all. But as
even user data are being corrupted, oops is not likely to be
interesting and system is typically not in state to capture it any more.)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Well I've seen empty lines when editing file with vim (these that are starting
with blue tilda) in the middle of file. And sometimes programs segfaults for no
good reason. Just today I've run "apt-get update" and got:
symbol lookup error: apt-get: undefined symbol: _ZN16pkgAcquireStatus4StopEv
While the correct symbol seems to be _ZN16pkgAcquireStatus4.
When running 'make' in kernel directory and closing the display sometimes
machine dies and nothing but reset under battery cover helps. I remeber waking
up in the morning, opening the device and reseting the device. And it seems to
be provoked much more by active CF wifi card.
--
metan
Forgotten about this one. See for yourself, notice short black vertical lines
flashing randomly.
http://atrey.karlin.mff.cuni.cz/~metan/outgoing/zaurus_sickness.mpg
> Well I've seen empty lines when editing file with vim
And I have seen:
- Unreproducible SIGSEGV of gcc (while Wi-Fi connection over CF card was
running).
- Unreproducible SIGSEGV of opkg (downloading via Wi-Fi connection over
CF card).
- Unreproducible SIGSEGV of rm (called from find command launched via
ssh, networking via Wi-Fi connection over CF card).
(Hint: Tasks above are HDD-intensive.)
- Lost blocks while copying from CF to SD.
- Lost blocks while copying from HDD to SD.
- Lost blocks while copying from CF to USB flash stick.
- And I see display noise while CF Wi-Fi card is active.
These problems appear in all kernels, at least since 2.6.26.
There is no note in the syslog.
________________________________________________________________________
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
I haven't looked at the video.
Is this display rotated by 90 degrees?
If so, they're actually horizontal lines as far as the display scanning
is concerned - and that tends to suggest that there's insufficient system
bus bandwidth for all the activity taking place, and the LCD controller
is being starved of data.
I've seen similar (described) effects on SA1110 systems in past years
with low clock rates.
Some of the reports suggest that this happens with multiple kernel versions
and is not something new to the latest kernels. Please confirm when the
problem started.
Yes it is.
> If so, they're actually horizontal lines as far as the display scanning
> is concerned - and that tends to suggest that there's insufficient system
> bus bandwidth for all the activity taking place, and the LCD controller
> is being starved of data.
Well, when doing 'echo 0 > /sys/class/graphics/fbcon/rotate_all' for 2.6.33
I've got the same problem but the lines are vertical. However 2.6.24 seems to
work in non rotated mode.
> I've seen similar (described) effects on SA1110 systems in past years
> with low clock rates.
>
> Some of the reports suggest that this happens with multiple kernel versions
> and is not something new to the latest kernels. Please confirm when the
> problem started.
As far as I can test in rotated mode it happens for kernels from 2.6.24 to
2.6.33 (I haven't older kernel than 2.6.24 that boots on spitz).
--
metan
I saw very similar failures for a long time on our iMX31 based device.
Eventually I found a Freescale errata where the RAM inside the USB2
macrocell started to make single bit errors below 1.38V Vcore; ours was
1.4V at that time but dipped on CPU load.
I cranked up the Vcore to 1.6V and that solved it, we also added some
ceramic caps to Vcore to help with the dips.
So it might be worth looking at PMU arrangements for Vcore level / look
for dips with a 'scope (despite this isn't an iMX31).
A characteristic of it was it never caused kernel issues, since the
kernel didn't come over USB. It only ever caused troubles on userspace
stuff.
-Andy
> I saw very similar failures for a long time on our iMX31 based device.
> Eventually I found a Freescale errata where the RAM inside the USB2
> macrocell started to make single bit errors below 1.38V Vcore; ours was
> 1.4V at that time but dipped on CPU load.
Good tip. It seems that nobody ported driver for the voltage control
chip ISL6271 from 2.4 kernel, and bootloader probably does not set
correct values.
Datasheet:
http://www.penguin.cz/~utx/zaurus/datasheets/power/Xscale/ISL6271.pdf
--
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
Unless there's more to it in the way the zaurus using it that regulator
isn't programmable digitally.
Reading about your CF Card WLAN related issues they suck down a good
amount of power when their radio is up, I would definitely suggest
monitoring with a 'scope the various rails (Vcore, RAM and whatever it
is the CF Card is powered by) while putting it under load.
-Andy
> Unless there's more to it in the way the zaurus using it that regulator
> isn't programmable digitally.
OOPS, I made a mistake and linked ISL6721 instead of ISL6271 there.
Now it is fixed:
http://www.penguin.cz/~utx/zaurus/datasheets/power/XScale/ISL6271A.pdf
This one has I2C. It is connected to GPIO 3 (PWR_SCL) and GPIO 4
(PWR_SDA).
It is visible between the black plastic and the large circular coil:
http://www.penguin.cz/~utx/zaurus/teardown#pcbt
> Reading about your CF Card WLAN related issues they suck down a good
> amount of power when their radio is up, I would definitely suggest
> monitoring with a 'scope the various rails (Vcore, RAM and whatever it
> is the CF Card is powered by) while putting it under load.
I guess that Zaurus has a good power design and that voltage should be
constant enough. CF has a dedicated step down (plus 2.8V power detector
(Why so low, if CF standard requres more?)), HDD has a dedicated step
up/down. USB has dedicated step up. Companion chips use dedicated 3.3V
step down. Audio uses dedicated linear regulator. CPU has several
dedicated step downs, CPU 3.3V uses step-up to 5V and then down to 3.3V
(which is shared only with IOPORT).
Nearest common point between CF card power and CPU power is the battery.
________________________________________________________________________
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
>> Unless there's more to it in the way the zaurus using it that regulator
>> isn't programmable digitally.
>
> OOPS, I made a mistake and linked ISL6721 instead of ISL6271 there.
> Now it is fixed:
> http://www.penguin.cz/~utx/zaurus/datasheets/power/XScale/ISL6271A.pdf
>
> This one has I2C. It is connected to GPIO 3 (PWR_SCL) and GPIO 4
> (PWR_SDA).
Thanks... that defaults to 1.3V on Vcore if you don't touch it. I guess
confirm on the CPU datasheet that it's OK for your selected CPU clock speed.
> I guess that Zaurus has a good power design and that voltage should be
> constant enough. CF has a dedicated step down (plus 2.8V power detector
In that case is the PXA CF driver PIO? Then it can be the same load on
Vcore issue in disguise.
-Andy
> In that case is the PXA CF driver PIO? Then it can be the same load on
> Vcore issue in disguise.
There is a proprietary ASIC chip (Sharp Scoop) that handles CF and HDD
access (and also several additional GPIOs):
http://www.penguin.cz/~utx/zaurus/datasheets/ASIC_S1L50752B26B200/412752.PDF
The ASIC runs in dual power mode. HVDD is powered from the 3.3V
dedicated to CF resp. HDD power supply (both may be turned off by the
kernel), LVDD is shared with CPU 3.3V (it is always on).
It seems that there are no other chips connected to the VCC_PLL,
VCC_SRAM and VCC_CORE.
VCC_DRAM is the same 3.3V ans CPU ans ASIC LVDD and also the same as
flash power and flash driver CPLD:
http://www.penguin.cz/~utx/zaurus/datasheets/memory/
________________________________________________________________________
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
--
Right but not thinking about its power arrangements, rather the load on
the CPU itself when it's transferring data to / from CF interface (via
this ASIC).
If the ASIC has bus master DMA and that's used by the driver then fair
enough, otherwise if it is done by PIO in the driver "while using CF"
(as mentioned in most symptoms) becomes the same as saying "during 100%
load on CPU" which is what leads to dents in Vcore and potential
instability by that same Vcore path.
-Andy
Interesting, I get memory corruption leading to strange
behaviour. Sometimes echo 3 > /proc/sys/vm/drop_caches helps...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Are we sure about this one? If we have wrong voltages on various
parts, that kind-off explains it.
Would it be possible to measure (Voltmeter) difference between 2.4
kernel and 2.6 kernel?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > > > Good tip. It seems that nobody ported driver for the voltage control
> > > > chip ISL6271 from 2.4 kernel, and bootloader probably does not set
> > > > correct values.
>
> Are we sure about this one? If we have wrong voltages on various
> parts, that kind-off explains it.
>
> Would it be possible to measure (Voltmeter) difference between 2.4
> kernel and 2.6 kernel?
If you are ready to run Zaurus in dismantled state, then yes. Measure on
the upper pin of the large coil in the center of the
http://www.penguin.cz/~utx/zaurus/pcbt_uc.jpg image or on the testpoint
nearby (probably to the right).
Alternatively, it is possible to write a driver. It is just one byte
write and one byte read via I2C.
________________________________________________________________________
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
--
I'm not comfortable dismantling it :-(.
> Alternatively, it is possible to write a driver. It is just one byte
> write and one byte read via I2C.
Do you know what byte it is? That sounds easy enough...
But I have small problem now -- zaurus seems to work mostly fine
now. Does it depend on temperature, or what? Tried mtest,
nothing. Tried compiling kernel, ok... Will try few more times...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > Alternatively, it is possible to write a driver. It is just one byte
> > write and one byte read via I2C.
>
> Do you know what byte it is? That sounds easy enough...
Yes, it should be easy driver. One byte address and then one byte write
or one byte read.
See the datasheet:
http://www.penguin.cz/~utx/zaurus/datasheets/power/XScale/ISL6271A.pdf
Page 11: address
Pages 8 and 9: Data interpretation
--
Stanislav Brabec
http://www.penguin.cz/~utx/zaurus
--
This is not only case of spitz. I've seen LCD image falling apart on pxafb on
Voipac PXA270 board. The image was like "torn in half and part of it was moved
to right, the hole between staying white".
This happened exactly when I started doing a DMA transfer from a harddrive
attached through pata_pxa. It's perfectly replicable. If I disabled DMA and let
it run only in PIO, the image was fine.
I assume the corruption Pavel was seeing is related. My guess is the problems
are caused when DMA between the CPU and a companion chip happens. I dunno if the
DMA controller doesn't have enough power to supply LCD and the companion chip
with data, but that's one of my guesses.
btw. Adjusting the DMA descriptor length in pata_pxa didn't help.
Guys, we need to investigate this as it seems to cause problems on many places.
Cheers!
If there is a FIFO attached, may need the FIFO status when the error happens.
And a dump of the DMA registers would also be helpful.