Comment #52 on issue 458878 by
cerne...@chromium.org: Chrome says no
I haven't updated this bug for a while, but in recent weeks some progress
has been made.
The first big clue arrived about a month ago, when matthewyuan supplied a
Pixel 1 that was in the bad state (BatteryAbsent true, no icon). We opened
the unit up while it was live, and cut wires/traces until we reached a
point where the issue was narrowed down to the EC. tek00124.png shows that
SCL was held low before and after each transaction, and the rise times
looked a bit unhealthy. Nothing but the EC was on the bus at this point.
The EC is manufactured by TI, so we approached TI to ask how this could
have happened. TI pointed us to an erratum involving the ESD protection
circuits on the 5v-tolerant pads (which include both I2C lines). The
symptoms are not an exact match for the observed behavior; in particular,
TI has only seen the latch-up above 85 degrees C, while most of the problem
reports we've seen have happened when the unit was cold. However, it may
still be a possibility.
Reviewing ~10 alt-shift-I feedback reports from link users, the general
pattern often looks like this:
1) Unit is on battery power, no AC.
2) Suspend the unit by closing the lid.
3) A few hours pass, and the OS wakes up for dark resume.
4) In dark resume, the system decides to shut down to save battery life.
5) Many hours pass. Presumably the EC enters hibernate mode.
6) User opens the lid the next morning (or a few days later), and sees a
missing battery icon on boot. AC has still not been connected.
7) Sometimes this fixes itself, but often it doesn't.
I have run a number of long-term stress tests involving EC hibernation and
EC resets, but none of them have reproduced the bug.
I have also tried many combinations of GPIO settings (open drain, pullup,
pulldown, drive strengths, etc.) to see if any software (mis)configuration
could produce a waveform similar to what we captured from the bad unit.
Nothing was found.
At this point it seems that our best option is to backport the
i2c_unwedge() code from ToT, and monitor the UMA stats to assess whether
that helps. Some of the reasons we decided to try this are:
a) The unwedge code is a known quantity; it has been deployed on all modern
Chromebooks and is unlikely to have side effects.
b) TI suspects that changing GPIO modes (e.g. AFSEL) and wiggling the pin
may be one way to repair the ESD latch-up condition.
c) It is possible that newer Chromebooks running the same LM4 EC do not
experience this problem because they do have i2c_unwedge() in their
firmware. i2c_unwedge() runs any time an I2C error is seen, so maybe this
condition is happening on the newer Chromebooks but it is getting repaired
automatically.
d) We have observed that some units recover on their own from the error
condition, and we don't have a good idea what causes this behavior. It is
possible that merely wiggling the affected lines is enough to return the
chip to normal behavior.
Some reasons why this might NOT be helpful:
e) If it works, we still won't know the root cause. That is not ideal.
f) The current unwedge code mostly gives up if SCL is stuck low. Maybe we
want to be more aggressive about driving it, in this case.
g) I backported the i2cwedge command and played with it for a while, but I
wasn't actually able to show that any peripheral on this bus can get
wedged. So i2c_unwedge() might not actually serve its primary purpose on
link. Some I2C peripherals can be convinced to hold e.g. SDA low forever,
by abruptly halting a transaction. SMBus peripherals have timeouts, and in
my testing, I wasn't able to get any of the chips on this I2C bus to wedge
the bus for more than ~40ms.
The other thing I observed in testing is that the initial call to
i2c_unwedge() inadvertently causes glitches on both SCL and SDA when the
GPIOs are being reconfigured. Subsequent calls did not cause glitches. If
SCL is stuck low, i2c_unwedge() doesn't actually try to wiggle any pins,
but maybe one of these glitches is accidentally fixing the latch-up
condition.
Attachments:
tek00124.png 36.0 KB