i2c silent hang over time

99 views
Skip to first unread message

Ching-Cheng Cheng

unread,
Feb 12, 2015, 11:39:34 AM2/12/15
to bcm...@googlegroups.com
There is another problem does concern me.

When I run bcm2835 i2c to communicate with mlx90615, there will be some silent hang after hours or days reading temp from chip. If the hang is some error like communication failure, I can use exception to handle it. But it is a silent hang, the program will stop there.
Here is my code, is there any wrong usage? This happen on both pi1 and pi2.
Thanks for this wonderful library~

"""
//fordit:  gcc MLXi2c.c -o i2c -l bcm2835
#include <stdio.h>
#include <bcm2835.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include<time.h>
#define AVG 1   //averaging samples
#define LOGTIME 10  //loging period
int main(int argc, char **argv)
{
    unsigned char buf[6];
    unsigned char i,reg;
    double temp=0,calc=0, skytemp,atemp;
    bcm2835_init();
    bcm2835_i2c_begin();
    bcm2835_i2c_set_baudrate(100000);   
    bcm2835_i2c_setSlaveAddress(0x5a);
    printf("\nOk, your device is working!!\n");
    bcm2835_i2c_begin();

    while (1){
    calc=0;
    reg=7;
    //bcm2835_i2c_begin();
    bcm2835_i2c_write (&reg, 1);
    bcm2835_i2c_read_register_rs(&reg,&buf[0],3);
    temp = (double) (((buf[1]) << 8) + buf[0]);
    temp = (temp * 0.02)-0.01;
    temp = temp - 273.15;
    calc+=temp;
   
    printf("%f\n",calc);
    }
}


"""

Olly Funkster

unread,
Apr 12, 2015, 8:54:09 AM4/12/15
to bcm...@googlegroups.com
Allo,

I've just run into this (or something similar anyway). bcm2835_i2c_read_register_rs appears not to elegantly handle a noisy I2C bus, which we must assume will happen in the real world. I've been running it in the debugger and found that the number of remaining bytes has reached zero, but BCM2835_BSC_S_RXD is set and BCM2835_BSC_S_DONE is clear. I haven't captured the event that causes this on the 'scope (yet) but I assume there's an extra clock edge or similar that makes the bcm2835 think there's an extra data byte that we haven't asked for, but yet still needs to be read. The error bit is not set.

The way the loop is written, there is no way for it to detect this and return an error. This is how it is now (v1.44, line 1003)


    /* wait for transfer to complete */
    while (!(bcm2835_peri_read_nb(status) & BCM2835_BSC_S_DONE))
    {
        /* we must empty the FIFO as it is populated and not use any delay */
        while (remaining && (bcm2835_peri_read_nb(status) & BCM2835_BSC_S_RXD))
        {
            /* Read from FIFO, no barrier */
            buf[i] = bcm2835_peri_read_nb(fifo);
            i++;
            remaining--;
        }
    }



As a fix I suggest this:

    /* wait for transfer to complete */
    while (!(bcm2835_peri_read_nb(status) & BCM2835_BSC_S_DONE))
    {
        /* we must empty the FIFO as it is populated and not use any delay */
        while (bcm2835_peri_read_nb(status) & BCM2835_BSC_S_RXD)
        {
            if (remaining)
            {
                /* Read from FIFO, no barrier */
                buf[i] = bcm2835_peri_read_nb(fifo);
                i++;
                remaining--;
            }
            else
            {
                /* we have received more data than we asked for (noise on bus?), */
                /* assume all the data is bad as we do not know where the extra */
                /* clocks occurred! */
               
                /* do the read anyway to clear BCM2835_BSC_S_RXD and thus set BCM2835_BSC_S_DONE */
                bcm2835_peri_read_nb(fifo);
               
                /* return an error code (maybe make a new one?) so we know not to trust buf[] */
                reason = BCM2835_I2C_REASON_ERROR_DATA;
            }
        }
    }

Olly Funkster

unread,
Apr 12, 2015, 9:03:09 AM4/12/15
to bcm...@googlegroups.com
Sorry, that was a lazy paste of something I hadn't tested.

Try this:


    /* wait for transfer to complete */
    while (!(bcm2835_peri_read_nb(status) & BCM2835_BSC_S_DONE))
    {
        /* we must empty the FIFO as it is populated and not use any delay */
        while (bcm2835_peri_read_nb(status) & BCM2835_BSC_S_RXD)
        {
            if (remaining)
            {
                /* Read from FIFO, no barrier */
                buf[i] = bcm2835_peri_read_nb(fifo);
                i++;
                remaining--;
            }
            else if (bcm2835_peri_read_nb(status) & BCM2835_BSC_S_RXD)

            {
                /* we have received more data than we asked for (noise on bus?), */
                /* assume all the data is bad as we do not know where the extra */
                /* clocks occurred! */
               
                /* do the read anyway to clear BCM2835_BSC_S_RXD and thus set BCM2835_BSC_S_DONE */
                bcm2835_peri_read_nb(fifo);
               
                /* return an error code (maybe make a new one?) so we know not to trust buf[] */
                reason = BCM2835_I2C_REASON_ERROR_DATA;
            }
            else
            {
                break;
            }
        }
    }

Olly Funkster

unread,
Apr 12, 2015, 2:02:23 PM4/12/15
to bcm...@googlegroups.com
Hmm. The code above cleared the fault a few times, but I'm still ending up stuck in this loop with BCM2835_BSC_S_RXD set (status word reads 0xF00000A9) even though it's continually reading from the fifo (I'm hoping I'm right that the call to bcm2835_peri_read_nb(fifo) will not get optimised out even though I'm ignoring its return value - when running in the debugger, stepping into that line does step into the function, and it returns the dereferenced value of the pointer to paddr so that should do the read. Right?).

After manually breaking the loop in the debugger, subsequent I2C reads work fine so the hardware does not need a re-init, I guess I'll just have to add a timeout to this function somehow.

Olly Funkster

unread,
Apr 15, 2015, 2:58:45 PM4/15/15
to bcm...@googlegroups.com
Further hmm: having added a timeout to break the loop when this second mode of failure occurs, it appears that once that has happened, the function never returns more than one byte (I'm doing a two-byte read).

Quitting and restarting the app that's making the calls gets the second byte going again, so it looks like a re-init is required after all.

Olly Funkster

unread,
Apr 16, 2015, 2:11:03 PM4/16/15
to bcm...@googlegroups.com
Okay, that last post wasn't true... I'm reading from a chip with a configurable word length, and that setting was getting corrupted. The bus *is* returning the second byte, there was just no data in it - so the slave chip is probably seeing a register write when the bus error happens.
Reply all
Reply to author
Forward
0 new messages