On Sat, 20 Mar 2021 18:02:53 +0000, Markus Robert Kessler wrote:
>
> I only use brands like Sandisk, Samsung EVO and similar.
>
Good.
> It makes me cry to see that the card is totally ok,
>
> # badblocks -vvv /dev/mmcblk0 Checking blocks 0 to 15446015 Checking for
> bad blocks (read-only test):
> done Pass completed, 0 bad blocks found. (0/0/0 errors)
>
Good.
> and this seems to be one more ext4-issue.
>
Not necessarily - see below.
Besides, IME ext3 and ext4 are very reliable filing systems - I've never
had any problems with either, even when retrieving the /home directory
structure from a hard drive that was failing due to old age (50,000
hours).
> In the meantime the filesystem was going more and more corrupted after I
> tried to perform fsck.ext4.
> Finally, there were errors in /home, and even /var was empty (!)...
>
> So, the last thing I tried was to switch to NFS- or NAS-boot but I had
> to see that the total storage space at that location was by far not
> sufficient. Even worse, init didn't work either.
>
> Since even /lib was more and more messed up, not even a shutdown / halt
> /
> poweroff etc. was possible. So, I kicked it out in the firewall to
> prevent it from doing unpredictable things after sshd also crashed and I
> lost the connection.
>
> So, end of the line here. Oh man...
The reason that I suggested running "fsck -A -s" is because:
- the -A option tells fsck to check every partition in /etc/fstab, using
the appropriate filing system checker, as specified in /etc/fstab, for
each partition
- the -s option tells it to check one partition at a time. The default
is to check them all at once, but this does mean that the error
messages will be jumbled together if more than one partition has errors.
So, what exactly did you run? Just fsck.ext4? If so, with what options?
If you let fsck.ext4 loose on both partitions of course it would throw
errors because the boot partition is VFAT, not EXT4, but if you *DID NOT*
let fsck.ext4 make any changes, then the filing system should not have
been damaged (any more than it was already).
Did you make a backup copy, as I also suggested, before running fsck.ext4?
If not, and the errors are due to telling fsck.ext4 to scan a VFAT
partition, then your larger EXT4 partition may well be salvageable, but
we can't tell you how unless we know what other computers you have and
what operating systems they run.
For example everything here, apart from my RPi, runs on X86 chips and has
Fedora Linux installed, so I could transfer only stuff I've written on my
RPi to another SD card by:
- using gparted to make a same sized pair of VFAT and EXT4 partitions on
a new SD card
- make two tar archives, one containing everything in /home and the
other containing everything in /usr/local, saving them on a Fedora box.
- set up a clean copy of Raspbian Buster on the new card
- unpack the contents of the /home and /usr/local tar archives
over the new Debian install
This would put me pretty much back in business with all my own code and
data reinstalled in an upto date Debian Buster system. And, I've done
this several times already as the SD card had grown form 4GB -> 8GB ->
16GB and Raspbian has successively outgrown the previous cards.
Finally, back when you wrote your monitoring system, did you make a
backup copy of the source code, binaries, shell scripts etc. on another
machine before putting your RPi system into everyday use? Ans, I have to
add: if not, why not?