Upgrade issue from okd 4.7 to 4.8

已查看 48 次
跳至第一个未读帖子

Bala C

未读,
2022年10月3日 17:52:482022/10/3
收件人 okd-wg
Upgraded from 4.7.0-0.okd-2021-08-07-063045 to 4.8.0-0.okd-2021-10-24-061736, which breaks one worker node which is booting fine  and no error
messsages or any clue in the boot logs or journalctl files. The systemctl commands as well getting stuck and not getting any result.

Rebooted multiple times still machineconfig not able to update this node and bring back to the cluster.
Any idea about this situation to proceed further ? Thanks


```
$ ssh dw-server3
Fedora CoreOS 34

Failed to list units: Connection timed out
[core@dw-server3 ~]$ sudo su -

Failed to list units: Transport endpoint is not connected
[root@dw-server3 ~]#
```

[root@dw-server3 ~]# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
[root@dw-server3 ~]# 

Bala C

未读,
2022年10月5日 09:09:002022/10/5
收件人 okd-wg

systemd is failing to start services and that there is no way to rollback without it, since even with the --peer option it needs the polkit service :
```
$ ssh -tt FAILING_NODE rpm-ostree rollback --peer --reboot Authorization not available. Check if polkit service is running or see debug message for more information. error: Error calling StartServiceByName for org.projectatomic.rpmostree1: Timeout was reached
```
已删除帖子

Bala C

未读,
2022年10月5日 16:14:022022/10/5
收件人 okd-wg
Actually it seems to be a hardware failure in the end and we replaced with new hardware now issue got resolved. But how we can monitor hardware failures in fedora-coreos? There was no message related to hardware but its hardware failure, no clue in the console as well.
回复全部
回复作者
转发
0 个新帖子