Tl;dr - Two issues below:
1 - etcd v3.5.[0-5] data inconsistency issue for a case when etcd crashes during processing defragmentation operation
2 - etcd v3.4.[20-21] and v3.5.5 data inconsistency issue for a case when auth is enabled and a new member added to the cluster
After recently discovered consistency problems in etcd-3.5, etcd maintainers are investing in extensive testing of data consistency in different etcd crash modes. As part of the process we discovered following issues:
Issue 1: etcd v3.5.[0-5] data inconsistency issue for a case when etcd crashes during processing defragmentation operation
If etcd crashes during an online defragmentation operation, when the etcd instance starts again, it might reapply some entries which have already been applied. This might result in the member's data becoming inconsistent with the other members.
This issue does not occur when performing the defragmentation operation offline using etcdutl.
Usually there is no data loss, and clients can always get the latest correct data. The only issue is the problematic etcd member’s revision might be a little larger than the other members. However, if etcd reapplies some conditional transactions, the issue might cause data inconsistency. Please get more detailed information from the discussion in the PR 14685 (fixed in #14730).
The affected versions are 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4 and 3.5.5. The issue was resolved in 3.5.6.
Recommendations:
Issue 2: etcd v3.4.[20-21] and v3.5.5 data inconsistency issue for a case when auth is enabled and a new member added to the cluster
This issue only affects etcd clusters where auth is enabled.
Recent issue 14571 surfaced a data inconsistency issue for a specific case as detailed in this note. When the auth is enabled, newly added members might fail to apply data due to permission denied, and eventually become data inconsistent.
In this situation, clients (e.g. etcdctl) connected to the problematic member (the new added member) will fail to read or write any data due to permission denied. Restarting the new etcd member can resolve the permission failures, but afterwards clients might get stale data from the problematic member.
Please get more detailed information from the discussion in the issue 14571.
The affected versions are 3.4.20, 3.4.21 and 3.5.5. The issue was resolved in 3.4.22 and 3.5.6.
Recommendations:
Thanks veshij@ who reported and resolved this issue.
Thanks,
etcd-maintainers
--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion on the web visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/SN6PR05MB56805A9E04FD3A1648D926E1BC0A9%40SN6PR05MB5680.namprd05.prod.outlook.com.