Reopened #50945.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
[MILESTONENOTIFIER] Milestone Labels Complete
Issue label settings:
This has started failing again on our GKE test suite https://k8s-testgrid.appspot.com/release-master-blocking#gke
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:69
attempting to delete a newly created flunders resource
Expected error:
<*errors.StatusError | 0xc4211b4990>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
Status: "Failure",
Message: "the server could not find the requested resource",
Reason: "NotFound",
Details: {
Name: "",
Group: "",
Kind: "",
UID: "",
Causes: [
{
Type: "UnexpectedServerResponse",
Message: "unknown",
Field: "",
},
],
RetryAfterSeconds: 0,
},
Code: 404,
},
}
the server could not find the requested resource
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:430
cc @kubernetes/sig-api-machinery-test-failures
/assign @cheftako
Seems to be flaking now with the error:
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:69
Sep 20 17:49:59.789: failed to get back the correct flunders list &{map[metadata:map[selfLink:/apis/wardle.k8s.io/v1alpha1/namespaces/sample-system/flunders resourceVersion:5] kind:FlunderList apiVersion:wardle.k8s.io/v1alpha1] []} from the dynamic client
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:481
@cheftako @ericchiang - we need to determine (today if possible) if this is truly release blocking. If so, please add the release-blocker label. And, if not, how do we best continue work on this for 1.8.x/1.9.0?
this test has two passes and four failures on the same commit
I'm seeing gke-specific authz grants in that test that are incorrect:
3b9485b#diff-c944d1288edcaf37beebab811603bfd8L164
That commit removed the wait for the authz grant to become effective (which can lead to flakes), and granted superuser permissions to all users, which is incorrect and invalidates any other authz-related tests run in parallel with this test
I can add the wait back.
Actually after staring at this test for about half an our I can't figure out what different users exist or what permissions they're being granted. ClientSet, InternalClientset, and AggregatorClient are all initialized from the same config so I don't see how one would be able to create an RBAC binding but another would fail later.
@cheftako any thoughts here?
I honestly think the gke specific BindClusterRole is a red-hearing. It is needed so the client has permission to perform one of the setup steps. (I think it was either to create the wardler cluster role or to bind that role to the anonymous user) Once that setup step is complete we no longer need the that cluster role bound and so I don't think its related.
I don't see how one would be able to create an RBAC binding but another would fail later.
The gke authorizer allows the “bind” verb, so the client can create a binding to the cluster-admin. It cannot create a role directly unless it has permissions via RBAC. Since we don’t have a way to determine the username associated with iclient, binding to all authenticated users is what was done as a workaround.
I agree that the point at which the tests are failing indicate that the previous authorization issues are not the cause.
/open
Reopened #50945.
/reopen
So a lot more information to work with now but the error is still occurring. I am still looking into this.
@cheftako any update on the investigation?
https://storage.googleapis.com/k8s-gubernator/triage/index.html?test=aggregator
Friendly v1.8 release team ping. This failure still seems to be happening, is this actively being worked? Does this need to be in the v1.8 milestone?