Aish Sundar

unread,

May 9, 2018, 5:12:40 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Flaky Tests

Consistency: 86%

Flakiest test

[sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator

Failures in TestGrid

Flakes in the past week: 29

Triage link

More nuanced logs and details can be found at https://storage.googleapis.com/k8s-gubernator/triage/index.html?sig=api-machinery&job=ci-kubernetes-e2e-gci-gce&test=Aggregator%20Should%20be%20able%20to%20support%20the%201.7%20Sample%20AP

Error

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:76
gave up waiting for apiservice wardle to come up successfully
Expected error:
<*errors.errorString | 0xc420131150>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:337

/priority failing-test
/priority important-soon
/kind flake
/sig-api-machinery
@kubernetes/sig-api-machinery-bugs
/assign @liggitt

@liggitt can you please help triage this issue?

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Daniel Smith

unread,

May 9, 2018, 5:19:14 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@deads2k has made changes that could have affected this recently. @liggitt
made one that'd be even more suspicious but I don't think it has merged yet
:) :)

On Wed, May 9, 2018 at 2:12 PM Aish Sundar <notifi...@github.com> wrote:

> Flaky Tests
>
> - pull-kubernetes-e2e-gce
> <https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/pull/59284/pull-kubernetes-e2e-gce/>
> - ci-kubernetes-e2e-gci-gce
> <https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/>
>
> Consistency: 86%
> <http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json> Flakiest

> test
>
> [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample
> API Server using the current Aggregator
> Failures in TestGrid
>

> - pull-kubernetes-e2e-gce
> <https://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#pull-kubernetes-e2e-gce&include-filter-by-regex=Aggregator%20Should%20be%20able%20to%20support%20th>
> - ci-kubernetes-e2e-gci-gce
> <https://k8s-testgrid.appspot.com/sig-api-machinery#gce>

>
> Flakes in the past week: 29

> <http://storage.googleapis.com/k8s-metrics/flakes-latest.json> Triage link

>
> More nuanced logs and details can be found at
> https://storage.googleapis.com/k8s-gubernator/triage/index.html?sig=api-machinery&job=ci-kubernetes-e2e-gci-gce&test=Aggregator%20Should%20be%20able%20to%20support%20the%201.7%20Sample%20AP
> Error
>
> /go/src/
> k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:76
> gave up waiting for apiservice wardle to come up successfully
> Expected error:
> <*errors.errorString | 0xc420131150>: {
> s: "timed out waiting for the condition",
> }
> timed out waiting for the condition
> not to have occurred
> /go/src/
> k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:337
>
> /priority failing-test
> /priority important-soon
> /kind flake
> /sig-api-machinery
> @kubernetes/sig-api-machinery-bugs

> <https://github.com/orgs/kubernetes/teams/sig-api-machinery-bugs>
> /assign @liggitt <https://github.com/liggitt>
>
> @liggitt <https://github.com/liggitt> can you please help triage this

> issue?
>
> —
> You are receiving this because you are on a team that was mentioned.
> Reply to this email directly, view it on GitHub

> <https://github.com/kubernetes/kubernetes/issues/63622>, or mute the
> thread
> <https://github.com/notifications/unsubscribe-auth/AAnglmmWy0fYV3WnoEeIKisCaxNha-Tuks5tw1vHgaJpZM4T5AS5>
> .

Jordan Liggitt

unread,

May 9, 2018, 5:23:29 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

flakes show steady 2-3/day over the past week: https://storage.googleapis.com/k8s-gubernator/triage/index.html?sig=api-machinery&job=ci-kubernetes-e2e-gci-gce&test=Aggregator%20Should%20be%20able%20to%20support%20the%201.7%20Sample%20AP

@lavalamp which changes are you referring to? the line that's failing isn't doing any discovery, restmapping, etc, at all... just a straight get to the API:

https://github.com/kubernetes/kubernetes/blob/ca92b73a659bf92c3051765a6ea5bc8c12069975/test/e2e/apimachinery/aggregator.go#L318-L337

Jordan Liggitt

unread,

May 9, 2018, 5:26:19 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

audit log from the test run shows the requests are returning 404s, not 503s

Daniel Smith

unread,

May 9, 2018, 5:26:31 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

ah, sorry, I misread the log and thought it was a discovery error. Hm.

On Wed, May 9, 2018 at 2:23 PM Jordan Liggitt <notifi...@github.com>
wrote:

> flakes show steady 2-3/day over the past week:
> https://storage.googleapis.com/k8s-gubernator/triage/index.html?sig=api-machinery&job=ci-kubernetes-e2e-gci-gce&test=Aggregator%20Should%20be%20able%20to%20support%20the%201.7%20Sample%20AP
>

> @lavalamp <https://github.com/lavalamp> which changes are you referring

> to? the line that's failing isn't doing any discovery, restmapping, etc, at
> all... just a straight get to the API:
>
>
> https://github.com/kubernetes/kubernetes/blob/ca92b73a659bf92c3051765a6ea5bc8c12069975/test/e2e/apimachinery/aggregator.go#L318-L337
>
> —

> You are receiving this because you were mentioned.

> Reply to this email directly, view it on GitHub

> <https://github.com/kubernetes/kubernetes/issues/63622#issuecomment-387879645>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAngljP9s-60DM21UvsoQwKLAAe83PG3ks5tw15PgaJpZM4T5AS5>
> .

Jordan Liggitt

unread,

May 9, 2018, 6:05:13 PM5/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

hmm, there's a second smaller set of failures that are discovery related:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:76
getting server preferred namespaces resources for dynamic client
Expected error:
    <*discovery.ErrGroupDiscoveryFailed | 0xc421caabe0>: {
        Groups: {
            {
    Group: mygroup.example.com,
    Version: v1beta1,
}: {
                ErrStatus: {
                    TypeMeta: {Kind: "", APIVersion: ""},
                    ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
                    Status: "Failure",
                    Message: "the server could not find the requested resource",
                    Reason: "NotFound",
                    Details: {
                        Name: "",
                        Group: "",
                        Kind: "",
                        UID: "",
                        Causes: [
                            {
                                Type: "UnexpectedServerResponse",
                                Message: "404 page not found",
                                Field: "",
                            },
                        ],
                        RetryAfterSeconds: 0,
                    },
                    Code: 404,
                },
            },
        },
    }
    unable to retrieve the complete list of server APIs: mygroup.example.com/v1beta1: the server could not find the requested resource
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:386

looks like those started flaking after 74b7cec which went in on 5/4

looks like some discovery paths had internal retries that could have masked those 404 errors before. since the error that test is encountering is for an unrelated CRD API that is being created/deleted as part of another test, it makes more sense to switch that check to just ensure the extension API group under test is discovered, rather than assert there are no errors looking for other API groups that we know are being dynamically added/removed

opened #63624 to deflake the aggregator e2e check

Jordan Liggitt

unread,

May 14, 2018, 12:50:31 AM5/14/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/assign @cheftako

Walter Fender

unread,

May 14, 2018, 8:26:42 PM5/14/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Kubelet log is reporting that the sample-apiserver exited with a return code of 2 and it also shows with a high restart count for the sample-apiserver. When I extract the sample-apiserver log I'm seeing multiple "panic: Get https://10.0.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.0.0.1:443: getsockopt: connection refused"
Both failures today are in ipvs clusters which seems to have extensive networking problems.
/assign @rramkumar1

Rohit Ramkumar

unread,

May 15, 2018, 12:32:26 AM5/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@cheftako We are looking into finding the root cause into the problems with the ipvs clusters. I am pretty confident they were triggered by a bad PR, just need to identify which one.

Rohit Ramkumar

unread,

May 15, 2018, 12:33:12 AM5/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Initial hypothesis is that PR #63585 caused it.

Rohit Ramkumar

unread,

May 15, 2018, 11:55:09 AM5/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We think #63840 should fix the IPVS issues.

Aish Sundar

unread,

May 16, 2018, 1:19:56 PM5/16/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #63622.

Aish Sundar

unread,

May 16, 2018, 1:20:01 PM5/16/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I see #63840 merged and this test is now passing in both pull-kubernetes-e2e-gce and
ci-kubernetes-e2e-gci-gce. Hence closing the issue. Thanks all

k8s-ci-robot

unread,

May 30, 2018, 1:54:14 PM5/30/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Reopened #63622.

Aish Sundar

unread,

May 30, 2018, 1:54:20 PM5/30/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@liggitt can you please take a look and triage? Thanks

Aish Sundar

unread,

May 30, 2018, 1:54:39 PM5/30/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/reopen

The test is failing for the same error again in the following jobs

Error

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/64170/pull-kubernetes-e2e-gce/38489#sig-api-machinery-aggregator-should-be-able-to-support-the-17-sample-api-server-using-the-current-aggregator

It is one of the top flake from last week failing 47 jobs.

Aish Sundar

unread,

May 31, 2018, 10:23:52 AM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@rramkumar1 @liggitt this flake is now blocking this test from being promoted to Conformance suite in 1.11. Is it something we can investigate and resolve soon (today or tomorrow) to increase changes of getting this into 1.11? If not this test has to wait until 1.12 to be promoted.

Jordan Liggitt

unread,

May 31, 2018, 10:52:05 AM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

all I see in the test run logs is 404 errors coming back from that check. @cheftako can you see anything useful in the container logs for the aggregated server?

Aish Sundar

unread,

May 31, 2018, 1:48:37 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/cc @jennybuckley @fedebongio

Jenny Buckley

unread,

May 31, 2018, 4:57:23 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Update:
I have been trying to reproduce this failure on my own test cluster (go run hack/e2e.go ...) and it has passed 128 times in a row so far.

Benjamin Elder

unread,

May 31, 2018, 5:14:33 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Looking through recent results this crops up as a flake in presubmit and CI across many configurations... I don't think how we run the test jobs should be affecting this in the slightest. In fact anything marked conformance really shouldn't be affected by job config (!) Not sure what's wrong here, but it looks like a pretty typical flakey test (some race / buggy setup/teardown?)

nb that go run hack/e2e.go ... may not trivially map to the same way the tests are run though, and e2e.go itself is a wrapper over kubetest

Aish Sundar

unread,

May 31, 2018, 5:17:11 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@jennybuckley does the API server logs throw any light on whats happening on CI test cluster.

Jenny Buckley

unread,

May 31, 2018, 5:36:46 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

The teardown for the aggregator test is wrong, it tries to delete the deployment "sample-apiserver" when the deployment is actually named "sample-apiserver-deployment", but I don't think that's the issue. It's been like that since the test was added. I'll look for conflicts with other tests.

Daniel Smith

unread,

May 31, 2018, 5:42:04 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We should probably go ahead and fix that anyway, it might be stomping on
another test...

On Thu, May 31, 2018 at 2:36 PM Jenny Buckley <notifi...@github.com>
wrote:

> The teardown for the aggregator test is wrong, it tries to delete the
> deployment "sample-apiserver" when the deployment is actually named
> "sample-apiserver-deployment", but I don't think that's the issue. It's
> been like that since the test was added. I'll look for conflicts with other
> tests.
>
> —

> You are receiving this because you were mentioned.

> Reply to this email directly, view it on GitHub

> <https://github.com/kubernetes/kubernetes/issues/63622#issuecomment-393688324>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAnglv6AJXIeAAYuo4JD2oqzbPwBIxIWks5t4GJtgaJpZM4T5AS5>
> .

Jenny Buckley

unread,

May 31, 2018, 7:10:32 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

The namespace controller polls each of the api groups periodically, and it gets this error from the /apis/wardle.k8s.io/v1alpha1 endpoint a couple of times. Maybe related?

E0531 15:23:19.095075       1 namespace_controller.go:148] unable to retrieve the
complete list of server APIs: wardle.k8s.io/v1alpha1: an error on the server
    ("Internal Server Error: \"/apis/wardle.k8s.io/v1alpha1?timeout=32s\":
    subjectaccessreviews.authorization.k8s.io is forbidden: User
    \"system:serviceaccount:e2e-tests-aggregator-97cps:default\" cannot create
    subjectaccessreviews.authorization.k8s.io at the cluster scope")
has prevented the request from succeeding

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/64337/pull-kubernetes-e2e-gce/38798/artifacts/e2e-38798-674b9-master/kube-controller-manager.log

Jordan Liggitt

unread,

May 31, 2018, 8:38:10 PM5/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

The namespace controller polls each of the api groups periodically, and it gets this internal error back from the /apis/wardle.k8s.io/v1alpha1 endpoint a couple of times. Maybe related?

unclear... I opened #64587 to capture the state of the APIService, extension server pods, and those pod logs in failures cases

Jenny Buckley

unread,

Jun 1, 2018, 2:46:01 PM6/1/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@liggitt
It flaked again since your PR merged
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/26059#sig-api-machinery-aggregator-should-be-able-to-support-the-17-sample-api-server-using-the-current-aggregator

Jenny Buckley

unread,

Jun 1, 2018, 2:48:16 PM6/1/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

All I'm seeing is

webhook.go:185] Failed to make webhook authorizer request: subjectaccessreviews.authorization.k8s.io is forbidden: User "system:serviceaccount:e2e-tests-aggregator-cfck8:default" cannot create subjectaccessreviews.authorization.k8s.io at the cluster scope

Jordan Liggitt

unread,

Jun 1, 2018, 11:19:15 PM6/1/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

All I'm seeing is

webhook.go:185] Failed to make webhook authorizer request:
subjectaccessreviews.authorization.k8s.io is forbidden:
User "system:serviceaccount:e2e-tests-aggregator-cfck8:default" cannot
create subjectaccessreviews.authorization.k8s.io at the cluster scope

I don't think that's related... I see 10-30 of those same rejections in apiserver logs of successful test runs (for example, search https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/64641/pull-kubernetes-e2e-gce/39140/artifacts/e2e-39140-674b9-master/kube-apiserver.log for 403 [[sample-apiserver/v0.0.0). I don't see anything in the test setup that grants the sample apiserver permission to run those checks, so it doesn't seem like that would be intermittent or flaky.

Kubernetes Submit Queue

unread,

Jun 12, 2018, 9:50:28 AM6/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #63622 via #64994.

Aish Sundar

unread,

Jun 20, 2018, 4:37:00 PM6/20/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I am re-opening this issue since I still see intermittent failure of this test

https://storage.googleapis.com/k8s-gubernator/triage/index.html?date=2018-06-15&sig=api-machinery&job=ci-kubernetes-e2e-gci-gce&test=Aggregator%20Should%20be%20able%20to%20support%20the%201.7%20Sample%20AP

@cheftako @liggitt are your pending PRs (#65239 and #64993) targeted towards these remaining issues?

/reopen

k8s-ci-robot

unread,

Jun 20, 2018, 4:37:00 PM6/20/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Reopened #63622.

Aish Sundar

unread,

Jun 25, 2018, 11:41:31 PM6/25/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@fedebongio @cheftako this looks like 1.7 API Server image is causing more flakes other than just this test - #64450 (comment)

The test has become lot less flaky after increasing the timeout.

Since the official policy for GKE is 2 previous versions, and 1.11 is almost baked, this is the plan discussed with @fedebongio @cheftako offline:

leave the 1.7 test as it is right now
create a similar test to the 1.7, but with 1.9 (implies creating a new sample-sever, uploading image, and creating the test).
get that new test PR merged and running
let it soak for several days, see if in 1.7 - 1.9 the flakiness (latency or whatever is causing it) went away.
if it did, evaluate removing the 1.7 and using the 1.9

@liggitt let us know what you think of this approach

Jordan Liggitt

unread,

Jun 26, 2018, 12:34:00 AM6/26/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

this looks like 1.7 API Server image is causing more flakes other than just this test - #64450 (comment)

I think you misunderstood my comment... that only related to kubectl being used against a 1.7-level kube-apiserver serving discovery docs for the v1 API, not a generic 1.7-level extension API server

Jordan Liggitt

unread,

Jun 26, 2018, 12:34:44 AM6/26/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

unrelatedly, moving the test to 1.9-level code once 1.11 is released seems reasonable to me

Federico Bongiovanni

unread,

Jun 26, 2018, 1:42:07 PM6/26/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Thank you Jordan, that's the current plan! We will actually create a parallel 1.9-level test, to see if between 1.7 and 1.9 the internals of API Server fixed what's generating the flakiness and see next steps from there.

Haowei Cai (Roy)

unread,

Jul 3, 2018, 7:14:39 PM7/3/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

The test seems to be green in the past two weeks. Wonder if it's fixed / mitigated?

pull-kubernetes-e2e-gce
ci-kubernetes-e2e-gci-gce

Mithra Rajah

unread,

Jul 23, 2018, 4:35:38 PM7/23/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Pinging to see what are the next steps. Who is taking point to create a parallel test?

Aish Sundar

unread,

Jul 23, 2018, 4:49:33 PM7/23/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

last I chated with @fedebongio someone on his team will build 1.9 API server image and push it to GCR. Once that image is available then @mgdevstack can help writing the parallel test. I am not sure if the image step is done.

Federico Bongiovanni

unread,

Jul 24, 2018, 2:12:14 AM7/24/18

to reply+01a0aaa558155df7fcd8d4667647fae0adb079b...@reply.github.com, kuber...@noreply.github.com, kubernetes-sig-a...@googlegroups.com, team_m...@noreply.github.com

It's on the works. Buiding the sample-image showed to be more complicated than what it seemed to be. If it becomes a blocker let me know.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-api-machinery-bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-mac...@googlegroups.com.
To post to this group, send email to kubernetes-sig-a...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery-bugs/kubernetes/kubernetes/issues/63622/407196016%40github.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Crickenberger

unread,

Jul 31, 2018, 3:55:11 PM7/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I'd like to know where we are on this, because the 1.7 thing has come up in the context of multi-arch image builds

Aaron Crickenberger

unread,

Jul 31, 2018, 3:55:53 PM7/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

#66686
and
#66741

Federico Bongiovanni

unread,

Jul 31, 2018, 6:53:11 PM7/31/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Hi Aaron and Aish, sorry for the delay. The process to compile, create and upload the sample API Server was not properly documented / maintained, and that plus more burning things is what made it take so long. We are trying to finally have it back (and documented by end of this week). Will keep you posted.

Federico Bongiovanni

unread,

Aug 6, 2018, 7:30:23 PM8/6/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Image created with all the refactor happened on that area, trying upgrading the test at present from 1.7 to 1.9

Federico Bongiovanni

unread,

Aug 10, 2018, 4:28:58 PM8/10/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Update: Upgraded the test, and now the test is not passing... working on this.

• Failure [38.067 seconds]
[sig-api-machinery] Aggregator
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/framework.go:22
Should be able to support the 1.9 Sample API Server using the current Aggregator [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:76

creating a new flunders resource
Expected error:
<*errors.StatusError | 0xc420c90cf0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
Status: "Failure",
Message: "flunders.wardle.k8s.io "rest-flunder-573651905" is forbidden: not yet ready to handle request",
Reason: "Forbidden",
Details: {
Name: "rest-flunder-573651905",
Group: "wardle.k8s.io",
Kind: "flunders",
UID: "",
Causes: nil,
RetryAfterSeconds: 0,
},
Code: 403,
},
}
flunders.wardle.k8s.io "rest-flunder-573651905" is forbidden: not yet ready to handle request
not to have occurred

Jordan Liggitt

unread,

Aug 10, 2018, 4:32:05 PM8/10/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

not sure if it's related, but I don't think the test currently gives sufficient permissions to the aggregated server to perform delegated authn/authz checks. we might want to look at merging #64993

Federico Bongiovanni

unread,

Aug 17, 2018, 8:12:58 PM8/17/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@liggitt tried your fix, same result. The image of the sample-apiserver 1.9 based that I've created is uploaded to gcr.io. If someone want's to give it a try, simply by upgrading 1.0 to 1.1 here: https://github.com/kubernetes/kubernetes/blob/master/test/utils/image/manifest.go#L51

Aaron Crickenberger

unread,

Aug 20, 2018, 1:51:37 PM8/20/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/assign @mgdevstack
Can you try to figure out why the test is failing w/ the new image?

Federico Bongiovanni

unread,

Aug 23, 2018, 6:31:35 PM8/23/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/cc @yliaog

Aaron Crickenberger

unread,

Oct 4, 2018, 7:39:42 PM10/4/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Now waiting on #69239

Aish Sundar

unread,

Oct 12, 2018, 1:16:06 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Is there an ETA on #69239? This test is flaking quite a bit offlate - https://k8s-testgrid.appspot.com/sig-release-master-blocking#gci-gke

Yu Liao

unread,

Oct 12, 2018, 2:38:26 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

#68300 upgrades to 1.10. however it requires more rbac permissions than before which is not fully understood. We will either resolve the permission issue or leave it for later investigation. The upgrade to 1.10 should be merged in a couple of days, if not today.

Jordan Liggitt

unread,

Oct 12, 2018, 2:54:15 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

#68300 upgrades to 1.10. however it requires more rbac permissions than before which is not fully understood.

what issue are you seeing? is #64993 relevant?

Yu Liao

unread,

Oct 12, 2018, 3:04:01 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Yes, it is relevant. It does not fix the issue by itself though.

Federico Bongiovanni

unread,

Oct 12, 2018, 6:34:48 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

As for #69239 should be ready to be merged.

k8s-ci-robot

unread,

Oct 12, 2018, 11:44:36 PM10/12/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #63622 via #69239.

Aaron Crickenberger

unread,

Oct 13, 2018, 2:56:28 PM10/13/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/reopen
I would like some manual confirmation this has been addressed

k8s-ci-robot

unread,

Oct 13, 2018, 2:57:20 PM10/13/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Reopened #63622.

k8s-ci-robot

unread,

Oct 13, 2018, 2:57:57 PM10/13/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@spiffxp: Reopening this issue.

In response to this:

/reopen
I would like some manual confirmation this has been addressed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Yu Liao

unread,

Oct 15, 2018, 12:48:01 PM10/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

after some in-depth investigations, it turns out the 1.10 sample-apiserver would need the following rbac permissions (in addition to the system:auth-delegator role). I'm updating the PR to add these.

=====================================================
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: sample-apiserver
rules:

apiGroups:
- admissionregistration.k8s.io
  resources:
- '*'
  verbs:
- list
apiGroups:
- ''
  resources:
- namespaces
  verbs:
- list

Josh Berkus

unread,

Oct 15, 2018, 1:25:24 PM10/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

This test is failing again.

Can someone explain why we're testing against a 1.7 API server? 1.7 has been unsupported for a year, and we don't promise backwards compatibility back that far AFAIK. Shouldn't we be testing against a 1.9 or 1.10 API server?

Federico Bongiovanni

unread,

Oct 15, 2018, 1:31:52 PM10/15/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Hi Josh, TL;DR: we are upgrading to 1.10 and that's what people is talking about in the previous comments.

k8s-ci-robot

unread,

Oct 16, 2018, 7:25:09 PM10/16/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #63622 via #68300.

Aish Sundar

unread,

Oct 18, 2018, 12:16:37 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/reopen

I would like to see the test pass with the #68300 fix before closing this issue out. thanks

k8s-ci-robot

unread,

Oct 18, 2018, 12:16:54 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@AishSundar: Reopening this issue.

In response to this:

/reopen

I would like to see the test pass with the #68300 fix before closing this issue out. thanks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

—

k8s-ci-robot

unread,

Oct 18, 2018, 12:17:25 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Reopened #63622.

Federico Bongiovanni

unread,

Oct 18, 2018, 1:42:17 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Hi @AishSundar, please notice the test was renamed to: [sig-api-machinery] Aggregator Should be able to support the 1.10 Sample API Server using the current Aggregator, and has been passing since merged?
https://k8s-testgrid.appspot.com/sig-api-machinery-gce-gke#gce

Aish Sundar

unread,

Oct 18, 2018, 8:59:51 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Ah cool thanks @fedebongio that explains it :) I was looking at this dashboard in release-master-blocking and looks like it hasn't picked by the latest test since its 10/13 run.

Closing this issue now based on the api-machinery dashboard now. Will reopen if the release job doesnt turn green the next run. Thanks much.

Aish Sundar

unread,

Oct 18, 2018, 9:01:52 AM10/18/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #63622.

Maria Ntalla

unread,

Oct 19, 2018, 8:11:33 PM10/19/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@AishSundar @fedebongio Is it worth removing the copy of this test that lives in sig-release-master-blocking#gce-master-scale-correctness? Or is it still a release-blocking test?

Aish Sundar

unread,

Oct 20, 2018, 1:25:29 PM10/20/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@mariantalla I think the test will automatically pick the new version of the test in its next run (which should be sometime today, 10/20?). As to whether the test should be in the scale-correctness job , it is a ques for @fedebongio and @wojtek-t

Reply all

Reply to author

Forward

[kubernetes/kubernetes] Flaky Test : [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator fails on "pull-kubernetes-e2e-gce" and "ci-kubernetes-e2e-gci-gce" (#63622)

Aish Sundar

Flaky Tests

Consistency: 86%

Flakiest test

Failures in TestGrid

Flakes in the past week: 29

Triage link

Error

Daniel Smith

Jordan Liggitt

Jordan Liggitt

Daniel Smith

Jordan Liggitt

Jordan Liggitt

Walter Fender

Rohit Ramkumar

Rohit Ramkumar

Rohit Ramkumar

Aish Sundar

Aish Sundar

k8s-ci-robot

Aish Sundar

Aish Sundar

Error

Aish Sundar

Jordan Liggitt

Aish Sundar

Jenny Buckley

Benjamin Elder

Aish Sundar

Jenny Buckley

Daniel Smith

Jenny Buckley

Jordan Liggitt

Jenny Buckley

Jenny Buckley

Jordan Liggitt

Kubernetes Submit Queue

Aish Sundar

k8s-ci-robot

Aish Sundar

Jordan Liggitt

Jordan Liggitt

Federico Bongiovanni

Haowei Cai (Roy)

Mithra Rajah

Aish Sundar

Federico Bongiovanni

Aaron Crickenberger

Aaron Crickenberger

Federico Bongiovanni

Federico Bongiovanni

Federico Bongiovanni

Jordan Liggitt

Federico Bongiovanni

Aaron Crickenberger

Federico Bongiovanni

Aaron Crickenberger

Aish Sundar

Yu Liao

Jordan Liggitt

Yu Liao

Federico Bongiovanni

k8s-ci-robot

Aaron Crickenberger

k8s-ci-robot

k8s-ci-robot

Yu Liao

Josh Berkus

Federico Bongiovanni

k8s-ci-robot

Aish Sundar

k8s-ci-robot

k8s-ci-robot

Federico Bongiovanni

Aish Sundar

Aish Sundar

Maria Ntalla

Aish Sundar