I'm going to add an e2e test. So far manual testing is not working so need to debug further
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
@bsalamat commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
Shouldn't you invalidate predicates before moving pods to active queue?
@msau42 commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
The scenario I am trying to fix here would cause predicates to return an error. IIUC, predicate results are not cached when they return error, so we shouldn't need to invalidate them here.
@bsalamat commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
Predicate results are cached in both failure and success cases.
@msau42 commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
I think this is slightly different. IIUC, predicates can return 3 results: success, fail, error. If it returns error, then the result is not cached.
So in my scenario, the predicate would be returning an error.
@bsalamat commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
I see. Thanks. It looks fine, but could add some integration tests to ensure things work as expected?
@msau42 commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
Yes, i'm planning to write an e2e test for this, but maybe I can get it to work in an integration test instead.
Testing the predicate invalidation harder to do in integration. Do you guys already have any integration tests for predicate invalidation that I can follow? I was unable to find any.
@bsalamat commented on this pull request.
In pkg/scheduler/factory/factory.go:
> @@ -544,6 +552,54 @@ func (c *configFactory) invalidatePredicatesForPvcUpdate(old, new *v1.Persistent
c.equivalencePodCache.InvalidateCachedPredicateItemOfAllNodes(invalidPredicates)
}
+func (c *configFactory) onStorageClassAdd(obj interface{}) {
+ sc, ok := obj.(*storagev1.StorageClass)
+ if !ok {
+ glog.Errorf("cannot convert to *storagev1.StorageClass: %v", obj)
+ return
+ }
+
+ // Creating a StorageClass with late binding can cause previously errored volume predicates to pass
@msau42 equivalence cache is enabled by default in our integration tests if you use initTest to initialize the test environment. This is how most of scheduler's integration tests are initialized. In order to test invalidation, you should:
By doing this, you ensure that the eCache is invalidated properly.
@msau42: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Thanks @msau42, I am also tracking this PR.
Let me know if there's any common config once I add some tests. I probably wont have time to look at this for the next 2 weeks or so though.
Yes I will. And please feel free to go ahead as is if my refactoring is not in (I have some tests timeout issue there), it should not be a blocker :)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
This has been addressed as part of #65616
Closed #60006.