We run thousands of ProwJobs per day both in Openshift and Kubernetes and it would be great to be able to filter PJs in the cli based on arbitrary ProwJob spec fields. Namely, pj.spec.type to filter based on the type of the job (presubmit, postsubmit, periodic) or pj.spec.job to filter based on the job name. Filtering is already supported in the dashboard but there is no way to dump all the results locally. Adding support on the cli would improve debugging a lot.
@kubernetes/sig-api-machinery-feature-requests @kubernetes/sig-cli-feature-requests @enisoc @ncdc
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Depends on #50140
cc @jpbetz
This would also be useful for our controllers. Today we use one controller per supported pj.spec.agent
field (one for kubernetes pods and one for jenkins jobs) and we are handling the filtering inside the controller which is not ideal.
See also #1362
This needs a proposal.
/area custom-resources
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
/remove-lifecycle stale
Do we support arbitrary field selectors for CRD now?
No, still just namespace and name
I'm trying to fire several similar jobs simultaneously and watch the status of them. After all jobs are completed, I can follow up with next step. It would be useful if fields selector support arbitrary fields.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
/remove-lifecycle stale
any plans of making this possible? I have a infra inventory controller and CRDs and I have to duplicate some data into labels to be able to find resources based on specific criteria.
Same above, we have thousands of the CRD instances, and would love to be able to select them using field selectors. We also for now are planning to duplicate all that data in the labels, which is a real pain.
Please let me know if I can help in any way to make progress here.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
/remove-lifecycle rotten
/remove-lifecycle stale
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
/lifecycle frozen
We don't currently have plans, and it should work for all resources, not just CRDs. But this shouldn't be auto-closed.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
We also have need for this feature/issue and are going to use an inefficient workaround for now. Is there any movement on this feature/issue?
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
I too ran into this problem when trying to use field selectors on CRDs, CRDs are everywhere now and this would be really nice to have.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
This would be a major undertaking to add so we're going to need a really long list of people wanting it to prioritize this :)
This would be a major undertaking to add so we're going to need a really long list of people wanting it to prioritize this :)
+ 1 person in the list.
+1 for this proposal, it would be extremely helpful
+1 for this
+1 Istio would also like to make use of CRD field selectors.
+1 for this
+1
+1
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
+1, so we can avoid filtering by Labels
That would be a great and welcome addition. Currently, our controller updates the CR.Status, it will take an extra Update call to update the Labels map as well.
Field selectors are a critical part of the pattern for efficiently keeping an object around while it is in use (https://www.ibm.com/cloud/blog/preserve-kubernetes-api-objects-while-they-are-in-use?mhsrc=ibmsearch_a&mhq=kubernetes%20api%20object), and so should be available in CRs defined by CRDs.
+1
I want to my crd support custom filters
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1, is there a proposal?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1 for this feature
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
+1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I'm currently considering duplicating status fields into labels 🤢
How much more priority/awaiting-more-evidence
is needed?! 😆
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
This is not regarding arbitrary field selectors, but in general Label selectors - Filter CRD objects at scale sucks at 1000+ CR instances and stops working when you have to make selection out of 3K CR instances.
(With ETCD deployed on a SSD)
I had to implement my own filtering solution based on custom Caching and pagination.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
This is on our radar but I don't think we have a plan yet. We might use CEL as an expression language. But that doesn't change the fact that queries like this are extremely expensive for apiserver (which has to scan every object) and we're reluctant to encourage more of them, at least not until APF is a little further along.
Caching & indexing locally (like all the built-in controllers) is the way to go at the moment.
@jpbetz fyi
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
The message bus people worked out decades ago how to efficiently handle content-based subscriptions.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
http://shenh.people.clemson.edu/publishedPaper/bookChapter/2009/sub-pub-Shen.pdf should have some good pointers. I know that the Gryphon system had a good solution.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I drafted a potential solution to this using CEL (which we are already using for CRD validation): Draft KEP for field selection using CEL
Is there community interest in this solution? Feedback on the Draft KEP, use cases that would be solved and interest in contributing all would be valuable here.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I think we need to include @Abhishek-Srivastava's concern. Adding use of CEL will not address that.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
CEL addresses the expression language blocker. Addressing scale is a different concern.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
True. But both are important. I am concerned about addressing one in a way that is particularly inimical to the other.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
For efficiently handling a lot of subscriptions, I learned 20 years ago about https://dl.acm.org/doi/pdf/10.1145/301308.301326 . Here is the abstract:
Content-based subscription systems are an emerging alternative to traditional publish-subscribe systems, because they permit more flexible subscriptions along multiple dimensions. In these systems, each subscription is a predicate which may test arbitrary attributes within an event. However, the matching problem for content-based systems - determining for each event the subset of all subscriptions whose predicates match the event - is still an open problem. We present an efficient, scalable solution to the matching problem. Our solution has an expected time complexity that is sub-linear in the number of subscriptions, and it has a space complexity that is linear. Specifically, we prove that for predicates reducible to conjunctions of elementary tests, the expected time to match a random event is no greater than
O(N^(1-lambda))
where N is the number of subscriptions, andlambda
is a closed-form expression that depends on the number and type of attributes (in some cases, lambda is about l/2). We present some optimizations to our algorithms that improve the search time. We also present the results of simulations that validate the theoretical bounds and that show acceptable performance levels for tens of thousands of subscriptions.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
CRDs mostly fail to scale as well as built-ins because json is not very efficient to decode. IMO we aren't advanced enough to have that problem yet.
We can prototype this outside of apiserver. I'm much more worried about getting the API right than I am about efficiently implementing.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
(And I don't see why CEL would be harder to scale than any other predicate system?)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Our solution has an expected time complexity that is sub-linear in the number of subscriptions, and it has a space complexity that is linear. Specifically, we prove that for predicates reducible to conjunctions of elementary tests, the expected time to match a random event is no greater than
O(N^(1-lambda))
where N is the number of subscriptions, andlambda
is a closed-form expression that depends on the number and type of attributes (in some cases, lambda is about l/2).
This is where CEL can really shine. The ability to compute the cost of an expressions both statically and at runtime means that we bound the costs in very similar ways as what is described above. E.g. we could disallow any CEL expressions that have a worst case O(n) or worse runtime cost, which would result in us being able to bound matching of a random event to O(N)
where N is the number of subscriptions. All the "cost" work to do this was completed in 1.24 for CRD validation, so it would be trivial for us to leverage. We would just need to pick what our cost limits are for field selection, which would likely be more strict that for validation.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
If the cost of CEL evaluation for one (object, subscription) pair is O(1) then that is already worse than the cited result of O(N^(1-lambda))
for N subscriptions.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
This is probably the wrong place to go so far into the weeds on this, but I'd expect we'd use the CEL cost to inform APF cost. I actually don't care if it is slow and takes a long time, as long as apiserver doesn't fall over and other users aren't impacted.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
If the CEL-based solution evaluates each (object, subscription) pair independently and the cost of CEL evaluation for one (object, subscription) pair is O(1) then that is already worse than the cited result of
O(N^(1-lambda))
for N subscriptions.
Yes, I realized after posting that this isn't quite the sub-linear that the above quote mentions, but we might able to get that optimization as well long term, if we wanted to invest in it, since we would inspect CEL expressions and find common sub-expressions).
Main point is we have a lot of control. I don't think we paint ourselves into a corner.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I think that the question of whether we have something that works well for large numbers of objects and subscriptions is not a matter of weeds, it is a high-level objective that I care about.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Today we solve this by implementing local caches in controllers. I don't see apiserver needing to compete with that. I don't particularly want apiserver to become more of a database engine than it already is. The value-add apiserver has is the api mechanisms and models. Rudimentary search is useful for one-offs and prototyping, or I'd be completely against adding this to apiserver.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I am not sure I understand the objection. Today we already have the apiserver doing label selection, as well as field selection for types defined in the server, and --- as is well testified in this issue --- people using CRDs are abusing labels to hold spec and/or status and doing label selection at the server.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
and --- as is well testified in this issue --- people using CRDs are abusing labels to hold spec and/or status and doing label selection at the server.
That is a correct statement but biased, because the ppl that already use client-side indexing on the cache are not going to comment in this issue.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Not related to this topic directly but Helm on supporting CRD says this -
"Originally, CRDs were designed for a very specific purpose: To make it possible to add new controllers to Kubernetes. For example, if Deployment does not do what you want, you can write an alternative resource definition backed by a custom controller. While this pattern has definitely been useful, some see CRDs as a generic data modeling tool. We have seen many cases where CRDs were used to store configuration, model non-Kubernetes objects, or even serve as a data storage layer consumed by applications. While we do not think this was the spirit in which CRDs were intended to be used, we acknowledge that they are used in this way, and therefore it is Helm's responsibility to not foreclose such usage patterns."
We are exactly doing the above - Modeling non-k8s resources to CRDs and so far it has been doing wonders for us - Free APIs, RBACs, periodic reconciling, CLI and much more. It is primary the scale, size of the CRDs and searching/indexing are what have started creating issues now.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Helm docs are wrong. CRDs were not specifically about enabling controllers (put another way, everything is a controller if you squint).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
and --- as is well testified in this issue --- people using CRDs are abusing labels to hold spec and/or status and doing label selection at the server.
That is a correct statement but biased, because the ppl that already use client-side indexing on the cache are not going to comment in this issue.
@alvaroaleman could you elaborate how client-side indexing helps? We use client-side indexing in KubeVirt and in addition we abuse labels to implement something like a node-field which we can index on, so that our node daemon only sees the objects it needs, for scale reasons. Maybe I miss how the client-side indexing can help with daemonset-scale demands?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Client side indexing means you can run as many queries as you want without increasing load on the k8s control plane. Listing all objects with some filter is nearly as expensive as listing all objects without the filter from apiserver's & etcd's perspective, saving only network bandwidth. This is especially good for the controller case, where typically it needs every object eventually anyway.
Pods assigned to a given node is an exception which apiserver optimizes specifically for the daemonset case, you can do an efficient query for that.
I agree that you should not have a DS on every node list/watch entire collections, that makes your cluster do N^2 work. And one of the reasons we don't provide arbitrary filtering today is because it would be less obvious that if you do this it will blow up your control plane once the clusters start getting bigger.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Client side indexing means you can run as many queries as you want without increasing load on the k8s control plane. Listing all objects with some filter is nearly as expensive as listing all objects without the filter from apiserver's & etcd's perspective, saving only network bandwidth. This is especially good for the controller case, where typically it needs every object eventually anyway.
Thanks. Makes sense. I had my head too deep down in the daemonset-case, and did not realize that this issue is mostly concerned about the normal controller pattern.
Pods assigned to a given node is an exception which apiserver optimizes specifically for the daemonset case, you can do an efficient query for that.
I agree that you should not have a DS on every node list/watch entire collections, that makes your cluster do N^2 work. And one of the reasons we don't provide arbitrary filtering today is because it would be less obvious that if you do this it will blow up your control plane once the clusters start getting bigger.
Yes, I am not sure how clear that for people is though. I encountered a lot of cases (different projects, different companies, early design, production) where people were creating daemonsets, using controller-gen and just thinking that when they use filters there that it is server-side, creating quite some API pressure. Would it make sense to break out an issue just for a node-field-selector for CRs?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
break out an issue just for a node-field-selector for CRs?
No, I think this is the designated place for everyone to feel bad about the current state :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Client side indexing means you can run as many queries as you want without increasing load on the k8s control plane. Listing all objects with some filter is nearly as expensive as listing all objects without the filter from apiserver's & etcd's perspective, saving only network bandwidth. This is especially good for the controller case, where typically it needs every object eventually anyway.
What is the recommented way for a DS to watch the CRs belonging to the node? I'm using "k8s.io/client-go/tools/cache".NewFilteredListWatchFromClient
with LabelSelector. However, according to the explanation above, specifying LabelSelector in ListOptions would not save much work for APIServer.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
[DaemonSet] to watch the CRs belonging to the node?
That sounds like a valid use case and I don't have a great answer. If you have a watch for each of N nodes, apiserver will do N checks for each change to a CR (fortunately apiserver won't have N backing watches, I believe). It should be ok unless you have both a lot of nodes and a lot of CRs. The list is probably more problematic.
...if you happen to have named the CRs to begin with the name of the associated node, and they are cluster-scoped or all in one namespace, then you can perhaps do some things with continue tokens which let you list beginning at a lexical place. Look up the pagination feature, look up the continue token format (you'll have to read the code). This is very much not recommended but if you have an emergency...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
...since we do in fact maintain that exact index for pods, I would consider a targeted proposal to add that exact index for CRs for which it is relevant, if it can be done with non-horrible CRD API change and without too much violence to the codebase.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
With an admission webhook, you can force a label to hold the value of the node and you can then label select the values. This is similar to the way that namespaces have enforced labels. This is one the reasons why I thought CEL should be allowed to enforce values on labels.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
...that gets the job done, but means the cluster does N^2 work, since there's no index-- which is the complaint.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I don't particularly want apiserver to become more of a database engine than it already is.
What about an extension mechanism in order to out-source the indexing .
A CRDIndexWebhook, similar to webhook mechanism, ie.e including registraion.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
I don't particularly want apiserver to become more of a database engine than it already is.
What about an extension mechanism in order to out-source the indexing .
A CRDIndexWebhook, similar to webhook mechanism, ie.e including registration.
Our background is that we are looking at having 10k+ VMs in a cluster and would like to provide a fast way of filtering.
An alternative to this that has been discussed is an extension where a CEL expression is used, allowing the filtering to occur in process in the apiserver, avoiding requests to a webhook. Some properties are the same, if you register ahead of time, the index can be optimized, if you apply the filter adhoc, the apiserver needs to do a full scan.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Right. To me the key difference would be to offload (computationally) this burden to a non-api-server component. To me an additional benefit of having this functionality externally would be that the CRD could use domain knowledge in order to do the indexing possibly more efficiently.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
There is an open KEP for CRD field selectors: kubernetes/enhancements#4359
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
It appears this feature has been implemented in #122717 and will be released in K8s 1.30
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
It appears this feature has been implemented in #122717 and will be released in K8s 1.30
Correct. I'd encourage everyone on this thread to try out the alpha feature in 1.30 and provide feedback.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
CRD field selectors are on-by-default in 1.31 and merged for GA in 1.32.
/close
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Closed #53459 as completed.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.