RevocableInfo.Type

60 views
Skip to first unread message

Benjamin Mahler

unread,
Feb 3, 2016, 6:29:07 PM2/3/16
to mesos-al...@googlegroups.com
One of the topics that we need to address is the addition of RevocableInfo.Type. Joris, Jie and I chatted recently and realized that we may need to expose this to frameworks as a representation of the SLA for the resource. For example:

    Non-Revocable: cannot be revoked, inverse offers required ($$$)
    Revocable: can be revoked ($$)
    Best-Effort: can be revoked, can be throttled ($)

In a revocable-by-default world, the allocator is free to offer Revocable resources for various purposes (e.g. by default, un-allocated reservations, un-allocated quota) and the agent will offer un-utilized resources as Best-Effort.

We thought about whether it makes sense to eliminate the distinction between Revocable and Best-Effort by allowing all Revocable resources to be throttled, but it seems that frameworks may want to opt-out of throttling for performance / latency reasons (i.e. give me all or nothing, but don't throttle me) whereas some workloads are explicitly ok taking the Best-Effort resources because they're cheaper.

The assumption here is that frameworks must not rely on the presence of Best-Effort resources, because they are only generated when a resource estimator is active on the agent.

Any thoughts on this?

Guangya Liu

unread,
Feb 4, 2016, 3:34:50 AM2/4/16
to Mesos Resource Allocation Working Group, bma...@apache.org
Hi Ben,

So you mean that basically, those three kind of resources will be:


    Non-Revocable: cannot be revoked, inverse offers required ($$$)
    Revocable: can be revoked ($$) <<<<< Including both allocation slack and quota slack for now,  and no need to distinguish those two types.
    Best-Effort: can be revoked, can be throttled ($) <<<<< Only usage slack, right?

If my understanding is right, then I think that we still need two types of revocable resources: one is usage slack which is the best effort resources and the other is allocation slack and quota slack for now, right?

The framework need to clarify if s/he want to use revocable resources or best effort resources or both via some flags? The current usage slack resources (best effort) is using REVOCABLE_RESOURCES by framework, do we need to introduce another flag to identify the framework want to use revocable resources?

Thanks,

Guangya


在 2016年2月4日星期四 UTC+8上午7:29:07,Benjamin Mahler写道:

Klaus Ma

unread,
Feb 4, 2016, 7:46:53 AM2/4/16
to Mesos Resource Allocation Working Group, bma...@apache.org
Regarding "can be throttled", my understanding is that "Best-Effort" resources maybe reduced by Estimator, e.g. from 2 CPU to 1 CPU, right? ALLOCATION_SLACK has similar cases that if framework `/unreserve` resources, the revocable resources (ALLOCATION_SLACK) need to be evicted to release resources. So I think we can eliminate the distinction between Revocable and Best-Effort. And updating oversubscription that: Estimator report revocable resources (Best-Effort) to allocator, allocator decide how many resources (revocable resources) should be evicted in agent; the QoSController is merged with modules in agent on resources evicting.

Thanks
Klaus

Alex Rukletsov

unread,
Feb 4, 2016, 8:53:16 AM2/4/16
to Klaus Ma, Mesos Resource Allocation Working Group, bma...@apache.org
Klaus, 

why do you think there is no point in throttling tasks? I can imagine that for compressible resources, a task requiring e.g. 2 cpus may prefer running on 1 cpu temporarily rather than being killed.

Here is another reason why we may want to preserve the distinction. I believe we in the allocator we have to distinguish offered resources by source (usage slack oversubscription, unused quota oversubscription, revocable and so on), not only by quality (non-revocable, revocable, best-effort). Maintaining 1-to-1 correspondence between source and quality may simplify bookkeeping and resources math (for example, we know that best-effort resources come from usage slack oversubscription only and are controlled by external module).

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocati...@googlegroups.com.
To post to this group, send email to mesos-al...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mesos-allocation/f7b52023-d856-41b0-a060-b815fa139528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Mahler

unread,
Feb 4, 2016, 3:21:37 PM2/4/16
to Alex Rukletsov, Klaus Ma, Mesos Resource Allocation Working Group
The way I'm thinking about "Best-Effort" containers is that they are "scavengers" of unused resources: they are running within the wasted resources of containers with higher QoS (revocable, non-revocable). Compressible resources (e.g. cpu, disk bandwidth, network bandwidth) can be throttled when no longer available to the "Best-Effort" container. Incompressible resources (e.g. memory, disk) that are no longer available for the "Best-Effort" container will lead to revocation, since we cannot throttle.

Guangya Liu

unread,
Feb 4, 2016, 9:09:44 PM2/4/16
to Mesos Resource Allocation Working Group, al...@mesosphere.com, klaus1...@gmail.com, bma...@apache.org
The current "Best-Effort" resources are managed by resourceEstimator and qosController and I think it is end user's choice for how to handle the "compressible resources" and "incompressible resources"?

在 2016年2月5日星期五 UTC+8上午4:21:37,Benjamin Mahler写道:
The way I'm thinking about "Best-Effort" containers is that they are "scavengers" of unused resources: they are running within the wasted resources of containers with higher QoS (revocable, non-revocable). Compressible resources (e.g. cpu, disk bandwidth, network bandwidth) can be throttled when no longer available to the "Best-Effort" container. Incompressible resources (e.g. memory, disk) that are no longer available for the "Best-Effort" container will lead to revocation, since we cannot throttle.
On Thu, Feb 4, 2016 at 5:53 AM, Alex Rukletsov <al...@mesosphere.com> wrote:
Klaus, 

why do you think there is no point in throttling tasks? I can imagine that for compressible resources, a task requiring e.g. 2 cpus may prefer running on 1 cpu temporarily rather than being killed.

Here is another reason why we may want to preserve the distinction. I believe we in the allocator we have to distinguish offered resources by source (usage slack oversubscription, unused quota oversubscription, revocable and so on), not only by quality (non-revocable, revocable, best-effort). Maintaining 1-to-1 correspondence between source and quality may simplify bookkeeping and resources math (for example, we know that best-effort resources come from usage slack oversubscription only and are controlled by external module).
On Thu, Feb 4, 2016 at 1:46 PM, Klaus Ma <klaus1...@gmail.com> wrote:
Regarding "can be throttled", my understanding is that "Best-Effort" resources maybe reduced by Estimator, e.g. from 2 CPU to 1 CPU, right? ALLOCATION_SLACK has similar cases that if framework `/unreserve` resources, the revocable resources (ALLOCATION_SLACK) need to be evicted to release resources. So I think we can eliminate the distinction between Revocable and Best-Effort. And updating oversubscription that: Estimator report revocable resources (Best-Effort) to allocator, allocator decide how many resources (revocable resources) should be evicted in agent; the QoSController is merged with modules in agent on resources evicting.

Thanks
Klaus


On Thursday, February 4, 2016 at 7:29:07 AM UTC+8, Benjamin Mahler wrote:
One of the topics that we need to address is the addition of RevocableInfo.Type. Joris, Jie and I chatted recently and realized that we may need to expose this to frameworks as a representation of the SLA for the resource. For example:

    Non-Revocable: cannot be revoked, inverse offers required ($$$)
    Revocable: can be revoked ($$)
    Best-Effort: can be revoked, can be throttled ($)

In a revocable-by-default world, the allocator is free to offer Revocable resources for various purposes (e.g. by default, un-allocated reservations, un-allocated quota) and the agent will offer un-utilized resources as Best-Effort.

We thought about whether it makes sense to eliminate the distinction between Revocable and Best-Effort by allowing all Revocable resources to be throttled, but it seems that frameworks may want to opt-out of throttling for performance / latency reasons (i.e. give me all or nothing, but don't throttle me) whereas some workloads are explicitly ok taking the Best-Effort resources because they're cheaper.

The assumption here is that frameworks must not rely on the presence of Best-Effort resources, because they are only generated when a resource estimator is active on the agent.

Any thoughts on this?

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

Klaus Ma

unread,
Feb 8, 2016, 10:40:27 AM2/8/16
to Mesos Resource Allocation Working Group, klaus1...@gmail.com, bma...@apache.org
Regarding "distinguish offered resources by source", we may trace used resources (vs. allocated resources) after "Optimistic Offer Phase 2" & "revocable by default"; if so, framework also need to know which kind of revocable (source) resources are using.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

Klaus Ma

unread,
Feb 8, 2016, 10:40:46 AM2/8/16
to Mesos Resource Allocation Working Group, al...@mesosphere.com, klaus1...@gmail.com, bma...@apache.org
Got your points about "Best-Effort" and "throttled", thanks very much :).

Regarding "Best-Effort" and "throttled", there're two dimension on revocable resources: "Best-Effort" (oversubscription) and "throttled" (Compressible/Incompressible);  and four types of revocable resources:
  1. "Best-Effort" (oversubscription), Compressible, e.g. CPU by Estimator
  2. "Best-Effort" (oversubscription), Incompressible, e.g. mem by Estimator
  3. "Revocable" (allocator), Compressible, e.g. CPU by allocator
  4. "Revocable" (allocator), Incompressible, e.g. mem by allocator
I think it's complex to framework developer, that's why I propose not to distinguish oversubscription and allocator. As a framework developer, I don't care where revocable resources came from (unused reservation, wasted resources of containers or unused quota) if framework can use revocable resources mixed; but "throttled" may impact framework's decision.

We distinguish revocable resources in "Optimistic Offer Phase 1 (OO1)" because the evictor is different: agent evict executor in OO1, QoSController evict executor in Oversubscription. The case is complex if both evictor take actions.

On Friday, February 5, 2016 at 4:21:37 AM UTC+8, Benjamin Mahler wrote:
The way I'm thinking about "Best-Effort" containers is that they are "scavengers" of unused resources: they are running within the wasted resources of containers with higher QoS (revocable, non-revocable). Compressible resources (e.g. cpu, disk bandwidth, network bandwidth) can be throttled when no longer available to the "Best-Effort" container. Incompressible resources (e.g. memory, disk) that are no longer available for the "Best-Effort" container will lead to revocation, since we cannot throttle.
On Thu, Feb 4, 2016 at 5:53 AM, Alex Rukletsov <al...@mesosphere.com> wrote:
Klaus, 

why do you think there is no point in throttling tasks? I can imagine that for compressible resources, a task requiring e.g. 2 cpus may prefer running on 1 cpu temporarily rather than being killed.

Here is another reason why we may want to preserve the distinction. I believe we in the allocator we have to distinguish offered resources by source (usage slack oversubscription, unused quota oversubscription, revocable and so on), not only by quality (non-revocable, revocable, best-effort). Maintaining 1-to-1 correspondence between source and quality may simplify bookkeeping and resources math (for example, we know that best-effort resources come from usage slack oversubscription only and are controlled by external module).
On Thu, Feb 4, 2016 at 1:46 PM, Klaus Ma <klaus1...@gmail.com> wrote:
Regarding "can be throttled", my understanding is that "Best-Effort" resources maybe reduced by Estimator, e.g. from 2 CPU to 1 CPU, right? ALLOCATION_SLACK has similar cases that if framework `/unreserve` resources, the revocable resources (ALLOCATION_SLACK) need to be evicted to release resources. So I think we can eliminate the distinction between Revocable and Best-Effort. And updating oversubscription that: Estimator report revocable resources (Best-Effort) to allocator, allocator decide how many resources (revocable resources) should be evicted in agent; the QoSController is merged with modules in agent on resources evicting.

Thanks
Klaus


On Thursday, February 4, 2016 at 7:29:07 AM UTC+8, Benjamin Mahler wrote:
One of the topics that we need to address is the addition of RevocableInfo.Type. Joris, Jie and I chatted recently and realized that we may need to expose this to frameworks as a representation of the SLA for the resource. For example:

    Non-Revocable: cannot be revoked, inverse offers required ($$$)
    Revocable: can be revoked ($$)
    Best-Effort: can be revoked, can be throttled ($)

In a revocable-by-default world, the allocator is free to offer Revocable resources for various purposes (e.g. by default, un-allocated reservations, un-allocated quota) and the agent will offer un-utilized resources as Best-Effort.

We thought about whether it makes sense to eliminate the distinction between Revocable and Best-Effort by allowing all Revocable resources to be throttled, but it seems that frameworks may want to opt-out of throttling for performance / latency reasons (i.e. give me all or nothing, but don't throttle me) whereas some workloads are explicitly ok taking the Best-Effort resources because they're cheaper.

The assumption here is that frameworks must not rely on the presence of Best-Effort resources, because they are only generated when a resource estimator is active on the agent.

Any thoughts on this?

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

Guangya Liu

unread,
Feb 21, 2016, 10:06:28 PM2/21/16
to Mesos Resource Allocation Working Group, al...@mesosphere.com, klaus1...@gmail.com, bma...@apache.org
I want to continue this thread here to invoke more thinking/discussion, I agree with Alex that we should keep the source of the offered resource for better bookkeeping.

So my thinking is that still keep the usage slack, allocation slack, quota slack. Each slack support two kind of resources: Compressible and Incompressible while the Incompressible will lead to eviction in agent. 

For frameworks, we can define different framework capabilities for different slack resources and end user can compose different capabilities to specify what kind of resources does the framework want to use.

Thanks,

Guangya

在 2016年2月8日星期一 UTC+8下午11:40:46,Klaus Ma写道:

Benjamin Mahler

unread,
Mar 7, 2016, 8:43:18 PM3/7/16
to Guangya Liu, Mesos Resource Allocation Working Group, Alex Rukletsov, Klaus Ma
Why do we need to differentiate between allocation slack and quota slack? Does the framework care about this?

The issue with tracking this is that it makes the allocator less flexible. For example, if the quota is removed do we have to revoke the container because we told the framework "quota slack" but now it has become "revocable by default"? It seems to me the only distinction that needs to be made is between the allocator-managed revocable resources and the agent-managed revocable resources. Fortunately, these also are different "flavors" of resources, the latter being revocable and throttle-able resources, the former being just revocable.

To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocati...@googlegroups.com.
To post to this group, send email to mesos-al...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocati...@googlegroups.com.
To post to this group, send email to mesos-al...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocati...@googlegroups.com.
To post to this group, send email to mesos-al...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mesos-allocation/35a874c0-0f21-42d3-beda-cd8bc5591d49%40googlegroups.com.

Guangya Liu

unread,
Mar 7, 2016, 10:17:29 PM3/7/16
to Mesos Resource Allocation Working Group, gyli...@gmail.com, al...@mesosphere.com, klaus1...@gmail.com, bma...@apache.org
Hi Ben,

Jus had some discussion with Klaus for this regarding your question. Yes, does it makes sense to only keep two different kind of revocable resources: usage slack and allocator slack? The allocator slack is revocable resources calculated by allocator for quota and reservation.

Regarding to the frameworks capability, we did have some discussion here https://groups.google.com/forum/#!topic/mesos-of-dev-wg/VO93dYY8S60 , and the conclusion is that we do not need to add new capabilities but keep the current capability. The master already made some checking to see if the task resource request is valid https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L463-L472 , we can enhance here to handle more cases for allocator slack.

Any comments?

Thanks,

Guangya


在 2016年3月8日星期二 UTC+8上午9:43:18,Benjamin Mahler写道:
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Mesos Resource Allocation Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mesos-allocation+unsubscribe@googlegroups.com.
To post to this group, send email to mesos-allocation@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages