Reviewing AWS Unknown estimator

66 views
Skip to first unread message
Assigned to vfe.17141915...@thoughtworks.com by vfe.1695961...@thoughtworks.com

Matthew Ferry

unread,
Feb 3, 2023, 12:50:43 PM2/3/23
to Cloud Carbon Footprint
Hi CCF community,

I'm concerned about the proportion of estimated carbon coming from "Unknown" estimators. For my data, "Unknown" accounts for up to 47% of the estimate, but I don't think it gets enough attention.

My suggestion is that Unknown estimates in AWS are:
  1. Unexpectedly influential - they compose large portions of usage for small accounts like mine.
  2. Inconsistent - they are based on an accrual of previously queried data, so a total for January data can change with the introduction of February data.
  3. Missing service distinctions - they are purely unit-based due to a potential bug.
  4. Overly sensitive to unrelated usage - CO2e estimates for an idle elastic IP address will change if I run a t2.micro or m5.xlarge in the same account.
  5. Based on incomplete costs - since cost-based estimates use blended cost and exclude reservation costs, they potentially overstate emissions-per-dollar.
Unexpected influence
One way to test how much of your footprint is "Unknown" is by disabling unknown estimates and re-running your data. Run a local build and comment out these lines. Clear the cache (e.g. by deleting packages/api/estimates.cache.day.json) and open the dashboard to see your estimate without the unknowns.

Taking a sample of my data with a footprint of 0.07 tonnes CO2e, commenting out those lines reduced my footprint to 0.037 tonnes, meaning that Unknowns accounted for 47% of my footprint estimate.

Suggestion: We should review the methodology for unknown estimates to ensure their influence is defensible.

Inconsistency
Unknown estimates are based on KILOWATT_HOURS_BY_SERVICE_AND_USAGE_UNIT from AWS_CLOUD_CONSTANTS, though technically it's only constant because it's a reference variable. In practice, this object is dynamically updated as each row of the dataset is processed.

For a sample of my usage, at the end of a run it looks something like this:
{
  "total": {
    ...
    "Hrs": { "cost": 136.98740000000012, "kilowattHours": 132.60490061155122 },
    "hours": { "cost": 3.104505134099999, "kilowattHours": 0.18453706155533817 }
  },
  "AmazonEC2": {
    ...
    "Hrs": { "cost": 90.90839999999997, "kilowattHours": 99.68438196330591 }
  },
  ...
}

It's constructed by various accumulateKilowattHours function calls, then applied to unknown rows at the end. This means all unknown rows are affected by all previously queried rows - seemingly since the API server started?

If I change my usage from month to month, then my kWh-cost ratio will change and all historical estimates for unknown rows will be affected if the cache is cleared.

Suggestion: Should we partition kWh-cost factors monthly so they finalise when bills finalise?

Missing service distinctions
In the example above, you can see both a "total" and a service-level estimate. The methodology document says to prefer service-level estimates, then fall back to total estimates. This is implemented with a check against the service field.

In the AWS library on these lines we take a subset of the unknown row's information, not including service, and pass that to the estimate function. Because service isn't supplied on the unknownUsage object, service-level estimates will never be used for AWS.

This looks like a bug, but it would be good to discuss as part of a larger review.

Suggestion: Can we discuss if it's worth reimplementing service-level distinctions?

Overly sensitive to unrelated usage
Referring back to the example factors above, the unit 'Hrs' has 0.96 kWh/$ while 'hours' has 0.06 kWh/$. Hrs is commonly used for EC2 and RDS instances, so it includes scope 3 estimates, which may explain why it's so much larger. But on the face of it, does it make sense for two different spellings of "hours" to have such different estimates? How meaningful is that distinction?

Two of my major Unknown estimate contributors are NAT Gateways (EU-NatGateway-Hours) and load balancer usage (EU-LoadBalancerUsage), also measured in 'Hrs' and running 24/7/365. My current estimate for these usage types would be 0.96 kWh/$ based on the EC2 instances I'm running. The same applies to idle IP addresses (EU-ElasticIP:IdleAddress) because they're also measured in 'Hrs'.

Do these resources consume the same energy per $ as an EC2 instance? If so, which one? If I change EC2 instances, should my NAT Gateway emissions change?

Suggestion: Can we consider alternative ways to source kWh/$ estimates for common unknown usage types?

Based on incomplete costs
Cost-based estimates for AWS read from blended costThis AWS cheatsheet explains a bit about how blended cost works and more importantly this documentation shows some of the complexity in calculating blended cost.

A potential issue with blended costs is that reservation costs aren't included, as in the documentation example. This would artificially inflate kWh/$ measures for accounts with discounts and reservations.

Suggestion: Can we explore switching to on-demand pricing to reduce the influence of cost-saving measures on kWh estimates?

Conclusion
I'd like to discuss how we can review and improve unknown estimates to ensure they're defensible and believable. Above, I listed four questions to start that review:
  • Should we partition kWh-cost factors monthly so they finalise when bills finalise?
  • Can we discuss if it's worth reimplementing service-level distinctions?
  • Can we consider alternative ways to source kWh/$ estimates for common unknown usage types?
  • Can we explore switching to on-demand pricing to reduce the influence of cost-saving measures on kWh estimates?
I'm excited to discuss these topics and contribute changes once we settle on a way forward.

Thanks for your time,

Ismael Velasco

unread,
Feb 3, 2023, 9:04:14 PM2/3/23
to Matthew Ferry, Cloud Carbon Footprint
This is a very thought provoking and persuasive analysis, and I enjoyed and appreciated reading it over a couple of times. I haven't looked at the code myself yet to follow the evidence trail or replicated your steps to corroborate or nuance your conclusions, but it definitely made me want to! And all framed in a constructive, solution focused tone. So this is primarily to say thanks. 

Secondly, I won't have opportunity to engage properly for a while, but in an ideal world, I'd love to hear of someone repeating your steps and reporting on the results. We might a) confirm your analysis and b) enrich the insights from testing against a different usage profile. 

Finally, I haven't really delved into the innards, and wondered whether the issues you raised are exclusively AWS focused (like blended costs), or apply more generally to other clouds, for instance in the use of accummulateKilowattHours. Along similar lines, I wondered what relationship https://github.com/cloud-carbon-footprint/cloud-carbon-footprint/blob/trunk/packages/core/src/unknown/UnknownEstimator.ts and associated files had, if any, to the AWS code. Which is to really ask how comparable the measurements of "unknown" are in AWS, GCP and Azure.

Very interesting and I hope to learn from the discussion I hope follows your very thoughtful post.
 




--
We're excited to hear from you and hope you're enjoying Cloud Carbon Footprint.
Please fill out our feedback form: https://forms.gle/nFzkRioryy4R1DGB6
Add your name to ADOPTERS.md: https://github.com/cloud-carbon-footprint/cloud-carbon-footprint/blob/trunk/ADOPTERS.md
Give us a star on the github if you're enjoying the tool!
---
You received this message because you are subscribed to the Google Groups "Cloud Carbon Footprint" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-carbon-foot...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-carbon-footprint/0d0c0ec6-f8df-4352-9656-627b66c26f26n%40googlegroups.com.

Cloud Carbon Footprint

unread,
Mar 14, 2023, 3:12:42 PM3/14/23
to Cloud Carbon Footprint

Hi Matt,


Thanks so much for your review of the Unknown estimators and for your well-thought analysis on it! We appreciate the breakdown of your concerns, and are excited to discuss them. Here are our thoughts on each point that was presented: 

  1. Regarding the proportion of unknown estimates:
    The percentage of estimates coming from “Unknown” will be heavily dependent on the mix of services used - which will vary from user to user. It would be interesting to hear from other users of CCF what their percentage of “Unknown” is to develop a sense of what constitutes a common portfolio (for AWS as well as the other cloud providers). @Community let us know what you find!

  1. Should we partition kWh-cost factors monthly so they finalize when bills finalize?

While we’re aware of this factor when calculating the unknown services, partitioning on a regular basis during normal instances of fetching may be difficult for the user to understand what’s happening. We do expect a certain consistency of fetching data as normal behavior usually involves using the same grouping method or interval for date ranges. In doing so, it should be expected that the values will change whenever querying for partial sets of data vs complete sets. Partitioning the kWh may introduce a potential bias to those querying methods. We do, however, realize that we need to make this process more clear in the documentation including assumptions that are made given the behavior of the unknown estimator – such as the use of a moving average – and the effect that these assumptions could have.

For additional clarification, the unknown rows and the accumulated KWh that are used for them are only affected by the queried rows in a single request. This accumulation does not persist through the lifecycle of the API, and is only scoped to the data returned for a given request and the data returned within the date range provided for it.

  1. Can we discuss if it's worth reimplementing service-level distinctions?

The service-level distinctions are applied and calculated before the unknown estimations. When the code reaches the unknown estimators we are already exhausted with our attempts to use the service-level information. Therefore, that is the reason why these are not passed to the unknown estimator. 

  1. Can we consider alternative ways to source kWh/$ estimates for common unknown usage types?

We may need to think about this some more. For context and clarity, we did   want to avoid this approach as it isn’t as accurate. When you start considering a cost ratio, you need to factor in things like reserved instances, savings plans, etc that might not equally reflect across various services. Here is more information on why we needed to use cost for AWS.

  1. Can we explore switching to on-demand pricing to reduce the influence of cost-saving measures on kWh estimates?

We do use the line item for blended costs to pull costs for services, and understand the accounting benefits that unblended costs may bring for reporting of reserved instances affecting actual spend. We chose to go with blended costs as its calculation method shows more of an organizational impact of usage at the account-level rather than individual spend affected by discounts and modified rates. We see this as being valuable for monitoring account-level impact relative to organizational spend/usage, and that could be achieved with some changes and an addition of a configuration to CCF. It may be best to discuss and document the tradeoffs of the current design as well so that users will know the impact from using one vs the other when discussing estimates and kWh/$.

Thanks,
The Cloud Carbon Footprint team at Thoughtworks.

Matthew Ferry

unread,
Mar 17, 2023, 11:44:09 AM3/17/23
to Cloud Carbon Footprint
Thank you for your response!
  1. Proportion - I look forward to community members checking this for their deployments, I'm curious to see what people will find.
  2. Partitioning - I understand and agree this approach is efficient for ad hoc querying. Personally, I value consistency over efficiency of ad hoc queries, which is why I believe partitioning would be helpful. It ensures data won't change retroactively. Ultimately I leave it to the maintainers to decide the priority between consistency and efficiency!
  3. Service-level distinctions - That's fine, in that case I believe the documentation is unclear. The GCP example states that there is a distinction in services: "For each unknown row, if there is a known usageAmount/kilowattHours ratio with the same service and usage unit, multiply the usageAmount by the ratio to determine the estimated kilowatt hours." The section on AWS only talks about cost, and does not mention that overall totals will be used instead of service totals.
  4. Alternative sources - I agree this is the most difficult question. To do this well, I think some independent research and consultation with AWS would need to be done on individual usage types. Should we schedule something to brainstorm this?
  5. On-demand pricing - To clarify, is the suggestion here that we should create a configuration option to choose between cost measures? That seems like it would be helpful, as it would allow users to align costs with how they view costs in other FinOps contexts.

Thank you again for taking the time to review and respond to this post.

Cheers,
Matt
Reply all
Reply to author
Forward
0 new messages