Resiliency scoring framework

535 views
Skip to first unread message

kishore yendamuri

unread,
May 20, 2024, 7:52:31 PM5/20/24
to Chaos Community
Hi,
Do you guys know of any framework that can be applied to any cloud provider or to on prem to come up with a resiliency score. In other ways, how to measure application stacks with a resiliency score in organizations?

Thanks

Purohit Goswami

unread,
Aug 23, 2024, 2:49:19 AM8/23/24
to Chaos Community
Hi Kishore,

Probably this will help you to calculate Resilience Score. 

Step 1: Identify the metrics.

Availability: Percentage of time the service is operational and accessible.
Mean Time to Recovery (MTTR): Average time taken to restore a service after a failure.
Mean Time Between Failures (MTBF): Average time between failures of a service.
Error Rate: Percentage of requests that result in errors.
Fault Tolerance: Ability to continue operating properly in the event of a failure.
Recovery Point Objective (RPO): The maximum tolerable period in which data might be lost due to a disaster.
Recovery Time Objective (RTO): The maximum acceptable amount of time to restore the service after a disruption


Step 2: Set Thresholds for Each Metric:
Establish acceptable levels for each metric based on business requirements and industry standards.

Step3: Collect Data:
Use monitoring tools to collect data on each metric.

Step4: Calculate Individual Resilience Scores:
Calculate scores for each metric based on the data collected. For example:

Availability Score = (Actual Uptime / Total Possible Uptime) * 100
MTTR Score = 1 / (1 + MTTR) (Lower MTTR is better)
MTBF Score = MTBF (Higher MTBF is better)
Error Rate Score = (1 - Error Rate) * 100 (Lower Error Rate is better)


Step5: Weight the Metrics:
Assign weights to each metric based on its importance to the overall resilience of the service. For example, availability might have a higher weight than MTTR.

Step 6:Compute the Resilience Score:
Combine the weighted scores to compute an overall resilience score. This can be done using a weighted average formula.


Example of Calculating Resilience Score
Let's consider a cloud service with the following metrics collected over a month:

Availability: 99.95%
MTTR: 30 minutes
MTBF: 72 hours
Error Rate: 0.2%
Weights: Availability (40%), MTTR (20%), MTBF (30%), Error Rate (10%)


1. Convert Metrics to Scores:

Availability Score = 99.95
MTTR Score = 1 / (1 + 0.5) = 0.67 (since MTTR is 30 minutes, or 0.5 hours)
MTBF Score = 72 (Higher is better, so the raw number is used)
Error Rate Score = (1 - 0.002) * 100 = 99.8


2.Normalize Scores (if necessary):

Convert scores to a common scale, such as 0-100.
MTTR Normalized Score = 0.67 * 100 = 67
MTBF Normalized Score = (72 / Max MTBF in dataset) * 100 (assuming 72 is the max, it’s 100)


3.Calculate Weighted Scores:

Weighted Availability Score = 99.95 * 0.4 = 39.98
Weighted MTTR Score = 67 * 0.2 = 13.4
Weighted MTBF Score = 100 * 0.3 = 30
Weighted Error Rate Score = 99.8 * 0.1 = 9.98


4. Compute Overall Resilience Score:

Overall Resilience Score=39.98+13.4+30+9.98=93.36

Conclusion
The overall resilience score for the service is 93.36 out of 100, indicating high resilience. This score provides a quantitative measure of how well the service can maintain its performance in the face of disruptions.

Casey Rosenthal

unread,
Oct 9, 2024, 12:28:14 PM10/9/24
to Chaos Community
Just want to jump in here and say that MTTR is complete bullshit. If you are using MTTR, you are wasting your time. The VOID report explains why in detail, but roughly:
* Incidents aren't distributed along a normal curve, so you are using the wrong statistic
* You don't have enough data to be statistically significant
* Duration doesn't correlate to criticality
... and a half dozen other reasons, any one of which invalidates the basis of MTTR. Don't use it.

Thanks!
-Casey

Ashutosh Sharma

unread,
Oct 9, 2024, 1:29:15 PM10/9/24
to Casey Rosenthal, Chaos Community
Casey, can you share the VOID report?

--
You received this message because you are subscribed to the Google Groups "Chaos Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chaos-communi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chaos-community/6f9fbd58-f8df-40e6-ab11-a25e57511e9cn%40googlegroups.com.

Yury Niño Roa

unread,
Oct 9, 2024, 2:28:26 PM10/9/24
to Chaos Community
Hi!

Sorry for interrupting, but I would like to give visibility about an issue that I am facing with the report :) Although it is the link for the report https://www.thevoid.community/report-2024#custom-code1 I have not been able to download it, since I have to subscribe to a list, but when I did, I am redirected to the same page. 

Arun Kumarr B

unread,
Oct 10, 2024, 1:44:21 AM10/10/24
to Ashutosh Sharma, Casey Rosenthal, Chaos Community

Ashutosh Sharma

unread,
Oct 10, 2024, 4:57:30 AM10/10/24
to Arun Kumarr B, Casey Rosenthal, Chaos Community
Thanks Arun, Casey 
Reply all
Reply to author
Forward
0 new messages