The Art Of Site Reliability Engineering- Part 2


In PART 1 we touched lightly upon what is SRE and what are its vocabulary . In this we will further expand upon few of the vocabulary keywords used and implemented by SRE’s.
What is SLI ?
SLI is an acronym for Service Level Indicator, but this can't be any type of metric.
An SLI needs to be a ratio of 2 numbers: the number of good events divided by the total number of (valid) events.
Effective way to calculate it can be in terms of Percentile.
Example:
How many requests are very slow (unacceptable tail latency), we can express this by asking "what is the 99th percentile" every N minutes or seconds. If a query is beyond a cut-off point that we perceive as unacceptable (lets say, 2 seconds), this query is counted as “not good.”
Benefits of SLI?
Using SLI avoids using all metrics tracked by your observability platform and only select a few indicators to track the level of service delivered to our end users.
Subscribe to my newsletter
Read articles from Gaurav directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
