The Art Of Site Reliability Engineering- Part 2

GauravGaurav
1 min read

In PART 1 we touched lightly upon what is SRE and what are its vocabulary . In this we will further expand upon few of the vocabulary keywords used and implemented by SRE’s.

What is SLI ?

  • SLI is an acronym for Service Level Indicator, but this can't be any type of metric.

  • An SLI needs to be a ratio of 2 numbers: the number of good events divided by the total number of (valid) events.

  • Effective way to calculate it can be in terms of Percentile.

Example:

How many requests are very slow (unacceptable tail latency), we can express this by asking "what is the 99th percentile" every N minutes or seconds. If a query is beyond a cut-off point that we perceive as unacceptable (lets say, 2 seconds), this query is counted as “not good.”

Benefits of SLI?

Using SLI avoids using all metrics tracked by your observability platform and only select a few indicators to track the level of service delivered to our end users.

1
Subscribe to my newsletter

Read articles from Gaurav directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gaurav
Gaurav