In modern software development, we often face three types of risks:

Known knowns: issues we are aware of and understand.
Known unknowns: issues we know exist but lack full understanding.
Unknown unknowns: issues we neither know exist nor understand.

This article explores testing techniques for known knowns and known unknowns, and shows how logging and observability help us convert unknown unknowns into known risks. We’ll also see how Site Reliability Engineering (SRE) practices fit into this picture.

1. Understanding Known Knowns and Known Unknowns

Known knowns are clear requirements or behaviors already captured by tests.
Known unknowns are areas we suspect might fail, but we don’t know the exact nature of the failure.

For example, we may know that a user-input parser must handle Unicode strings (known unknown), but we lack precise examples of problematic inputs.

2. Testing Techniques for Known Knowns and Known Unknowns

2.1 Unit and Integration Tests

Unit tests verify specific functions or methods with fixed inputs and outputs.
Integration tests check the interaction between multiple components.

These approaches are excellent for covering known knowns, because they focus on explicit, pre-defined cases.

2.2 Property-Based Testing

Property-based testing flips the script: instead of writing individual test cases, we define properties that should always hold. For instance:

A sorting function should always return a list of the same length, with elements in non-decreasing order.
A JSON serializer followed by a deserializer should yield the original object.

By generating thousands of random inputs, property-based testing helps uncover edge cases, turning some known unknowns into known knowns.

3. From Unknown Unknowns to Known Knowns: Logging and Observability

While testing covers many issues, some problems only appear in production under real-world conditions. These are the unknown unknowns. To discover them, we rely on:

3.1 Structured Logging

Contextual logs record key variables and execution paths.
Log levels (INFO, WARN, ERROR) help prioritise messages.

Good logging makes unexpected behaviour visible, turning unknown unknowns into known unknowns that we can investigate.

3.2 Metrics and Monitoring

Counters track event occurrences (e.g., request rates, error counts).
Gauges measure values over time (e.g., memory usage).
Alerts notify teams when metrics cross thresholds.

By observing metric spikes or trends, we identify anomalies that testing missed, shifting issues from unknown to known.

3.3 Tracing and Distributed Context

Distributed tracing maps request flows across services, showing latency and errors per segment.
Trace IDs link logs and metrics to individual user requests.

Tracing reveals complex failure modes in microservice architectures, converting unknowns into testable scenarios.

3.4 Cultivating an Observability Culture

Post-mortems document incidents, root causes, and preventive actions.
Blameless analysis encourages team collaboration and continuous improvement.

Observability is not just tools—it’s a mindset that embraces uncertainty and learns from real-world failures.

4. SRE and the Convergence of Practices

Site Reliability Engineering combines testing, logging, and observability:

SRE teams define service level objectives (SLOs) to guide alerts and dashboards.
Error budgets balance innovation against reliability, using real-time data.
Collaboration between dev and ops ensures that unknown unknowns are surfaced early and learned from.

Conclusion

By applying:

Unit/integration tests for known knowns,
Property-based testing for known unknowns, and
Logging + observability for unknown unknowns,

We build resilient systems that adapt and improve. Embracing these practices bridges the gap between what we expect and what happens, turning every unknown into an opportunity for learning.

“There are unknown unknowns; the ones we don’t know we don’t know.”
– Donald Rumsfeld, Wikipedia: Unknown unknowns

Handling Known and Unknown Unknowns with Testing and Observability