Grafana, Span Queries & TraceQL learnings

Tiger AbrodiTiger Abrodi
2 min read

Span Queries

Span queries drill down into specific operations within traces for detailed system observation. They identify bottlenecks, errors, and unusual patterns in distributed systems.

Grafana provides span filters in trace view to refine displayed spans by service name, span name, duration, or tags. Multiple filters combine for targeted analysis.

Common use cases: pinpoint slow operations, focus on specific services, isolate spans with error tags or HTTP status codes.

TraceQL Fundamentals

TraceQL is Grafana Tempo's query language for distributed tracing. It selects traces and spans using structured, expressive queries.

Relational Operators

TraceQL supports span relationships: parent-child (>) and ancestor-descendant (>>).

Find traces where service A span is parent of service B span:

{ .service.name = "A" } > { .service.name = "B" }

Attribute Filtering

Query spans by attributes using stream selectors. Find spans with HTTP status greater than 200:

{ http.status_code > 200 }

Pipeline Processing

Group, aggregate, and recombine spans using pipeline operators: by(.attribute), coalesce(), min/max/sum.

Span Events

Span events are timestamped annotations within spans for recording exceptions, log messages, or significant occurrences.

Query by event name:

{ event:name = "exception" }

Query by event attribute:

{ event.exception.message =~ ".*timeout.*" }

Span events are more efficient than creating additional spans and ideal for tracking exceptions or key milestones.

OpenTelemetry Setup

Resource Attributes

Required and optional attributes set at resource level:

  • service.name: Logical service name (required)

  • service.namespace: Service grouping

  • service.version: Service version

  • service.instance.id: Instance identifier

These attributes apply to all telemetry emitted by the API instance.

Key TraceQL Patterns

Find error traces:

{ status = error }

Find slow operations:

{ duration > 1s }

Service-to-service calls:

{ .service.name = "frontend" } >> { .service.name = "database" }

Exception events:

{ event:name = "exception" && event.exception.type = "TimeoutError" }

HTTP errors:

{ http.status_code >= 400 }
0
Subscribe to my newsletter

Read articles from Tiger Abrodi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tiger Abrodi
Tiger Abrodi

Just a guy who loves to write code and watch anime.