Instrumentation and Django's StreamingHttpResponse
In my work at Bevy, I recently implemented a streaming endpoint in our Django app using StreamingHttpResponse
. For anyone unfamiliar with this class, using it is straightforward. You just need to pass it some iterable, and it will stream the content.
from django.http import StreamingHttpResponse
def my_generator():
for i in range(3):
yield i
def my_view(request):
return StreamingHttpResponse(
my_generator(),
content_type="text/event-stream",
)
And you'll get a view that streams 012
, one digit at a time.
But the gotcha came from how we normally do instrumentation. We use Datadog for distributed tracing and structlog
for structured logging, and using a streaming response broke our normal usage patterns for both.
First, the tracing. We typically trace an entire callable, like so:
from ddtrace import tracer
class MyStreamingThing:
@tracer.wrap(service="my_service")
def stream_stuff(self):
for i in range(3):
yield i
This works for callables that never yield up to the caller. But it doesn't work for streaming. The trace doesn't stay open for the entire duration of the work. Instead, I had to manually manage the traces, something like this:
from ddtrace import tracer
class MyStreamingThing:
def stream_stuff(self):
span = tracer.trace(
name="my_streaming_thing.stream_stuff",
service="my_service",
)
try:
for i in range(3):
yield i
finally: # spans MUST be finished to avoid a memory leak!
span.finish() # FYI, this method is idempotent; very useful!
This worked as expected. Hooray! Logging, on the other hand, was a little more complicated. We use an admittedly outdated version of django-structlog
, which includes a piece of middleware that uses structlog
's thread-local context to bind logger variables. This middleware uses structlog
's threadlocal.tmp_bind
context manager around every request to bind request variables like the user ID, IP address, correlation ID, etc. to a new bound logger. These variables are unbound when the context manager exits.
And I'm sure you can see where this is going: we were losing all our request-level logger bindings! Now, the correct solution to this is to upgrade our version of django-structlog
, which uses structlog
's contextvars
module to bind request-level variables that persist across threads. And that's what I'll be doing next week! But for now, I needed a quicker solution.
Enter the immutable logger. structlog
lets you separate a logger instance from the thread-local context, which is just what we needed to preserve our request-level variables even after our middleware context manager had exited. We used it like so:
import structlog
from ddtrace import tracer
from django.http import StreamingHttpResponse
# this logger has the context provided by the middleware
logger = structlog.get_logger(__name__)
def my_view(request):
# this logger will keep its context after the middelware exits
immutable_logger = structlog.threadlocal.as_immutable(logger)
return StreamingHttpResponse(
MyStreamingThing().stream_stuff(immutable_logger),
content_type="text/event-stream",
)
class MyStreamingThing:
def stream_stuff(self, logger):
# now our logger won't lose its context, and we can continue
# to bind extra context as we normally would
logger = logger.bind(foo="bar")
span = tracer.trace(
name="my_streaming_thing.stream_stuff",
service="my_service",
)
try:
for i in range(3):
logger.info("Yielding...")
# log event: {"event": "Yielding...", "foo": "bar", "request-id": ... }
yield i
finally:
span.finish()
And now everything was working as expected. The logger was a bit tedious, as we had to pass it around to any other callables that needed to log anything, but it worked as we expected. As mentioned, using contextvars
makes this less verbose (at the cost of more "magic"), but the bigger lesson for me was this: streaming responses and context managers can combine into a bit of a footgun, so it pays to be extra careful!
Subscribe to my newsletter
Read articles from Justin Thurman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by