Understanding HTTP Status Codes: Importance and Usage in RESTful Microservices


Microservice architectures live and die by clear communication. When dozens of services (and external vendors) interact via REST APIs, HTTP status codes become the silent contract that every request and response abides by. Using them correctly is more than a nicety – it’s essential for clarity, observability, and maintainability in distributed systems. In this post, we’ll explore why status codes matter, focus on commonly used codes and give practical advice.
Why HTTP Status Codes Matter in Microservice APIs
They’re part of your API contract. In a RESTful communication, status codes are a primary way the server communicates what happened to a request. Martin Fowler notes that a truly RESTful service makes full use of HTTP verbs and response codes. In other words, your API should use HTTP status codes meaningfully, not just always return 200 with a magic string that the consumer is expected to interpret. As microservices expert Sam Newman advises, every service should leverage the standard HTTP codes to clearly indicate outcomes. This makes your API self-explanatory to clients and developers.
Clarity for clients and developers. A well-chosen status code instantly tells the client how to handle the response. For example, a 404 Not Found
tells a client it used a bad URL or ID, whereas a 400 Bad Request
indicates something wrong with the request format or data. If your API returns a 404 for a missing record ID, the client knows it’s a wrong ID; if it returns 400, the client knows it sent an invalid request (maybe malformed JSON). This clarity improves the developer experience and reduces misunderstandings. No one likes guessing whether a request failed due to their bug or a server issue – the status code should make it obvious.
Observability and monitoring. In a distributed system, proper status codes are crucial for tracking the health of services. Monitoring tools and logs typically treat 5xx
errors as indicators of server or system problems, while 4xx
errors indicate client-side issues. As one microservices guide notes, services are generally expected to emit 2xx, 3xx, or 4xx codes, whereas any 5xx or timeout suggests an unhealthy service that may trigger alerts. If you misuse codes (for example, returning 200 OK even when an error occurred and that’s in the body of the response), your observability is compromised. Your dashboards won’t show the spike in errors because you never signaled an error to begin with. Using the correct codes helps your team quickly pinpoint issues (e.g. a surge in 504 Gateway Timeouts could flag upstream vendor problems immediately).
Maintainability and consistency. In a microservice ecosystem, dozens of services might be developed by different teams. Consistent use of HTTP status codes across all services makes it easier to maintain and integrate these components. If every team follows the same protocols, developers moving between services (or writing client code for multiple services) don’t have to relearn the error semantics each time. Consistency is something thought leaders like David Farley emphasize – without it, you incur complexity and technical debt without any benefit. In practice, this means defining clear guidelines: e.g., “use 400 for validation errors, 404 for not found, 500 for unhandled exceptions, etc.” and sticking to them. Microsoft also has made it easy to return the proper HTTP status using Action Results, nudging teams toward consistent and correct usage.
Can you imagine if you had been onboarded to a new organization, and had to learn all of the magic strings that are returned from a service? What if those strings changed? How do you maintain that? Oh, this is making my head hurt!
Finally, robustness in face of failures. In any distributed system, failures are going to happen – networks partition, services go down, timeouts occur - a dog could eat all of your packets. Your services should use status codes to communicate these failures gracefully. As Sam Newman puts it, you must design for failure by handling timeouts and errors actively rather than ignoring them. A client of your service should receive a 504 Gateway Timeout
if your service couldn’t get a response from a downstream dependency, not a generic 500 Internal Server Error
, or a 400 Bad Request
. Clear status codes allow clients to implement retries or fallbacks when appropriate. They also encourage you, as the service author, to think about error cases explicitly (e.g., “What should I return if the service doesn’t respond in 5 seconds?”). This kind of defensive design is key to resilient microservices.
What are some common HTTP Status Codes?
In nearly ever engineer interview I go into, I propose a number of scenarios around HTTP and I always throw in some of the common status codes. There are more than I can remember, and you can always look them up, but when time is of the essence, you have to have the basics. It is just part of owning your craft.
HTTP Status codes are broken down into five classes, where the first digit of the code highlights the class, and the following two digits help create the specificity. The first family of status codes are the 1xx
Informational response codes, and are used to communicate back and forth between the server and client with your response. These are not going to be the focus for today.
2xx Success Codes: Communicating “All Good”
When a request is handled successfully by a service, a 2xx
status code is returned. The 2xx range tells the client “Everything worked as expected.” Even within success codes, choosing the right one adds clarity about how the request was processed. Here are the most common ones:
200 OK
What it means: A 200 OK
response means the request was successful and the server is returning the result in the response body (if any). This is the most common status code – essentially “OK, here is the data you asked for” or “OK, the operation succeeded.”
When to use: Use 200 for successful GET requests (retrieving a resource or a collection), for PUT/PATCH requests that updated a resource, or for any POST request that doesn’t create a new resource (for example, a search operation or a login which just returns a token). In our Dog API, a GET request to GET /api/dogs/123
that finds the dog will return 200 along with the dog’s profile JSON in the body. Similarly, if you updated a dog’s profile with a PUT request, a 200 might indicate the update succeeded and perhaps return the updated resource in the body.
Why it matters: 200 is the default success code that clients will assume for a normal outcome. It’s important to return 200 (and not 204) when there is a response body. Conversely, don’t return 200 if something actually went wrong – that would mislead the client. As Vinay Sahni notes in his REST API guidelines, 200 OK is appropriate for a successful GET, PUT, PATCH or DELETE, or even a POST that doesn’t result in a new resource.
.NET example: In an ASP.NET controller, you typically return 200 by using the Ok(...)
helper with the response data. For example:
[HttpGet("api/dogs/{id}")]
public ActionResult<Dog> GetDog(int id) {
var dog = _dogService.FindById(id);
if (dog == null) {
return NotFound(); // 404 if no such dog
}
return Ok(dog); // 200 OK with the dog data in the body
}
In the above code, if the dog exists we return Ok(dog)
, which the framework translates to an HTTP 200 status with the dog object serialized in the response body. (If the dog isn’t found, we return a 404 Not Found
– we’ll talk more about 404 in the 4xx section.)
201 Created
What it means: A 201 Created
status means that the request was successful and a new resource was created as a result. It’s typically accompanied by a Location
header pointing to the URL of the newly created resource. This is the proper response for POST requests that create new objects.
When to use: Use 201 when processing a POST that adds a new resource to the system. For instance, POST /api/dogs
to create a new dog profile should return 201 on success. The body usually contains the newly created resource (or some representation of it), and the Location
header should contain the URL where that resource can be fetched (e.g. /api/dogs/12345
if 12345 is the new ID). This makes it easier for clients to, for example, immediately navigate to or GET the new resource. According to REST best practices, “Response to a POST that results in a creation should be 201 Created and include a Location header”.
Why it matters: 201 provides a clear signal that something was created, which differentiates it from a generic 200. Clients (and developers reading logs) will know that a new record was made. This can also be important for user interfaces or follow-up actions (the client now knows the URL of the new resource to perhaps display or further manipulate). Not using 201 in create scenarios might force the client to parse the response body to figure out if a creation happened, or to guess the new resource’s URL – that’s less clean. Using 201 is all about self-descriptiveness of your API.
.NET example: ASP.NET provides a convenient helper to return 201 with a Location header. You can use CreatedAtAction
(or CreatedAtRoute
) to both return the created object and set the Location header. For example:
[HttpPost("api/dogs")]
public ActionResult<Dog> CreateDog([FromBody] DogDto newDog) {
if (!ModelState.IsValid) {
// 400 if the input is invalid
return BadRequest(ModelState);
}
if (_dogService.Exists(newDog.Name)) {
// 409 Conflict if a dog with the same name already exists
return Conflict("A dog with that name already exists.");
}
var created = _dogService.Add(newDog);
// Return 201 Created with Location header of the new resource
return CreatedAtAction(nameof(GetDog), new { id = created.Id}, created);
}
In this snippet, after validating the input and checking for conflicts, we call the service to add the new dog. We then return CreatedAtAction(...)
, which produces a 201 status. The nameof(GetDog)
references the GET action for a single dog, and the anonymous object { id =
created.Id
}
fills in that route’s parameters to generate the URL. This results in an HTTP response with status 201 and a Header Location: https://<baseurl>/api/dogs/12345
(for example), and the body will contain the created
Dog object in JSON. This way, the client immediately knows where the new dog resource lives.
204 No Content
What it means: 204 No Content
indicates success but no response body. The server successfully processed the request and is not returning any content. This is typically used when there’s nothing to return (as opposed to 200 where you usually have a response body).
When to use: Use 204 for operations that successfully perform an action but don’t need to return data. Classic cases are DELETE requests (after deleting a resource, what would you return anyway?) and sometimes PUT/PATCH requests that update a resource without returning the updated representation. For example, if a client sends DELETE /api/dogs/123
, and the dog is successfully removed, your service can return 204 No Content – basically saying “deleted successfully, and there’s no further information.” Another example: POST /api/dogs/123/vaccinations
might record a vaccination and not need to return anything – a 204 tells the client “got it, vaccination recorded.”
Why it matters: 204 is useful to save bandwidth and signal “nothing else to see here.” If you returned 200 in these cases, the client might expect a body (even an empty JSON object). With 204, the client knows to expect no content. It’s a small thing, but it makes the API a bit more precise. Also, if you have a client that automatically deserializes JSON, a 204 avoids the need to handle an empty response body in parsing logic – it’s clearly no content.
.NET example: To return 204 in ASP.NET, you can use the NoContent()
helper. For instance, in an update scenario:
[HttpPut("api/dogs/{id}/vaccinations")]
public IActionResult UpdateVaccination(int id, [FromBody] VaccinationRecord record) {
if (!_dogService.Exists(id)) {
return NotFound(); // 404 if no such dog
}
_dogService.UpdateVaccination(id, record);
return NoContent(); // 204, indicating the update succeeded, nothing to return
}
Here we update a dog’s vaccination info. If the dog exists, we perform the update and return NoContent()
. The client receives a 204 status with no body, which is their cue that the operation succeeded and there’s no further data. (If the dog didn’t exist, we returned 404; if the input was invalid, we might return 400 or 422 as we’ll see next.)
Summary of 2xx: In short, use 200 for normal responses with content, 201 for creations, 204 for empty successes. By using these appropriately, your API conveys exactly what happened. As an API design principle: “Use HTTP status codes to be meaningful” – a 200 tells a different story than a 201 or 204, even if all are “successful.” This extra semantic precision helps client developers and logs tremendously.
4xx Client Error Codes: Client Error
The 4xx class of codes indicates client errors – the request was somehow incorrect or cannot be fulfilled as is. This could be due to bad input, missing authentication, forbidden action, nonexistent resource, etc. Using the right 4xx code helps the client quickly understand and fix the issue. Let’s examine the common ones in our context:
400 Bad Request
What it means: 400 Bad Request
means the server cannot or will not process the request due to something that is perceived to be a client error. In other words, the request was malformed or invalid in some way.
When to use: Return 400 when the request data is syntactically incorrect or doesn’t pass basic validation. Typical scenarios:
JSON body cannot be parsed
Required fields are missing or of the wrong type
The format of an input (like an email or date) is wrong
In our API, if a client POSTs a new dog with an invalid JSON (say missing a curly brace) or with a required field like
name
empty, the server should respond with 400. Essentially, “Your request is wrong, fix it and try again.”
It’s worth noting that some teams use 400 for any validation errors (even semantic ones), lumping what others might use 422 for – we’ll discuss 422 Unprocessable Entity
soon. The key is to use 400 for clear-cut bad requests. As Vinay Sahni describes: “400 Bad Request – The request is malformed, such as if the body does not parse.”
Why it matters: 400 distinguishes client-side mistakes from other errors. If your service returns 400, the the caller knows the error is on their side. This is very different from a 500, which implies the client did everything right and the server needs fixing. By properly returning 400 for bad input, you signal to API consumers (and to monitoring systems) that the error was due to a bad request. This prevents unnecessary alerts on your side and helps client developers quickly find issues in their usage of your API.
.NET example: In ASP.NET, model binding and model validation make it easy to generate 400 responses. If you decorate your DTO with validation attributes (like [Required]
or data annotations), and then call ModelState.IsValid
, you can return a BadRequest. The framework can also auto-return 400 with error details if you use [ApiController]
attribute (it does model validation automatically). In our earlier CreateDog
snippet, we had:
if (!ModelState.IsValid) {
return BadRequest(ModelState); //400 Bad Request
}
This returns 400 with details about which fields failed validation. You could also do simpler: return BadRequest("Invalid dog data");
with a custom message. The client then knows to fix the request (maybe they omitted the name or used an invalid format for a field).
Another example: if someone calls GET /api/dogs?id=abc
and your API expected an integer, the framework might automatically treat that as a bad request (since “abc” can’t convert to int) and return a 400 for you. This helps indicate the client used the API incorrectly.
401 Unauthorized
What it means: 401 Unauthorized
means the request has not been applied because it lacks valid authentication credentials. Despite the name “Unauthorized,” it really is about authentication (Who are you?) rather than authorization (what you’re allowed to do).
Use 401 when the request requires user authentication and the client did not provide it or provided invalid credentials (such as a bad token or expired token). For instance:
If our API requires a valid API key or JWT token on a request, and the client calls
GET /api/dogs/123
without a token or with a wrong token, the service should return 401.If a user is not logged in and tries to access a protected endpoint, 401 is appropriate.
In short, 401 says “You are not authenticated. Please authenticate and try again.” The client can attempt to resolve this by providing credentials (logging in, refreshing a token, etc.). It’s not saying “you can never access this”; it’s saying “not in this state (unauthenticated).”
Why it matters: In a microservices environment, clear auth errors are crucial. A 401 tells any intermediaries (like gateways) and the client that the issue is authentication. Many frameworks and tools (like HTTP client libraries or browsers) know to react to 401 by, for example, prompting the user to login or retrying with credentials. Using 401 vs 403 correctly also enhances security: 401 for missing/invalid credentials, 403 for valid credentials but forbidden action. The distinction can prevent information leakage. For example, if a resource requires auth, you don’t want to reveal its existence to an unauthenticated request – a 401 is the correct generic response. As a best practice: 401 when no/invalid credentials, 403 when credentials are valid but lack permissions.
.NET example: In ASP.NET, you often don’t manually return 401 – the framework’s authentication middleware does it for you when authentication fails. For example, if you have JWT Bearer auth and the token is missing or wrong, the middleware will short-circuit and return 401 automatically. You can also explicitly return Unauthorized()
from a controller if needed:
if (!User.Identity.IsAuthenticated) {
return Unauthorized(); // returns 401
}
Typically though, [Authorize]
attributes on controllers handle this. It’s worth noting that Unauthorized()
in .NET corresponds to 401, whereas there is a separate helper Forbid()
for 403 Forbidden.
403 Forbidden
What it means: 403 Forbidden
means the server understood the request and the user is authenticated, but they do not have permission to perform this action. It’s an authorization issue – “You’re logged in, but you’re not allowed to do this.”
When to use: Return 403 when the user’s credentials are recognized but they don’t have the right privileges or access level for the resource or operation. Examples:
The client’s API token is valid but does not include the scope to delete a dog, and they attempted
DELETE /api/dogs/123
. Your Dog service should return 403 Forbidden in this case.A user is trying to access a dog profile that they don’t own or shouldn’t see. If authentication succeeded (the user is logged in) but this particular dog is off-limits, 403 is the right response.
Any operation where the request is well-formed and the user is authenticated, but the authorization policy says “nope, not allowed.”
Why it matters: Using 403 appropriately, in tandem with 401, completes the security story of your API. It tells the client, “You can’t have this even though we know who you are.” If you always returned 401 for both unauthenticated and unauthorized cases, clients will get confused (do I need to re-authenticate or is it fundamentally not allowed?).
A spike in 403 errors might indicate attempted access violations or misconfigured permissions, whereas 401 spikes might indicate an authentication problem (like an auth server down or tokens expired). They are different scenarios and should be distinguished. Following the principle from Vinay’s API guidelines: “403 Forbidden – when authentication succeeded but authenticated user doesn’t have access to the resource”.
.NET example: Similar to 401, ASP.NET will often handle 403 via the [Authorize]
attribute and your authorization configuration. For example, if you use roles or policy-based authorization and a user lacks a required role, the framework will return 403 Forbidden. You can also manually return Forbid()
in a controller to send a 403. For example:
[Authorize] // user must be logged in
[HttpDelete("api/dogs/{id}")]
public IActionResult DeleteDog(int id) {
var dog = _dogService.FindById(id);
if (dog == null) return NotFound();
if (!User.HasClaim("CanDeleteDog", "true")) {
return Forbid(); // 403 if user is not allowed to delete
}
_dogService.Remove(id);
return NoContent();
}
In this snippet, we check an imaginary claim or permission and return Forbid()
if the user isn’t allowed to delete the dog. The result is a 403 Forbidden.
404 Not Found
What it means: 404 Not Found
means the server can’t find the requested resource. The client might be requesting an endpoint that doesn’t exist or an entity by ID that isn’t present.
When to use: Use 404 when:
The URL is wrong or no longer exists (like
/api/dogz/123
with a typo, or an outdated endpoint).The resource ID doesn’t exist. In our Dog API, if a client requests
GET /api/dogs/99999
but there is no dog with ID 99999, return 404. Similarly, if they try to update or delete a non-existent record, 404 is appropriate.Essentially, whenever a resource cannot be found on the server.
Note that if the resource is defined but the user isn’t allowed to see it, and you want to hide its existence, you might also return 404 to an unauthorized user. But generally, 404 is straightforward: record not found.
Why it matters: You’ve probably seen 404 errors just navigating the web. It’s important because it immediately tells the client that either they have a mistake in the URI, or the resource has been deleted. In microservices, this can happen for legitimate reasons (an ID was valid but the record was deleted by another service or user). Handling 404 correctly improves the user experience. For example, an app can show “Dog not found” to the user instead of generic failure. From an observability standpoint, 404s are usually not alerts (they often indicate user input error or outdated references), so filtering them out of error alerts is common. As an API guideline: “404 Not Found – when a non-existent resource is requested.”
.NET example: We saw this in the GetDog
example earlier. Using return NotFound();
will produce a 404. To reinforce:
[HttpGet("api/dogs/{id}")]
public ActionResult<Dog> GetDog(int id) {
var dog = _dogService.FindById(id);
if (dog == null) {
return NotFound(); // returns 404 Not Found
}
return Ok(dog); // returns 200 OK if found
}
This pattern of checking for null and returning NotFound is very common in Web API controllers. It clearly separates the “not found” case from the success case. In a list-fetch scenario (say GET /api/dogs?name=Diesel
) if no results, one might choose to return 200 with an empty list instead of 404 (because the endpoint exists, it just has no data to return in that query). 404 is more for a singular resource that isn’t present.
409 Conflict
What it means: 409 Conflict
indicates that the request could not be processed because of a conflict with the current state of the resource. The server is basically saying “there’s a logical conflict, so I can’t do this unless you resolve the conflict.”
When to use: Common use cases for 409:
Unique constraint violations: If the client attempts to create a resource that conflicts with an existing one. For example, if dogs are identified by name and the client tries to create another dog with the same name (assuming uniqueness), you might return 409 to indicate this conflict. (We showed an example check in our CreateDog code: we returned 409 if a dog with the same name exists.)
Edit conflicts / concurrency control: Suppose the Dog API supports optimistic locking (each dog profile has a version number). If two clients try to update the same dog simultaneously, one update might conflict with the other. The second update could receive a 409 Conflict indicating “the state you tried to update has changed, your update conflicts.” The client might then fetch the latest version and retry. This is a classic use of 409 in REST to handle concurrent updates.
Basically, anytime the request can’t be completed due to some resource state that the client might not be aware of.
As Martin Fowler has pointed out in a discussion on API design, 409 is a useful code for situations like business rule violations too. For instance, one could consider an attempt to perform an operation that violates a business invariant as a conflict. In an example from a banking context, Fowler favored using 409 when a withdrawal request couldn’t be processed due to insufficient funds (rather than 400), treating it as a state conflict with the account’s resource state. In our dog context, that might be like trying to register a dog twice for the same event – one could argue that’s a conflict.
Why it matters: It’s not a client-format error (400), not an auth issue (401/403), and not a server bug (500) – it’s a logical conflict. 409 informs the client that repeating the exact same request will not succeed unless something changes (idempotency). This often prompts either user action or client logic to resolve the conflict. For example, a client gets 409 on creating “Diesel” because there can only be one Diesel – it can inform the user “choose a different name” rather than blindly retrying. Or for an update conflict, the client knows to GET the latest state and merge changes. By using 409, your API is communicating that there’s nothing wrong with the request format and the server is fine, but the requested action can’t be done in the current state. This is very helpful in microservices where concurrent writes or uniqueness constraints across services can happen. It’s also great for observability: you can track 409s to see how often clients hit conflicts.
.NET example: There is a Conflict()
helper in ASP.NET for 409. We used it in the earlier CreateDog example:
if (_dogService.Exists(newDog.Name)) {
return Conflict("A dog with that name already exists.");
}
This returns a 409 Conflict with a message in the body. Another scenario:
[HttpPut("api/dogs/{id}")]
public IActionResult UpdateDog(int id, DogUpdateDto dto) {
if (!_dogService.Exists(id)) return NotFound();
try {
_dogService.UpdateDog(id, dto); // say this throws a ConcurrencyException
return NoContent(); //204
} catch (ConcurrencyException) {
return Conflict("Dog profile was updated by someone else. Please refresh and retry."); //409
}
}
Here, if our service layer throws a ConcurrencyException
because, say, an ETag or version check failed, we catch it and return Conflict()
to inform the client of the edit conflict.
422 Unprocessable Entity
What it means: 422 Unprocessable Entity
means the server understands the content type of the request and the syntax is correct, but the content was invalid in some way that prevented processing. In simpler terms, the request is well-formed, but the specific semantic errors make it unprocessable.
When to use: 422 is often used for validation errors where the request format is correct (hence not a 400), but the content fails business rules or more complex validation:
For example, in the Dog API, suppose
POST /api/dogs
requires a valid birth date for the dog. If the client provides a date in the future, that’s semantically invalid (a dog can’t be born in the future). The server could respond with 422 Unprocessable Entity, with a message like “Birth date cannot be in the future.”Another example: if the client attempts to perform an action that is conceptually correct in format but not allowed: “Update vaccination” where the vaccination data is internally inconsistent or violates a rule (e.g., a vaccination date is before the dog’s birthdate, or trying to add a vaccination that the dog already has). The server might return 422 to indicate “I understood your request, but I can’t process these specifics.”
In practice, some teams choose to use 400 for all kinds of validation errors (treating “missing required field” and “field value out of range” both as 400). Others use 422 to mean “the request payload was syntactically correct JSON and maybe partially valid, but there are domain-specific issues with it.” It’s a nuanced distinction. You did set up bounded contexts when determining your services, right? Right? Vinay Sahni’s guidelines list “422 Unprocessable Entity – Used for validation errors”, which reflects this common usage.
Why it matters: If you choose to use 422, it gives client developers a clue that “your request was understood and validated, but there are issues you need to correct.” The difference between 400 and 422 can be subtle, but it can help in large systems to separate pure format errors from semantic validation. For example, monitoring a spike in 400s might indicate a bug in how clients are formatting requests (or a change in the API spec), whereas a spike in 422s might indicate lots of users hitting a business rule (maybe an overly strict rule or a UI that allows invalid data to be submitted). It also allows the response body to focus on detailed validation errors, since a 422 is clearly about that. Using 422 is a way of saying “all your syntax was correct, but the request as a whole is unacceptable in its current form.”
.NET example: ASP.NET recently introduced the UnprocessableEntity()
helper and should be in all modern .NET Framework versions. We can use it similarly to other helpers:
[HttpPost("api/dogs/{id}/vaccinations")]
public IActionResult AddVaccination(int id, [FromBody] VaccinationRecord record) {
if (!_dogService.Exists(id)) return NotFound();
try {
_dogService.AddVaccination(id, record);
return NoContent();
} catch (InvalidOperationException ex) {
// the vaccination record is invalid (maybe vaccine is not applicable for the dog's age)
return UnprocessableEntity(new { error = ex.Message });
}
}
In this example, if the service throws an exception because the vaccination data didn’t pass some business rule (maybe the dog is too young for rabies vaccine, etc.), we catch it and return 422 Unprocessable Entity with an error message. The client sees a 422 and knows “my data was understood, but it failed validation; I need to adjust it.”
If using [ApiController]
, you could also customize the validation problem details to return 422 instead of 400 for certain cases, but that’s beyond our scope here. The main idea is: use 422 (if you choose to) for semantic validation failures.
A note on 400 vs 422: There is some debate on using 422 vs 400 for validation. Documentation and consistency is key around this when you do design your API. If you opt not to use 422, it’s fine to return 400 for all invalid input cases. Just ensure your clients know how to differentiate different error causes via error messages or error codes in the response body. The advantage of 422 is simply one extra layer of clarity.
5xx Server Error Codes: When Things Go Wrong on the Server (or Beyond)
The 5xx class indicates the server failed to fulfill a valid request. These are not the client’s fault; something went wrong on the server side or in a downstream service. Clients typically can’t fix these – but they might retry later. For microservices, distinguishing 5xx errors is vital for operations: a spike in 5xx means something needs investigation on the server side. Let’s cover the two big ones in our context: 500 and 504.
500 Internal Server Error
What it means: 500 Internal Server Error
is the generic catch-all for “the server encountered an unexpected condition that prevented it from fulfilling the request.” It’s essentially “something blew up on our end.”
When to use: Return 500 when no other specific 5xx code fits, and the error is indeed on the server. Common scenarios:
Unhandled exceptions in code (this never happens to you though). For example, a null reference exception, or an overflow, or any bug that wasn’t caught will typically result in a 500.
Database connection failures. If your service tries to fetch data and the database is down or throws an error that isn’t specifically handled, that might bubble up as a 500.
Essentially, any time your service logic fails unexpectedly. If you anticipated the failure, you might choose a more specific code (
503 Service Unavailable
for planned downtime, though that’s usually used by load balancers; or504 Gateway Timeout
if the downstream service is timing out, etc.). But if it’s a surprise – it’s 500.
In our API, if a GET request triggers an exception (maybe the database query threw), the user would get a 500. If a POST triggers a bug in business logic resulting in an exception not caught, 500 is returned. Ideally, your code catches exceptions and maybe converts them to a nicer error response (possibly a 400 if it was due to bad data, or 503 if a dependency is not available). But anything unanticipated bubbles up as 500.
Why it matters: 500 is how your service barks (I’m trying to keep the dog them going) for help. Monitoring systems will flag 500s as errors needing attention. As a rule, a well-designed microservice should minimize how often it returns 500 by handling expected error scenarios gracefully (using appropriate 4xx or 5xx codes for specific conditions). So, when a 500 does occur, it’s usually a true bug or outage. For the client, a 500 means “you did everything right, but the server failed – you can’t fix this from your end.” Clients might then either give up or schedule a retry after some delay, depending on the operation. From a maintainability perspective, when you see 500s in logs, you dive into server-side debugging. These errors often correlate with exceptions in your logs. One mantra is that 500 errors should not be part of normal business logic; if you find yourself intentionally returning 500 for expected conditions, consider using a different code. 500 should be reserved for “unexpected” failures – it's literally in the definition.
.NET example: By default, if an ASP.NET controller throws an uncaught exception, the framework will return a 500 Internal Server Error (and possibly with a generic error payload or none, depending on your settings). You typically don’t manually return StatusCode(500)
unless you caught an exception and want to wrap it. For example:
try {
// ...some operation...
} catch (Exception ex) {
_logger.LogError(ex, "Unexpected error in UpdateDog");
return StatusCode(StatusCodes.Status500InternalServerError, "An unexpected error occurred.");
}
This catches any exception and returns 500 with a message. In many cases, you’d let the global exception handler or middleware handle it. The key is: you as a developer focus on preventing these. But you might use StatusCode(500, ...)
if you have custom error handling logic and want to provide a custom error body.
504 Gateway Timeout
What it means: 504 Gateway Timeout
indicates that a server, acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to contact in order to complete the request. In essence, one server was calling another (or waiting on another), and the other didn’t respond in time, so the chain timed out.
When to use: Use 504 in a microservice when your service is dependent on an upstream service or external API and that call times out. Scenarios:
Your service calls an external Pedigree API (let’s say a third-party service that gives detailed lineage info). The client calls your endpoint
GET /api/dogs/123/pedigree
. Your service in turn calls the Pedigree API to fetch data. If the Pedigree API doesn’t respond within your timeout window, you should return 504 Gateway Timeout to the client. This tells the client that the server did not get a response from an upstream dependency.A more internal example: Service A calls Service B as part of a request. Service B hangs or is offline – Service A’s request to B times out. Service A can return 504 to its caller (which might be a user or perhaps another service) to indicate “I couldn’t complete your request because a downstream service didn’t respond.”
Also, API Gateways or load balancers themselves often return 504 if one of the downstream microservices doesn’t respond in time. For instance, an Nginx proxy might give a 504 if the backend took too long. But here we’re focusing on your microservice actively returning 504 when it waits on something else.
Why it matters (especially in vendor-dependent systems): In microservice ecosystems that rely on third-party vendors (payment gateways, mapping APIs, etc.), timeouts are a fact of life. Emphasizing the 504 scenario is critical because it’s about graceful degradation. If a vendor API is slow or down, your service should not hang indefinitely, nor should it pretend everything is fine. It should fail fast and inform the client with a 504. This has several benefits:
Clarity: The client (or calling service) knows the error is due to an upstream timeout. They might choose to implement a retry strategy with backoff, or present a specific message to the user (“The service is experiencing delays from a downstream provider, please try again later.”). If you simply returned 500, the client wouldn’t know it was a timeout vs a bug in your code.
Resource freeing: By timing out and returning 504, your service frees up resources (threads, memory) that would otherwise be stuck waiting. It’s better to fail and report than to tie up resources on a lost cause. Sam Newman states that setting timeouts is key to building resilient services – without them, calls could hang forever and cascade issues.
Observability & Monitoring: A rise in 504 errors specifically can alert you (and the vendor) that something is wrong with the upstream service’s performance. You might have dashboards showing 504s separate from 500s. This is gold for quickly diagnosing issues in a complex chain. For example, if Service A returns a bunch of 504s, you immediately check Service B or the third-party system it depends on. It narrows down the problem domain.
Maintainability: Designing with 504 in mind forces you to think about timeout strategies and fallback plans. This leads to more robust code. Perhaps you implement a circuit breaker pattern: after several 504s, you stop calling the vendor for a while and immediately fail (or degrade functionality) to avoid cascading latency. Martin Fowler describes that circuit breakers help “avoid waiting on timeouts for the client” and prevent overloading a struggling upstream by short-circuiting calls. In practice, a circuit breaker might internally treat repeated timeouts as errors and for a period, return an error (maybe a 503 or 504 immediately) without attempting the upstream call, until the upstream seems healthy again. This spares your system extra load and gives the upstream time to recover.
User Experience: If you return 504 quickly, the user isn’t left staring at a spinning loader for a minute only to get an error anyway. Failing fast can allow the client to maybe call a fallback service or at least inform the user promptly.
In summary, a 504 is the correct way to propagate an upstream timeout condition. It says: “I, the gateway, timed out waiting for a response from another server.” Contrast this with a 503 Service Unavailable
, which typically means “the server itself is temporarily overloaded or down.” A 504 pinpoints it to an upstream dependency issue.
.NET example: Suppose our Dog API has an endpoint to get a dog’s pedigree from an external service:
//This represents the API Gateway
[HttpGet("api/dogs/{id}/pedigree")]
public async Task<IActionResult> GetPedigree(int id) {
if (!_dogService.Exists(id)) {
return NotFound(); // 404 if dog not found
}
try {
var pedigree = await _pedigreeService.GetPedigreeAsync(id);
return Ok(pedigree); // 200 OK with data if successful
} catch (TimeoutException) {
// The call to the 3rd party pedigree service timed out
return StatusCode(StatusCodes.Status504GatewayTimeout,
"Pedigree service did not respond in time"); //504
} catch (Exception ex) {
// Some other error in calling external service or processing
_logger.LogError(ex, "Unexpected error getting pedigree");
return StatusCode(StatusCodes.Status500InternalServerError, "Internal error"); //500
}
}
In this snippet, _pedigreeService.GetPedigreeAsync(id)
represents a call to the external vendor (perhaps using HttpClient
under the hood). We wrap it in a try/catch. If it throws a TimeoutException
(meaning we hit our timeout without a response), we return a 504 Gateway Timeout with a message. Any other exception we treat as a generic 500. Notice we check for the dog existence first to handle 404 separately – a missing dog is not an upstream timeout issue.
It’s important that we set a timeout on the external call. If you never set one, you might never throw that TimeoutException and your thread could hang. Best practice is to use a cancellation token or timeout mechanism on HttpClient (like HttpClient.Timeout
property or using CancellationTokenSource
, but we’ll talk about CancellationTokens some other time). By doing so, you ensure that after X seconds of no response, you abandon the call and return 504. This is implementing the fail fast principle. As Newman’s Building Resilient Systems book suggests, timeouts are your first line of defense – they prevent your system from waiting indefinitely.
Handling 504 in microservices: Beyond just returning 504, a robust service might also:
Retry the upstream call a couple of times before giving up (especially if the operation is read-only and idempotent). If a transient slowdown caused the timeout, a quick retry might succeed. If all retries fail, then return 504.
Implement a circuit breaker as mentioned, so that if the upstream is consistently timing out, you stop hammering it for a while. The circuit breaker could trigger a fallback – for example, return cached data or a default response if available, instead of an outright error. If no fallback is possible, 504 is still returned, but the circuit breaker ensures you recover faster when the upstream is back.
Log the timeout with context (which upstream, how long we waited) and possibly trigger alerts if it crosses a threshold.
Communicate with the vendor: if this is a third-party, your devops team might contact the vendor when seeing sustained 504s, while your service keeps returning 504 to clients to be transparent about the issue.
From the client’s perspective, a 504 might mean they should try again later. If it’s a user-facing scenario, you might show a friendly error like “We’re experiencing delays from our data provider. Please try again in a few minutes.” If it’s service-to-service, the calling service might catch 504 and decide to either propagate it further up or implement its own fallback.
To put it in the words of an error reference: “A 504 error indicates that the web server (acting as a gateway) was waiting too long for a response from another server and timed out”. This is precisely why we emphasize it when leveraging the services of third parties.
Status Codes Matter
HTTP status codes might seem like small numeric signals, but as I’ve shown, they carry a lot of weight in microservice-based systems. When you build distributed systems, thinking deliberately about which code to return in each scenario is part of designing a clear and maintainable API. And continuing on this thought, nearly everything is a distributed system.
By using the correct codes:
You make your APIs self-explanatory (a new developer can read the code or API docs and immediately understand what 401 vs 403 or 409 means in your context).
You enhance observability, since tools can rely on the status codes to measure the health and behavior of your services (e.g., tracking 5xx rates for instability, 4xx for client misuse, etc.).
You improve client handling of errors – well-behaved clients will read 409 and not retry immediately (instead maybe prompt user), but might retry on 503 or 504 after a delay. They’ll redirect on 201 if needed, or prompt auth on 401. In essence, you play into the HTTP ecosystem’s established patterns.
You ensure maintainability and consistency across services. If every vendor and team follows these practices, services can work together more easily. Following these protocols ensure that new engineers can be onboarded quickly by following industry standards.
Martin Fowler and Sam Newman often remind us that the first rule of distributed system design is to acknowledge it is distributed – things will fail. Using status codes, contracts, and HTTP verbs properly is part of your API Design. David Farley cautions against the pitfalls of misapplying microservices – one such pitfall would be neglecting fundamentals like clear API communication. On the flip side, embracing these fundamentals (like clear status codes) helps unlock the benefits of microservices by making services loosely coupled but strongly coherent in protocol.
Look - I’m not that creative - many other smarter people have written these protocols. Follow industry standard definitions for status codes as outlined (there’s a reason these codes exist!). Document your API’s error responses. For critical integrations (like with vendors), establish and handle timeouts and use the appropriate codes. Leverage your framework – as we saw with ASP.NET, many helpers exist (Ok()
, NotFound()
, BadRequest()
, etc.) to make doing the right thing easy.
By treating HTTP status codes not as an afterthought but as a core part of your API design, you’ll create services that are easier to debug, scale, and integrate. The result is a more resilient microservice ecosystem – one where, if something goes wrong, everyone knows exactly what’s going on just by looking at the HTTP responses. And as a bonus, the next developer to maintain your service will thank you for those clear 4xx/5xx signals instead of a mysterious "error": "something went wrong"
with a 200 OK
.
Let your microservices speak the language of HTTP clearly. A well-placed status code is worth a thousand words (or at least saves a trip to the logs). So whether you’re fetching a dog profile or updating a vaccination record, make sure your service barks the right code!
People and Sources that are Smarter than Me:
Newman, Sam. Building Microservices – Emphasizes designing services with clear contracts and handling failures gracefully (timeouts, etc.).
Fowler, Martin. Richardson Maturity Model – Discusses the importance of using HTTP verbs and codes in REST APIs.
Sahni, Vinay. Best Practices for REST API – Provides practical guidelines on status codes (e.g., 201 for create with Location header, 422 for validation).
Farley, Dave. Continuous Delivery & Microservices – Stresses getting the fundamentals right and avoiding complexity when it’s not adding value.
Microsoft ASP.NET Team (led by Scott Guthrie) – Built frameworks for proper use of HTTP status codes, highlighting their importance in API design.
Subscribe to my newsletter
Read articles from Larry Gasik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by