Python - Different WSGI and ASGI Running Performance
The blog post aims to give an account of the problem statement where I had to compare the performance of different worker modes of gunicorn to be able to come up with the worker mode to be used for different nature of workloads.
This blog demonstrates the working performance of different worker modes in Gunicorn (a Python web server gateway interface HTTP server) for different worker models including sync
, gthread
, gevent
, and ASGI worker mode with Uvicorn and FastAPI for a given spread of server work (mix of CPU + IO ).
The concept here is - we can introduce a dummy workload that simulates some CPU work and some IO (remote API calls) work. One can tune the below test code with the nature of the workload that one is expecting to run i.e. Higher CPU, increase the CPU work time with fake recur_fibo
function OR if there are long tailing API calls and highly bounded IO work - increase the response time of the Latency
response header.
Simplistic Latency Server
Firstly - we need to simulate the external API calls that our server is gonna make. To do this, estimate the total amount of external services IO time. We'll write a latency server to simulate this external IO time. Latency server in golang
as follows:
package main
import (
"fmt"
"log"
"net/http"
"strconv"
"time"
)
func healthHandler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "healthy!")
}
func healthWithLatencyHandler(w http.ResponseWriter, r *http.Request) {
str_latency := r.Header.Get("Latency")
latency, _ := strconv.Atoi(str_latency)
time.Sleep(time.Duration(latency) * time.Millisecond)
fmt.Fprintf(w, "%d, healthy!", latency)
}
func main() {
http.HandleFunc("/health", healthHandler)
http.HandleFunc("/healthWithLatency", healthWithLatencyHandler)
fmt.Printf("Server started at port 8080\n")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}
As shown above, the healthWithLatencyHandler
function is a HTTP handler that reads the Latency
header value from an incoming request and responds with a delay of v * time.Millisecond
where v
is the value of Latency
. It is used to simulate latency in API responses for testing or debugging purposes.
Flask Server to do some CPU and IO work
Now, we'll write a flask server which will do some CPU bound work and some IO bound work - as follows:
import requests
from flask import Flask
from pprint import pformat
import time
app = Flask(__name__)
LATENCY_SERVER = "<hosted endpoint of above go server>"
# Python program to display the Fibonacci sequence
def recur_fibo(n):
if n <= 1:
return n
else:
return(recur_fibo(n-1) + recur_fibo(n-2))
@app.route('/')
def remote_latencycall():
# consider ext service latency = 150ms
# consider local latency = 25ms
# do some work
ans = recur_fibo(25)
res = requests.get(
f"http://{LATENCY_SERVER}/healthWithLatency",
headers={"Latency": "150"}
)
return pformat([res, res.text, ans]), 200
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080)
The above code is representative of the actual server code we're supposed to run. We simulate the remote API call latency of 150ms
and simulate a dummy CPU-bound work of 25ms
. I've timed the recur_fibo
function as follows -
import timeit
import time
def recur_fibo(n):
if n <= 1:
return n
else:
return(recur_fibo(n-1) + recur_fibo(n-2))
starttime = timeit.default_timer()
recur_fibo(25)
print("Time taken = ", str((timeit.default_timer() - starttime)*1000))
The above responded on ~25ms
average.
Running the test with ab
for sync
, gthread
and gevent
workers
We'll use the ApacheBench tool i.e. ab
to run a load test on each one of the above worker modes.
sync
worker test - 9.80 [#/sec] (mean)
Using command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 -k sync --access-logfile -
And simulating the load via ab
with -
timeout -s INT 20s ab -n 100000 -c 10 http://localhost:8080/
Output of ab
command is as follows -
Concurrency Level: 10
Time taken for tests: 19.991 seconds
Complete requests: 196
Failed requests: 0
...
Requests per second: 9.80 [#/sec] (mean)
Time per request: 1019.925 [ms] (mean)
Time per request: 101.993 [ms] (mean, across all concurrent requests)
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 2
Processing: 421 971 176.1 988 1460
Waiting: 421 970 175.8 985 1460
Total: 422 971 176.0 988 1460
Percentage of the requests served within a certain time (ms)
50% 988
66% 1086
75% 1119
80% 1126
90% 1145
95% 1172
98% 1414
99% 1420
100% 1460 (longest request)
gthread
worker test - 23.31 [#/sec] (mean)
Command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 --threads 4 -k gthread --access-logfile -
Output from ab
test -
Time taken for tests: 19.990 seconds
Complete requests: 466
...
Requests per second: 23.31 [#/sec] (mean)
Time per request: 428.971 [ms] (mean)
Time per request: 42.897 [ms] (mean, across all concurrent requests)
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 343 415 48.3 404 744
Waiting: 343 414 48.2 402 742
Total: 343 416 48.3 404 744
Percentage of the requests served within a certain time (ms)
50% 404
66% 417
75% 427
80% 436
90% 453
95% 486
98% 599
99% 656
100% 744 (longest request)
gevent
worker test - 23.71 [#/sec] (mean)
Command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 --threads 4 -k gevent --access-logfile -
Output from ab
-
Concurrency Level: 10
Time taken for tests: 19.992 seconds
Complete requests: 474
Failed requests: 0
...
Requests per second: 23.71 [#/sec] (mean)
Time per request: 421.780 [ms] (mean)
Time per request: 42.178 [ms] (mean, across all concurrent requests)
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 339 409 49.9 400 713
Waiting: 339 409 50.0 399 713
Total: 339 410 50.0 400 713
Percentage of the requests served within a certain time (ms)
50% 400
66% 410
75% 416
80% 422
90% 440
95% 468
98% 668
99% 693
100% 713 (longest request)
ASGI with FastAPI and uvicorn
test - 31.71 [#/sec] (mean)
For ASGI server, we'll write a similar code in FastAPI and run it with uvicorn
asgi workers as follows -
import http3
from pprint import pformat
import time
from fastapi import FastAPI
app = FastAPI()
LATENCY_SERVER = "<hosted endpoint of above go server>"
client = http3.AsyncClient()
# Python program to display the Fibonacci sequence
async def recur_fibo(n):
if n <= 1:
return n
else:
return(await recur_fibo(n-1) + await recur_fibo(n-2))
@app.get('/')
async def remote_latencycall():
# consider ext service latency = 150ms
# consider local latency = 25ms
# do some work
ans = await recur_fibo(25)
res = await client.get(
f"http://{LATENCY_SERVER}/healthWithLatency",
headers={"Latency": "150"}
)
return pformat([res, res.text, ans]), 200
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080)
Command to run the above ASGI
python3.10 -m gunicorn 'pyfastapiclient:app' -b 0.0.0.0:8080 -w 4 -k uvicorn.workers.UvicornH11Worker --access-logfile -
Output running the ab
test -
Concurrency Level: 10
Time taken for tests: 19.993 seconds
Complete requests: 634
Failed requests: 0
...
Requests per second: 31.71 [#/sec] (mean)
Time per request: 315.347 [ms] (mean)
Time per request: 31.535 [ms] (mean, across all concurrent requests)
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 267 308 31.9 302 574
Waiting: 267 308 31.9 302 573
Total: 267 308 32.0 302 575
Percentage of the requests served within a certain time (ms)
50% 302
66% 309
75% 313
80% 316
90% 330
95% 347
98% 419
99% 460
100% 575 (longest request)
Conclusion
For the nature of work that I had expected to be running - which would typically require 25ms
of CPU work and 150ms
of IO work - I get the following results -
sync == 9.80 [#/sec] (mean)
gthread == 23.31 [#/sec] (mean)
gevent == 23.71 [#/sec] (mean)
uvicorn == 31.71 [#/sec] (mean)
Hence the best option to go for the workload in the case we've simulated is FastAPI+uvicorn assuming that I can write the application with the ASGI framework.
In case, the application is already written with the WSGI server - the best option to go with is gevent
worker model.
This is how I simulated the workload running on different worker modes and take a decision as to what worker model and framework is best to be used for the given purpose. If you have experimented with your test - do share the implementation for wider learning.
Subscribe to my newsletter
Read articles from Deep Tech directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Deep Tech
Deep Tech
I work as a Staff SRE. I’m an avid techie. During the day, I work hard to ensure that our platform runs smoothly, and at night, I love to learn new things about technology. In my free time, I also enjoy travelling and exploring new places.