Python - Different WSGI and ASGI Running Performance

Deep TechDeep Tech
7 min read

The blog post aims to give an account of the problem statement where I had to compare the performance of different worker modes of gunicorn to be able to come up with the worker mode to be used for different nature of workloads.

This blog demonstrates the working performance of different worker modes in Gunicorn (a Python web server gateway interface HTTP server) for different worker models including sync, gthread, gevent, and ASGI worker mode with Uvicorn and FastAPI for a given spread of server work (mix of CPU + IO ).

The concept here is - we can introduce a dummy workload that simulates some CPU work and some IO (remote API calls) work. One can tune the below test code with the nature of the workload that one is expecting to run i.e. Higher CPU, increase the CPU work time with fake recur_fibo function OR if there are long tailing API calls and highly bounded IO work - increase the response time of the Latency response header.

Simplistic Latency Server

Firstly - we need to simulate the external API calls that our server is gonna make. To do this, estimate the total amount of external services IO time. We'll write a latency server to simulate this external IO time. Latency server in golang as follows:

package main

import (
    "fmt"
    "log"
    "net/http"
    "strconv"
    "time"
)

func healthHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "healthy!")
}

func healthWithLatencyHandler(w http.ResponseWriter, r *http.Request) {
    str_latency := r.Header.Get("Latency")
    latency, _ := strconv.Atoi(str_latency)
    time.Sleep(time.Duration(latency) * time.Millisecond)
    fmt.Fprintf(w, "%d, healthy!", latency)
}

func main() {

    http.HandleFunc("/health", healthHandler)
    http.HandleFunc("/healthWithLatency", healthWithLatencyHandler)

    fmt.Printf("Server started at port 8080\n")
    if err := http.ListenAndServe(":8080", nil); err != nil {
        log.Fatal(err)
    }
}

As shown above, the healthWithLatencyHandler function is a HTTP handler that reads the Latency header value from an incoming request and responds with a delay of v * time.Millisecond where v is the value of Latency. It is used to simulate latency in API responses for testing or debugging purposes.

Flask Server to do some CPU and IO work

Now, we'll write a flask server which will do some CPU bound work and some IO bound work - as follows:

import requests
from flask import Flask
from pprint import pformat
import time

app = Flask(__name__)

LATENCY_SERVER = "<hosted endpoint of above go server>"

# Python program to display the Fibonacci sequence
def recur_fibo(n):
   if n <= 1:
       return n
   else:
       return(recur_fibo(n-1) + recur_fibo(n-2))

@app.route('/')
def remote_latencycall():
    # consider ext service latency = 150ms
        # consider local latency = 25ms

    # do some work
    ans = recur_fibo(25)

    res = requests.get(
        f"http://{LATENCY_SERVER}/healthWithLatency", 
        headers={"Latency": "150"}
    )

    return pformat([res, res.text, ans]), 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

The above code is representative of the actual server code we're supposed to run. We simulate the remote API call latency of 150ms and simulate a dummy CPU-bound work of 25ms. I've timed the recur_fibo function as follows -

import timeit
import time

def recur_fibo(n):
   if n <= 1:
       return n
   else:
       return(recur_fibo(n-1) + recur_fibo(n-2))

starttime = timeit.default_timer()
recur_fibo(25)
print("Time taken = ", str((timeit.default_timer() - starttime)*1000))

The above responded on ~25ms average.

Running the test with ab for sync, gthread and gevent workers

We'll use the ApacheBench tool i.e. ab to run a load test on each one of the above worker modes.

sync worker test - 9.80 [#/sec] (mean)

Using command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 -k sync --access-logfile -

And simulating the load via ab with -

timeout -s INT 20s ab -n 100000 -c 10 http://localhost:8080/

Output of ab command is as follows -

Concurrency Level:      10
Time taken for tests:   19.991 seconds
Complete requests:      196
Failed requests:        0
...
Requests per second:    9.80 [#/sec] (mean)
Time per request:       1019.925 [ms] (mean)
Time per request:       101.993 [ms] (mean, across all concurrent requests)
...

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       2
Processing:   421  971 176.1    988    1460
Waiting:      421  970 175.8    985    1460
Total:        422  971 176.0    988    1460

Percentage of the requests served within a certain time (ms)
  50%    988
  66%   1086
  75%   1119
  80%   1126
  90%   1145
  95%   1172
  98%   1414
  99%   1420
 100%   1460 (longest request)

gthread worker test - 23.31 [#/sec] (mean)

Command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 --threads 4 -k gthread --access-logfile -

Output from ab test -

Time taken for tests:   19.990 seconds
Complete requests:      466
...
Requests per second:    23.31 [#/sec] (mean)
Time per request:       428.971 [ms] (mean)
Time per request:       42.897 [ms] (mean, across all concurrent requests)
...

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   343  415  48.3    404     744
Waiting:      343  414  48.2    402     742
Total:        343  416  48.3    404     744

Percentage of the requests served within a certain time (ms)
  50%    404
  66%    417
  75%    427
  80%    436
  90%    453
  95%    486
  98%    599
  99%    656
 100%    744 (longest request)

gevent worker test - 23.71 [#/sec] (mean)

Command - python3.10 -m gunicorn 'wsgi:app' -b 0.0.0.0:8080 -w 4 --threads 4 -k gevent --access-logfile -

Output from ab -

Concurrency Level:      10
Time taken for tests:   19.992 seconds
Complete requests:      474
Failed requests:        0
...
Requests per second:    23.71 [#/sec] (mean)
Time per request:       421.780 [ms] (mean)
Time per request:       42.178 [ms] (mean, across all concurrent requests)
...

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   339  409  49.9    400     713
Waiting:      339  409  50.0    399     713
Total:        339  410  50.0    400     713

Percentage of the requests served within a certain time (ms)
  50%    400
  66%    410
  75%    416
  80%    422
  90%    440
  95%    468
  98%    668
  99%    693
 100%    713 (longest request)

ASGI with FastAPI and uvicorn test - 31.71 [#/sec] (mean)

For ASGI server, we'll write a similar code in FastAPI and run it with uvicorn asgi workers as follows -

import http3
from pprint import pformat
import time

from fastapi import FastAPI

app = FastAPI()

LATENCY_SERVER = "<hosted endpoint of above go server>"

client = http3.AsyncClient()

# Python program to display the Fibonacci sequence
async def recur_fibo(n):
   if n <= 1:
       return n
   else:
       return(await recur_fibo(n-1) + await recur_fibo(n-2))

@app.get('/')
async def remote_latencycall():
    # consider ext service latency = 150ms
        # consider local latency = 25ms

    # do some work
    ans = await recur_fibo(25)

    res = await client.get(
        f"http://{LATENCY_SERVER}/healthWithLatency", 
        headers={"Latency": "150"}
    )

    return pformat([res, res.text, ans]), 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

Command to run the above ASGI

python3.10 -m gunicorn 'pyfastapiclient:app'  -b 0.0.0.0:8080 -w 4 -k uvicorn.workers.UvicornH11Worker --access-logfile -

Output running the ab test -

Concurrency Level:      10
Time taken for tests:   19.993 seconds
Complete requests:      634
Failed requests:        0
...
Requests per second:    31.71 [#/sec] (mean)
Time per request:       315.347 [ms] (mean)
Time per request:       31.535 [ms] (mean, across all concurrent requests)
...

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:   267  308  31.9    302     574
Waiting:      267  308  31.9    302     573
Total:        267  308  32.0    302     575

Percentage of the requests served within a certain time (ms)
  50%    302
  66%    309
  75%    313
  80%    316
  90%    330
  95%    347
  98%    419
  99%    460
 100%    575 (longest request)

Conclusion

For the nature of work that I had expected to be running - which would typically require 25ms of CPU work and 150ms of IO work - I get the following results -

sync         ==         9.80 [#/sec] (mean)
gthread      ==         23.31 [#/sec] (mean)
gevent       ==         23.71 [#/sec] (mean)
uvicorn      ==         31.71 [#/sec] (mean)

Hence the best option to go for the workload in the case we've simulated is FastAPI+uvicorn assuming that I can write the application with the ASGI framework.

In case, the application is already written with the WSGI server - the best option to go with is gevent worker model.

This is how I simulated the workload running on different worker modes and take a decision as to what worker model and framework is best to be used for the given purpose. If you have experimented with your test - do share the implementation for wider learning.

0
Subscribe to my newsletter

Read articles from Deep Tech directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Deep Tech
Deep Tech

I work as a Staff SRE. I’m an avid techie. During the day, I work hard to ensure that our platform runs smoothly, and at night, I love to learn new things about technology. In my free time, I also enjoy travelling and exploring new places.