Using AI to Optimise Database Queries in Django Applications


Introduction
As your Django app grows, be it the number of users, the amount of data, or the complexity of your business logic, database queries often become a real headache. Slow queries can drag down your entire user experience, loading pages sluggishly and frustrating everyone involved.
Traditionally, you’d solve this by hand: adding indexes, tweaking queries, or maybe caching results. While these methods work, they can be a bit of a faff and don’t always adapt as your site evolves.
This is where artificial intelligence comes in. By logging your queries, training a machine learning model, and letting AI make sense of all that data, you can:
Quickly spot which queries might slow down in the future.
Get suggestions for how to speed them up, potentially before they even become a noticeable problem.
Rely less on guesswork and spend fewer hours digging through logs manually.
In this article, we’ll look at how to set up a basic system that logs your queries, trains an AI model to predict slow ones, and then uses that model in your Django application to offer practical suggestions.
Step 1: Logging and Collecting Query Data
First things first, we need data—without it, the AI model’s got nothing to learn from. Django can log all the queries run during a request. We’ll use a bit of middleware to write these queries and their execution times to a file so we can analyse them later.
Example Middleware (middleware.py
):
pythonCopy codeimport time
from django.db import connection
from django.utils.deprecation import MiddlewareMixin
class QueryLoggerMiddleware(MiddlewareMixin):
def process_request(self, request):
self.start_time = time.time()
def process_response(self, request, response):
total_time = time.time() - self.start_time
# Write queries to a CSV file for future analysis.
for query in connection.queries:
with open("query_logs.csv", "a") as f:
safe_sql = query['sql'].replace(",", ";")
f.write(f"{safe_sql},{query['time']}\n")
print(f"Total Request Time: {total_time}s")
return response
Add this to your MIDDLEWARE
in settings.py
and let your site run normally for a bit. Over time, you’ll gather a nice collection of queries and execution times. Make sure to keep an eye on sensitive data, though—if you’ve got personal user info in these queries, consider masking it.
Step 2: Analysing Your Logged Data
After a few days or weeks, you’ll have a decent amount of logged queries. Each line in query_logs.csv
should have the SQL query and how long it took.
This is where we start turning that raw data into something an AI model can use. For instance, you might look at things like:
How long the query is (the number of characters).
How many JOINs it uses.
Any particular SQL patterns that could slow it down.
You can load query_logs.csv
into Pandas, preprocess it, and come up with extra features that help the model understand which queries are generally slow.
Step 3: Training the AI Model (Offline)
We’ll do the training in a separate script. The idea is straightforward: we want to classify queries as “slow” or “not slow” based on the data we’ve got. Once trained, we’ll save the model and load it in Django.
Example (model_
training.py
):
pythonCopy codeimport pandas as pd
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Load your query logs
df = pd.read_csv("query_logs.csv", names=["query", "execution_time"])
df['execution_time'] = df['execution_time'].astype(float)
# Add simple features
df['query_length'] = df['query'].apply(len)
df['num_joins'] = df['query'].str.count("JOIN")
# Define what's "slow"
threshold = 0.2
df['slow'] = df['execution_time'] > threshold
X = df[['query_length', 'num_joins']]
y = df['slow']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
print(f"Model Accuracy: {accuracy}")
# Save the model
joblib.dump(model, "trained_model.joblib")
print("Model saved as trained_model.joblib")
We’ve now got a model that can, at least roughly, tell us which queries might be slow. Sure, it’s a simplistic example, but you can get fancy by adding more clever features.
Step 4: Integrating the Model Into Django
With the model ready, we can load it into our Django app and use it whenever we want. For instance, let’s say we build a simple endpoint that, given a query, predicts whether it’s slow and suggests improvements.
Example View (views.py
):
pythonCopy codeimport joblib
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
model = joblib.load("trained_model.joblib")
@csrf_exempt
def optimise_query(request):
# In a real setup, you'd get these details from a request or a proper logging system.
query_text = "SELECT * FROM users JOIN orders ON users.id = orders.user_id"
query_length = len(query_text)
num_joins = query_text.count("JOIN")
feature_values = [query_length, num_joins]
is_slow = model.predict([feature_values])[0]
suggestion = "Consider adding an index or rethinking the JOIN" if is_slow else "Query seems fine."
return JsonResponse({"is_slow": bool(is_slow), "suggestion": suggestion})
Of course, in a real application, you’d integrate this logic with your actual query data. You might have a dashboard that shows recent queries and their predicted performance, or a development tool that lets your team test queries before shipping changes.
Keeping It Going
Continuous Updates:
Every so often, gather more logs and retrain your model. The system improves over time, adapting to new data patterns.More Features:
Add extra features to your model. Maybe check if the query hasORDER BY
,GROUP BY
, or how many rows it usually returns. The better the features, the smarter your model’s predictions.Explainability:
Give developers some insight into why the model thinks a query is slow. For example, “High number of JOINs detected” or “Very long query string” helps them know what to fix.Security and Privacy:
Ensure any sensitive data is stripped or anonymised in the logs, so you’re not exposing user info inadvertently.
Why Bother?
This approach takes some time to set up, but it can pay off nicely:
Saves Developer Time: Instead of manually digging into logs and guesswork, you get automated insights.
Proactive Optimisation: You find and fix issues before users notice slow performance.
Scalability: As your data grows, the model keeps learning, ensuring performance isn’t just a short-term fix but an ongoing, adaptive process.
Conclusion
Turning to AI for query optimisation in Django is a neat idea that can seriously smooth out performance tuning. While this was a simplistic example, the general idea is flexible and can be tailored to your needs. As you refine the approach, adding better features, training on more data, and integrating tighter with your workflow, you’ll have a system that keeps your app nippy and responsive, even as it scales.
The best part is that it’s not just about immediate fixes. It’s about building a feedback loop where data informs models, models inform decisions, and those decisions make your application run better now and in the future.
Subscribe to my newsletter
Read articles from TemiTope Kayode directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

TemiTope Kayode
TemiTope Kayode
Seasoned software engineer and founder specialised in web and mobile applications, enterprise applications, cloud computing, and DevOps using tools like Django, React, Flutter, AWS, and DigitalOcean. Currently a Senior Software Developer and a mentor, I balance coding with family and leisure. Holds a distinction in a Masters in Computer Science from Coventry University, blending education with practical prowess. Passionate about technology and innovation, eager to connect and explore new possibilities.