Django and Semgrep: Enforcing a Service Layer Using Static Analysis

Simon CroweSimon Crowe
4 min read

In my previous post about implementing a service layer in Django, I wrote about a simple pattern that "plays nice" with the mountain of functionality that comes with Django out-of-the-box, particularly the ORM.

In this implementation, business logic is grouped into modules containing functions. Although logic can be grouped based on whatever makes sense in terms of encapsulating functionality and providing useful APIs to the rest of a codebase, this article is only concerned with service modules that map onto Django data models.

This article will show you rough-and-ready example how to enforce the simplest Service Layer pattern using Semgrep rules that can be checked in CI pipelines. By automatically enforcing adherence to a basic high-level pattern, static analysis can take some of the burden away from code reviewers, and empower them to focus on business logic rather than patterns.

Enforcing the Rules

The rules map a particular Django ORM model to a service module and ensure that all or a subset of manager methods for that model are only called with the module.

The simplest way to enforce the service layer pattern is to prevents developers from calling any ORM methods from outside of the service module. Below is check_rules.sh where a YAML list of semgrep rules is built dynamically by appending the output of a Jinja template.

#!/bin/sh

# This is expected to be run from repo root
template_path=semgrep/python/django_service_pattern_strict/rule.yaml.jinja
service_rules_filename=/tmp/django-service-rules.yaml

echo "rules:" > $service_rules_filename
jinja -D model_name Share \
      -D service_file_path apps.core.service.user.py \
      -D model_class_path apps.core.models.User \
      $template_path >> $service_rules_filename
jinja -D model_name Share \
      -D service_file_path apps/advertising/service/campaign.py \
      -D model_class_path apps.advertising.models.Campaign \
      $template_path >> $service_rules_filename

semgrep --error -f $service_rules_filename touchsurgery
semgrep_exit_code=$?

rm $service_rules_filename

exit $semgrep_exit_code

The resulting rule ensurers that ORM calls that access the database via apps.advertising.models.Campaign can only be made from the apps.advertising.service.campaign module and the same for the User model.

Here is the template used to generate rules for each data model/service module pair.

  - id: {{ model_name|lower }}-service-strict
    languages:
      - python
    message: |
      Call methods on the {{ model_name }} model's manager(s) in the appropriate service module:
      {{ service_file_path }}
    pattern-either:
      - pattern: {{ model_class_path }}(...)
      - pattern: {{ model_class_path }}.$MANAGER.$METHOD(...)\
    severity: ERROR
    paths:
      exclude:
        - {{ service_file_path }}
        - conftest.py
        - test*.py
        - tests/*.py

As you can see it's not particularly complicated. The trade-off of this is having to wrap even safeSELECT queries in service functions; after a while this can become laborious.

Testing the Rules

It's good to have some confidence that static analysis works as intended. Semgrep has us covered here, allowing us to run rules against test files that are annotated with the names of rules.

#!/bin/sh

# This is expected to be run from platform repo root
template_path=semgrep/python/django_service_pattern_strict/rule.yaml.jinja
test_dir=/tmp/django-service-test
rules_filename=${test_dir}/rule.yaml
rules_test_filename=${test_dir}/rule.py

mkdir $test_dir
cp semgrep/python/django_service_pattern_strict/tests.py $rules_test_filename
echo "rules:" > $rules_filename
jinja -D model_name SomeModel \
      -D service_file_path /dev/null \
      -D model_class_path app.models.SomeModel \
      $template_path >> $rules_filename

semgrep --quiet --test $test_dir

rm -r $test_dir

Here is tests.py with various code snippets that will trigger the strict Semgrep rule.

import random

from django.db import transaction
from django.db.models import F, Q

from app.models import SomeModel

# ruleid: somemodel-service-strict
instance = SomeModel(foo="bar")

# ruleid: somemodel-service-strict
result = SomeModel.objects.get(pk=1)

# ruleid: somemodel-service-strict
results = SomeModel.objects.filter(foo="bar")

# ruleid: somemodel-service-strict
values = SomeModel.objects.filter(foo="bar").values_list("baz", flat=True)

# ruleid: somemodel-service-strict
obj_one, obj_two = SomeModel.objects.bulk_create(
    # ruleid: somemodel-service-strict
    (SomeModel(foo="bar"), SomeModel(foo="baz"))
)

obj_one.foo = obj_two.foo = "flob"
# ruleid: somemodel-service-strict
SomeModel.some_custom_manager.bulk_update([obj_one, obj_two])

# ruleid: somemodel-service-strict
SomeModel.objects.create(foo="wobble")

# ruleid: somemodel-service-strict
SomeModel.sllyMngrNme.filter(foo__icontains="ob").exclude(foo__icontains="f").delete()

# ruleid: somemodel-service-strict
obj, _ = SomeModel.objects.get_or_create(foo="wibble")

# ruleid: somemodel-service-strict
qs = SomeModel.objects.select_for_update().filter(foo="bar")
with transaction.atomic():
    for i, obj in enumerate(qs):
        obj.foo = f"bar_{i:03d}"
        obj.save()

# ruleid: somemodel-service-strict
SomeModel.all_objects.update(foo="wubble")

# ruleid: somemodel-service-strict
SomeModel.objects.update_or_create(id=2, foo="bar")

# ruleid: somemodel-service-strict
unpersisted_obj = SomeModel()
unpersisted_obj.foo = "fuzzle"
unpersisted_obj.save()

# ruleid: somemodel-service-strict
doomed_qs = SomeModel.objects.all()
mercy = bool(random.getrandbits(1))
if mercy:
    doomed_qs = doomed_qs.annotate(odd=F("id") % 2).filter(odd=False)
doomed_qs.delete()

# ruleid: somemodel-service-strict
doomed_obj = SomeModel.objects.filter(
    Q(foo__contains="z") | Q(foo__contains="a")
).last()
doomed_obj.delete()

# ruleid: somemodel-service-strict
obj, _ = SomeModel.objects.all()[:2]
obj.delete()

A less restrictive approach

There is an alternative approach, which only prevents developers from calling ORM methods that modify database state outside of the service module. This less strict version doesn’t force developers to wrap safe ORM calls that just generate SELECT queries.

I am not including an example rule because of the verbosity necessitated by covering a subset of Django QuerySet and Manager methods, as well as chaining these methods using the ORMs fluent interface. Not to mention the myriaad way in which Model instances can be instantiated, mutated and persisted.

Conclusion

We tend to use static analysis to help us get the details right, to enforce best practices, to spot security risks and code smells; it's particular use when working with a dynamic language like Python. Hopefully this post goes some way in showing how static analysis can be used to enforce specific high-level patterns within a codebase.

The example code is little more than bashglue tying several CLIs together. It's not yet clear whether it would be worth writing a purpose-built abstraction layer on top of Semgrep. In the mean time I encourage you to get creative and try to use it to enforce patterns and conventions within your codebases.

0
Subscribe to my newsletter

Read articles from Simon Crowe directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Simon Crowe
Simon Crowe

I'm a backend engineer currently working in the DevOps space. In addition to cloud-native technologies like Kubernetes, I maintain an active interest in coding, particularly Python, Go and Rust. I started coding over ten years ago with C# and Unity as a hobbyist. Some years later I learned Python and began working as a backend software engineer. This has taken me through several companies and tech stacks and given me a lot of exposure to cloud technologies.