Dynamic creation of REST APIs from third-party databases with Django-Schema-Sprout

Working with third-party or legacy databases can be a real headache.

We've all been there - you're stuck dealing with a database you didn't design, and every change feels like a nightmare.

Django's inspectdb helps create models from an existing schema, but it's not perfect, especially when the legacy database has tons of tables and views. What's more, if there is one change in the database, you're back to square one, needing to regenerate everything manually.

That's why, some time ago, I decided to create django-schema-sprout. It's a package designed to make your life easier, allowing you to dynamically create Django models, views, and serializers straight from the database schema.

Currently, there is only support for Postgres databases, but MySQL support and some additional features are on the roadmap.

Live example

(TL;DR You can find the repository with code here https://github.com/grumpy-miner/django-schema-sprout-example)

One of the third-party databases that I used to work with was AACT (Aggregate Analysis ofClinicalTrials.gov). It contains all information (protocol and result in data elements) about every study registered in ClinicalTrials.gov

You can get access to the Postgres database and start querying all the results, but what if you want to create Django ORM models and use them in your application? Well, let's start with creating a new Django project. (Since AACT uses Postgres v11 we'll be using Django==4.1.13)

django-admin startproject django-schema-sprout-example

Install django-schema-sprout:

pip install django-schema-sprout==0.1.1

If needed also install (psycopg2).

Updating settings.py

In your settings, add rest_framework, drf_yasg and django_schema_sprout into the list of installed apps

INSTALLED_APPS = [
    ...
    "rest_framework",
    "drf_yasg",
    "django_schema_sprout",
]

Set up your database and add SchemaSproutDBRouter in your database routers

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": BASE_DIR / "db.sqlite3",
    },
    "aact": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "NAME": "aact",
        "USER": os.getenv("AACT_USER"),
        "PASSWORD": os.getenv("AACT_PASSWORD"),
        "HOST": os.getenv("AACT_HOST"),
        "PORT": os.getenv("AACT_PORT"),
    },
}

DATABASE_ROUTERS = [
    "django_schema_sprout.router.SchemaSproutDBRouter",
]

Last but not least configure STATIC_ROOT and REST_FRAMEWORK

STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")

REST_FRAMEWORK = {
    "DEFAULT_PAGINATION_CLASS": "rest_framework.pagination.LimitOffsetPagination",
    "PAGE_SIZE": 100, # we don't want to limit page size as there are a lot of records in AACT database
}

Add schema_sprout.py file

from django_schema_sprout.schema_sprout import SchemaSprout

AACT_PUBLIC_SPROUT = SchemaSprout("aact") # pass the name of your database from settings
AACT_PUBLIC_SPROUT.create_models(readonly=True) # Default False, we have readonly access to AACT db

By the end of the day, it doesn't matter where you instantiate SchemaSprout as it is a Singeltone based on args, additionally create_models and create_model function will do nothing if the model is already dynamically created (unless there were new tables added).

Add your sprout router URLs to urlpatterns

What we want to do now, is to update our urls.py. Let's add URLs for swagger and router URLs from AACT_PUBLIC_SPROUT

from django.urls import path, re_path, include
from rest_framework import permissions
from drf_yasg.views import get_schema_view
from drf_yasg import openapi

from sprout_schema_example.schema_sprout import AACT_PUBLIC_SPROUT

schema_view = get_schema_view(
    openapi.Info(
        title="AACT API",
        default_version="1.0.0",
        description="Documentation for the AACT API",
    ),
    public=False,
    permission_classes=(permissions.AllowAny,),
)

urlpatterns = [
    path("api/", include(AACT_PUBLIC_SPROUT.router.urls)),
    re_path(
        r"^swagger(?P<format>\.json|\.yaml)$",
        schema_view.without_ui(cache_timeout=0),
        name="schema-json",
    ),
    path(
        "swagger/",
        schema_view.with_ui("swagger", cache_timeout=0),
        name="schema-swagger-ui",
    ),
    path("redoc/", schema_view.with_ui("redoc", cache_timeout=0), name="schema-redoc"),
]

Run collectstatic and runserver

./manage.py collectstatic
./manage.py runserver

Since we create all models, views, and serializers at the start it will take some time (especially since there are tons of tables in the database).

You can see some warnings at the beginning as there are not all correct permissions set to visible tables.

WARNING:root:Couldn't get table description for table univ_als_studies. Reason: permission denied for table univ_als_studies
WARNING:root:Couldn't get table description for table univ_studies. Reason: permission denied for table univ_studies
WARNING:root:Couldn't get table description for table universities. Reason: permission denied for table universities
WARNING:root:Couldn't get table description for table covid_study_ids. Reason: permission denied for table covid_study_ids

Django-Schema-Sprout should create an endpoint for each table in third-party database.
Url schema for those tables follows the pattern api/<schema-name>/<table-name>

You can go to check out Swagger at http://127.0.0.1:8000/swagger/ and see all endpoints that were created.

Let's try out requesting one of the endpoints:

Creating only specific models.

If you want to create endpoints just for specific tables instead of everything you can use create_model instead of create_models

Let's modify schema_sprout.py, let's also see what will happen if we don't want to create a model with readonly=True

from django_schema_sprout.schema_sprout import SchemaSprout

AACT_PUBLIC_SPROUT = SchemaSprout("aact")
AACT_PUBLIC_SPROUT.create_model(table_name='studies', table_nspname='ctgov', readonly=False)

Now we have all CRUD methods for studies

In this case, trying to update or delete any records will raise an error as we don't have permission to do that

Using Django-Schema-Sprout's dynamically created models

Sometimes we need to create a view that will apply some additional business logic.
Let's see how we can use the models that django-schema-sprout generated.

Let's run ./manage.py shell

from django_schema_sprout.schema_sprout import SchemaSprout

schema_sprout = SchemaSprout("aact")
schema_sprout.create_models(readonly=True)

# let's get model, use '<schema>_<table_name>' to access model
CtgovDesigns = schema_sprout.models.get('ctgov_designs')

CtgovDesigns.objects.all().count()
# >>> 482207

That's all folks

So, that's the scoop on django-schema-sprout. If you've ever felt the pain of dealing with third-party databases in Django, this tool might just make your day a bit better.

I'm keen to see how django-schema-sprout evolves with your input. It's open source for a reason – to get better with every contribution, idea, or piece of feedback. So, if you've got something to say or want to add your touch to the project, don't hold back, feel free to open a Pull Request or open an Issue at https://github.com/grumpy-miner/django-schema-sprout

Give django-schema-sprout a whirl in your next project.

1
Subscribe to my newsletter

Read articles from Paweł Puchalski directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Paweł Puchalski
Paweł Puchalski

Senior Data Engineer by profession. World explorer by curiosity. Music Producer by aspiration. Lazy potato by heart.