Dynamic creation of REST APIs from third-party databases with Django-Schema-Sprout
Working with third-party or legacy databases can be a real headache.
We've all been there - you're stuck dealing with a database you didn't design, and every change feels like a nightmare.
Django's inspectdb
helps create models from an existing schema, but it's not perfect, especially when the legacy database has tons of tables and views. What's more, if there is one change in the database, you're back to square one, needing to regenerate everything manually.
That's why, some time ago, I decided to create django-schema-sprout. It's a package designed to make your life easier, allowing you to dynamically create Django models, views, and serializers straight from the database schema.
Currently, there is only support for Postgres databases, but MySQL support and some additional features are on the roadmap.
Live example
(TL;DR You can find the repository with code here https://github.com/grumpy-miner/django-schema-sprout-example)
One of the third-party databases that I used to work with was AACT (Aggregate Analysis ofClinicalTrials.gov). It contains all information (protocol and result in data elements) about every study registered in ClinicalTrials.gov
You can get access to the Postgres database and start querying all the results, but what if you want to create Django ORM models and use them in your application? Well, let's start with creating a new Django project. (Since AACT uses Postgres v11 we'll be using Django==4.1.13)
django-admin startproject django-schema-sprout-example
Install django-schema-sprout:
pip install django-schema-sprout==0.1.1
If needed also install (psycopg2).
Updating settings.py
In your settings, add rest_framework
, drf_yasg
and django_schema_sprout
into the list of installed apps
INSTALLED_APPS = [
...
"rest_framework",
"drf_yasg",
"django_schema_sprout",
]
Set up your database and add SchemaSproutDBRouter
in your database routers
DATABASES = {
"default": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": BASE_DIR / "db.sqlite3",
},
"aact": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
"NAME": "aact",
"USER": os.getenv("AACT_USER"),
"PASSWORD": os.getenv("AACT_PASSWORD"),
"HOST": os.getenv("AACT_HOST"),
"PORT": os.getenv("AACT_PORT"),
},
}
DATABASE_ROUTERS = [
"django_schema_sprout.router.SchemaSproutDBRouter",
]
Last but not least configure STATIC_ROOT
and REST_FRAMEWORK
STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")
REST_FRAMEWORK = {
"DEFAULT_PAGINATION_CLASS": "rest_framework.pagination.LimitOffsetPagination",
"PAGE_SIZE": 100, # we don't want to limit page size as there are a lot of records in AACT database
}
Add schema_sprout.py
file
from django_schema_sprout.schema_sprout import SchemaSprout
AACT_PUBLIC_SPROUT = SchemaSprout("aact") # pass the name of your database from settings
AACT_PUBLIC_SPROUT.create_models(readonly=True) # Default False, we have readonly access to AACT db
By the end of the day, it doesn't matter where you instantiate SchemaSprout
as it is a Singeltone based on args, additionally create_models
and create_model
function will do nothing if the model is already dynamically created (unless there were new tables added).
Add your sprout router URLs to urlpatterns
What we want to do now, is to update our urls.py
. Let's add URLs for swagger and router URLs from AACT_PUBLIC_SPROUT
from django.urls import path, re_path, include
from rest_framework import permissions
from drf_yasg.views import get_schema_view
from drf_yasg import openapi
from sprout_schema_example.schema_sprout import AACT_PUBLIC_SPROUT
schema_view = get_schema_view(
openapi.Info(
title="AACT API",
default_version="1.0.0",
description="Documentation for the AACT API",
),
public=False,
permission_classes=(permissions.AllowAny,),
)
urlpatterns = [
path("api/", include(AACT_PUBLIC_SPROUT.router.urls)),
re_path(
r"^swagger(?P<format>\.json|\.yaml)$",
schema_view.without_ui(cache_timeout=0),
name="schema-json",
),
path(
"swagger/",
schema_view.with_ui("swagger", cache_timeout=0),
name="schema-swagger-ui",
),
path("redoc/", schema_view.with_ui("redoc", cache_timeout=0), name="schema-redoc"),
]
Run collectstatic and runserver
./manage.py collectstatic
./manage.py runserver
Since we create all models, views, and serializers at the start it will take some time (especially since there are tons of tables in the database).
You can see some warnings at the beginning as there are not all correct permissions set to visible tables.
WARNING:root:Couldn't get table description for table univ_als_studies. Reason: permission denied for table univ_als_studies
WARNING:root:Couldn't get table description for table univ_studies. Reason: permission denied for table univ_studies
WARNING:root:Couldn't get table description for table universities. Reason: permission denied for table universities
WARNING:root:Couldn't get table description for table covid_study_ids. Reason: permission denied for table covid_study_ids
Django-Schema-Sprout should create an endpoint for each table in third-party database.
Url schema for those tables follows the pattern api/<schema-name>/<table-name>
You can go to check out Swagger at http://127.0.0.1:8000/swagger/ and see all endpoints that were created.
Let's try out requesting one of the endpoints:
Creating only specific models.
If you want to create endpoints just for specific tables instead of everything you can use create_model
instead of create_models
Let's modify schema_sprout.py
, let's also see what will happen if we don't want to create a model with readonly=True
from django_schema_sprout.schema_sprout import SchemaSprout
AACT_PUBLIC_SPROUT = SchemaSprout("aact")
AACT_PUBLIC_SPROUT.create_model(table_name='studies', table_nspname='ctgov', readonly=False)
Now we have all CRUD methods for studies
In this case, trying to update or delete any records will raise an error as we don't have permission to do that
Using Django-Schema-Sprout's dynamically created models
Sometimes we need to create a view that will apply some additional business logic.
Let's see how we can use the models that django-schema-sprout generated.
Let's run ./manage.py shell
from django_schema_sprout.schema_sprout import SchemaSprout
schema_sprout = SchemaSprout("aact")
schema_sprout.create_models(readonly=True)
# let's get model, use '<schema>_<table_name>' to access model
CtgovDesigns = schema_sprout.models.get('ctgov_designs')
CtgovDesigns.objects.all().count()
# >>> 482207
That's all folks
So, that's the scoop on django-schema-sprout
. If you've ever felt the pain of dealing with third-party databases in Django, this tool might just make your day a bit better.
I'm keen to see how django-schema-sprout
evolves with your input. It's open source for a reason – to get better with every contribution, idea, or piece of feedback. So, if you've got something to say or want to add your touch to the project, don't hold back, feel free to open a Pull Request or open an Issue at https://github.com/grumpy-miner/django-schema-sprout
Give django-schema-sprout
a whirl in your next project.
Subscribe to my newsletter
Read articles from Paweł Puchalski directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Paweł Puchalski
Paweł Puchalski
Senior Data Engineer by profession. World explorer by curiosity. Music Producer by aspiration. Lazy potato by heart.