Ah.... Django UUIDs. I had a recent project at my company, where I was migrating a model from our old monolith codebase, to a microservice. In order to uniquely identify the rows across both services, I decided I would add a UUID column to the monolith model (which would be the primary key of the model in the microservice). Sounds simple so far, right? Now, this was a bit of an interesting case, since, as I wasn't adding it as a primary key, I wasn't sure if I should add the unique=True constraint to the field (which is added by default when it is used as a primary key field). I had a discussion with the one of the senior staff engineers, and he was of the opinion that there really wasn't any need, after all, for there to be even a 50% chance of two UUIDs colliding, more than 2.71 quintillion would have to be generated. Eventually, though, just to be safe, I decided, "hey... why not, no harm in being completely safe, I need the column to be indexed anyway, might as well make it a unique constraint, right?". Right?!

So, I got started. After editing the model, the field (pre-migration generation) looked like: uuid = models.UUIDField(unique=True, default=uuid.uuid4) (for those unfamiliar, in Django's ORM, you make a change to your models.py file, then run python makemigrations <app> and Django generates a migration file for you, which you can edit, and then use to generate some SQL with the command python migrate <app> which it applies to the database for you.) The migration file generated by this looked like:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models, migrations
import main.fields
import uuid


class Migration(migrations.Migration):

    dependencies = [
        ('<appname>', 'PREVIOUSDEPENDENCY'),
    ]

    operations = [
        migrations.AddField(
            model_name='MyModel',
            name='uuid',
            field=main.fields.UnicodeUUIDField(default=uuid.uuid4,
                                               unique=True, 
                                               max_length=32),
        ),
    ]

I proceeded to apply the migration, and that's when I hit upon an interesting error:

django.db.utils.IntegrityError: (1062, "Duplicate entry UUID for key MY_UUID_COLUMN)

...

Had the unique constraint messed something up? I decided to investigate our database table directly. After viewing the UUID column, I noticed something very interesting. Turns out, every column had the same UUID! I decided this must be some form of known issue, and after some investigation, I found this piece of Django's documentation, which I decided to follow. I generated my three migration files, applied the migrations on our test environments, and even ran the migration on a copy of our production database, to get time estimates for each individual migration. At this point, I was sure I was as prepped as any person could be, and so, the day to apply the migration to the production database arrived (as our site receives a lot of traffic at all times, we decided to apply the migration when our traffic was the lowest, which was usually about 5 am).

The first migration ran fine, and we started running the second one. As the second migration was a data migration, it took about 3 hours or so, and we encountered no major issues. However, right at the end, we came across a rather strange error:

django.db.utils.DatabaseError: Save with update_fields did not affect any rows

Why exactly had this error occurred? After a few hours of investigation I came across an interesting piece of documentation in Django, related to RunPython. The key part to focus on is this: "By default, RunPython will run its contents inside a transaction on databases that do not support DDL transactions (for example, MySQL and Oracle). This should be safe, but may cause a crash if you attempt to use the schema_editor provided on these backends; in this case, pass atomic=False to the RunPython operation". This was pretty much the whole issue in a nutshell. But... why was this the problem? Well, it has to do with how Django applies transactions to the database itself, so lets take a look at that.

The key is that whenever Django is opening a connection to a SQL database, especially if it is using a transaction, it creates a cursor pointing to the database. However, for reads (which is what was needed before updating the DB), if the default InnoDB config of REPEATABLE_READ was used, as can be seen in the MySql 8.0 docs, it meant that when the transaction was opened in RunPython, until it was closed, the code would be running on a copy of the database. This was a problem, as the migration would run for about three hours on the table copy, and by the time it had to be applied to the actual table, due to the large volume of changes, there were inevitable deletions and creations, which would lead to the transaction not being applied. The final solution was to apply atomic=False to RunPython, as indicated by the docs, and use MyModel.filter(FILTER_KWARGS)[:BATCH_SIZE] repeatedly, so that transactions would only be applied in batches to the actual db, reducing the impact of failed transactions.

Conclusion

So... this was quite a journey, starting from what I thought was quite a simple problem, to going quite deep into the implementation details of Django's ORM. I think the biggest learning experience for me here was, to always try and understand your tools and abstractions as deeply as possible, because some decisions made by them are not always intuitive at first glance. Also, it was a lot of fun!

Django UUIDs... Simple, right?

Table of contents

Conclusion

Subscribe to my newsletter

Arman Jasuja

Arman Jasuja