Building a Simple DB Migration Tool in Just 200 Lines

Dhwanil ShahDhwanil Shah
5 min read

In this blog entry, I will express my reflections on developing the database migration tool, the restrictions of my migration tool, and several additional insights.

Why Did I Reinvent the wheel myself

I am presently engaged in a somewhat intricate backend project. I am utilizing the Sanic framework alongside a Postgres database and the asyncpg driver. I explored the Python ecosystem for migration tools and discovered just two active choices:
1. Alembic:- A tool for migrating databases with SQLAlchemy. Since I wasn't utilizing SQLAlchemy, this choice was outside my range.
2. Yoyo Migrations:- A universal migration solution. It did not support the asyncpg driver, which was what I had been using. If I utilized Yoyo Migrations, I would need to change to a different driver only for migration reasons, which I found unacceptable.

Therefore, I chose to create my own migration tool.

Choices I Made

  1. Python based migration file:- I appreciated the straightforward nature of Yoyo's migration files, which were simply single Python scripts outlining the SQL steps to be executed. Therefore, I drew motivation from that.

  2. Different Migration directory for different module/app/entity:- I opted for this approach because my project had many entities/tables, and consolidating the migrations of all these entities in one place could complicate understanding the changes that an entity has undergone over time and make it difficult to recognize the final structure of my entity after a long period. It draws inspiration from the method of creating migrations in Django.

Generating Migration File

Each migration's name features a timestamp and a slug to prevent generating migrations with identical names and leading to confusion during either migration or rollback.
The migration files also contain dependencies, which are migrations that need to be executed prior to the present migration. I automatically include the last migration of the same entity as a dependency for the newly created migration, thereby facilitating the migration process.
This is the code snippet used for creating migrations:

   def generate(self, module_name: str, slug: str):
    """
    Generates a new migration file for the given module and slug.
    Args:
        module_name (str): Name of the module where new migration file is to be created
        slug (str): Name of the migration
    """
    template = """
    # List of dependencies (migration that must be applied before this one)
    dependencies = ["{last_migration}"]
    # SQL to apply the migration
    apply = []
    # SQL to rollback the migration
    rollback = []
    """
    migrations_path = Path(f"{self.app_name}/{module_name}/migrations")
    migrations_path.mkdir(exist_ok=True)
    # To get last migration if any for the module to add it is dependency for current migration
    generated_migrations = self.get_all_migrations(migrations_path)
    if len(generated_migrations) != 0:
        template = template.format(last_migration=generated_migrations[-1])
    else:
        template = template.replace('"{last_migration}"', "")
    timestamp = datetime.now().strftime("%Y%m%d%H%M")
    migration_file = migrations_path.joinpath(f"{timestamp}_{slug}.py")
    migration_file.touch()
    migration_file.write_text(dedent(template))

Applying a Migrations

The implementation does not roll back the dependencies that are migrated if the current migration fails, which I find acceptable since I will ultimately have to apply it eventually. If a dependency caused the failure, it can be undone and subsequently reapplied after adjustments are made.

async def apply(self, module_name: str, slug: str):
    """
    Applies the given migration.
    Args:
        module_name (str): Name of the module where migration is located
        slug (str): Name of the migration
    Raises:
        e: If any error occurs while executing the migration
    """
    await self.load_applied_migrations()
    migration_name = f"{module_name}.{slug}"
    conn = await connect(self.POSTGRES_DSN)
    # Checks if migration is already applied or not
    if migration_name not in self.applied_migrations:
        module = self.import_module(module_name, slug)
        # Resolve the dependency and first applies them and then applies current migration
        for dependency in module.dependencies:
            if dependency not in self.applied_migrations:
                dep_module, dep_slug = dependency.split(".")
                await self.apply(dep_module, dep_slug)
        async with conn.transaction():
            print("Applying Migration:- ", migration_name)
            for sql in module.apply:
                try:
                    await conn.execute(sql)
                except Exception as e:
                    print("Error occured while executing sql:- ", sql)
                    raise e
            module_hash = sha3_512(
                Path(f"{self.app_name}/{module_name}/migrations/{slug}.py")
                .read_text()
                .encode("UTF-8")
            ).digest()
            await conn.execute(
                "INSERT INTO migrations(name,hash) VALUES($1,$2)",
                migration_name,
                module_hash,
            )
            self.applied_migrations.append(migration_name)

It saves the hash of the migration file within the database. This was performed to verify if the migration file has been altered or not. If it has been altered, it can be reapplied. Nonetheless, I haven't put this feature into action so far.

Drawback of Current Implementation

A significant limitation of the mentioned implementation is its inability to detect a circular dependency. In the scenario of a circular dependency, the code would become trapped in an endless loop, leading to a freeze. This issue can be resolved by generating a dependency graph prior to executing a migration to detect any cycles.

Rollbacking a Migration

The implementation at present, it undoes all rollbacks if even one of the rollbacks is unsuccessful, which might be the situation it should address. I haven't considered it deeply, as I believe it ought to be this way. Don't hesitate to express your opinions.

async def rollback(self, module_name: str, slug: str):
    """
    Rollback the given migration
    Args:
        module_name (str): Name of the module where migration is located
        slug (str): Name of the migration
    Raises:
        e: If any error occurs while executing the migration
    """
    await self.load_applied_migrations()
    migration_name = f"{module_name}.{slug}"
    if migration_name in self.applied_migrations:
        # Generates list of migrations that are needed to be rollback before rollbacking the current one
        # Currently it rollbacks all the migration applied after the migration that is to be rollbacked
        dependencies = self.applied_migrations[
            self.applied_migrations.index(migration_name) :
        ]
        dependencies.reverse()
        conn = await connect(self.POSTGRES_DSN)
        for dependency in dependencies:
            dep_module, dep_slug = dependency.split(".")
            module = self.import_module(dep_module, dep_slug)
            print("Rollbacking Migration:- ", dependency)
            async with conn.transaction():
                for sql in module.rollback:
                    try:
                        await conn.execute(sql)
                    except Exception as e:
                        print("Error occured while executing sql:- ", sql)
                        raise e
                await conn.execute(
                    "DELETE FROM migrations WHERE name=$1", dependency
                )
                self.applied_migrations.remove(dependency)

Drawback of Current Implementation

At present, it undoes all the executed migrations after the rollback, which is not needed. It should solely revert the migrations that are reliant on it. This can be addressed by building a dependency graph that includes all migrations, or we may keep track of the names of the migrations that rely on it when they are applied. Therefore, we can directly obtain a list of migrations that need to be reverted.

That’s It

You can find source code of full implementation here GitHub Gist - Migrations.py
Please feel free share your views on my implementation on my socials

0
Subscribe to my newsletter

Read articles from Dhwanil Shah directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Dhwanil Shah
Dhwanil Shah