I maintain an application with a beefy MySQL database. This database is old. Like… old. And it’s had a lot of changes applied to it over the years.

Now it needs to change again. It’s been a while. And the fella changing it has never seen it done. So I set out to explain how the things are wired together. It takes a few cycles, but now we’re on the same page. Mostly. Because things have changed and he soon finds out that our shared page is outdated.

You see, we don’t just go into the database and add columns to tables. We have a change log. Or changelog. The changelog is a file describing all the changes over the database’s lifetime. New tables, column additions, nullability constraints, values for enum, etc. This file is processed by Liquibase, you see. Liquibase reads the changelog and applies the changes.

And of course, we don’t use the MySQL instance in our unit tests. We swap it out for an in-memory H2 database. This database is ephemeral and is blown away after our tests run.

This has benefits. I don’t have to boot up a copy of MySQL for my tests; don’t have to make sure that the copy is in a good state; don’t have to clean up after the tests. And there are downsides. For example, we’ve found previously that some more obscure MySQL syntax is unsupported and that the behavior of the driver your application uses to talk to the database is slightly different, especially in regards to how Java types map to database types. However, I wouldn’t be surprised if these issues have been solved long ago.

The bottom line is that H2 makes it a lot easier to unit test your SQL. You just need to put the same tables and relevant data into the H2 instance, and we can do this with Liquibase.

The Setup

The challenge for new users is to get comfortable with this setup. It’s hard to examine a database which gets deleted after your unit tests finish. I’m not too proud to admit that I used to put a 10 minute sleep into the test suite so that I could connect to H2 using SQuirrel. But don’t do this, it’s hack as hell.

In order to start playing with these tools, we’ll use a file-backed H2 database instead. I use DBeaver as my SQL client. DBeaver Community edition knows where to get the H2 driver, making the setup easy. But you can use just about anything you like if you’re willing to go hunting for the drivers yourself.

When configuring the DBeaver connection, it will ask for the database type. Select H2 Embedded. Next, set “connect by” to URL and set the URL to jdbc:h2:~/test-h2;AUTO_SERVER=TRUE.

Notice the AUTO_SERVER parameter. This is necessary to allow other applications to connect to the same instance of the database.

Start the connection in DBeaver. Congrats! You can now execute SQL on this empty database (e.g. create tables or SELECT 1).

Now, let’s create an application that interacts with H2. I’m writing Scala, but I’ll avoid anything advanced, so it remains easy to rewrite into any other JVM language.

You’ll have to include the H2 and Liquibase dependencies. For Scala (building with SBT), I’ll add the following to build.sbt:

libraryDependencies ++= Seq(
  "com.h2database" % "h2" % "2.3.232",
  "org.liquibase" % "liquibase-core" % "4.31.1"
)

Here are the imports we’ll need:

import liquibase.Liquibase
import liquibase.database.DatabaseFactory
import liquibase.database.jvm.JdbcConnection
import liquibase.resource.DirectoryResourceAccessor
import java.io.File
import java.sql.{Connection, DriverManager, ResultSet}

And here is the skeleton for for our main:

val connection = 
    DriverManager
        .getConnection("jdbc:h2:~/test-h2;AUTO_SERVER=TRUE", "", "")
try {
  val db = new Database(connection)

  // TODO: all database interactions will go here

} finally {
  connection.close()
}

The above snippet connects to the database we started in DBeaver and closes the connection before exiting. All further snippets are executed from inside the try block, where you see the TODO.

I also defined a helper to query the database and parse its results:

class Database(conn: Connection) {
  def query[T](sql: String)(parser: ResultSet => T): List[T] = {
    val builder = List.newBuilder[T]
    val resultSet = conn.createStatement().executeQuery(sql)
    // The ResultSet starts before the first row of data. You must
    // advance it to read the first row.
    while (resultSet.next()) {
      builder += parser(resultSet)
    }
    builder.result()
  }
}

This query method takes a SQL query and a parser - a way to extract values from each row of the query result set. For example, we could call it like this:

db.query("SELECT 1 AS my_num")(_.getInt("my_num"))

Now, let’s create a changelog.xml file in your project’s root directory. This file will tell Liquibase how to modify the database.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd">

    <changeSet id="create first table" author="your name">
        <createTable tableName="first_table">
            <column name="id" type="varchar(255)"/>
        </createTable>
    </changeSet>
</databaseChangeLog>

In the above example, we add one change set which creates a table called first_table which has a single string column called id.

There are ways of running these changes from command line. However, we are interested in setting up the database for tests, so I’ll focus on running the change log from the JVM app.

val database = 
    DatabaseFactory
        .getInstance()
        .findCorrectDatabaseImplementation(new JdbcConnection(connection))

// where your changelog.xml file is, relative to current directory
// (probably project root)
val directory = new File(".")

val liquibase = new Liquibase(
    "changelog.xml",
    new DirectoryResourceAccessor(directory),
    database
)

liquibase.update()

After you run the code above, go back to DBeaver and refresh. You should see some tables: first_table and a couple more. Most interesting for us is the DATABASECHANGELOG table. This is where Liquibase records the migrations it has applied. If you run main again, you’ll see output informing you that no new updates were applied, but there were previously executed updates. Liquibase figures this out by looking into the DATABASECHANGELOG table. Mine looks like this after migration (I’ll make it easier to read by splitting it into three sections below):

ID                |AUTHOR   |FILENAME     |DATEEXECUTED           |ORDEREXECUTED|
create first table|your name|changelog.xml|2025-05-04 14:33:14.813|            1|


EXECTYPE|MD5SUM                            |DESCRIPTION                      |
EXECUTED|9:249a20e9db93783921faac26d5aaaa76|createTable tableName=first_table|


COMMENTS|TAG|LIQUIBASE|CONTEXTS|LABELS|DEPLOYMENT_ID|
        |   |4.31.1   |        |      |6387194337   |

Once the migration has been run, your application can query the new table as well.

println(db.query("SELECT COUNT(*) cnt FROM first_table")(_.getInt("cnt")))

You can imagine how your test suite sets up a new database connection, runs Liquibase change logs, and then validates your queries by running them against the database. You’ll probably want to switch to an in-memory mode (adding :mem to the JDBC URL), so that the database is rebuilt fresh on every test run. Or you could cache the database state by keeping the file around - just make sure your test cases clean up after themselves and don’t depend on each other’s data.

Warnings and Discussion

A word of warning: do not modify change sets once they’ve been applied. Liquibase will compute a new checksum. If the change set ID is the same as before, it will complain that the change set is different. If the ID is different, it will consider it a new change set and try to apply it - the second time from your, wiser, human perspective, and the first from Liquibase’s.

You may be tempted to work around this by making database edits manually and updating the change sets in ways that do not affect the checksum. But “what affects the checksum?” That’s exactly the type of question you don’t want to have to answer, especially when you ask it right: “what goes into computing the checksum in the version of Liquibase that I’m using today, but won’t be in a year?” Just don’t do it.

It’s worth noting that Liquibase actually has tools for dealing with you if you read the above and did it anyway. For example, you can add other acceptable checksums to existing change sets. I won’t show you how to do this because I don’t want you to play with this dangerous toy which I’m putting right here in the cupboard. Forget about it.

Finally, Liquibase gradually evolves how it computes change set checksums. See that “9:” prefix in the MD5SUM column? That’s an algo version, the one that the current version of Liquibase uses at the time of this writing. I’ve recently updated a database where old change sets had “4:” checksum prefixes. Not going to lie: so much changed about how we handle this old database, that we were not able to reconcile the checksums, but this had to do with moving the change logs from an archived repository to a maintained one and abandoning a custom way of referencing change logs from other change logs. The point is this: if you don’t do anything crazy, your checksums will remain stable, Liquibase will work intuitively and correctly, and you’ll have fun!

Database Migration and Testing: Liquibase and H2

The Setup

Warnings and Discussion

Subscribe to my newsletter

uskob

uskob