Originally posted in rodloboz.com

Introduction

In the evolving landscape of web and application development, the need for reliable, scalable, and efficient data identification mechanisms is more critical than ever. As applications grow and handle increasingly complex data across distributed systems, the traditional methods of identifying records—such as auto-incremental integers or universally unique identifiers (UUIDs)—begin to show their limitations. This is where ULIDs, or Universally Unique Lexicographically Sortable Identifiers, come into play, offering a compelling alternative for developers seeking to optimize their databases and application performance.

ULIDs are designed to address several key scenarios where traditional identifiers fall short:

Distributed Systems: In environments where data is spread across multiple databases or servers, ensuring uniqueness of identifiers can be challenging. ULIDs are globally unique by design, making them ideal for such distributed architectures.
Time-Ordered Records: Applications that require sorting records by creation time, such as logging systems, social media platforms, or any system tracking temporal events, benefit immensely from ULIDs. The lexicographically sortable nature of ULIDs means that sorting them alphabetically also sorts them chronologically, a significant advantage over UUIDs.
High-Performance Requirements: ULIDs are generated with a timestamp and random component, offering a balance between randomness and time-based sorting without compromising on performance. This makes them suitable for high-throughput systems where insertion speed and index performance are crucial.
Scalability and Future-Proofing: As applications scale, the overhead of managing and indexing traditional identifiers can become a bottleneck. ULIDs, with their fixed size and sortable nature, provide a scalable solution that is efficient to index and query over time.
Human-Friendly: Unlike the lengthy and cumbersome UUIDs, ULIDs are shorter and more manageable, making debugging and manual inspection of records slightly more straightforward without sacrificing uniqueness or information density.

Prerequisites

Ruby on Rails (version 7 or later)
PostgreSQL database
Basic understanding of Rails migrations

Step 1: Creating a ULID Domain in PostgreSQL

First, we'll define a custom domain in PostgreSQL to represent the ULID format. This ensures that all ULIDs stored in the database conform to the correct structure.

Generate the migration file:

rails generate migration CreateUlidDomain

Edit the generated migration file to include the CREATE DOMAIN and DROP DOMAIN SQL commands:

class CreateUlidDomain < ActiveRecord::Migration[7.1]
  def up
    execute <<-SQL.squish
      CREATE DOMAIN public.ulid AS character(26);
    SQL
  end

  def down
    execute <<-SQL.squish
      DROP DOMAIN public.ulid;
    SQL
  end
end

Step 2: Creating a ULID Generation Function

Next, we'll create a PostgreSQL function to generate ULIDs. This function utilizes the pgcrypto extension for generating random bytes and encodes the timestamp and randomness into a ULID format.

Generate the migration file:

rails generate migration CreateUlidFunction

Edit the generated migration file to include the CREATE FUNCTION SQL command:

class CreateUlidFunction < ActiveRecord::Migration[7.1]
  def up
    execute <<-'SQL'
      CREATE EXTENSION IF NOT EXISTS pgcrypto WITH SCHEMA public;

      CREATE OR REPLACE FUNCTION public.gen_ulid() RETURNS public.ulid
          LANGUAGE plpgsql
          AS $$
      declare
        -- crockford's base32
        encoding   bytea = '0123456789ABCDEFGHJKMNPQRSTVWXYZ';
        timestamp  bytea = E'\\000\\000\\000\\000\\000\\000';
        output     ulid = '';

        unix_time  bigint;
        ulid       bytea;
      begin
        -- 6 timestamp bytes
        unix_time = (extract(epoch from clock_timestamp()::timestamp) * 1000)::bigint;
        timestamp = set_byte(timestamp, 0, (unix_time >> 40)::bit(8)::integer);
        timestamp = set_byte(timestamp, 1, (unix_time >> 32)::bit(8)::integer);
        timestamp = set_byte(timestamp, 2, (unix_time >> 24)::bit(8)::integer);
        timestamp = set_byte(timestamp, 3, (unix_time >> 16)::bit(8)::integer);
        timestamp = set_byte(timestamp, 4, (unix_time >> 8)::bit(8)::integer);
        timestamp = set_byte(timestamp, 5, unix_time::bit(8)::integer);

        -- 10 entropy bytes
        ulid = timestamp || gen_random_bytes(10);

        -- encode the timestamp
        output = output || chr(get_byte(encoding, (get_byte(ulid, 0) & 224) >> 5));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 0) & 31)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 1) & 248) >> 3));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 1) & 7) << 2) | ((get_byte(ulid, 2) & 192) >> 6)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 2) & 62) >> 1));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 2) & 1) << 4) | ((get_byte(ulid, 3) & 240) >> 4)));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 3) & 15) << 1) | ((get_byte(ulid, 4) & 128) >> 7)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 4) & 124) >> 2));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 4) & 3) << 3) | ((get_byte(ulid, 5) & 224) >> 5)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 5) & 31)));

        -- encode the entropy
        output = output || chr(get_byte(encoding, (get_byte(ulid, 6) & 248) >> 3));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 6) & 7) << 2) | ((get_byte(ulid, 7) & 192) >> 6)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 7) & 62) >> 1));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 7) & 1) << 4) | ((get_byte(ulid, 8) & 240) >> 4)));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 8) & 15) << 1) | ((get_byte(ulid, 9) & 128) >> 7)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 9) & 124) >> 2));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 9) & 3) << 3) | ((get_byte(ulid, 10) & 224) >> 5)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 10) & 31)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 11) & 248) >> 3));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 11) & 7) << 2) | ((get_byte(ulid, 12) & 192) >> 6)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 12) & 62) >> 1));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 12) & 1) << 4) | ((get_byte(ulid, 13) & 240) >> 4)));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 13) & 15) << 1) | ((get_byte(ulid, 14) & 128) >> 7)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 14) & 124) >> 2));
        output = output || chr(get_byte(encoding, ((get_byte(ulid, 14) & 3) << 3) | ((get_byte(ulid, 15) & 224) >> 5)));
        output = output || chr(get_byte(encoding, (get_byte(ulid, 15) & 31)));

        return output;
      end $$;
    SQL
  end
end

The gen_ulid() function we've defined in PostgreSQL serves to generate ULIDs, which are 26-character, URL-safe, time-sortable identifiers. The function operates as follows:

Timestamp Generation: The function captures the current time with millisecond precision (clock_timestamp()), multiplying it by 1000 to convert seconds to milliseconds. This timestamp is then broken down into 6 bytes, representing the most significant part of the ULID and ensuring that ULIDs are sortable by time.
Randomness: After the timestamp, 10 bytes of cryptographic randomness are generated using PostgreSQL's pgcrypto extension (gen_random_bytes(10)). This randomness ensures that each ULID is unique, even if generated at the same millisecond as another ULID.
Encoding: The function then encodes both the timestamp and the random bytes into Crockford's Base32, chosen for its readability and URL-friendliness. The encoding process converts the binary data into a human-readable string while maintaining the lexicographical sortability property. This is achieved by mapping each group of bits to one of the 32 characters in the Crockford's Base32 alphabet.
Concatenation: The encoded timestamp and random parts are concatenated to form the final ULID. The result is a 26-character string that uniquely identifies a record, with the added benefit of being sortable by creation time.

This approach to generating ULIDs directly in the database layer has several advantages, including consistency across application instances, reduced application complexity, and leveraging the database's robustness and scalability. By offloading the ID generation to PostgreSQL, applications can efficiently handle ID creation even under high load, ensuring uniqueness and sortability without the overhead of additional application logic.

Step 3: Configuring Rails to Use ULIDs for Primary Keys

In your Rails application, configure the generators to use ULIDs for primary keys by default. Edit the config/application.rb file:

module YourAppName
  class Application < Rails::Application
    # ...
    config.generators do |generate|
      generate.orm :active_record, primary_key_type: :ulid
    end
    # ...
  end
end

Step 4: Generating a Model with a ULID Primary Key

When you generate a new model, Rails will automatically use a ULID as the primary key type.

rails generate model User name:string

Edit the generated migration to set the default value of the ULID id using the gen_ulid() function:

class CreateUsers < ActiveRecord::Migration[7.1]
  def change
    # Add `default: -> { 'gen_ulid()' }` to the line below
    create_table :users, id: :ulid, default: -> { 'gen_ulid()' }  do |t|
      t.string :name

      t.timestamps
    end
  end
end

Step 5: Adding ULID References in Other Tables

When you've configured your Rails application to use ULIDs as the default primary key type, Rails automatically handles the creation of reference columns with the correct type. This simplification is due to the config.generators settings in config/application.rb, which instruct Rails to use ULID for primary keys and, by extension, for reference keys in associated models.

Here's how Rails manages the generation of models with references when ULIDs are set as the default primary key type:

rails g model Post user:references title:string

This command generates a migration file for the posts table with a user_id column that uses ULID as its type, thanks to the global configuration you've applied. Here's what the generated migration might look like:

class CreatePosts < ActiveRecord::Migration[7.1]
  def change
    # Add `default: -> { 'gen_ulid()' }` to the line below
    create_table :posts, id: :ulid, default: -> { 'gen_ulid()' } do |t|
      t.references :user, null: false, foreign_key: true, type: :ulid
      t.string :title

      t.timestamps
    end
  end
end

Testing it out

We can now test the ULID generation by creating a new user and a post inspecting the generated ULID.

After applying the migrations, run the rails console in --sandbox mode:

rails console --sandbox

And then in the console:

user = User.create(name: "Loboz")
user.id
# => "01HR2DYE28BCSVAR5ENXK6AWDF"
post = Post.create(user_id: user.id, title: "ULIDs on Rails")
post.user_id
# => "01HR2DYE28BCSVAR5ENXK6AWDF"

Conclusion

You've successfully integrated ULID generation into your Ruby on Rails application with PostgreSQL. This setup enhances your application's scalability and performance by providing unique, time-sortable identifiers that are ideal for distributed systems.

Implementing ULIDs in Ruby on Rails with PostgreSQL

Table of contents