Databricks Environment Splits

Three-tier architecture was the norm when I started my IT career. It consisted of the presentation layer (the web servers), the application layer (the app servers), and the data layer (the databases).

We could spin up as many environments as needed for the presentation and application layers — most engineers typically had their own local setups for these. But the database layer? That was usually shared.

This setup works fine when the schema is stable and the only operations are data manipulation (DML: inserts, updates, deletes). But in my experience, that’s rarely the case.

Data-heavy applications are always evolving — new tables here, an extra column there, another index for that slow join. Sharing a database across multiple developers and environments quickly becomes a source of friction.

Modern Workarounds for Database Isolation

Today, there are far more elegant ways to manage this. I've worked on systems where each developer spun up their own isolated data layer using Docker containers for PostgreSQL or MySQL. For Oracle, we found it more practical to use PDBs (Pluggable Databases) instead. In AWS-heavy environments, some teams spun up databases from RDS snapshots using CloudFormation or Terraform.

There are also broader solutions involving full-stack virtualization, infrastructure-as-code, and automation tools. These approaches help bring the data layer up to the same level of flexibility as the app and presentation layers — and reduce cross-team collisions along the way.

Databricks and the Environment Problem

But what happens when your “database layer” isn’t a traditional RDBMS at all?

What if it’s Databricks, where data lives in cloud-managed storage, and the interface revolves around notebooks and distributed compute jobs? In this world, environment isolation takes a different form — and requires different strategies.

What Is an Environment in Databricks?

The first important question is: how do you define an environment in Databricks? There are several valid approaches.

A common pattern is to define an environment as a Databricks workspace — e.g., kurdapyo-prd, kurdapyo-uat, kurdapyo-sit, kurdapyo-dev. This gives you clean isolation. Pre–Unity Catalog, you could even reuse catalog names across workspaces. But with Unity Catalog, catalog names must be globally unique, so it's now common to prefix them — e.g., gold_uat, gold_prd.

The downside? Spinning up a new workspace environment isn’t cheap. On AWS, it often involves provisioning a new account and VPC, setting up DNS, configuring SCIM and user permissions — the kind of setup best left to platform teams or DevOps experts.

Splitting a Workspace into Logical Environments

The next logical approach is to carve out logical environments within a single Databricks workspace. We explored two main options:

  • Splitting at the catalog level

  • Splitting at the schema level

I personally prefer the schema-level split. I even advocated for building automation to create schemas on demand and using zero-copy cloning to generate like-for-like environments — perfect for safe, isolated testing.

However, that approach wasn’t prioritized at the time. Instead, we went with a catalog-level split, managed through Terraform to keep things clean and avoid ad-hoc changes by engineers.

Naming Your Logical Environments

Once you've defined your environment boundaries, the next step is naming. While I prefer a “treat them as cattle, not pets” mindset, stakeholders usually prefer meaningful names.

A practical trick I’ve learned: always include a numeric suffix, like dev-001 or uat-02. It sounds simple, but it's incredibly effective — because no matter how many environments you start with, you'll always need more later. This naming convention makes it easy to scale your environments without resorting to naming gymnastics.

Aspirations: Automation and Production Parity

Once your environments and naming conventions are in place, the long-term goal is automation:

  • Automate environment creation and teardown

  • Make it easy to spin up production-like environments

  • Enable regression testing in consistent, isolated setups

These efforts can be time-consuming upfront but pay off significantly in delivery speed and engineering confidence. Of course, it’s essential to balance these aspirations with business priorities. Delivering value to users comes first — but carving out time for automation is what sets teams up for long-term success.

One tool I’m particularly curious about is SQLMesh. It promises branch-based environments, automated model testing, and declarative change tracking — all of which seem like a natural fit for solving the isolation problem at the data transformation layer. I haven’t tried it in anger yet, but it’s definitely on my list to explore as we continue refining our approach to Databricks environments.

Final Thoughts

The challenges of environment isolation aren’t new — they just evolve with the tools we use. Whether it's a traditional database in a three-tier architecture or a modern platform like Databricks, the underlying principles remain the same: avoid shared mutable state, automate what you can, and build with repeatability in mind.

0
Subscribe to my newsletter

Read articles from Kurdapyo Data Engineer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kurdapyo Data Engineer
Kurdapyo Data Engineer

I’m the kuya at Kurdapyo Labs — a recovering Oracle developer who saw the light and helped migrate legacy systems out of Oracle (and saved a lot of money doing it). I used to write PL/SQL, Perl, ksh, Bash, and all kinds of hand-crafted ETL. These days, I wrestle with PySpark, Airflow, Terraform, and YAML that refuses to cooperate. I’ve been around long enough to know when things were harder… and when they were actually better. This blog is where I write (and occasionally rant) about modern data tools — especially the ones marketed as “no-code” that promise simplicity, but still break in production anyway. Disclaimer: These are my thoughts—100% my own, not my employer’s, my client’s, or that one loud guy on tech Twitter. I’m just sharing what I’ve learned (and unlearned) along the way. No promises, no warranties—just real talk, some opinions, and the occasional coffee/beer-fueled rant. If something here helps you out, awesome! If you think I’ve missed something or want to share your own take, I’d love to hear from you. Let’s learn from each other.