What's with a name (in Databricks)

Naming conventions are often a contentious topic among architects. In Databricks, these debates become even more pronounced due to the extensive platform work required to establish a Databricks implementation.

In an AWS-based Databricks setup, naming conventions extend to AWS resources like VPNs, S3 buckets, IAM Roles, and IAM Policies. Additionally, with SCIM integration, service principals and groups are synced into Databricks, retaining their original names.

In Databricks, you'll encounter a mix of naming styles, including snake_case, kebab-case, camelCase, and PascalCase. This mix can include inconsistently named resources and occasionally misspelled ones that, for various reasons, aren't worth the effort to correct. It can be quite frustrating.

Give to Caesar the things which are Caesar’s

Typically, different teams manage the cloud and Databricks environments. It's essential to adhere to the cloud team's established naming conventions. They service other technological needs aside from the data so they know their world best. Cloud resources often use kebab-case because bucket names become part of URLs. Interestingly, Google advises against using underscores in URLs, as they don't treat them as word separators (e.g., my_site is read as mysite).

Databricks start with the workspace name. It’s a name that is used in a URL and should be in kebab-case.

For groups and users, it is common practice to implement SCIM(System for Cross-domain Identity Management) Integration. What happens there is that the groups and users are synced from ActiveDirectory/IdentityNow and the groups and users are synced into databricks carrying over how it’s named. Service principals, I’ve always seen kebab-case used there. There is also the option of creating databricks groups which are separate from the SCIM integrated groups, these ones often just get the same convention for Service Principals.

Enter Unity Catalog

Unity Catalog introduces additional complexity. Most resources, like Catalogs, Schemas, Tables, and Views, are used in SQL, where snake_case or UPPER_SNAKE_CASE is preferred. SQL syntax generally handles these well, as many databases aren't case-sensitive unless names are enclosed in double quotes. Using kebab-case can be problematic because the dash might be interpreted as a subtraction sign, necessitating double quotes.

PascalCase and camelCase are popular among Java developers and in SQLServer, but using these conventions require double quotes due to case sensitivity and spaces.

Beyond SQL objects, resources like storage credentials and external locations often follow kebab-case, especially when linked to S3 buckets.

ABCs

Ultimately, the key to effective naming is to Always Be Consistent. While smaller projects often achieve this, larger implementations may struggle due to the involvement of many people, time constraints, and limited reviews. Therefore, in addition to striving for consistency, it's important to Also Be Considerate. People generally do their best with the knowledge and resources available, and while names that violate standards can be changed, consistency might not always be prioritized due to other pressing business needs.

0
Subscribe to my newsletter

Read articles from Kurdapyo Data Engineer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kurdapyo Data Engineer
Kurdapyo Data Engineer

I’m the kuya at Kurdapyo Labs — a recovering Oracle developer who saw the light and helped migrate legacy systems out of Oracle (and saved a lot of money doing it). I used to write PL/SQL, Perl, ksh, Bash, and all kinds of hand-crafted ETL. These days, I wrestle with PySpark, Airflow, Terraform, and YAML that refuses to cooperate. I’ve been around long enough to know when things were harder… and when they were actually better. This blog is where I write (and occasionally rant) about modern data tools — especially the ones marketed as “no-code” that promise simplicity, but still break in production anyway. Disclaimer: These are my thoughts—100% my own, not my employer’s, my client’s, or that one loud guy on tech Twitter. I’m just sharing what I’ve learned (and unlearned) along the way. No promises, no warranties—just real talk, some opinions, and the occasional coffee/beer-fueled rant. If something here helps you out, awesome! If you think I’ve missed something or want to share your own take, I’d love to hear from you. Let’s learn from each other.