Naming conventions are often a contentious topic among architects. In Databricks, these debates become even more pronounced due to the extensive platform work required to establish a Databricks implementation.

In an AWS-based Databricks setup, naming conventions extend to AWS resources like VPNs, S3 buckets, IAM Roles, and IAM Policies. Additionally, with SCIM integration, service principals and groups are synced into Databricks, retaining their original names.

In Databricks, you'll encounter a mix of naming styles, including snake_case, kebab-case, camelCase, and PascalCase. This mix can include inconsistently named resources and occasionally misspelled ones that, for various reasons, aren't worth the effort to correct. It can be quite frustrating.

Give to Caesar the things which are Caesar’s

Typically, different teams manage the cloud and Databricks environments. It's essential to adhere to the cloud team's established naming conventions. They service other technological needs aside from the data so they know their world best. Cloud resources often use kebab-case because bucket names become part of URLs. Interestingly, Google advises against using underscores in URLs, as they don't treat them as word separators (e.g., my_site is read as mysite).

Databricks start with the workspace name. It’s a name that is used in a URL and should be in kebab-case.

For groups and users, it is common practice to implement SCIM(System for Cross-domain Identity Management) Integration. What happens there is that the groups and users are synced from ActiveDirectory/IdentityNow and the groups and users are synced into databricks carrying over how it’s named. Service principals, I’ve always seen kebab-case used there. There is also the option of creating databricks groups which are separate from the SCIM integrated groups, these ones often just get the same convention for Service Principals.

Enter Unity Catalog

Unity Catalog introduces additional complexity. Most resources, like Catalogs, Schemas, Tables, and Views, are used in SQL, where snake_case or UPPER_SNAKE_CASE is preferred. SQL syntax generally handles these well, as many databases aren't case-sensitive unless names are enclosed in double quotes. Using kebab-case can be problematic because the dash might be interpreted as a subtraction sign, necessitating double quotes.

PascalCase and camelCase are popular among Java developers and in SQLServer, but using these conventions require double quotes due to case sensitivity and spaces.

Beyond SQL objects, resources like storage credentials and external locations often follow kebab-case, especially when linked to S3 buckets.

ABCs

Ultimately, the key to effective naming is to Always Be Consistent. While smaller projects often achieve this, larger implementations may struggle due to the involvement of many people, time constraints, and limited reviews. Therefore, in addition to striving for consistency, it's important to Also Be Considerate. People generally do their best with the knowledge and resources available, and while names that violate standards can be changed, consistency might not always be prioritized due to other pressing business needs.

What's with a name (in Databricks)

Give to Caesar the things which are Caesar’s

Enter Unity Catalog

ABCs

Subscribe to my newsletter

Kurdapyo Data Engineer

Kurdapyo Data Engineer