Data Analysis Journey with Databricks & Google Sheets
Challenge / Issue
Our Business Analyst team has been skillfully utilizing Google Sheets to collect and manage data, demonstrating their proficiency with this tool. They thoughtfully organize processes, items, and charts within Google Sheets.
Meanwhile, my Data Engineering team has developed data pipelines to process information from various sources using Databricks.
The Business Analyst team has been exporting Gold tables from Databricks and then importing them into Google Sheets. Although Databricks could serve their analysis needs, the team tends to favor the tools they are familiar with, as these tools greatly boost our productivity.
What is the Gold table
In Databricks, a "Gold table" is often seen as the ultimate, polished data set that has been meticulously transformed and checked for quality. This table is a key component in a multi-layer data architecture, which typically consists of:
Bronze Table: This is where raw data from various sources is initially collected.
Silver Table: The data is cleaned and enriched, possibly involving some aggregations or transformations.
Gold Table: This is the most refined data, specifically structured for analytical or business intelligence purposes, and is ready for reporting and decision-making.
Characteristics of a Gold Table:
High Quality: The data is thoroughly validated and cleansed to ensure it is accurate and consistent.
Optimized for Performance: It is often indexed or partitioned to allow faster queries.
Business-Ready: It includes only the essential fields needed for analysis and reporting.
Documentation: It has a clearly defined schema and metadata to facilitate easy understanding and use.
Use Cases:
- Conducting advanced analytics.
https://www.databricks.com/glossary/medallion-architecture
Exploring Solutions for productivity
The pipelines on Databricks are automatically executed and generate the latest tables. It is a great platform for all data engineers. Our business analysis team needs to access the latest tables to analyze data, so it's necessary for them to export/import several times a day.
To make this process more efficient, we discussed and clarified our requirements:
Automatically update Google Sheets
Importing data from Databricks into Google Sheets allows us to work seamlessly within Google Sheets.
No need to execute complex SQL (the team lacks technical skills for it)
Appropriate access rights for the table on Databricks
Continue using the collaborative features on Google Sheets
We searched for a good solution using Google and found many articles about importing from Google Sheets to Databricks, but not much about the opposite direction.
The Ideal Solution for Our Needs
In our situation, we found the best solution for us. The tool is Bricksheet, which is specifically designed to connect Google Sheets and Databricks.
Web URL : https://bricksheet.amukin.com/
Install : https://workspace.google.com/marketplace/app/bricksheet/979793077657
The tool met most of our requirements. We can import tables directly into Google Sheets in just four clicks without needing any SQL statements. Catalogs and schemas are displayed in a drop-down menu on the user interface. We need to configure Databricks parameters in the tool, and once configured, catalogs and schemas are automatically displayed. We simply select a catalog, and then the schema is automatically shown in a drop-down menu under the catalog. Once the schema is selected, the table list appears under it. Just select the table with a radio button, then click "Import Table." Depending on the table size, it takes a few seconds to complete the import to Google Sheets.
The Result - Decision
Through testing Bricksheet, we decided to use the tool. It only imports tables, making it safer for users because there's no risk of changing data on Databricks. For streamlining and processing data, Databricks is the best choice for data engineering. Another benefit we found is that there are no downloaded files containing confidential table data. Some members have many downloaded files on their laptops. From a security perspective, storing files on individual laptops is not recommended. Bricksheet helps prevent security incidents and saves unnecessary time and cost.
You can try this tool for free, so it's worth a try for anyone using the Databricks warehouse!
#data
Subscribe to my newsletter
Read articles from Reny Kamar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by