Snowflake: Architecture & Ecosystem
Hello Data Engineers, welcome to the new blog from the series of blogs on Snowflake, 'Zero to Snowflake'. Since you're diving into the Snowflake technology, let's understand the architecture of the Snowflake first.
Architecture
Snowflake offers a fully-managed, scalable, and secure solution for storing and analyzing large amounts of data. It is designed to handle a variety of data types, from structured to semi-structured and unstructured data, and to support complex data analytics workloads.
At a high level, Snowflake's architecture consists of three layers: Storage, Compute, and Services.
1. Storage (Database) Layer
The storage layer is responsible for storing all of the data in Snowflake. Data is stored in a proprietary columnar format, optimized for analytical workloads. Snowflake uses a scalable storage layer that can automatically scale up or down based on the data being stored. The storage layer is also designed to be fault-tolerant, with multiple copies of data stored in different availability zones for high availability and durability.
It provides storage based on AWS S3, Azure Blob, and GCP Storage. Costs are based on a daily average of all compressed data storage, including data stored according to the Time Travel retention policy and Failsafe practices
2. Compute (Query Processing) Layer
The compute layer is responsible for processing data in Snowflake. It consists of virtual warehouses, which are clusters of compute resources that are used to run queries and perform analytics. Each virtual warehouse is completely isolated from others, allowing for concurrency and parallelism. Users can create multiple virtual warehouses of varying sizes and configurations to handle different workloads. This means Snowflake can scale compute independently of storage, providing more flexibility and cost-effectiveness.
It provides computational power based on AWS EC2, Azure Virtual Machine, and GCP Compute Engine. The cost of a virtual warehouse is determined by the size of the virtual warehouse, the larger the warehouse, the higher the cost. It is also based on the amount of time the virtual warehouse is in use, the number of concurrent queries running in the virtual warehouse, and the region of the Snowflake account.
3. Cloud Service Layer
This layer is known as the Brain of Snowflake. The services layer provides management, security, and governance features. It includes services such as authentication, authorization, metadata management, and query optimization. The services layer also includes features such as data sharing, which allows users to securely share data with external parties, and data governance, which allows organizations to enforce data policies and compliance requirements.
Ecosystem
The Snowflake ecosystem refers to the set of tools and technologies that are used in conjunction with the Snowflake cloud data platform. includes a range of tools and technologies that are designed to help users work with data in Snowflake more effectively through an extensive network of connectors, drivers, programming languages, and utilities.
Snowflake Data Marketplace: A marketplace where Snowflake users can find and access third-party data sets to use in their analyses.
Snowflake Partner Connect: A platform that enables Snowflake users to connect easily with partners who can provide data integration, ETL, and other services.
Snowflake Data Exchange: A platform that allows Snowflake users to securely share and monetize data sets with other Snowflake users.
The above image displays 3rd-party partners and technologies that have been certified to provide native connectivity to Snowflake. They are categorized into Data Integration, ML & Data Science, BI, Security & Governance, SQL Development, and Programming Interfaces. Users can pick any of Snowflake's partners from a specific category according to their costs to achieve their requirements.
#Must Know Facts of Snowflake
Snowflake’s processing engine is ANSI SQL, the most familiar and utilized database querying language. SQL capabilities have been natively built into the product.
SQL functionality can be extended via SQL User Defined Functions (UDFs), Javascript UDFs, session variables, Stored Procedures, and User Defined Procedures (UDPs).
Snowflake supports structured and semi-structured data within one fully SQL data warehouse. Semi-structured data strings can be stored in a column with a data type of “VARIANT”.
I've started a detailed series of blogs on Snowflake. Check out my previous blog on the Snowflake series. If you found this article helpful, please do follow me on hashnode and LinkedIn. Thank you for reading, and I look forward to sharing more with you soon!
Subscribe to my newsletter
Read articles from Vipul Tripathi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Vipul Tripathi
Vipul Tripathi
Creator by Heart and Developer by Google