Deciphering Data Architectures: Closer Look at Different Paradigms


Navigating the intricate world of data architecture can feel overwhelming โ but it doesnโt have to be. Letโs break down key concepts like relational data warehouses, data lakes, modern data warehouses, data fabric, data lakehouses, and most importantly, data mesh. Along the way, weโll uncover their roles, challenges, and best use cases.
This blog post draws insights from James Serraโs book, Deciphering Data Architectures and reflects his perspectives on the various data architectures discussed.
Relational Data Warehouse (RDW) ๐
What is it?
A relational data warehouse serves as a centralized hub for consolidating data from multiple sources. Itโs designed for historical analysis and provides the โsingle version of truth.โ Unlike operational databases, itโs not intended for transactional (OLTP) purposes.
Why use it?
Consolidates data for unified insights.(schema-on-write)
Reduces the load on production systems.
Provides reliable historical trend analysis.
Ensures enhanced security and data quality.
๐ Questions to consider:
Does your organization depend on historical reporting?
Are production systems overburdened with analytical queries?
RDWs have both a compute engine and storage. The compute engine is the processing power used to query the data. The storage is relational storage, which holds data that is structured via tables, rows, and columns. The RDWโs compute power can be used only on its relational storageโthey are tied together.
Data Lake ๐
What is it?
A data lake stores raw, unprocessed data in its native format and uses a schema-on-read approach. Itโs an excellent option for exploration and experimentation.
Why use it?
Cost-effective storage for vast data volumes.
Flexible access for data scientists and power users.
Frees up enterprise data warehouse resources.
Retains complete historical data in one place.
๐ Things to think about:
How do you plan to manage semi-structured or unstructured data?
Do you prioritize flexibility and experimentation for data users?
Modern Data Warehouse (MDW) โ๏ธ
How does it work?
The modern data warehouse combines the strengths of RDWs and data lakes, offering:
Low-latency, high-performance analytics.
Self-service business intelligence (BI) capabilities.
Interactive ad-hoc querying for business users.
Benefits:
Real-time data processing.
Compatibility with diverse data sources.
Enhanced compliance and security measures.
๐ Ask yourself
Are your current analytics tools meeting business demands?
How effectively does your infrastructure support real-time data?
Data Fabric ๐
What is it?
Data fabric weaves together disparate data systems to create a unified, accessible layer. Think of it as a modern evolution of the traditional data warehouse, with added features like metadata cataloguing and data virtualization.
Key Features:
Streamlined data access policies.
Support for real-time data handling.
Integration through APIs and microservices.
๐ Questions to ponder:
How well do your systems integrate diverse data sources?
Is real-time data access critical for your organization?
Data Lakehouse ๐
What is it?
The data lakehouse bridges the gap between data lakes and RDWs, combining the scalability of a data lake with the transactional capabilities of a data warehouse.
Key Features:
ACID transactions for data integrity.
Unified batch and streaming data processing.
Schema enforcement and evolution.
Who should use it?
Organizations dealing with:
Reliability issues between data lakes and warehouses.
Governance challenges for large-scale data.
๐ Considerations:
How important are transactional guarantees for your workflows?
Are you facing persistent challenges with siloed data?
Data Mesh ๐ฅ
What is it?
Data mesh decentralizes data ownership, giving individual teams responsibility for their data while treating it as a product. This approach fosters scalability, agility, and collaboration.
Key Principles:
Domain Ownership: Teams closest to the data take responsibility.
Data as a Product: Prioritize accessibility, quality, and usability.
Self-Serve Infrastructure: Equip teams with tools to build and manage their pipelines.
Federated Governance: Maintain consistency with centralized standards.
Why is it hard to implement?
Cultural and Organizational Barriers:
Shifting responsibilities requires a mindset overhaul.
Teams must adopt a product-oriented view of data.
Resistance is common from teams used to centralized ownership.
Governance Complexity:
Federated governance is difficult to enforce across domains.
Maintaining interoperability and data quality is a significant challenge.
Without coordination, data silos or duplication may emerge.
Technical Hurdles:
Self-serve infrastructure tools are still evolving.
Performance issues arise when aggregating data from domains.
Requires highly skilled engineers within each domain.
Why isnโt it more popular?
High Cost: Organizational change and technical overhauls demand significant investment.
Uncertain ROI: Benefits may take years to materialize, making it harder for companies to justify.
Standardization Gaps: Lack of established tools and practices can result in inconsistent implementations.
Debates and Common Questions:
Concept or Tool Agnostic? Data mesh is a conceptual framework that relies on principles rather than specific tools, sparking debates about standardization.
Performance Concerns: Real-time insights can be delayed when aggregating data from multiple domains.
๐ Pros:
Encourages collaboration and accountability.
Scales effectively by leveraging domain expertise.
Improves overall data quality and usability.
Enhances agility by decentralizing ownership.
๐ Cons:
High implementation and organizational costs.
Potential for data silos and duplication.
Requires skilled engineers and cultural buy-in.
Performance and interoperability challenges.
๐ Is it right for you?
Does your organization have the resources and commitment for such a transformation?
Are your domain teams equipped with the necessary skills and tools?
How will you ensure governance across all domains?
When to Use Each Architecture โ
Choosing the right data architecture is highly context-dependent and influenced by various factors such as organizational size, data complexity, team expertise, and business goals. The following examples are not exhaustive but serve as a general guide to help you consider potential directions:
Modern Data Warehouse: Ideal for organizations with smaller datasets and traditional business intelligence (BI) needs. Itโs best suited for scenarios requiring low latency and familiar relational database tools.
Data Fabric: Perfect for businesses needing to integrate diverse data sources. With its focus on real-time accessibility and governance, itโs a strong choice for enterprises managing complex systems.
Data Lakehouse: A great fit for organizations prioritizing scalability, cost-effective storage, and advanced analytics. It offers a hybrid solution that balances flexibility and governance.
Data Mesh: Best for large, domain-oriented companies struggling with scalability and data ownership issues. Itโs ideal for organizations ready to invest in a cultural and technical transformation.
Does your organization have the resources and commitment for such a transformation?
Are your domain teams equipped with the necessary skills and tools?
How will you ensure governance across all domains?
Comparison of Data Architectures
Characteristic | Relational Data Warehouse | Data Lake | Modern Data Warehouse | Data Fabric | Data Lakehouse | Data Mesh |
Year introduced | 1984 | 2010 | 2011 | 2016 | 2020 | 2019 |
Centralized/Decentralized | Centralized | Centralized | Centralized | Centralized | Centralized | Decentralized |
Storage type | Relational | Object | Relational and object | Relational and object | Object | Domain-specific |
Schema type | Schema-on-write | Schema-on-read | Schema-on-read and schema-on-write | Schema-on-read and schema-on-write | Schema-on-read | Domain-specific |
Data security | High | Low to medium | Medium to high | High | Medium | Domain-specific |
Data latency | Low | High | Low to high | Low to high | Medium to high | Domain-specific |
Time to Value | Medium | Low | Low | Low | Low | High |
Total cost of the solution | High | Low | Medium | Medium to high | Low to medium | High |
Supported use cases | Low | Low to medium | Medium | Medium to high | High | High |
Difficulty of development | Low | Medium | Medium | Medium | Medium to high | High |
Maturity of technology | High | Medium | Medium to high | Medium to high | Medium to high | Low |
Company skill set needed | Low | Low to medium | Medium | Medium to high | Medium to high | High |
Most companies will use pieces of each architecture to build a solution adapted to their specific needs.
Final Thoughts ๐
Most organizations will find success in adopting a hybrid approach, blending aspects of these architectures to suit their unique needs. Each framework offers distinct benefits and challenges โ the key is to evaluate your goals, resources, and scalability needs carefully. Which path will your organization choose?
Big Data
Types of Data Architectures
The Architecture Design Session
The Relational Data Warehouse
Data Lake
Data Storage Solutions and Processes
Approaches to Design
Approaches to Data Modeling
Approaches to Data Ingestion
The Modern Data Warehouse
Data Fabric
Data Lakehouse
Data Mesh Foundation
Should You Adopt Data Mesh? Myths, Concerns, and the Future
People and Processes
Technologies
Each chapter delves into key concepts, offering valuable insights and practical guidance. For anyone navigating the world of data architecture, this book is a must-read.
Thanks for reading! ๐
Subscribe to my newsletter
Read articles from Nalaka Wanniarachchi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Nalaka Wanniarachchi
Nalaka Wanniarachchi
Nalaka Wanniarachchi is an accomplished data analytics and data engineering professional with over 18 years of experience. As a CIMA(ACMA/CGMA) UK qualified ex-banker with strong analytical skills, he transitioned into building robust data solutions. Nalaka specializes in Microsoft Fabric and Power BI, delivering advanced analytics and engineering solutions. He holds a Microsoft certification as a Fabric Analytic Engineer and Power BI Professional, combining technical expertise with a deep understanding of financial and business analytics.