Deciphering Data Architectures: Closer Look at Different Paradigms

Navigating the intricate world of data architecture can feel overwhelming โ€” but it doesnโ€™t have to be. Letโ€™s break down key concepts like relational data warehouses, data lakes, modern data warehouses, data fabric, data lakehouses, and most importantly, data mesh. Along the way, weโ€™ll uncover their roles, challenges, and best use cases.

This blog post draws insights from James Serraโ€™s book, Deciphering Data Architectures and reflects his perspectives on the various data architectures discussed.


Relational Data Warehouse (RDW) ๐Ÿ“Š

What is it?

A relational data warehouse serves as a centralized hub for consolidating data from multiple sources. Itโ€™s designed for historical analysis and provides the โ€œsingle version of truth.โ€ Unlike operational databases, itโ€™s not intended for transactional (OLTP) purposes.

Why use it?

  • Consolidates data for unified insights.(schema-on-write)

  • Reduces the load on production systems.

  • Provides reliable historical trend analysis.

  • Ensures enhanced security and data quality.

๐Ÿ” Questions to consider:

  • Does your organization depend on historical reporting?

  • Are production systems overburdened with analytical queries?

RDWs have both a compute engine and storage. The compute engine is the processing power used to query the data. The storage is relational storage, which holds data that is structured via tables, rows, and columns. The RDWโ€™s compute power can be used only on its relational storageโ€”they are tied together.


Data Lake ๐ŸŒŠ

What is it?

A data lake stores raw, unprocessed data in its native format and uses a schema-on-read approach. Itโ€™s an excellent option for exploration and experimentation.

Why use it?

  • Cost-effective storage for vast data volumes.

  • Flexible access for data scientists and power users.

  • Frees up enterprise data warehouse resources.

  • Retains complete historical data in one place.

๐Ÿ” Things to think about:

  • How do you plan to manage semi-structured or unstructured data?

  • Do you prioritize flexibility and experimentation for data users?


Modern Data Warehouse (MDW) โš™๏ธ

How does it work?

The modern data warehouse combines the strengths of RDWs and data lakes, offering:

  • Low-latency, high-performance analytics.

  • Self-service business intelligence (BI) capabilities.

  • Interactive ad-hoc querying for business users.

Benefits:

  • Real-time data processing.

  • Compatibility with diverse data sources.

  • Enhanced compliance and security measures.

๐Ÿ” Ask yourself

  • Are your current analytics tools meeting business demands?

  • How effectively does your infrastructure support real-time data?


Data Fabric ๐ŸŒ

What is it?

Data fabric weaves together disparate data systems to create a unified, accessible layer. Think of it as a modern evolution of the traditional data warehouse, with added features like metadata cataloguing and data virtualization.

Key Features:

  • Streamlined data access policies.

  • Support for real-time data handling.

  • Integration through APIs and microservices.

๐Ÿ” Questions to ponder:

  • How well do your systems integrate diverse data sources?

  • Is real-time data access critical for your organization?


Data Lakehouse ๐Ÿ 

What is it?

The data lakehouse bridges the gap between data lakes and RDWs, combining the scalability of a data lake with the transactional capabilities of a data warehouse.

Key Features:

  • ACID transactions for data integrity.

  • Unified batch and streaming data processing.

  • Schema enforcement and evolution.

Who should use it?

Organizations dealing with:

  • Reliability issues between data lakes and warehouses.

  • Governance challenges for large-scale data.

๐Ÿ” Considerations:

  • How important are transactional guarantees for your workflows?

  • Are you facing persistent challenges with siloed data?


Data Mesh ๐Ÿฅ…

What is it?

Data mesh decentralizes data ownership, giving individual teams responsibility for their data while treating it as a product. This approach fosters scalability, agility, and collaboration.

Key Principles:

  1. Domain Ownership: Teams closest to the data take responsibility.

  2. Data as a Product: Prioritize accessibility, quality, and usability.

  3. Self-Serve Infrastructure: Equip teams with tools to build and manage their pipelines.

  4. Federated Governance: Maintain consistency with centralized standards.

Why is it hard to implement?

  1. Cultural and Organizational Barriers:

    • Shifting responsibilities requires a mindset overhaul.

    • Teams must adopt a product-oriented view of data.

    • Resistance is common from teams used to centralized ownership.

  2. Governance Complexity:

    • Federated governance is difficult to enforce across domains.

    • Maintaining interoperability and data quality is a significant challenge.

    • Without coordination, data silos or duplication may emerge.

  3. Technical Hurdles:

    • Self-serve infrastructure tools are still evolving.

    • Performance issues arise when aggregating data from domains.

    • Requires highly skilled engineers within each domain.

Why isnโ€™t it more popular?

  • High Cost: Organizational change and technical overhauls demand significant investment.

  • Uncertain ROI: Benefits may take years to materialize, making it harder for companies to justify.

  • Standardization Gaps: Lack of established tools and practices can result in inconsistent implementations.

Debates and Common Questions:

  • Concept or Tool Agnostic? Data mesh is a conceptual framework that relies on principles rather than specific tools, sparking debates about standardization.

  • Performance Concerns: Real-time insights can be delayed when aggregating data from multiple domains.

๐Ÿ“ˆ Pros:

  • Encourages collaboration and accountability.

  • Scales effectively by leveraging domain expertise.

  • Improves overall data quality and usability.

  • Enhances agility by decentralizing ownership.

๐Ÿ”‡ Cons:

  • High implementation and organizational costs.

  • Potential for data silos and duplication.

  • Requires skilled engineers and cultural buy-in.

  • Performance and interoperability challenges.

๐Ÿ” Is it right for you?

  • Does your organization have the resources and commitment for such a transformation?

  • Are your domain teams equipped with the necessary skills and tools?

  • How will you ensure governance across all domains?


When to Use Each Architecture โœ…

Choosing the right data architecture is highly context-dependent and influenced by various factors such as organizational size, data complexity, team expertise, and business goals. The following examples are not exhaustive but serve as a general guide to help you consider potential directions:

Modern Data Warehouse: Ideal for organizations with smaller datasets and traditional business intelligence (BI) needs. Itโ€™s best suited for scenarios requiring low latency and familiar relational database tools.

Data Fabric: Perfect for businesses needing to integrate diverse data sources. With its focus on real-time accessibility and governance, itโ€™s a strong choice for enterprises managing complex systems.

Data Lakehouse: A great fit for organizations prioritizing scalability, cost-effective storage, and advanced analytics. It offers a hybrid solution that balances flexibility and governance.

Data Mesh: Best for large, domain-oriented companies struggling with scalability and data ownership issues. Itโ€™s ideal for organizations ready to invest in a cultural and technical transformation.


  • Does your organization have the resources and commitment for such a transformation?

  • Are your domain teams equipped with the necessary skills and tools?

  • How will you ensure governance across all domains?


Comparison of Data Architectures

CharacteristicRelational Data WarehouseData LakeModern Data WarehouseData FabricData LakehouseData Mesh
Year introduced198420102011201620202019
Centralized/DecentralizedCentralizedCentralizedCentralizedCentralizedCentralizedDecentralized
Storage typeRelationalObjectRelational and objectRelational and objectObjectDomain-specific
Schema typeSchema-on-writeSchema-on-readSchema-on-read and schema-on-writeSchema-on-read and schema-on-writeSchema-on-readDomain-specific
Data securityHighLow to mediumMedium to highHighMediumDomain-specific
Data latencyLowHighLow to highLow to highMedium to highDomain-specific
Time to ValueMediumLowLowLowLowHigh
Total cost of the solutionHighLowMediumMedium to highLow to mediumHigh
Supported use casesLowLow to mediumMediumMedium to highHighHigh
Difficulty of developmentLowMediumMediumMediumMedium to highHigh
Maturity of technologyHighMediumMedium to highMedium to highMedium to highLow
Company skill set neededLowLow to mediumMediumMedium to highMedium to highHigh

Most companies will use pieces of each architecture to build a solution adapted to their specific needs.

Final Thoughts ๐ŸŒŸ

Most organizations will find success in adopting a hybrid approach, blending aspects of these architectures to suit their unique needs. Each framework offers distinct benefits and challenges โ€” the key is to evaluate your goals, resources, and scalability needs carefully. Which path will your organization choose?

๐Ÿ’ก
This blog only scratches the surface of what James Serra covers in his exceptional book, Deciphering Data Architectures. He provides an in-depth exploration of these topics, breaking them down across the following chapters:
  1. Big Data

  2. Types of Data Architectures

  3. The Architecture Design Session

  4. The Relational Data Warehouse

  5. Data Lake

  6. Data Storage Solutions and Processes

  7. Approaches to Design

  8. Approaches to Data Modeling

  9. Approaches to Data Ingestion

  10. The Modern Data Warehouse

  11. Data Fabric

  12. Data Lakehouse

  13. Data Mesh Foundation

  14. Should You Adopt Data Mesh? Myths, Concerns, and the Future

  15. People and Processes

  16. Technologies

Each chapter delves into key concepts, offering valuable insights and practical guidance. For anyone navigating the world of data architecture, this book is a must-read.

Thanks for reading! ๐Ÿ˜Š

0
Subscribe to my newsletter

Read articles from Nalaka Wanniarachchi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nalaka Wanniarachchi
Nalaka Wanniarachchi

Nalaka Wanniarachchi is an accomplished data analytics and data engineering professional with over 18 years of experience. As a CIMA(ACMA/CGMA) UK qualified ex-banker with strong analytical skills, he transitioned into building robust data solutions. Nalaka specializes in Microsoft Fabric and Power BI, delivering advanced analytics and engineering solutions. He holds a Microsoft certification as a Fabric Analytic Engineer and Power BI Professional, combining technical expertise with a deep understanding of financial and business analytics.