Simplifying Data Integration // Data Transformations with ADF: Merge Sources and Export to Parquet.

Arpit Tyagi

1 min read

Arpit Tyagi

1 min read

Step 1: Inspecting the CSV File in Data Lake and SQL Table present in Azure SQL DB
Step 2: Overview of the Dataflow for the task and then we will dig deeper into each step of this snapshot. Choose both sources i.e. “SQL DB and CSV file in ADLS”
Step 3: Use Join tool after that and select “Inner join“ for further work.
Step 4: Use Join on Customer id as that is the common field and choose inner join.
Step 5: Apply the filter transformation in the data. I have filtered all cust ids b/w 29500 and 30000.
Step 6: Sink Location would be a Parquet File in data lake so dataset has been chosen accordingly.
Step 7: Integrating Data Flow into a Pipeline: Directing Data to ADLS’s Parquet folder. Data must be saved into Parquet format in Data Lake.
Step 8: Pipeline Execution Success: Ensuring Smooth Data Transfer and Data Transformation.
Step 9: Verifying Parqute file in the data lake in Azure.

Step 1: Inspecting the CSV File in Data Lake and SQL Table present in Azure SQL DB

Step 2: Overview of the Dataflow for the task and then we will dig deeper into each step of this snapshot. Choose both sources i.e. “SQL DB and CSV file in ADLS”

Step 3: Use Join tool after that and select “Inner join“ for further work.

Step 4: Use Join on Customer id as that is the common field and choose inner join.

Step 5: Apply the filter transformation in the data. I have filtered all cust ids b/w 29500 and 30000.

Step 6: Sink Location would be a Parquet File in data lake so dataset has been chosen accordingly.

Step 7: Integrating Data Flow into a Pipeline: Directing Data to ADLS’s Parquet folder. Data must be saved into Parquet format in Data Lake.

Step 8: Pipeline Execution Success: Ensuring Smooth Data Transfer and Data Transformation.

Step 9: Verifying Parqute file in the data lake in Azure.

Subscribe to my newsletter

Read articles from Arpit Tyagi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Azure data-engineering Cloud Engineering Azure Data Factory Azure Data Lake Storage data migration ETL ELT

Written by

Arpit Tyagi

Experienced Data Engineer passionate about building and optimizing data infrastructure to fuel powerful insights and decision-making. With a deep understanding of data pipelines, ETL processes, and cloud platforms, I specialize in transforming raw data into clean, structured datasets that empower analytics and machine learning applications. My expertise includes designing scalable architectures, managing large datasets, and ensuring data quality across the entire lifecycle. I thrive on solving complex data challenges using modern tools and technologies like Azure, Tableau, Alteryx, Spark. Through this blog, I aim to share best practices, tutorials, and industry insights to help fellow data engineers and enthusiasts master the art of building data-driven solutions.

Simplifying Data Integration // Data Transformations with ADF: Merge Sources and Export to Parquet.

Table of contents

Step 1: Inspecting the CSV File in Data Lake and SQL Table present in Azure SQL DB

Step 2: Overview of the Dataflow for the task and then we will dig deeper into each step of this snapshot. Choose both sources i.e. “SQL DB and CSV file in ADLS”

Step 3: Use Join tool after that and select “Inner join“ for further work.

Step 4: Use Join on Customer id as that is the common field and choose inner join.

Step 5: Apply the filter transformation in the data. I have filtered all cust ids b/w 29500 and 30000.

Step 6: Sink Location would be a Parquet File in data lake so dataset has been chosen accordingly.

Step 7: Integrating Data Flow into a Pipeline: Directing Data to ADLS’s Parquet folder. Data must be saved into Parquet format in Data Lake.

Step 8: Pipeline Execution Success: Ensuring Smooth Data Transfer and Data Transformation.

Step 9: Verifying Parqute file in the data lake in Azure.

Subscribe to my newsletter

Arpit Tyagi

Arpit Tyagi