CI/CD for Azure Data Factory: Selective Deployment & Full Deployment

By Shubham Sahu
Tags: Azure Data Factory, CI/CD, Azure DevOps, Data Engineering, YAML

🚀 Introduction

Modern data engineering teams need a fast, reliable, and repeatable way to deploy Azure Data Factory (ADF) pipelines across multiple environments (dev, test, prod). Manual deployments are error-prone and slow. With the right CI/CD setup, we can deliver versioned, auditable, and environment-aware ADF pipelines using YAML pipelines in Azure DevOps.

In this article, we’ll walk through a CI/CD approach using:

ADF JSON files managed in Git
YAML pipelines for build and release
Environment-specific deployment filtering via CSV files
The SQLPlayer.DataFactoryTools DevOps extension

📊 Architecture Overview

This diagram illustrates a custom CI/CD workflow for Azure Data Factory using Git integration and Azure DevOps YAML pipelines. Developers commit ADF JSON files into Git branches. Once merged into the mainline, the Build pipeline validates and packages the artifacts. The Release pipeline deploys them across environments—Dev, Test, and Prod—using the SQLPlayer task. The adf_publish branch stores published JSON artifacts, keeping it in sync with the live factory.

📂 Folder Structure

We can use a clean Git repo layout for maintainability:

adf-deployment-repo/
├── LinkedServices/
│   ├── AzureBlobStorageLS.json
│   ├── AzureSqlDatabaseLS.json
├── Datasets/
│   ├── OrdersDataset.json
│   ├── ProductsDataset.json
├── Pipelines/
│   ├── pipeline_TransformOrders.json
│   ├── pipeline_LoadToSQL.json
│   ├── pipeline_NotifyComplete.json
├── deployment/
│   ├── config-dev.csv
│   ├── config-test.csv
│   ├── config-prod.csv
├── build-dataFactory.yaml
├── release-dataFactory.yaml

📄 CSV Configuration Example

The config file defines exactly which ADF objects to deploy:

config-test.csv

Pipeline;pipeline_TransformOrders;True
Pipeline;pipeline_LoadToSQL;True
Dataset;OrdersDataset;True
LinkedService;AzureBlobStorageLS;True;StorageConnectionString=DefaultEndpointsProtocol=https;AccountName=testblob;AccountKey=***

🎓 Step 1: Build Pipeline (Validation + Packaging)

Create a pipeline named build-dataFactory.yaml:

trigger: none

pool:
  vmImage: ubuntu-latest

jobs:
- job: BuildADF
  steps:
    - task: SQLPlayer.DataFactoryTools.BuildADF.BuildADFTask@1
      displayName: 'Validate ADF JSON Files'
      inputs:
        DataFactoryCodePath: '$(Build.SourcesDirectory)'

    - task: CopyFiles@2
      displayName: 'Copy Files to Artifact'
      inputs:
        Contents: '**/*'
        TargetFolder: '$(Build.ArtifactStagingDirectory)'

    - task: PublishBuildArtifacts@1
      inputs:
        PathtoPublish: '$(Build.ArtifactStagingDirectory)'
        ArtifactName: 'drop'

This pipeline validates ADF objects and publishes them as artifacts for release.

🚪 Step 2: Release Pipeline (Selective Deployment)

Create a pipeline named release-dataFactory.yaml that uses CSV files to control what gets deployed per environment.

parameters:
  - name: environment
    type: string
    default: dev

variables:
  - group: adf-${{ parameters.environment }}
  - name: configFile
    value: config-${{ parameters.environment }}.csv

stages:
- stage: DeployADF
  displayName: Deploy to ${{ parameters.environment }}
  jobs:
    - job: Deploy
      pool:
        vmImage: ubuntu-latest
      steps:
        - download: current
          artifact: drop

        - task: SQLPlayer.DataFactoryTools.PublishADF.PublishADFTask@1
          displayName: 'Deploy ADF to ${{ parameters.environment }}'
          inputs:
            azureSubscription: '$(serviceConnectionName)'
            ResourceGroupName: '$(resourceGroupName)'
            DataFactoryName: '$(dataFactoryName)'
            DataFactoryCodePath: '$(Pipeline.Workspace)/drop/drop'
            Location: '$(region)'
            StageType: FilePath
            StageConfigFile: '$(Pipeline.Workspace)/drop/drop/deployment/$(configFile)'
            DeleteNotInSource: false
            CreateNewInstance: false
            IncrementalDeployment: false
            FilteringType: Inline
            FilterText: |
              -managedPrivateEndpoint*
              -IntegrationRuntim*
            TriggerStartMethod: KeepPreviousState

📦 Full Deployment (All Files)

To perform a full deployment (deploying all objects in the repo, not selectively), simply omit the config CSV file by using StageType: Path instead of FilePath.

YAML Example for Full Deployment:

        - task: SQLPlayer.DataFactoryTools.PublishADF.PublishADFTask@1
          displayName: 'Full Deployment to ${{ parameters.environment }}'
          inputs:
            azureSubscription: '$(serviceConnectionName)'
            ResourceGroupName: '$(resourceGroupName)'
            DataFactoryName: '$(dataFactoryName)'
            DataFactoryCodePath: '$(Pipeline.Workspace)/drop/drop'
            Location: '$(region)'
            StageType: Path
            DeleteNotInSource: false
            CreateNewInstance: false
            IncrementalDeployment: false
            FilteringType: Inline
            FilterText: |
              -managedPrivateEndpoint*
              -IntegrationRuntim*
            TriggerStartMethod: KeepPreviousState

✅ This method will deploy everything found in the artifact folder.

✅ Optional: Incremental Deployment Mode

You can enable incremental deployment to improve performance in large environments by avoiding redeployment of unchanged objects.

YAML Sample:

        - task: SQLPlayer.DataFactoryTools.PublishADF.PublishADFTask@1
          displayName: 'Incremental Deploy ADF to ${{ parameters.environment }}'
          inputs:
            azureSubscription: '$(serviceConnectionName)'
            ResourceGroupName: '$(resourceGroupName)'
            DataFactoryName: '$(dataFactoryName)'
            DataFactoryCodePath: '$(Pipeline.Workspace)/drop/drop'
            Location: '$(region)'
            StageType: FilePath
            StageConfigFile: '$(Pipeline.Workspace)/drop/drop/deployment/$(configFile)'
            DeleteNotInSource: false
            CreateNewInstance: false
            IncrementalDeployment: true
            IncrementalDeploymentStorageUri: 'https://<yourstorage>.blob.core.windows.net/adf-deployment-state'
            FilteringType: Inline
            FilterText: |
              -managedPrivateEndpoint*
              -IntegrationRuntim*
            TriggerStartMethod: KeepPreviousState

How It Works:

Uses blob storage to store a deployment state JSON file
Compares current objects to previous deployment
Only changed objects are deployed

Requirements:

The storage container must exist and allow write access to your pipeline identity
Ensure no one edits ADF manually in Azure Portal (hash mismatch risk)
Remove the deployment state file to force a full redeploy

🔧 Assign Role to the Specific Container Only (More Secure)

Limiting access only to the container used for incremental deployment (e.g., adf-deployment-state) increases security and enforces the principle of least privilege.

How:

Go to the Storage Account in the Azure Portal
Select Containers → click on adf-deployment-state
Click Access Control (IAM) for this container
Click + Add role assignment
Configure the following:
- Role: Storage Blob Data Contributor
- Assign to: Your DevOps service principal (used in the pipeline)
- Scope: This container only

Your pipeline can only access the adf-deployment-state container, ensuring minimal access is granted and other containers in the same storage account remain isolated.

File Created in Blob:

adf-prod.adftools_deployment_state.json

🌌 Promotion Flow: Dev → Test → Prod

Developer commits ADF JSON to Git (feature branch)
PR is created and merged to develop (triggers build & deploy to dev)
Code is promoted to release/test via PR, triggering deploy to test
After approval, code is merged to main, triggering deployment to prod

🔧 Pro Tips

Use TriggerStartMethod: KeepPreviousState to avoid auto-starting triggers
Store secrets in Azure DevOps variable groups or Key Vault
Filter IRs, managed VNETs, and private endpoints
Only list stable objects in production config files
Use IncrementalDeployment: false for full control; use true only if state file is managed properly

🌟 Benefits of This Method

✅ Clean, traceable deployments via Git
✅ Environment-specific filtering
✅ No ARM templates or complicated parameters
✅ Compatible with DataOps principles (audit, rollback, security)

🧩 Best Practices for ADF Deployment in a DataOps Project

✅ Use a clean Git structure (LinkedServices/, Pipelines/, Datasets/, deployment/)
✅ Maintain separate config-<env>.csv files for environment-level filtering
✅ Store secrets in Azure DevOps variable groups or Azure Key Vault
✅ Always validate ADF JSONs in build before attempting release
✅ Use FilteringType in YAML to exclude IR, VNet, private endpoints
✅ Avoid using IncrementalDeployment: true unless using Blob state tracking correctly
✅ Provide IncrementalDeploymentStorageUri when enabling incremental mode
✅ Implement manual approval gates before prod deployment
✅ Track all changes via Git PRs and tag releases
✅ Promote from Dev → Test → Prod using separate configs

📖 Summary

This method of deploying ADF using Azure DevOps YAML pipelines + config files gives you:

Full control of what goes to each environment
Secure, repeatable, automated CI/CD
Fast onboarding for data engineers

🔗 References

Author: Shubham Sahu
Azure DevOps Lead | DevOps Enthusiast | DataOps Advocate

CI/CD for Azure Data Factory: Selective Deployment & Full Deployment Using Azure DevOps YAML Pipelines (Dev → Test → Prod)