Access Blob & ADLS2 data in Azure Synapse Dedicated pool

In my previous article, we explored various access options for ADLS2 and Blob storage in Azure Synapse serverless pool, focusing on how to access data using Managed Identity and Service Principal.

In this article, we will examine how to use Managed Identity and Service Principal to access data in Blob and ADLS2 storage through different endpoints and protocols in Azure Synapse dedicated pool.

SetUp

To get started we will first create a dedicated pool called SQLDB in the Synapse workspace.

Once created you can see the pool in SSMS.

We will use the same sample files that we had uploaded to the Blob and the ADLS2 storages in my previous article

Files on a blob storage are under a directory Customers

Synapse SQL Serverless pool

and so are the files on ADSL2

Synapse SQL Serverless pool

We will use the same Managed Identity and Service Principal as the one in my previous article.

Service Principal

Synapse SQL Serverless pool

Managed Identity

Synapse SQL Serverless pool

Create a master key in the pool

CREATE MASTER KEY ENCRYPTION BY PASSWORD='Any Strong Password'
GO

Please note that unlike serverless pool where you create the key in master database, you have to create the master key in the dedicated sql pool or in the session.

In dedicated pool we CANNOT use OPENROWSET syntax the way we use in serverless pool. You have to use External tables with External file format.

External File Format:

CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS (
        FIELD_TERMINATOR = ',', 
        FIRST_ROW  = 2,
        STRING_DELIMITER = '"',
        USE_TYPE_DEFAULT = False
        )
)

Grant the synapse workspace Storage Blob Data Contributor role to the blob storage.

Access Blob with Managed Identity and https protocol

First, create a database scoped credential

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY'; -- MANAGED IDENTITY IS THE KEYWORD

Then create an External data source. You can read about external data source here.

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
    -- location should be pointing to the container on the blob storage
      location = 'https://storageaccount.blob.core.windows.net/container',
      CREDENTIAL = MyCredential
);

Execute the query to the source Customers

CREATE EXTERNAL TABLE dbo.Customers
( [Customerid] [int] ,
    [Name] [varchar](40) ,
    [Email] [varchar](40) ,
    [city] [varchar](40)
    )
    WITH (
            LOCATION = 'Customers',
            DATA_SOURCE = SampleSource,
            FILE_FORMAT = [CsvFormatWithHeader]
            ) 

    SELECT * FROM  dbo.Customers

The query errors out : Msg 105097, Level 16, State 1, Line 1 Managed Service Identity authentication is only supported by abfss scheme.

This is because Manage Identity cannot access blob or ADLS2 using https on Synapse dedicated pool.

Well then you might ask , how to then access blob storage using Managed Identity through https ?

There is one way, albeit a very cheeky one..by using the COPY command. Basically what you do is COPY the data from the blob storage to table in the dedicated pool.

Note it has to be a physical table not and external one.

Create a Customers table

CREATE TABLE dbo.Customers
( [Customerid] [int] ,
    [Name] [varchar](40) ,
    [Email] [varchar](40) ,
    [city] [varchar](40)
)

Then use the COPY command.

COPY INTO  Customers  FROM 'https://storageaccount.blob.core.windows.net/container/Customers' 
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'Managed Identity')
)

Access Blob with Managed Identity and abfss protocol

Ok now lets try Managed Identity with abfss endpoint incase COPY is not an option for you.

Create database scoped credential

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY'; -- MANAGED IDENTITY IS THE KEYWORD

We modify the External data source to use abfss.

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
    -- location should be pointing to the container on the blob storage
       location = 'abfss://container@storageaccount.blob.core.windows.net',
    CREDENTIAL = MyCredential
);

We get an access error and not the protocol error.

Error path does not exist is quite strange.

I suspected that the issue must be with SSMS so I double checked the query with Synapse Studio and I get the same error.

But it was surprising that when I query for an individual file the query succeeds.

Then I thought that the issue must be because of the blob endpoint that I used in the External Data Source.

I dropped the existing Data Source and

DROP EXTERNAL DATA SOURCE SampleSource

I changed the endpoint from blob to dfs.

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
       location = 'abfss://container@storageaccount.dfs.core.windows.net',
    CREDENTIAL = MyCredential
);

and it worked because dfs endpoint generally respect the hierarchical namespace.

We have now established that Managed Identity does not work with https protocol.

If you dont want to through the route of the COPY option, you have to use dfs endpoint with abfss protocol to query the entire data and not just individual files.

Access ADLS2 with Managed Identity and https protocol

Lets now check ways to access ADLS2 using Managed Identity

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY';

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
       location = 'https://conatiner@adlsstorageaccount.dfs.core.windows.net/',
    CREDENTIAL = MyCredential
);

the query errors out

It means that irrespective of the storage types(Blob or ADLS2), Managed Identity is NOT supported with https in Synapse dedicated pool.

You can instead use the COPY option instead if you really would want to use https

COPY INTO  Customers  FROM 'https://adlsstorageaccount.dfs.core.windows.net/container/Customers' 
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'Managed Identity')
)

Access ADLS2 with Managed Identity and abfss protocol

Lets now try accessing ADLS2 through abfss endpoint with Managed Identity.

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY';

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (

    location = 'abfss://container@adlsstorgaeaccount.dfs.core.windows.net/',
    CREDENTIAL = MyCredential
);

As expected, this query succeeds.

Access Blob with Service Principal and https protocol

Drop the existing Data Source and Credentials

DROP EXTERNAL DATA SOURCE SampleSource
DROP DATABASE SCOPED CREDENTIAL MyCredential

As mentioned earlier, we will use the existing service principal called Fabric OAUTH2.

Synapse SQL Serverless pool

Ensure that you Grant the service principal Storage Blob Data Contributor role for the storage account.

Create database scoped credentials that uses Service Principal.

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' , 
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
       location = 'https://storageaccount.blob.core.windows.net/container',
    CREDENTIAL = MyCredential
);

The query errors out. We used https protocol with blob endpoints to access Blob storage. But is it possible to access blob storage using https in Service Principal ?

As was the case with Managed Identity where we used COPY option with https, we can use the same approach for Service Principal as well.

First create Customers table.

CREATE TABLE dbo.Customers
( [Customerid] [int] ,
    [Name] [varchar](40) ,
    [Email] [varchar](40) ,
    [city] [varchar](40)
)

and use the COPY option with the service principal URI

COPY INTO  Customers  FROM 'https://storageaccount.blob.core.windows.net/container/Customers' 
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token', 
SECRET='secret')
)

So, Service principal can access blob storage using https but it can do only using COPY option and not through Data Source. The only option is to use COPY if you really want to do it through the use of https.

Access Blob with Service Principal and abfss protocol

Lets try with abfss protocol with blob endpoints.

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' , 
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
    location = 'abfss://container@storageaccount.blob.core.windows.net/',
    CREDENTIAL = MyCredential
);

Once done ,the query did not error out and neither did it return any data which is quite interesting.

But querying individual files returns data

So lets change the endpoint from blob to dfs

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
    location = 'abfss://container@storageaccount.dfs.core.windows.net/',
    CREDENTIAL = MyCredential
);

Re executing the query returns all data across the directory.

Recall that earlier we had used blob endpoint through Managed Identity and the query errored out but when using blob endpoint with Service Principal the query did not error out and neither did it return any data. This would mean that the blob endpoint does not work with Service Principal as well even if you use abfss protocol.

You will have to use dfs endpoint alongside the abfss protocol.

Access ADLS2 with Service Principal and https protocol

Next lets try accessing ADLS2 storage with https protocol with blob endpoints.

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' , 
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (   
    location = 'https://container@adlsstorageaccount.blob.core.windows.net/',
    CREDENTIAL = MyCredential
);

As expected it would error out as we were using https to access ADLS2 storage.

Access ADLS2 with Service Principal and abfss protocol

CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' , 
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (   
    location = 'abfss://container@adlsstorageaccount.dfs.core.windows.net/',
    CREDENTIAL = MyCredential
);

The query works

Conclusion

In conclusion, both Managed Identity and Service Principal provide secure and flexible ways to access data in Azure Synapse dedicated pool. Understanding the differences in access methods across different endpoints and protocols is crucial for optimizing data workflows in Synapse dedicate pool. Implementing these options can help you tailor specific implementations for your custom solutions.

Thanks for reading !!!

0
Subscribe to my newsletter

Read articles from Sachin Nandanwar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sachin Nandanwar
Sachin Nandanwar