Access Blob & ADLS2 data in Azure Synapse Dedicated pool
data:image/s3,"s3://crabby-images/922fc/922fcf8de512f93149a9d0ae18c634a212d8cd1a" alt="Sachin Nandanwar"
Table of contents
- SetUp
- Access Blob with Managed Identity and https protocol
- Access Blob with Managed Identity and abfss protocol
- Access ADLS2 with Managed Identity and https protocol
- Access ADLS2 with Managed Identity and abfss protocol
- Access Blob with Service Principal and https protocol
- Access Blob with Service Principal and abfss protocol
- Access ADLS2 with Service Principal and https protocol
- Access ADLS2 with Service Principal and abfss protocol
- Conclusion
data:image/s3,"s3://crabby-images/0ad30/0ad303fb11510752a744655e45f1d710112fdf1c" alt=""
In my previous article, we explored various access options for ADLS2 and Blob storage in Azure Synapse serverless pool, focusing on how to access data using Managed Identity and Service Principal.
In this article, we will examine how to use Managed Identity and Service Principal to access data in Blob and ADLS2 storage through different endpoints and protocols in Azure Synapse dedicated pool.
SetUp
To get started we will first create a dedicated pool called SQLDB
in the Synapse workspace.
Once created you can see the pool in SSMS.
We will use the same sample files that we had uploaded to the Blob and the ADLS2 storages in my previous article
Files on a blob storage are under a directory Customers
and so are the files on ADSL2
We will use the same Managed Identity and Service Principal as the one in my previous article.
Service Principal
Managed Identity
Create a master key in the pool
CREATE MASTER KEY ENCRYPTION BY PASSWORD='Any Strong Password'
GO
Please note that unlike serverless pool where you create the key in master database, you have to create the master key in the dedicated sql pool or in the session.
In dedicated pool we CANNOT use OPENROWSET syntax the way we use in serverless pool. You have to use External tables with External file format.
External File Format:
CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = False
)
)
Grant the synapse workspace Storage Blob Data Contributor
role to the blob storage.
Access Blob with Managed Identity and https protocol
First, create a database scoped credential
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY'; -- MANAGED IDENTITY IS THE KEYWORD
Then create an External data source. You can read about external data source here.
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
-- location should be pointing to the container on the blob storage
location = 'https://storageaccount.blob.core.windows.net/container',
CREDENTIAL = MyCredential
);
Execute the query to the source Customers
CREATE EXTERNAL TABLE dbo.Customers
( [Customerid] [int] ,
[Name] [varchar](40) ,
[Email] [varchar](40) ,
[city] [varchar](40)
)
WITH (
LOCATION = 'Customers',
DATA_SOURCE = SampleSource,
FILE_FORMAT = [CsvFormatWithHeader]
)
SELECT * FROM dbo.Customers
The query errors out : Msg 105097, Level 16, State 1, Line 1 Managed Service Identity authentication is only supported by abfss scheme.
This is because Manage Identity cannot access blob or ADLS2 using https
on Synapse dedicated pool.
Well then you might ask , how to then access blob storage using Managed Identity through https
?
There is one way, albeit a very cheeky one..by using the COPY
command. Basically what you do is COPY
the data from the blob storage to table in the dedicated pool.
Note it has to be a physical table not and external one.
Create a Customers table
CREATE TABLE dbo.Customers
( [Customerid] [int] ,
[Name] [varchar](40) ,
[Email] [varchar](40) ,
[city] [varchar](40)
)
Then use the COPY
command.
COPY INTO Customers FROM 'https://storageaccount.blob.core.windows.net/container/Customers'
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'Managed Identity')
)
Access Blob with Managed Identity and abfss protocol
Ok now lets try Managed Identity with abfss
endpoint incase COPY
is not an option for you.
Create database scoped credential
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY'; -- MANAGED IDENTITY IS THE KEYWORD
We modify the External data source to use abfss
.
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
-- location should be pointing to the container on the blob storage
location = 'abfss://container@storageaccount.blob.core.windows.net',
CREDENTIAL = MyCredential
);
We get an access error and not the protocol error.
Error path does not exist
is quite strange.
I suspected that the issue must be with SSMS so I double checked the query with Synapse Studio and I get the same error.
But it was surprising that when I query for an individual file the query succeeds.
Then I thought that the issue must be because of the blob
endpoint that I used in the External Data Source.
I dropped the existing Data Source and
DROP EXTERNAL DATA SOURCE SampleSource
I changed the endpoint from blob
to dfs
.
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'abfss://container@storageaccount.dfs.core.windows.net',
CREDENTIAL = MyCredential
);
and it worked because dfs
endpoint generally respect the hierarchical namespace.
We have now established that Managed Identity does not work with https
protocol.
If you dont want to through the route of the COPY
option, you have to use dfs
endpoint with abfss
protocol to query the entire data and not just individual files.
Access ADLS2 with Managed Identity and https protocol
Lets now check ways to access ADLS2 using Managed Identity
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY';
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'https://conatiner@adlsstorageaccount.dfs.core.windows.net/',
CREDENTIAL = MyCredential
);
the query errors out
It means that irrespective of the storage types(Blob or ADLS2), Managed Identity is NOT supported with https
in Synapse dedicated pool.
You can instead use the COPY
option instead if you really would want to use https
COPY INTO Customers FROM 'https://adlsstorageaccount.dfs.core.windows.net/container/Customers'
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'Managed Identity')
)
Access ADLS2 with Managed Identity and abfss protocol
Lets now try accessing ADLS2 through abfss
endpoint with Managed Identity.
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY = 'MANAGED IDENTITY';
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'abfss://container@adlsstorgaeaccount.dfs.core.windows.net/',
CREDENTIAL = MyCredential
);
As expected, this query succeeds.
Access Blob with Service Principal and https protocol
Drop the existing Data Source and Credentials
DROP EXTERNAL DATA SOURCE SampleSource
DROP DATABASE SCOPED CREDENTIAL MyCredential
As mentioned earlier, we will use the existing service principal called Fabric OAUTH2
.
Ensure that you Grant the service principal Storage Blob Data Contributor
role for the storage account.
Create database scoped credentials that uses Service Principal.
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' ,
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'https://storageaccount.blob.core.windows.net/container',
CREDENTIAL = MyCredential
);
The query errors out. We used https
protocol with blob
endpoints to access Blob storage. But is it possible to access blob storage using https
in Service Principal ?
As was the case with Managed Identity where we used COPY
option with https
, we can use the same approach for Service Principal as well.
First create Customers table.
CREATE TABLE dbo.Customers
( [Customerid] [int] ,
[Name] [varchar](40) ,
[Email] [varchar](40) ,
[city] [varchar](40)
)
and use the COPY
option with the service principal URI
COPY INTO Customers FROM 'https://storageaccount.blob.core.windows.net/container/Customers'
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token',
SECRET='secret')
)
So, Service principal can access blob storage using https
but it can do only using COPY
option and not through Data Source. The only option is to use COPY
if you really want to do it through the use of https
.
Access Blob with Service Principal and abfss protocol
Lets try with abfss
protocol with blob
endpoints.
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' ,
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'abfss://container@storageaccount.blob.core.windows.net/',
CREDENTIAL = MyCredential
);
Once done ,the query did not error out and neither did it return any data which is quite interesting.
But querying individual files returns data
So lets change the endpoint from blob
to dfs
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'abfss://container@storageaccount.dfs.core.windows.net/',
CREDENTIAL = MyCredential
);
Re executing the query returns all data across the directory.
Recall that earlier we had used blob
endpoint through Managed Identity and the query errored out but when using blob
endpoint with Service Principal the query did not error out and neither did it return any data. This would mean that the blob
endpoint does not work with Service Principal as well even if you use abfss
protocol.
You will have to use dfs
endpoint alongside the abfss
protocol.
Access ADLS2 with Service Principal and https protocol
Next lets try accessing ADLS2 storage with https
protocol with blob
endpoints.
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' ,
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'https://container@adlsstorageaccount.blob.core.windows.net/',
CREDENTIAL = MyCredential
);
As expected it would error out as we were using https
to access ADLS2 storage.
Access ADLS2 with Service Principal and abfss protocol
CREATE DATABASE SCOPED CREDENTIAL MyCredential
WITH IDENTITY='ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token' ,
SECRET='secret'
CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
location = 'abfss://container@adlsstorageaccount.dfs.core.windows.net/',
CREDENTIAL = MyCredential
);
The query works
Conclusion
In conclusion, both Managed Identity and Service Principal provide secure and flexible ways to access data in Azure Synapse dedicated pool. Understanding the differences in access methods across different endpoints and protocols is crucial for optimizing data workflows in Synapse dedicate pool. Implementing these options can help you tailor specific implementations for your custom solutions.
Thanks for reading !!!
Subscribe to my newsletter
Read articles from Sachin Nandanwar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/922fc/922fcf8de512f93149a9d0ae18c634a212d8cd1a" alt="Sachin Nandanwar"