COPY command in Microsoft Fabric

Let's say you want to import data from an ADLS2 or any other external storage location to a Fabric Data Warehouse on the fly instead of the usual data pipeline approach. Something you would want to do through SQL that provides lightspeed data ingestion and provide the flexibility of making quick setting changes.

One option through which it can be achieved is by using the COPY command.

Yes you heard it right...the COPY command from Synapse Analytics can be used to move data into Microsoft Fabric warehouse but it works only for data warehouse and not for database or data lake in Fabric.

Lets see how it can be done.

SetUp

The given ADLS2 location is a secured location and has three csv files with each file having five records. In total fifteen records.

COPY command In Fabric

Select one of the existing service principal. We will use Fabric OAUTH2 service principal that exists in my Entra account.

COPY command In Fabric

We require ClientId, TenantId and Secret of the Service Principal.

COPY command In Fabric

Ensure that the Service Principal has atleast Storage Blob Data Contributor access to the ADLS2 location.

COPY command In Fabric

Create the destination table that matches the structure of the source data.

   CREATE TABLE Customers
   ( [Customerid] [int] ,
    [Name] [varchar](40) ,
    [Email] [varchar](40) ,
    [city] [varchar](40) 
 )

and then use the following COPY statement to move the data across.

COPY INTO  Customers  FROM 'https://storageaccount.blob.core.windows.net/container/Customers' 
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'ClientId@https://login.microsoftonline.com/tenantId/oauth2/v2.0/token', 
SECRET='secret')

COPY command In Fabric

Note that I have use https protocol with blob endpoints. You can use abfss protocols with dfs endpoint. Similar to use of COPY command on Synapse dedicated pool, there are no limitations wrt use of endpoints and protocols in Fabric for use of COPY command.

You can refer to this section of one of my article on Synapse dedicated pool for further details on COPY command in Synapse Analytics.

You can even run this command in TSQL Notebooks

Does COPY command work in Fabric Database ?

Unfortunately it doesn’t. When I execute the statement in SSMS it fails with this error message

COPY command In Fabric

and running the same on Fabric UI fails with a different error message.

COPY command In Fabric

I came across this documentation of use of COPY command

https://learn.microsoft.com/en-us/fabric/data-warehouse/ingest-data-copy

But the article uses data that is publicly available and it does not mention how to use it with secured data sources.

Database scoped credentials and External data source

But then is it possible to use the other option database scoped credentials with external data source that works in Azure Synapse Analytics ?

I tried creating them. Creation of Database Scoped Credentials and External Data Source do succeed.

Database Scoped Credentials

COPY command In Fabric

External Data Source

CREATE EXTERNAL DATA SOURCE SampleSource
WITH (
    TYPE=BLOB_STORAGE,
    location = 'abfss://adlsfilesystem@sachinadls.dfs.core.windows.net/',
    CREDENTIAL = MyCredential
);

COPY command In Fabric

and when I use OPENROWSET to query the data that External Data Source points to, the query errors out

SELECT 
    CustomerID,Name,Email,City
FROM 
    OPENROWSET(BULK 'Customers',format='csv',
    firstrow = 2) 
    WITH (
    [Customerid] [int] ,
    [Name] [varchar](40) ,
    [Email] [varchar](40) ,
    [city] [varchar](40) 
) AS [r]
go

COPY command In Fabric

I checked the feature comparison between Azure SQL Database and Fabric SQL Database

Features comparison: Azure SQL Database and SQL database (preview) - Microsoft Fabric | Microsoft Learn

and I found out that Database Scoped Credentials are indeed supported

but OPENROWSET isn’t..similar to Synapse dedicated pool

and there is no mention if External Data Source is supported or not.

So I am quite unsure the purpose Database Scoped Credentials and External Data Source serve in Fabric Database.

Just out of curiosity I tried using Managed Identity in the COPY command knowing that Managed Identity is now replaced with Workspace Identity in Fabric

COPY INTO  Customers  FROM 'abfss://adlsfilesystem@sachinadls.dfs.core.windows.net/' 
WITH ( FILE_TYPE='CSV', FIRSTROW =2,
CREDENTIAL=(IDENTITY= 'Managed Identity')

and as expected it errored out with a never seen error message which was a first one for me.

COPY command In Fabric

Msg 15858, Level 16, State 1, Line 1 Cannot obtain AAD token to access storage. Error message: 'Server identity is not configured. Please follow the steps in "Assign an Azure AD identity to your server and add Directory Reader permission to your identity" (https://aka.ms/sqlaadsetup)'.

Risks with Use of COPY command

A major issue with the COPY command approach is that using it in your production environment is far from ideal. You definitely wouldn't want to hardcode the secret values in your code or share it with anyone in first place.

And since there is nothing equivalent to Azure Key Vault in Fabric there is no way to encrypt your secrets or securely store them.

The feature is on the roadmap to be released in Q1 2025.

https://learn.microsoft.com/en-us/fabric/release-plan/data-factory#data-source-identity-management-azure-key-vault

But then this would mean you have to store credentials on Azure Key vault and access them from Fabric and key vault wont be inherently a Fabric thing.Though it is possible to access Azure Key vault in a Fabric Notebook using the MSSPARKUTILS package.

As it's not possible to access key vaults from SQL. So if you don't want to risk exposing the secrets then it's prudent not to use COPY command in prod environment.

Conclusion

The COPY command is great command if you want to quickly move data across without going through the overhead of creating data pipeline and works well with all protocols and endpoints.

But given the limitations that comes with its use limited only to Data warehouse I would definitely like that it should be supported in Fabric Database as well.

Thanks for reading !!!

0
Subscribe to my newsletter

Read articles from Sachin Nandanwar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sachin Nandanwar
Sachin Nandanwar