Using ADLS Gen2 APIs to export object hierarchy and metadata in Microsoft Fabric


In an earlier article, I demonstrated the use of DataLakeServiceClient from the Azure SDK to manage OneLake storage. In this post, we'll delve into leveraging ADLS Gen2 APIs to manage OneLake storage and retrieve metadata for underlying objects across all lakehouses within a given workspace.
OneLake supports the same ADLS Gen2 APIs and SDKs. It allows you to treat OneLake as a unified ADLS storage account. This article explores how to use ADLS Gen2 APIs to perform essential operations to retrieve object metadata across all the lakehouses for a given workspace.
We will use the service principal to generate a bearer token, which will then be used to authenticate against OneLake. We use a custom GET method as an extension to the existing inbuilt method to asynchronously perform the underlying actions to return a response.
Note : If you would want to skip the writeup you can instead watch the code walkthrough video here.
Code
To get started , create a new console application and add the following references
using Microsoft.Identity.Client;
using Newtonsoft.Json.Linq;
using System.Net.Http.Headers;
using File = System.IO.File;
Declare a bunch of variables
private static string RedirectURI = "http://localhost";
private static string clientId = "Service Principal Client ID";
private static string workSpace = "Your Workspace";
private static readonly HttpClient client = new HttpClient();
private static string response = "";
private static string access_token = "";
private static string[] scopes = new string[] { "https://storage.azure.com/.default" };
private static string Authority = "https://login.microsoftonline.com/organizations";
public HttpClient Client => client;
Next, create a method ReturnAuthenticationResult
that uses MSAL to return the bearer token
public async static Task<AuthenticationResult> ReturnAuthenticationResult()
{
string AccessToken;
PublicClientApplicationBuilder PublicClientAppBuilder =
PublicClientApplicationBuilder.Create(clientId)
.WithAuthority(Authority)
.WithCacheOptions(CacheOptions.EnableSharedCacheOptions)
.WithRedirectUri(RedirectURI);
IPublicClientApplication PublicClientApplication = PublicClientAppBuilder.Build();
var accounts = await PublicClientApplication.GetAccountsAsync();
AuthenticationResult result;
try
{
result = await PublicClientApplication.AcquireTokenSilent(scopes, accounts.First())
.ExecuteAsync()
.ConfigureAwait(false);
}
catch
{
result = await PublicClientApplication.AcquireTokenInteractive(scopes)
.ExecuteAsync()
.ConfigureAwait(false);
}
return result;
}
Create a customized GET method
public async static Task<string> GetAsync(string url)
{
AuthenticationResult result = await ReturnAuthenticationResult();
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", result.AccessToken);
access_token = result.AccessToken;
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
try
{
return await response.Content.ReadAsStringAsync();
}
catch
{
Console.WriteLine(response.Content.ReadAsStringAsync().Result);
return null;
}
}
Create a method to traverse across all the lakehouses in the given workspace
public static async Task<JObject> TraverseAllLakeHousesInWorkspace(string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=false";
response = await GetAsync(dfsendpoint);
JObject jsonObject_lakehouse = JObject.Parse(response);
return jsonObject_lakehouse;
}
Create a method to traverse all the directories in a lakehouse
public static async Task<JObject> TraverseAllDirectoriesInLakeHouse(string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=false";
response = await GetAsync(dfsendpoint);
JObject jsonObject_lakehouse = JObject.Parse(response);
return jsonObject_lakehouse;
}
Create a method to export the object metadata to a csv file
public static async void WriteToCsv(string path, string values)
{
string delimiter = ", ";
string[] parts = values.Split('~');
if (!File.Exists(path))
{
string createText = "LakeHouse" + delimiter + "Name" + delimiter + "Type" + delimiter + "Path" + delimiter + "CreationTime" + delimiter + "ModifiedTime" + delimiter + "Permissions" + delimiter + Environment.NewLine;
File.WriteAllText(path, createText);
}
var keyValue = parts.ToString().Split("||");
DateTime dateTime = new DateTime(Convert.ToInt64(parts[4].Split("||")[1]));
string appendText = parts[0].Split("||")[1] + delimiter + parts[1].Split("||")[1] + delimiter + parts[2].Split("||")[1] + delimiter + parts[3].Split("||")[1] + delimiter + dateTime + delimiter + parts[5].Split("||")[1].Split(",")[1] + delimiter + parts[6].Split("||")[1] + Environment.NewLine;
File.AppendAllText(path, appendText);
}
Main method of the console application
static async Task Main(string[] args)
{
string foldername = "";
JObject jsonObject = await TraverseAllLakeHousesInWorkspace(workSpace);
JArray pathsArray = (JArray)jsonObject["paths"];
foreach (JObject path in pathsArray)
{
if (path["name"].ToString().Contains(".Lakehouse"))
{
string lakehouse = path["name"].ToString();
JObject jsonObject_l = await TraverseAllDirectoriesInLakeHouse($"{workSpace}/{lakehouse}/Files");
JArray pathsArray_l = (JArray)jsonObject_l["paths"];
foreach (JObject path_l in pathsArray_l)
{
if (path_l["isDirectory"] != null)
{
int lastSlashIndex = path_l["name"].ToString().LastIndexOf('/');
string foldername_n = path_l["name"].ToString().Substring(lastSlashIndex + 1);
int firstSlashIndex = path_l["name"].ToString().IndexOf('/');
string pathname = path_l["name"].ToString().Substring(firstSlashIndex + 1);
WriteToCsv(@"Location\DirectoryMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + foldername_n + "~Type || Directory " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_l["creationTime"] + "~LastModified ||" + path_l["lastModified"] + "~Permissions ||" + path_l["permissions"]);
string path_ = $"{workSpace}/{lakehouse}/Files/" + foldername_n;
await returnMetadata(lakehouse, path_);
}
else
{
int lastSlashIndex = path_l["name"].ToString().LastIndexOf('/');
string filename_n = path_l["name"].ToString().Substring(lastSlashIndex + 1);
string path_ = $"{workSpace}/{lakehouse}/Files/" + filename_n;
await returnMetadata(lakehouse, path_);
}
}
}
}
}
Lets now dissect the Main method.
The first function executed in the Main method, which is the entry point of the application is the TraverseAllLakeHousesInWorkspace
. This function traverses all the lakehouses of the workspace.
JObject jsonObject = await TraverseAllLakeHousesInWorkspace(workSpace);
The method returns the JSON in the following format.We use JObject to manage the response JSON.
Next thing that we do is, we run through the collection of the above objects and then filter objects of the type lakehouse. We achieve this by filtering out only those JSON elements that have a property with an extension .Lakehouse.
foreach (JObject path in pathsArray)
{
if (path["name"].ToString().Contains(".Lakehouse"))
{
//Rest of code
}
}
While running through the collection we fetch the lakehouse name and pass it as an argument to the function TraverseAllDirectoriesInLakeHouse
.
foreach (JObject path in pathsArray)
{
if (path["name"].ToString().Contains(".Lakehouse"))
{
string lakehouse = path["name"].ToString();
JObject jsonObject_l = await TraverseAllDirectoriesInLakeHouse($"{workSpace}/{lakehouse}/Files");
JArray pathsArray_l = (JArray)jsonObject_l["paths"];
//Rest of code
}
}
Below is the structure of one of the Lakehouses of the collection, Lakehouse_1
.Lets continue the explanation keeping the Lakehouse Lakehouse_1
in context.
TraverseAllDirectoriesInLakeHouse
will return all files and folders located in the root directory of the lakehouse Lakehouse_1
that’s being currently processed in the collection.
We first return the metadata of all the directories and the files at root level .The returnMetadata
function
public static async Task<JObject> returnMetadata(string lakehouse, string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=true";
response = await GetAsync(dfsendpoint);
if (response != "")
{
JObject jsonObject = JObject.Parse(response);
JArray pathsArray = (JArray)jsonObject["paths"];
foreach (JObject path_ in pathsArray)
{
if (path_["isDirectory"] != null && path != "")
{
int lastSlashIndex = path_["name"].ToString().LastIndexOf('/');
string foldername_n = path_["name"].ToString().Substring(lastSlashIndex + 1);
int firstSlashIndex = path_["name"].ToString().IndexOf('/');
string pathname = path_["name"].ToString().Substring(firstSlashIndex + 1);
WriteToCsv(@"Location\DirectoryMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + foldername_n + "~Type || Directory " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_["creationTime"] + "~LastModified ||" + path_["lastModified"] + "~Permissions ||" + path_["permissions"]);
}
else
{
int lastSlashIndex = path_["name"].ToString().LastIndexOf('/');
string filename_n = path_["name"].ToString().Substring(lastSlashIndex + 1);
int firstSlashIndex = path_["name"].ToString().IndexOf('/');
string pathname = path_["name"].ToString().Substring(firstSlashIndex + 1);
WriteToCsv(@"Location\FileMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + filename_n + "~Type || File " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_["creationTime"] + "~LastModified ||" + path_["lastModified"] + "~Permissions ||" + path_["permissions"]);
}
}
return jsonObject;
}
else
{
return null;
}
}
To do that we filter the tag path_l["isDirectory"]
.The method above needs to run in a recursion as there can be N level of objects in the hierarchy. To achieve this we call the function returnMetadata
that takes the lakehouse name and the path as an argument.
Also note that files also can exist at the root level of the Lakehouse so we need an else condition in the method.Based on the arguments sent to the returnMetadata
function, the function returns the metadata in the json format for all the objects in the Files folder.
The key aspect here is the use of the recursive=true
option in the OneLake endpoint URI. This option enables traversing across all the folders in the root folder recursively.
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=true";
Complete code
using Microsoft.Identity.Client;
using Newtonsoft.Json.Linq;
using System.Net.Http.Headers;
using File = System.IO.File;
namespace ReturnLakeHouseMetaData
{
internal class Program
{
private static string RedirectURI = "http://localhost";
private static string clientId = "Service Principal Client Id";
private static string workSpace = "Your Workspace";
private static readonly HttpClient client = new HttpClient();
private static string response = "";
private static string access_token = "";
private static string[] scopes = new string[] { "https://storage.azure.com/.default" };
private static string Authority = "https://login.microsoftonline.com/organizations";
public HttpClient Client => client;
static async Task Main(string[] args)
{
string foldername = "";
JObject jsonObject = await TraverseAllLakeHousesInWorkspace(workSpace);
JArray pathsArray = (JArray)jsonObject["paths"];
foreach (JObject path in pathsArray)
{
if (path["name"].ToString().Contains(".Lakehouse"))
{
string lakehouse = path["name"].ToString();
JObject jsonObject_l = await TraverseAllDirectoriesInLakeHouse($"{workSpace}/{lakehouse}/Files");
JArray pathsArray_l = (JArray)jsonObject_l["paths"];
foreach (JObject path_l in pathsArray_l)
{
if (path_l["isDirectory"] != null)
{
int firstSlashIndex = path_l["name"].ToString().IndexOf('/');
string pathname = path_l["name"].ToString().Substring(firstSlashIndex + 1);
int lastSlashIndex = path_l["name"].ToString().LastIndexOf('/');
string foldername_n = path_l["name"].ToString().Substring(lastSlashIndex + 1);
WriteToCsv(@"Location\DirectoryMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + foldername_n + "~Type || Directory " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_l["creationTime"] + "~LastModified ||" + path_l["lastModified"] + "~Permissions ||" + path_l["permissions"]);
string path_ = $"{workSpace}/{lakehouse}/Files/" + foldername_n;
await returnMetadata(lakehouse, path_);
}
else
{
int lastSlashIndex = path_l["name"].ToString().LastIndexOf('/');
string filename_n = path_l["name"].ToString().Substring(lastSlashIndex + 1);
string path_ = $"{workSpace}/{lakehouse}/Files/" + filename_n;
await returnMetadata(lakehouse, path_);
}
}
}
}
}
public static async Task<JObject> TraverseAllDirectoriesInLakeHouse(string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=false";
response = await GetAsync(dfsendpoint);
JObject jsonObject_lakhouse = JObject.Parse(response);
return jsonObject_lakhouse;
}
public static async Task<JObject> TraverseAllLakeHousesInWorkspace(string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=false";
response = await GetAsync(dfsendpoint);
JObject jsonObject_lakhouse = JObject.Parse(response);
return jsonObject_lakhouse;
}
public static async Task<JObject> returnMetadata(string lakehouse, string path)
{
string dfsendpoint = $"https://onelake.dfs.fabric.microsoft.com/{path}" + $"?resource=filesystem&recursive=true";
response = await GetAsync(dfsendpoint);
if (response != "")
{
JObject jsonObject = JObject.Parse(response);
JArray pathsArray = (JArray)jsonObject["paths"];
foreach (JObject path_ in pathsArray)
{
if (path_["isDirectory"] != null && path != "")
{
int lastSlashIndex = path_["name"].ToString().LastIndexOf('/');
string foldername_n = path_["name"].ToString().Substring(lastSlashIndex + 1);
int firstSlashIndex = path_["name"].ToString().IndexOf('/');
string pathname = path_["name"].ToString().Substring(firstSlashIndex + 1);
WriteToCsv(@"Location\DirectoryMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + foldername_n + "~Type || Directory " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_["creationTime"] + "~LastModified ||" + path_["lastModified"] + "~Permissions ||" + path_["permissions"]);
}
else
{
int lastSlashIndex = path_["name"].ToString().LastIndexOf('/');
string filename_n = path_["name"].ToString().Substring(lastSlashIndex + 1);
int firstSlashIndex = path_["name"].ToString().IndexOf('/');
string pathname = path_["name"].ToString().Substring(firstSlashIndex + 1);
WriteToCsv(@"Location\FileMetaData.csv", "LakeHouse|| " + lakehouse + "~Name|| " + filename_n + "~Type || File " + "~Path ||" + $"https://onelake.dfs.fabric.microsoft.com/{pathname}" + "~CreationTime || " + path_["creationTime"] + "~LastModified ||" + path_["lastModified"] + "~Permissions ||" + path_["permissions"]);
}
}
return jsonObject;
}
else
{
return null;
}
}
public static async void WriteToCsv(string path, string values)
{
string delimiter = ", ";
string[] parts = values.Split('~');
if (!File.Exists(path))
{
string createText = "LakeHouse" + delimiter + "Name" + delimiter + "Type" + delimiter + "Path" + delimiter + "CreationTime" + delimiter + "ModifiedTime" + delimiter + "Permissions" + delimiter + Environment.NewLine;
File.WriteAllText(path, createText);
}
var keyValue = parts.ToString().Split("||");
DateTime dateTime = new DateTime(Convert.ToInt64(parts[4].Split("||")[1]));
string appendText = parts[0].Split("||")[1] + delimiter + parts[1].Split("||")[1] + delimiter + parts[2].Split("||")[1] + delimiter + parts[3].Split("||")[1] + delimiter + dateTime + delimiter + parts[5].Split("||")[1].Split(",")[1] + delimiter + parts[6].Split("||")[1] + Environment.NewLine;
File.AppendAllText(path, appendText);
}
public async static Task<string> GetAsync(string url)
{
AuthenticationResult result = await ReturnAuthenticationResult();
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", result.AccessToken);
access_token = result.AccessToken;
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
try
{
return await response.Content.ReadAsStringAsync();
}
catch
{
Console.WriteLine(response.Content.ReadAsStringAsync().Result);
return null;
}
}
public async static Task<AuthenticationResult> ReturnAuthenticationResult()
{
string AccessToken;
PublicClientApplicationBuilder PublicClientAppBuilder =
PublicClientApplicationBuilder.Create(clientId)
.WithAuthority(Authority)
.WithCacheOptions(CacheOptions.EnableSharedCacheOptions)
.WithRedirectUri(RedirectURI);
IPublicClientApplication PublicClientApplication = PublicClientAppBuilder.Build();
var accounts = await PublicClientApplication.GetAccountsAsync();
AuthenticationResult result;
try
{
result = await PublicClientApplication.AcquireTokenSilent(scopes, accounts.First())
.ExecuteAsync()
.ConfigureAwait(false);
}
catch
{
result = await PublicClientApplication.AcquireTokenInteractive(scopes)
.ExecuteAsync()
.ConfigureAwait(false);
}
return result;
}
}
}
Code Walkthrough
Conclusion
Using the ADLS Gen2 API brings several advantages, and in this article, I have focused only on one key aspect which is managing One Lake storage within a Fabric workspace. This approach simplifies the process of managing and tracking all objects across the various lakehouses within the workspace, particularly when dealing with a complex directory structure. By leveraging the ADLS Gen2 API, you can efficiently organize and keep track of the objects(files/folders) across the lakehouses.
The above use case is just one of the many possibilities with ADLS Gen2 APIs, which provide a robust set of features for managing objects across the fabric workspace. In upcoming articles I will explore more features of ADLS GEN2 API’s.
Thanks for reading !!!
Subscribe to my newsletter
Read articles from Sachin Nandanwar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
