Overcoming Limitations of Get Metadata Activity in Azure Data Factory / Synapse

Harsh Bakshi
3 min readOct 5, 2023

--

Problem Statement

When working with files uploaded on Azure Blob Storage or Azure Data Lake Storage, there are limitations to the properties that can be accessed using the Get Metadata Activity in Azure Data Factory / Synapse.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

The Get Metadata Activity can only retrieve a subset of properties as shown below:

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

However, there is a way to retrieve other properties such as Creation Time and Content-Type in Synapse / Data Factory pipelines.

Prerequisites

  1. Azure Data Factory / Synapse
  2. Azure Blob Storage / Azure Data Lake Storage

Solution

1. To retrieve additional blob file properties, we can leverage the Azure Blob Storage REST API : Get Blob.

2. To authenticate via Managed Identity, provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

b) Search Storage Blob Data Reader role and proceed further

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse
Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

3. Create a pipeline within Synapse / Data Factory and use a Web Activity to trigger the REST API.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

URL

In case of Azure Blob Storage

https://<>.blob.core.windows.net/<>/<>

In case of Azure Data Lake Storage

https://<>.dfs.core.windows.net/<>/<>

Method: GET

Authentication: System Assigned Managed Identity

Resource: https://storage.azure.com/

Headers:

1 x-ms-version : 2017–11–09

Output

Get Metadata Activity output

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Web Activity Output (Azure Blob Storage)

Skrots. Know more about our services at Skrots Services, Also checkout all other blogs at Blog at Skrots

Thanks, Harsh
Founder | CEO — Skrots

Learn more about our blog at Blog at Skrots. Checkout our list of services on Skrots. Give a look at our website design at Skrots . Checkout our LinkedIn Page at LinkedIn.com. Check out our original post at https://blog.skrots.com/overcoming-limitations-of-get-metadata-activity-in-azure-data-factory-synapse/?feed_id=262&_unique_id=651f0ede0e0d7

--

--

No responses yet