Deploy Python Script to Azure Functions App
The scenario for this post is to create Python Scripts that fetch json file from SportsRadar API, filer some unwanted information and save them into Azure Data Lake Storage Gen 2 container. Notice that the original json file contain many unwanted information nested inside the json structure, that is the reason why customized Python code is better suited rather than Azure Synapse or Azure Data Factory built-in pipeline activities.
We will then implement notebook for follow on transformation to load Silver and Gold layer tables, however, the transformation part is not the focus of this post.
We will go through the steps of building Azure Function, as well as the potential issue when trying to trigger the function app from Azure Synapse.
Prerequisite
Azure subscription
Having an Azure account with active subscription is mandatory for creating any resource on Azure, free account can be created here.
Visual Studio Code(Optional)
Visual Studio Code is the IDE designed by Microsoft, therefore, it is not difficult to understand that VS Code has the best Azure integration capability that can truly improve user experience in terms of resources deployment. However, it is totally optional, because writing and editing script directly on the Azure Portal is surely an option.
After installing VS Code, be sure to install all Azure extension, Python extension, as of the time writing this post, support for Python 3.11 is still under public preview, Python 3.8 or Python 3.9 will be mostly suggested for stable development and deployment.
Azure Functions Core Tool v.4(Optional)
Azure Functions Core Tool improves local development user experience by enabling developer to test functions locally before deploying to Azure.
Create Azure Function App on Azure
Azure Functions App can be created either through VS Code or directly on Azure Portal. Just remember that when creating from VS Code, select the Create Functions App in Azure(Advanced), so that full list of setting can be configured to suit your production need.
Some configuration you may set,
- Subscription (Your available one)
- Resource Group (Create new or choose existing one)
- Function App Name (App Name need to be unique)
- Code or Container Image (Deploy code or as container images)
- Runtime Stack (Python, Java..)
- Version (Python 3.9..)
- Region (Choose the same one with the resource group)
- OS (Linux or Win)
- Hosting Option (Consumption for Serverless mode)
- Storage account (Choose the one that is in the same region)
- Deployment (For GitHub integration)
Create Local Project
Now, start by creating a project locally, go to WORKSPACE then click on the Function App Icon and select Create Function.
It will them prompt for choosing create new project and the local directory path. Regarding details of creating a Function Project template, Microsoft Document can be a great reference.
Python Libraries
Libraries for accessing Azure Key Vault, Azure Storage Account, sending HTTP request.
azure-functions
requests==2.31.0
azure-core==1.28.0
azure-identity==1.13.0
azure-storage-blob==12.17.0
azure-storage-file-datalake==12.12.0
azure-keyvault-secrets==4.7.0
Project Directory Structure
Based on Documentation, under root directory, requirements.txt should be included for all necessary libraries. All other detail information can be found in the documentation as well.
<project_root>/
| - .venv/
| - .vscode/
| - my_first_function/
| | - __init__.py
| | - function.json
| | - example.py
| - my_second_function/
| | - __init__.py
| | - function.json
| - shared_code/
| | - __init__.py
| | - my_first_helper_function.py
| | - my_second_helper_function.py
| - tests/
| | - test_my_second_function.py
| - .funcignore
| - host.json
| - local.settings.json
| - requirements.txt
| - Dockerfile
Access Credential
What I really want to focus in this post is reading from or writing data to Azure Data Lake Storage Gen 2. Whenever deploying application to Cloud service like Azure or AWS, role based or identity based control are the most commonly used permission control mechanisms. A more intuitive approach will be assigning Role to Function App managed identity for accessing Storage Account directly. However, in some use cases, we may need to store secrets in the Key Vault for centralized management, and our app may need to have the secret for calling external API like the scenario we have here (Calling SportsRadar API). Therefore, in the following section, we are going to demonstrate how to set up managed identity for the Azure Function App, then assign Role to get secret content from Key Vault, so that the Function App can have the access key to storage account for the purpose of manipulating data in ADLS.
Turn on managed identity
Go to your Azure Function App, under Setting tab on the left side, click onto identity.
Enable the system assigned managed identity by turning on the status
Then, go to your Key Vault console, enter all the Secrets that you need.
Role Assignment
Then go to Access Control (IAM) tab of your Key Vault, select Role Assignments. Then click add, select add role assignment.
Select Key Vault Secrets User, notice that your permission model of the key vault must set to Azure role-based access control in order to have this role in effect. Then click next.
Select Managed Identity, then click Select members, a tab will pop out from the right side, pick your subscription, choose Function App in Managed Identity section the select your Function App then click assign.
All set, now the Function App is granted access to the Key Vault.
Access to Key Vault in Python
Now in the Python script, you can use DefaultAzureCredential class to create a credential object, and authenticate the SecretClient with this credential, so you can utilize this SecretClient object to retrieve secret content from the Key Vault.
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
KVUri = "xxxxxxxx"
name = "xxxxxxxx"
credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)
api_access_key = client.get_secret(name)
Issues when trigger Function App from Synapse or Data Factory
Since the Functions need to take date as an input and send the HTTP request to fetch MLB play by play data, we need to pass date as an dynamic parameter. However, when develop pipeline activity with Azure Function, when we choose HTTP GET method, there is not a field for users to passing dynamic parameter, so we have to use HTTP POST method, which commonly use json format body or form data as input payload. Forming a dynamic Body is not that intuitive in Azure Synapse or Data Factory, therefore I spent quite sometime going through this document, found out that when using dynamic parameter as the json value, the format should be as follow
{
"fetch_date": "@{formatDateTime(pipeline().parameters.fetch_date,'yyyy-MM-dd')}"
}
The above statement take input date as an parameter, convert it into string of ‘yyyy-MM-dd’ format, and use it as the value of key fetch_date. You should notices that there is a curly bracket wrapping around the value itself, this is not really an intuitive way as it compares to original json format.