> ## Documentation Index
> Fetch the complete documentation index at: https://firebolt-aggregate-helm-docs-pr-4.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure Blob Storage

> Configure Azure Blob Storage as the backing object storage for Firebolt Operator engines.

Every `FireboltEngine` requires object storage for managed tablet data. The Firebolt Operator does not support local filesystem storage mode, so an engine does not start until you point it at object storage. On a production cluster running on Azure, that backing store is an Azure Blob Storage container.

With Azure Blob Storage as the backing store for table data, durability does not depend on the per-pod data volumes mounted to each engine node. Even a complete loss of those volumes does not cause data loss, because the authoritative copy of managed table data lives in object storage.

You configure Azure Blob Storage on the engine through `spec.customEngineConfig.storage`. The engine reads Azure credentials from the pod's Azure identity, which you provide with [Microsoft Entra Workload ID](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview).

## Prerequisites

Before you begin, ensure that you have the following installed and configured:

* A [Kubernetes](https://kubernetes.io/) cluster (v1.28+) running on Azure Kubernetes Service (AKS) with workload identity and the OIDC issuer enabled.
* The Firebolt Operator installed in the cluster. See [Installation](../../installation).
* A `FireboltInstance` in the `Ready` phase. See the [Quickstart](../../quickstart).
* `kubectl` command-line tool configured to access your cluster.
* `helm` (v3+) installed on your local machine.
* `az` command-line tool configured for your subscription.
* An Azure subscription with permissions to create storage accounts, containers, and managed identities.

## Use Azure Blob Storage

The following examples use a storage account named `fireboltenginedemo` with a container named `firebolt-engine-demo-data`, but you can choose any names you like. Storage account names must be globally unique and use only lowercase letters and numbers.

### Create a storage account and container

```bash theme={null}
export RESOURCE_GROUP=firebolt-demo
export LOCATION=eastus
export STORAGE_ACCOUNT=fireboltenginedemo
export CONTAINER_NAME=firebolt-engine-demo-data

az storage account create \
  --name "${STORAGE_ACCOUNT}" \
  --resource-group "${RESOURCE_GROUP}" \
  --location "${LOCATION}" \
  --sku Standard_LRS \
  --kind StorageV2 \
  --allow-blob-public-access false

az storage container create \
  --name "${CONTAINER_NAME}" \
  --account-name "${STORAGE_ACCOUNT}" \
  --auth-mode login
```

### Create a managed identity and grant container access

Create a user-assigned managed identity for the engine and grant it permission to manage blobs in the container. Use [Microsoft Entra Workload ID](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) to federate this identity with the Kubernetes ServiceAccount that the engine pods run as. The next step shows how to attach that ServiceAccount to the engine.

```bash theme={null}
export IDENTITY_NAME=firebolt-engine
export AKS_CLUSTER=my-aks-cluster
export K8S_NAMESPACE=firebolt
export K8S_SA=my-engine

az identity create \
  --name "${IDENTITY_NAME}" \
  --resource-group "${RESOURCE_GROUP}"

export IDENTITY_CLIENT_ID=$(az identity show \
  --name "${IDENTITY_NAME}" \
  --resource-group "${RESOURCE_GROUP}" \
  --query clientId -o tsv)

export STORAGE_ACCOUNT_ID=$(az storage account show \
  --name "${STORAGE_ACCOUNT}" \
  --resource-group "${RESOURCE_GROUP}" \
  --query id -o tsv)

az role assignment create \
  --assignee "${IDENTITY_CLIENT_ID}" \
  --role "Storage Blob Data Contributor" \
  --scope "${STORAGE_ACCOUNT_ID}"

# Federate the managed identity with the Kubernetes ServiceAccount.
export OIDC_ISSUER=$(az aks show \
  --name "${AKS_CLUSTER}" \
  --resource-group "${RESOURCE_GROUP}" \
  --query oidcIssuerProfile.issuerUrl -o tsv)

az identity federated-credential create \
  --name firebolt-engine \
  --identity-name "${IDENTITY_NAME}" \
  --resource-group "${RESOURCE_GROUP}" \
  --issuer "${OIDC_ISSUER}" \
  --subject "system:serviceaccount:${K8S_NAMESPACE}:${K8S_SA}" \
  --audience api://AzureADTokenExchange
```

### Configure the engine to use Azure Blob Storage

Point the engine at the container through `spec.customEngineConfig.storage` and run its pods under the ServiceAccount that carries the Azure identity.

The engine merges `customEngineConfig` into its rendered configuration, so the `storage` block sets the storage backend (`type`), the storage scheme (`api_scheme`), and the container name (`bucket_name`). For Azure Blob Storage, set `type` to `abs`. The default scheme for `abs` is `azure://`.

The following manifest creates the ServiceAccount and a `FireboltEngine` that references it. The engine runs the operator's default image, so no `FireboltEngineClass` is required. Replace the managed identity client ID and container name with your values, and reference an existing `Ready` instance through `instanceRef`. The `azure.workload.identity/use` pod label is required for Workload ID to inject credentials.

```yaml theme={null}
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-engine
  namespace: firebolt
  annotations:
    # Bind the managed identity from the previous step (Workload ID).
    azure.workload.identity/client-id: <managed-identity-client-id>
---
apiVersion: compute.firebolt.io/v1alpha1
kind: FireboltEngine
metadata:
  name: my-engine
  namespace: firebolt
spec:
  instanceRef: quickstart
  serviceAccountName: my-engine
  replicas: 2
  customEngineConfig:
    storage:
      type: abs
      api_scheme: "azure://"
      bucket_name: firebolt-engine-demo-data
      azure:
        storage_account_name: fireboltenginedemo
  template:
    metadata:
      # Required: Workload ID only injects credentials into labeled pods.
      labels:
        azure.workload.identity/use: "true"
    spec:
      containers:
        - name: engine
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
            limits:
              cpu: "2"
              memory: "4Gi"
```

Apply the manifest:

```bash theme={null}
kubectl apply -f engine-abs.yaml
```

The engine resolves Azure credentials from the pod's Azure identity, which Workload ID provides. AKS injects the credentials automatically into pods that carry the `azure.workload.identity/use: "true"` label and run under a ServiceAccount annotated with `azure.workload.identity/client-id`.

For the full set of engine fields, including `customEngineConfig` and `serviceAccountName`, see the [FireboltEngine CRD reference](../../crd-reference/engine-crd-reference).

### Confirm that object storage is working

To confirm that managed storage works, create a table and check that new blobs appear in your container. Engine pods follow the name pattern `<engine>-g<generation>-<index>`, so the first pod of generation `0` for `my-engine` is `my-engine-g0-0`.

```bash theme={null}
kubectl port-forward pod/my-engine-g0-0 3473:3473 -n firebolt

curl -s "http://localhost:3473" --data-binary "create table test (val int);"
curl -s "http://localhost:3473" --data-binary "insert into test values (1);"

az storage blob list \
  --container-name firebolt-engine-demo-data \
  --account-name fireboltenginedemo \
  --auth-mode login \
  --output table
```

If the queries hang, check the engine pod logs for Azure access-denied errors:

```bash theme={null}
kubectl logs my-engine-g0-0 -n firebolt
```

## Restrict external access with an intermediary service principal

The container you configure under `storage` holds an engine's managed tablet data. The engine reaches it with the engine pod's own Azure identity. Queries that read from or write to *external* locations, such as external tables that point at a different container, follow a separate credential path.

By default, external access uses the engine pod's own identity. That identity is tied to the engine deployment, so it is not a convenient identity for the owner of an external container to reference when they grant access.

An intermediary service principal gives external access a stable identity instead. When you configure one, the engine uses the intermediary service principal for external access, rather than its own pod identity. Because the service principal is stable and known ahead of time, you can share it with third parties and reference it in container role assignments, including on Azure subscriptions outside your own organization.

### How the credential chain works

The engine selects the external credential path based on what you configure:

* **Intermediary service principal set.** The engine uses the intermediary service principal for external access. The service principal is the stable identity you grant access to external data.
* **Intermediary service principal not set.** The engine uses its own pod identity for external access.

Access to the managed `storage` container always uses the engine pod's own identity. The intermediary service principal applies only to external locations.

### Configure the intermediary service principal

Create the intermediary service principal and grant it the permissions required to reach the external data. Provide its application (client) ID to the engine.

Set the intermediary service principal client ID under `storage.azure.intermediary_service_principal_client_id`:

```yaml theme={null}
apiVersion: compute.firebolt.io/v1alpha1
kind: FireboltEngine
metadata:
  name: my-engine
  namespace: firebolt
spec:
  instanceRef: quickstart
  serviceAccountName: my-engine
  replicas: 2
  customEngineConfig:
    storage:
      type: abs
      api_scheme: "azure://"
      bucket_name: firebolt-engine-demo-data
      azure:
        storage_account_name: fireboltenginedemo
        intermediary_service_principal_client_id: 35f11db5-082b-46e8-9f2f-5466d8630003
```

The `storage.azure` block is valid when `type` is `abs` or `azurite`.
