What is a Capacity Unit (CU)?

One CU equals two Apache Spark VCores; capacity is shared across all workspaces in a Fabric capacity.

Which resource profiles exist and when should I use them?

The profiles are readHeavy, writeHeavy, and balanced. The default is writeHeavy; when using writeHeavy, VOrder is disabled by default and must be manually enabled.

How do I handle HTTP 430 throttling events?

Investigate current capacity utilization, review queueing behavior, adjust SKU sizing or Spark profiles, and use the capacity health checks to guide changes; retries are automated if queueing is enabled.

fabric-performance-monitoring

npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-performance-monitoring --openclaw

Files (1)

SKILL.md

6.0 KB

Microsoft Fabric Performance Monitoring

Toolkit for monitoring, diagnosing, and optimizing Microsoft Fabric capacity and workload performance across Spark, Data Warehouse, Lakehouse, and Pipeline workloads.

When to Use This Skill

Checking Fabric capacity utilization or CU consumption
Diagnosing throttling errors (HTTP 430 / TooManyRequestsForCapacity)
Monitoring Spark VCore usage and concurrency limits
Querying Fabric REST APIs for capacity and workspace health
Generating capacity performance reports
Tuning Spark resource profiles (readHeavy, writeHeavy, balanced)
Investigating job failures in the Monitoring Hub
Analyzing autoscale billing vs capacity-based billing
Reviewing background vs interactive operation patterns
Planning capacity SKU sizing or rightsizing

Prerequisites

PowerShell 7+ with Az.Fabric module installed
Microsoft Entra ID app registration with Fabric API permissions
Fabric Capacity Admin or Workspace Admin role
Fabric Capacity Metrics app installed (for visual monitoring)

Core Concepts

Capacity Units and Spark VCores

One Capacity Unit (CU) equals two Apache Spark VCores. Fabric capacity is shared across all workspaces assigned to it, and Spark VCores are shared among notebooks, Spark job definitions, and lakehouses within those workspaces.

Operation Types

Fabric classifies operations as interactive (on-demand, like DAX queries) or background (scheduled, like refreshes and Spark jobs). Background operations are smoothed over a 24-hour period. All Spark operations are background operations.

Throttling Behavior

When capacity is fully utilized, new Spark jobs receive HTTP 430 with TooManyRequestsForCapacity. With queueing enabled, pipeline-triggered and scheduled jobs enter a FIFO queue and retry automatically when capacity becomes available.

Capacity SKU Limits

SKU	Spark VCores	Queue Limit
F2	4	4
F4	8	4
F8	16	8
F16	32	16
F32	64	32
F64	128	64
F128	256	128
F256	512	256
F512	1024	512

Spark Resource Profiles

Fabric supports predefined Spark resource profiles for workload optimization. New workspaces default to writeHeavy. Available profiles: readHeavy, writeHeavy, balanced. When writeHeavy is used, VOrder is disabled by default and must be manually enabled.

Step-by-Step Workflows

Workflow 1: Capacity Health Check

Run the capacity health check script to retrieve current capacity status, SKU details, and state.

./scripts/Get-FabricCapacityHealth.ps1 -SubscriptionId "<sub-id>" -ResourceGroupName "<rg>" -CapacityName "<name>"

See capacity-health-reference.md for detailed API response schemas and interpretation guidance.

Workflow 2: Spark Concurrency Analysis

Run the Spark concurrency analyzer to check active sessions, queued jobs, and throttling status.

./scripts/Get-FabricSparkConcurrency.ps1 -WorkspaceId "<workspace-id>"

Workflow 3: Monitoring Hub Job Audit

Run the job audit script to retrieve recent job executions, durations, and failure details.

./scripts/Get-FabricJobHistory.ps1 -WorkspaceId "<workspace-id>" -HoursBack 24

Workflow 4: Generate Performance Report

Use the performance report template to query the SQL analytics endpoint for Lakehouse operation metrics, then generate a summary with the report generator.

Workflow 5: Autoscale vs Capacity Cost Analysis

See cost-analysis-reference.md for guidance on comparing autoscale billing vs capacity-based models using Azure Cost Management.

remediate

Symptom	Likely Cause	Resolution
HTTP 430 errors	Capacity fully utilized	Scale SKU, cancel idle sessions, enable queueing
Jobs stuck in queue	All VCores consumed	Check Monitoring Hub, stop idle notebooks
Slow Spark startup	Using custom pool with cold start	Switch to starter pool for quick sessions
High CU consumption	Inefficient queries or unoptimized code	Review Capacity Metrics app, optimize DAX/Spark
Autoscale charges unexpected	Spark jobs billed independently	Check Azure Cost Analysis with Autoscale meter
VOrder disabled	writeHeavy profile active	Manually enable VOrder if read performance needed

References

Capacity Health Reference - REST API schemas and interpretation
Cost Analysis Reference - Autoscale vs capacity billing comparison
Fabric Capacity Metrics App
Monitor Spark Capacity Consumption
Fabric REST API Documentation
Concurrency Limits and Queueing

Source

git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-performance-monitoring/SKILL.mdView on GitHub

Overview

Microsoft Fabric Performance Monitoring provides tooling to monitor, diagnose, and optimize capacity and workload performance across Spark, Data Warehouse, Lakehouse, and Pipeline workloads. It helps you detect throttling, track VCore usage, review Monitoring Hub jobs, and plan capacity SKU sizing.

How This Skill Works

The skill uses PowerShell (Az.Fabric), REST APIs, and T-SQL workflows to collect capacity metrics, retrieve Spark VCore consumption, and analyze CU usage. It provides ready-to-run scripts for capacity health checks, concurrency analysis, and job audits, plus guidance to tune Spark resource profiles (readHeavy, writeHeavy, balanced).

When to Use It

Check Fabric capacity utilization or CU consumption
Diagnose throttling errors (HTTP 430 / TooManyRequestsForCapacity)
Monitor Spark VCore usage and concurrency limits
Query Fabric REST APIs for capacity and workspace health
Generate capacity performance reports and SKU sizing plans

Quick Start

Step 1: Install prerequisites (PowerShell 7+, Az.Fabric module, Entra ID app with Fabric API permissions, and appropriate Fabric roles)
Step 2: Run a capacity health check script to fetch current capacity status
Step 3: Analyze results, adjust Spark resource profiles or SKU sizing, then re-run health checks

Best Practices

Run the capacity health check script Get-FabricCapacityHealth.ps1 to verify current state
Tune Spark resource profiles (readHeavy, writeHeavy, balanced) based on workload; start with writeHeavy as the default
Enable queueing for throttling scenarios; understand HTTP 430 and retry behavior
Regularly compare autoscale billing vs capacity-based billing to optimize cost
Review Monitoring Hub job results to identify failures and optimization opportunities

Example Use Cases

Diagnose a HTTP 430 throttling event during a data refresh and reallocate capacity
Size a new Fabric SKU (e.g., from F8 to F16) after observing peak Spark VCore usage
Tune a workspace from readHeavy to balanced to reduce VCore contention
Audit Monitoring Hub jobs to identify recurring failures and adjust scheduling
Generate a quarterly capacity performance report for governance reviews

Frequently Asked Questions

Add this skill to your agents