Get the FREE Ultimate OpenClaw Setup Guide →

fabric-performance-monitoring

npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-performance-monitoring --openclaw
Files (1)
SKILL.md
6.0 KB

Microsoft Fabric Performance Monitoring

Toolkit for monitoring, diagnosing, and optimizing Microsoft Fabric capacity and workload performance across Spark, Data Warehouse, Lakehouse, and Pipeline workloads.

When to Use This Skill

  • Checking Fabric capacity utilization or CU consumption
  • Diagnosing throttling errors (HTTP 430 / TooManyRequestsForCapacity)
  • Monitoring Spark VCore usage and concurrency limits
  • Querying Fabric REST APIs for capacity and workspace health
  • Generating capacity performance reports
  • Tuning Spark resource profiles (readHeavy, writeHeavy, balanced)
  • Investigating job failures in the Monitoring Hub
  • Analyzing autoscale billing vs capacity-based billing
  • Reviewing background vs interactive operation patterns
  • Planning capacity SKU sizing or rightsizing

Prerequisites

  • PowerShell 7+ with Az.Fabric module installed
  • Microsoft Entra ID app registration with Fabric API permissions
  • Fabric Capacity Admin or Workspace Admin role
  • Fabric Capacity Metrics app installed (for visual monitoring)

Core Concepts

Capacity Units and Spark VCores

One Capacity Unit (CU) equals two Apache Spark VCores. Fabric capacity is shared across all workspaces assigned to it, and Spark VCores are shared among notebooks, Spark job definitions, and lakehouses within those workspaces.

Operation Types

Fabric classifies operations as interactive (on-demand, like DAX queries) or background (scheduled, like refreshes and Spark jobs). Background operations are smoothed over a 24-hour period. All Spark operations are background operations.

Throttling Behavior

When capacity is fully utilized, new Spark jobs receive HTTP 430 with TooManyRequestsForCapacity. With queueing enabled, pipeline-triggered and scheduled jobs enter a FIFO queue and retry automatically when capacity becomes available.

Capacity SKU Limits

SKUSpark VCoresQueue Limit
F244
F484
F8168
F163216
F326432
F6412864
F128256128
F256512256
F5121024512

Spark Resource Profiles

Fabric supports predefined Spark resource profiles for workload optimization. New workspaces default to writeHeavy. Available profiles: readHeavy, writeHeavy, balanced. When writeHeavy is used, VOrder is disabled by default and must be manually enabled.

Step-by-Step Workflows

Workflow 1: Capacity Health Check

Run the capacity health check script to retrieve current capacity status, SKU details, and state.

./scripts/Get-FabricCapacityHealth.ps1 -SubscriptionId "<sub-id>" -ResourceGroupName "<rg>" -CapacityName "<name>"

See capacity-health-reference.md for detailed API response schemas and interpretation guidance.

Workflow 2: Spark Concurrency Analysis

Run the Spark concurrency analyzer to check active sessions, queued jobs, and throttling status.

./scripts/Get-FabricSparkConcurrency.ps1 -WorkspaceId "<workspace-id>"

Workflow 3: Monitoring Hub Job Audit

Run the job audit script to retrieve recent job executions, durations, and failure details.

./scripts/Get-FabricJobHistory.ps1 -WorkspaceId "<workspace-id>" -HoursBack 24

Workflow 4: Generate Performance Report

Use the performance report template to query the SQL analytics endpoint for Lakehouse operation metrics, then generate a summary with the report generator.

Workflow 5: Autoscale vs Capacity Cost Analysis

See cost-analysis-reference.md for guidance on comparing autoscale billing vs capacity-based models using Azure Cost Management.

remediate

SymptomLikely CauseResolution
HTTP 430 errorsCapacity fully utilizedScale SKU, cancel idle sessions, enable queueing
Jobs stuck in queueAll VCores consumedCheck Monitoring Hub, stop idle notebooks
Slow Spark startupUsing custom pool with cold startSwitch to starter pool for quick sessions
High CU consumptionInefficient queries or unoptimized codeReview Capacity Metrics app, optimize DAX/Spark
Autoscale charges unexpectedSpark jobs billed independentlyCheck Azure Cost Analysis with Autoscale meter
VOrder disabledwriteHeavy profile activeManually enable VOrder if read performance needed

References

Source

git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-performance-monitoring/SKILL.mdView on GitHub

Overview

Microsoft Fabric Performance Monitoring provides tooling to monitor, diagnose, and optimize capacity and workload performance across Spark, Data Warehouse, Lakehouse, and Pipeline workloads. It helps you detect throttling, track VCore usage, review Monitoring Hub jobs, and plan capacity SKU sizing.

How This Skill Works

The skill uses PowerShell (Az.Fabric), REST APIs, and T-SQL workflows to collect capacity metrics, retrieve Spark VCore consumption, and analyze CU usage. It provides ready-to-run scripts for capacity health checks, concurrency analysis, and job audits, plus guidance to tune Spark resource profiles (readHeavy, writeHeavy, balanced).

When to Use It

  • Check Fabric capacity utilization or CU consumption
  • Diagnose throttling errors (HTTP 430 / TooManyRequestsForCapacity)
  • Monitor Spark VCore usage and concurrency limits
  • Query Fabric REST APIs for capacity and workspace health
  • Generate capacity performance reports and SKU sizing plans

Quick Start

  1. Step 1: Install prerequisites (PowerShell 7+, Az.Fabric module, Entra ID app with Fabric API permissions, and appropriate Fabric roles)
  2. Step 2: Run a capacity health check script to fetch current capacity status
  3. Step 3: Analyze results, adjust Spark resource profiles or SKU sizing, then re-run health checks

Best Practices

  • Run the capacity health check script Get-FabricCapacityHealth.ps1 to verify current state
  • Tune Spark resource profiles (readHeavy, writeHeavy, balanced) based on workload; start with writeHeavy as the default
  • Enable queueing for throttling scenarios; understand HTTP 430 and retry behavior
  • Regularly compare autoscale billing vs capacity-based billing to optimize cost
  • Review Monitoring Hub job results to identify failures and optimization opportunities

Example Use Cases

  • Diagnose a HTTP 430 throttling event during a data refresh and reallocate capacity
  • Size a new Fabric SKU (e.g., from F8 to F16) after observing peak Spark VCore usage
  • Tune a workspace from readHeavy to balanced to reduce VCore contention
  • Audit Monitoring Hub jobs to identify recurring failures and adjust scheduling
  • Generate a quarterly capacity performance report for governance reviews

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers