What prerequisites are required?

Access to a Fabric workspace with Contributor or higher, familiarity with the Fabric Monitoring Hub, understanding of capacity SKUs and their limits, and PowerShell 7+ for diagnostic scripts.

How do I run the diagnostic script?

Execute the example: ./scripts/Get-FabricPipelineDiagnostics.ps1 -WorkspaceId " " -PipelineName "MyPipeline" or collect metrics manually from the Monitoring Hub as described in Step 2.

fabric-data-factory-perf-remediate

npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-data-factory-perf-remediate --openclaw

Files (1)

SKILL.md

9.6 KB

Microsoft Fabric Data Factory Performance remediate

Systematic approach to diagnosing and resolving performance issues in Microsoft Fabric Data Factory pipelines, copy activities, and dataflows.

When to Use This Skill

Pipeline execution takes longer than expected
Copy activities are slow or appear stuck
Activities show "Not Started" status for extended periods
Capacity throttling errors (HTTP 430, TooManyRequestsForCapacity)
Throughput is lower than expected for copy operations
Dataflow Gen2 refresh is slow or timing out
Pipeline monitoring shows performance degradation over time
Need to optimize parallelism, DIU, or partitioning settings

Prerequisites

Access to Microsoft Fabric workspace with Contributor or higher role
Familiarity with the Fabric Monitoring Hub
Understanding of Fabric capacity SKUs and their limits
PowerShell 7+ for running diagnostic scripts

Diagnostic Workflow

Step 1: Identify the Bottleneck Category

Determine which category your issue falls into:

Category	Symptoms	Start Here
Copy Activity Slow	Low throughput, long transfer duration	copy-activity-tuning.md
Pipeline Stuck	Activity shows In Progress with no movement	pipeline-stuck-resolution.md
Capacity Throttling	HTTP 430 errors, jobs queued	capacity-throttling-guide.md
Dataflow Slow	Dataflow Gen2 refresh takes too long	dataflow-optimization.md
Spark Job Queue	Jobs stuck in "Not Started" status	capacity-throttling-guide.md

Step 2: Collect Diagnostics

Run the diagnostic script to gather baseline metrics:

./scripts/Get-FabricPipelineDiagnostics.ps1 -WorkspaceId "<guid>" -PipelineName "MyPipeline"

Or manually collect from the Monitoring Hub:

Open Fabric portal and navigate to Monitoring Hub
Filter by pipeline name and time range
Select the run details (glasses icon) for the slow run
Capture the Duration Breakdown for copy activities
Note the queue time, transfer time, and pre/post-copy script duration

Step 3: Apply Targeted Fixes

Based on the bottleneck category, apply the appropriate optimization from the reference guides.

Quick Fixes for Common Issues

Copy Activity Running Slowly

Set Intelligent Throughput Optimization to Maximum (or custom 4-256)
Configure Degree of Copy Parallelism based on source type
Enable Partition Option for SQL sources (Dynamic Range or Physical)
Pre-calculate partition upper/lower bounds to avoid overhead
Enable Staging when sink is Fabric Warehouse

Pipeline Activity Stuck

Cancel the stuck activity and retry
Check source/sink connectivity and credentials
Verify Fabric capacity is not in throttled state
Review if payload exceeds 896 KB limit
Check for connection timeout or network interruption

Capacity Throttling (HTTP 430)

Check current Spark concurrency against SKU limits
Cancel unnecessary active Spark jobs via Monitoring Hub
Consider upgrading to a larger capacity SKU
Distribute pipeline trigger times to avoid burst load
Use job queueing for non-interactive Spark workloads

Dataflow Gen2 Performance

Reduce data volume with query folding and filters
Avoid unnecessary data type conversions
Minimize the number of transformation steps
Use staging for large datasets
Check for connector-specific throttling

Capacity SKU Quick Reference

SKU	Max Spark Cores	Queue Limit	Equivalent Power BI
F2	Limited	4	-
F4	Limited	4	-
F8	Limited	8	-
F16	Limited	16	-
F32	Limited	32	-
F64	Standard	64	P1
F128	Standard	128	P2
F256	Standard	256	P3
F512	Standard	512	P4
F1024	Large	1024	-
F2048	Large	2048	-
Trial	P1 equiv	N/A (no queue)	P1

Copy Activity Performance Settings Reference

Setting	Property	Range	Recommendation
Intelligent Throughput Optimization	`dataIntegrationUnits`	Auto, Standard (64), Balanced (128), Maximum (256), Custom (4-256)	Start with Auto, increase for large datasets
Degree of Copy Parallelism	`parallelCopies`	1-256	Auto for most; limit to 32 for Fabric Warehouse sink
Partition Option	Source settings	None, Physical, Dynamic Range	Use Dynamic Range for large SQL tables
Enable Staging	`enableStaging`	true/false	Required for Fabric Warehouse sink
Source Retry Count	`sourceRetryCount`	Integer	Set 2-3 for transient failures
Fault Tolerance	`enableSkipIncompatibleRow`	true/false	Enable for non-critical loads

Error Code Quick Reference

Error	Meaning	Action
HTTP 430	Capacity compute limit reached	Reduce concurrent jobs or upgrade SKU
Payload too large	Activity config exceeds 896 KB	Reduce parameter sizes
TooManyRequestsForCapacity	Spark compute or API rate limit	Cancel active jobs or wait
Connection timeout	Source/sink unreachable	Check network, credentials, firewall
Deflate64 unsupported	Compression format not supported	Re-compress with deflate algorithm

Monitoring Setup

Enable workspace monitoring for ongoing performance analysis:

Go to Workspace Settings > Monitoring
Add a Monitoring Eventhouse and enable Log workspace activity
Query the ItemJobEventLogs table with KQL for pipeline-level insights

Example KQL query for failure trends:

ItemJobEventLogs
| where ItemKind == "Pipeline"
| summarize count() by JobStatus

See workspace-monitoring-setup.md for detailed configuration.

References

External Resources

Source

git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-data-factory-perf-remediate/SKILL.mdView on GitHub

Overview

Systematic approach to diagnosing and resolving performance issues in Microsoft Fabric Data Factory pipelines, copy activities, and dataflows. It covers bottleneck classification, tuning knobs such as parallelCopies, DIU, ITO, and partitioning, plus monitoring and dataflow optimization to prevent timeouts, stalls, and throttling.

How This Skill Works

Identify the bottleneck category (Copy Activity Slow, Pipeline Stuck, Capacity Throttling, Dataflow Slow, Spark Job Queue). Collect diagnostics with the Get-FabricPipelineDiagnostics.ps1 script or by inspecting the Monitoring Hub. Apply targeted fixes from the reference guides (copy activity tuning, capacity management, dataflow optimization) and validate improvements with fresh runs.

When to Use It

Pipeline execution is slower than expected
Copy activities are slow or appear stuck
Activities show In Progress or Not Started for extended periods
HTTP 430 / TooManyRequestsForCapacity throttling occurs
Dataflow Gen2 refresh is slow or timing out

Quick Start

Step 1: Identify the bottleneck category from symptoms using the diagnostic workflow (Copy Activity Slow, Pipeline Stuck, Capacity Throttling, Dataflow Slow, Spark Job Queue)
Step 2: Collect diagnostics with Get-FabricPipelineDiagnostics.ps1 or via Monitoring Hub (note queue time, transfer time, and breakdowns)
Step 3: Apply targeted fixes from the reference guides and validate by re-running the pipeline and monitoring performance

Best Practices

Use Monitoring Hub to establish a performance baseline and capture duration breakdowns
Run the diagnostic script Get-FabricPipelineDiagnostics.ps1 to collect baseline metrics
Set Intelligent Throughput Optimization to Maximum (or a custom 4-256) and tune parallelism
Configure Degree of Copy Parallelism and enable Partition Option for SQL sources
Review capacity SKUs and quotas; adjust workspace capacity to match workload and reduce throttling

Example Use Cases

A slow pipeline with Not Started tasks is resolved by adjusting capacity throttling settings and increasing the capacity SKU
Copy throughput improves after enabling Intelligent Throughput Optimization and calibrating parallelism to source type
Spark job queueing is alleviated by aligning SKUs and monitoring queue times via the Monitoring Hub
Dataflow Gen2 refresh speeds up after enabling Partition Option for SQL sources
Long queue times disappear after applying targeted fixes from the capacity-throttling guide and re-running the diagnostic

Frequently Asked Questions

Add this skill to your agents