fabric-dataflows-perf-remediation
npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-dataflows-perf-remediate --openclawDataflows Gen2 Performance Troubleshooting
Systematic diagnostic workflows for identifying and resolving performance issues in Microsoft Fabric Dataflow Gen2 workloads covering data movement, transformation execution, staging configuration, and destination write optimization.
When to Use This Skill
- Dataflow Gen2 refresh takes longer than expected
- Fast Copy is not being utilized or is failing
- Query folding indicators show red or yellow steps
- Staging-to-destination data movement is a bottleneck
- Gateway-connected dataflows have high latency
- Incremental refresh is slower than full refresh
- Capacity throttling affects dataflow execution
- Refresh failures with permission or connector errors
- Need to automate dataflow monitoring and health checks
- Migrating from Dataflow Gen1 and seeing performance regressions
Prerequisites
- Microsoft Fabric workspace with Data Factory enabled
- Contributor or higher role on the workspace
- PowerShell 7+ with Az.Accounts module for automation scripts
- Access to Fabric Monitoring Hub for refresh history analysis
- Fabric Capacity Metrics app access for CU consumption review
Quick Diagnosis: Symptom-to-Solution Map
| Symptom | Likely Cause | Jump To |
|---|---|---|
| Refresh takes 2x+ longer than Gen1 | Delta Parquet output overhead or staging misconfiguration | Consideration 5 in Performance Guide |
| Fast Copy not activating | Unsupported connector or non-foldable transforms | Fast Copy Diagnostics |
| "Insufficient permissions for staging artifacts" | Creator token expired (90+ days) | Common Errors |
| Slow design-time previews | Large dataset loaded in editor | Design-Time Optimization |
| Staging-to-Lakehouse write is slow | Extra data hop through staging Warehouse | Staging Strategy |
| Gateway dataflows are slow | All transforms running on gateway host | Gateway Optimization |
| Incremental refresh slower than full | Too many small buckets creating overhead | Incremental Refresh Tuning |
| HTTP 430 / capacity throttled | Concurrent Spark jobs exhausting CUs | Capacity Management |
Diagnostic Workflow
Step 1: Check Refresh History
- Open the Fabric workspace and locate the dataflow
- Select the ellipsis (...) > Recent runs
- Review the refresh history for status, duration, and type
- Select a specific refresh Start time to drill into details
- Examine the Tables section for per-entity timing
- Examine the Activities section for destination write timing
- Download detailed logs (bottom-left button) for deep analysis
- Download CSV of refresh runs for trend analysis
Key metrics to capture: total duration, per-table duration, bytes read/written, rows read/written, engine type (Mashup vs CopyActivity vs SQL DW).
Step 2: Identify the Bottleneck Component
Dataflow Gen2 has three performance-critical components:
[Data Source] --> [Dataflow Engine] --> [Data Destination]
| | |
Connectors Mashup / Fast Copy / Lakehouse /
+ Gateway SQL DW Compute Warehouse /
SQL Database
- Data Source: Connector speed, network latency, gateway throughput
- Dataflow Engine: Query folding, staging, Fast Copy, Mashup engine
- Data Destination: Write performance, Delta Parquet conversion
Step 3: Apply Targeted Optimization
Based on the bottleneck identified, follow the relevant section below.
Fast Copy Diagnostics
Fast Copy provides up to 9x faster ingestion for supported scenarios.
Supported Connectors
ADLS Gen2, Azure Blob Storage, Azure SQL DB, Lakehouse, PostgreSQL, On-premises SQL Server, Warehouse, Oracle, Snowflake, SQL Database in Fabric.
Fast Copy Indicators
| Indicator | Meaning | Action |
|---|---|---|
| Green | Step supported by Fast Copy | No action needed |
| Yellow | Some steps may support Fast Copy | Split query at boundary |
| Red | Step NOT supported by Fast Copy | Move to referenced query |
When Fast Copy Is Disabled
- Transformations beyond: select columns, change types, rename, remove columns (for file sources)
- Non-supported connector in use
- "Require Fast Copy" set on incompatible query (causes failure)
- Destination is not Lakehouse (stage first, then reference)
Query Splitting Pattern for Fast Copy
- Remove any red-indicator steps and the destination from the original query
- Verify remaining steps show green indicators
- Right-click the query > Enable staging
- Right-click again > Reference to create a new query
- Add back transformations and destination to the referenced query
- Publish and refresh — first query uses Fast Copy, second uses SQL DW compute
See Performance Optimization Guide for detailed walkthrough.
Staging Strategy Decision Tree
Is destination a Warehouse?
├── YES → Staging REQUIRED (enabled by default)
│ Write goes directly via SQL DW compute
│ This is the optimal path for Warehouse destinations
└── NO → Is destination a Lakehouse?
├── YES with staging enabled → Data moves:
│ Source → Staging LH → Staging WH → Lakehouse
│ ⚠ CONSIDER: Disable staging to avoid extra hop
│ OR: Switch destination to Warehouse instead
└── YES with staging disabled → Data writes directly
Source → Lakehouse (via Mashup engine)
✓ Fewer hops, but no SQL DW compute for transforms
Key Decision: If your transforms fold to the source, disable staging for Lakehouse destinations. If transforms are complex (joins, aggregations), either enable staging or switch to Warehouse destination.
Design-Time Optimization
When working with large datasets in the dataflow editor:
- Use parameters for date filtering: Create a
DesignDateFilterparameter to limit preview data during authoring, then adjust to full range before publish - Switch to Schema view: Select Schema view in the editor toolbar to see structure without loading data
- Limit preview rows: Keep preview data small during development
Gateway Performance
When using on-premises data gateway:
- Split dataflows: Separate data movement (gateway → cloud) from transformations
- First dataflow: Use Fast Copy for high-throughput transfer from on-premises to Lakehouse/Warehouse staging
- Second dataflow: Apply transformations using cloud compute on the staged data
- Gateway version: Must be 3000.214.2+ for Fast Copy support; keep within last 6 supported versions
- Detailed logs: Not yet supported for on-premises gateway refreshes (supported for cloud/VNet gateways)
Incremental Refresh Tuning
If incremental refresh is slower than full refresh:
- Increase bucket size: Reduce total bucket count to lower partition management overhead
- Evaluate data volume: For small datasets, full refresh may outperform incremental
- Concurrency control: Adjust max concurrent requests in dataflow settings if source can't handle defaults
- Monitor bucket efficiency: Check if most buckets process zero rows (indicates over-partitioning)
Capacity and Throttling
Dataflow Gen2 Compute Meters
| Engine | When Used | Billing Basis |
|---|---|---|
| Standard Compute (Mashup) | Staging disabled or non-foldable queries | Query evaluation time |
| High Scale Compute (SQL DW) | Staging enabled | Lakehouse + Warehouse duration |
| Fast Copy | Supported connectors with Fast Copy enabled | Copy job duration |
All operations are background operations smoothed over 24 hours.
CU Consumption Formula (Standard Compute)
If QueryDuration < 600s:
CU_seconds = QueryDuration × 12
Else:
CU_seconds = (QueryDuration - 600) × 1.5 + 600 × 12
Reducing Capacity Impact
- Avoid inefficient Power Query logic (expensive merges and sorts)
- Maximize query folding to push work to source systems
- Disable staging for small data volumes or simple transforms
- Don't refresh more frequently than source data updates
- Use data destinations instead of dataflow connectors for consumption
Common Refresh Errors
| Error | Cause | Resolution |
|---|---|---|
| "Insufficient permissions for staging artifacts" | Creator inactive 90+ days or left org | Have creator log in to Fabric; if gone, open support ticket |
| "Expression.Error: import matches no exports" | Unsupported connector in Premium workspace | Check connector compatibility list |
| "Gateway version not supported" | On-premises gateway out of support | Update to latest gateway version |
| "Staging lakehouse couldn't be found" | CI/CD branch workspace missing staging | Create a new Dataflow Gen2 in workspace to trigger staging creation |
| Validation failure on save | Query schema can't be determined in 10 min | Simplify query or check source connectivity |
Modern Evaluator (Preview)
The Modern Evaluator engine can provide significant performance improvements:
- Large data volumes: Shorter processing time, reduced memory usage
- Complex transformations: Improved execution plans for joins across large tables
- Frequent schedules: Cumulative time savings across multiple daily runs
Enable via dataflow settings. Monitor results after enabling — some connectors may not yet be fully optimized.
Automation and Monitoring
Run the Dataflow Health Check Script to programmatically audit dataflow configurations and recent refresh performance across a workspace.
Run the Dataflow Refresh Monitor Script to poll and track active refresh status in real-time.
See REST API Reference for complete API documentation covering CRUD operations, scheduling, refresh triggering, and monitoring.
See Performance Optimization Guide for deep-dive scenarios including Fast Copy benchmarks, staging architecture patterns, and query folding analysis techniques.
References
- Performance Optimization Guide — Deep-dive optimization scenarios
- Dataflow REST API Reference — API operations for automation
- Get-DataflowHealthReport.ps1 — Workspace health audit
- Watch-DataflowRefresh.ps1 — Real-time refresh monitor
- Microsoft Learn: Best Practices for Dataflow Gen2 Performance
- Microsoft Learn: Fast Copy in Dataflow Gen2
- Microsoft Learn: View Refresh History and Monitor Dataflows
- Microsoft Learn: Dataflow Gen2 Pricing
Source
git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-dataflows-perf-remediate/SKILL.mdView on GitHub Overview
Diagnose and resolve Microsoft Fabric Dataflow Gen2 performance issues, including slow refreshes, Fast Copy optimization, and staging/destination bottlenecks. The guide provides diagnostic workflows, best practices, and automation approaches via REST API and PowerShell.
How This Skill Works
It uses a symptom-to-solution map across the Data Source, Dataflow Engine, and Data Destination, starting with refresh history analysis and per-entity timing. The diagnostic workflow targets bottlenecks, incremental refresh tuning, capacity management, and gateway issues, with automation hooks via REST API and PowerShell.
When to Use It
- Dataflow Gen2 refresh takes longer than expected
- Fast Copy is not activating or failing
- Query folding indicators show red or yellow steps
- Staging-to-destination data movement is a bottleneck
- Gateway-connected dataflows have high latency
Quick Start
- Step 1: Open the Fabric workspace, locate the dataflow, and open Recent runs to view the refresh history
- Step 2: Drill into per-table and per-activity timing to identify the bottleneck component (Data Source, Engine, or Destination)
- Step 3: Apply targeted remediation (e.g., tune incremental refresh, fix permissions, or configure Fast Copy) and set up automated health checks
Best Practices
- Start with Refresh History to identify bottlenecks and trends
- Validate Fast Copy compatibility and connector support for folding
- Tune incremental vs full refresh and adjust bucket sizing
- Monitor capacity usage and throttle effects; optimize CU consumption
- Automate health checks and alerting via REST API and PowerShell
Example Use Cases
- Troubleshooting a dataflow with slow refresh times due to staging bottlenecks
- Fast Copy not activating due to unsupported connectors or non-foldable transforms
- HTTP 430 capacity throttling caused by concurrent Spark jobs
- Gateway latency where most transforms run on the gateway host
- Automating refresh history monitoring with REST API and PowerShell