fabric-lakehouse-views-perf-remediate
npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-lakehouse-views-perf-remediate --openclawFabric Materialized Lake Views Performance remediate
Diagnose and resolve performance issues with materialized lake views (MLVs) in Microsoft Fabric lakehouses. This skill covers refresh optimization, Spark job diagnostics, data quality constraint tuning, and lineage execution remediate.
When to Use This Skill
- MLV refresh runs are taking longer than expected
- Incremental refresh is falling back to full refresh unexpectedly
- MLV lineage execution shows Failed or Skipped nodes
- Spark jobs for MLV refresh are failing with errors
- "Delta table not found" errors during MLV creation or refresh
- Data quality constraints causing unexpected pipeline failures
- Need to enable or verify optimal refresh configuration
- Custom Spark environment tuning for MLV workloads
- Monitoring and interpreting MLV run history
Prerequisites
- Microsoft Fabric workspace with Lakehouse items
- Schema-enabled lakehouse (recommended for MLV support)
- Fabric notebook for executing Spark SQL commands
- Workspace Admin or Contributor role for scheduling and monitoring
- Access to Monitor Hub for viewing MLV run details
Quick Diagnostics Checklist
Run through these checks in order when remediate MLV performance:
| Step | Check | Action |
|---|---|---|
| 1 | Identify refresh mode | Verify optimal refresh toggle is enabled in lineage view |
| 2 | Check CDF status | Confirm delta.enableChangeDataFeed=true on ALL source tables |
| 3 | Review query patterns | Ensure only supported SQL constructs are used (see supported expressions) |
| 4 | Inspect run history | Open lineage view dropdown to see last 25 runs and their states |
| 5 | Check node failures | Click failed nodes in lineage to view error messages |
| 6 | Review Spark logs | Follow Detailed Logs link to Monitor Hub for Spark error logs |
| 7 | Validate data quality | Check if FAIL constraints are causing "delta table not found" errors |
| 8 | Assess source data | Determine if source has updates/deletes (forces full refresh) |
Step-by-Step Workflows
Workflow 1: Diagnose Slow Refresh
- Determine current refresh strategy - Navigate to Manage materialized lake views in the lakehouse ribbon. Check if Optimal refresh toggle is ON.
- Verify change data feed - Run the diagnostic script to check CDF status on all source tables. See scripts/check-cdf-status.sql.
- Check for unsupported expressions - Review the MLV definition for constructs that force full refresh. See Supported Expressions.
- Inspect source data patterns - If source tables have UPDATE or DELETE operations, Fabric always performs full refresh regardless of CDF status.
- Review partition strategy - Consider adding
PARTITIONED BYto large MLVs to improve refresh parallelism. - Attach custom environment - Configure a custom Spark environment with optimized compute for heavy workloads. See references/custom-environment-guide.md.
Workflow 2: Resolve Failed Refresh Runs
- Open lineage view - Navigate to Manage materialized lake views, select the failed run from the dropdown (last 25 runs available).
- Identify failed node - Click the failed node in the lineage graph. Review the error message in the right-side panel.
- Access Spark logs - Click the Detailed Logs link to navigate to Monitor Hub. Review Apache Spark error logs.
- Common failure patterns - See references/common-failure-patterns.md for resolution steps.
- Retry or recreate - For transient Spark failures, retry the run. For persistent errors, drop and recreate the MLV.
Workflow 3: Enable Optimal Refresh
- Enable CDF on all source tables:
ALTER TABLE bronze.customers SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
ALTER TABLE bronze.orders SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
- Verify CDF is enabled:
DESCRIBE DETAIL bronze.customers;
-- Check properties column for delta.enableChangeDataFeed=true
- Create MLV with CDF property:
CREATE OR REPLACE MATERIALIZED LAKE VIEW silver.customer_orders
TBLPROPERTIES (delta.enableChangeDataFeed=true)
AS
SELECT
c.customerID,
c.customerName,
c.region,
o.orderDate,
o.orderAmount
FROM bronze.customers c INNER JOIN bronze.orders o
ON c.customerID = o.customerID;
- Enable optimal refresh toggle - Navigate to Manage materialized lake views and verify the Optimal refresh toggle is ON (enabled by default).
Supported Expressions for Incremental Refresh
MLVs using only these constructs qualify for incremental refresh:
| SQL Construct | Notes |
|---|---|
SELECT | Only deterministic built-in functions. Non-deterministic and window functions force full refresh. |
FROM | Standard table references |
WHERE | Only deterministic built-in functions |
INNER JOIN | Supported for incremental |
WITH (CTE) | Common table expressions supported |
UNION ALL | Supported |
CONSTRAINT ... CHECK | Only deterministic built-in functions in constraint conditions |
Unsupported constructs that force full refresh:
LEFT JOIN,RIGHT JOIN,FULL OUTER JOIN- Window functions (
ROW_NUMBER,RANK,LAG,LEAD, etc.) - Non-deterministic functions (
current_timestamp(),rand(), etc.) - Subqueries in
SELECTorWHERE GROUP BYwithHAVINGDISTINCT- User-defined functions (UDFs)
Key Spark SQL Reference
List All MLVs
SHOW MATERIALIZED LAKE VIEWS IN silver;
View MLV Definition
SHOW CREATE MATERIALIZED LAKE VIEW silver.customer_orders;
Force Full Refresh
REFRESH MATERIALIZED LAKE VIEW silver.customer_orders FULL;
Drop and Recreate
DROP MATERIALIZED LAKE VIEW silver.customer_orders;
CREATE OR REPLACE MATERIALIZED LAKE VIEW silver.customer_orders
TBLPROPERTIES (delta.enableChangeDataFeed=true)
AS
SELECT ...;
Known Issues and Limitations
| Issue | Impact | Workaround |
|---|---|---|
| FAIL constraint + "delta table not found" | MLV creation or refresh fails | Recreate MLV using DROP action instead of FAIL |
| Schema names with ALL CAPS | MLV creation fails | Use lowercase or mixed-case schema names |
| Session-level Spark properties | Not applied during scheduled refresh | Set properties in custom environment instead |
| Delta time travel in MLV definition | Not supported | Remove VERSION AS OF or TIMESTAMP AS OF from queries |
| DML statements on MLVs | Not supported | MLVs are read-only; modify source tables instead |
| UDFs in SELECT | Not supported | Use built-in Spark SQL functions |
| Temporary views as MLV source | Not supported | Reference base tables or other MLVs directly |
| Cross-lakehouse lineage | Not supported | Keep all related MLVs within same lakehouse |
| Updating data quality constraints | Not supported | Drop and recreate the MLV with new constraints |
| LIKE/regex in constraint conditions | Not supported | Use simple comparison operators in constraints |
| Service principal authentication | Not supported for MLV APIs | Use Microsoft Entra user authentication |
| Single schedule per lineage | UI instability if exceeded | Maintain only one active schedule per lineage |
| South Central US region | Feature not available | Use a workspace in a different region |
Run States Reference
| State | Meaning | Action |
|---|---|---|
| Completed | All nodes executed successfully | No action needed |
| Failed | One or more nodes failed; child nodes skipped | Review failed node error, check Spark logs |
| Skipped | Previous run still in progress | Wait for current run to complete, or cancel it |
| In Progress | Run started, not yet terminal | Monitor progress in lineage view |
| Canceled | Manually canceled from Monitor Hub | Re-trigger if needed |
remediate
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Refresh always full, never incremental | CDF not enabled on source tables | Run ALTER TABLE ... SET TBLPROPERTIES (delta.enableChangeDataFeed = true) on ALL sources |
| Refresh always full despite CDF enabled | Unsupported SQL constructs in MLV | Review definition for window functions, LEFT JOIN, non-deterministic functions |
| Refresh always full despite CDF and supported SQL | Source has UPDATE/DELETE operations | Incremental only supports append-only; redesign ETL to be append-only or accept full refresh |
| "Delta table not found" on create | FAIL constraint issue | Recreate MLV without FAIL; use DROP action instead |
| Node shows Skipped state | Parent node failed | Fix the parent node failure first |
| Schedule not running | Previous run still in progress | Cancel the stuck run from Monitor Hub, or wait |
| Environment not accessible | User lacks access to selected environment | Select an accessible environment from dropdown |
| Deleted environment error | Associated environment was removed | Choose a new environment in lineage settings |
| MLV names changed to lowercase | Expected behavior | MLV names are case-insensitive and stored as lowercase |
| Workspace names with spaces cause errors | Backtick syntax required | Use backtick notation:`My Workspace`.lakehouse.schema.view_name |
References
- Common Failure Patterns - Detailed resolution for frequent MLV errors
- Custom Environment Guide - Spark environment tuning for MLV workloads
- Monitoring Deep Dive - Advanced Monitor Hub usage for MLV diagnostics
- CDF Status Check Script - T-SQL/Spark SQL script to audit CDF property
- MLV Health Check Script - Comprehensive MLV configuration audit
- MLV Diagnostic Template - Notebook template for systematic MLV remediate
Source
git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-lakehouse-views-perf-remediate/SKILL.mdView on GitHub Overview
Diagnose and resolve performance issues with Microsoft Fabric lakehouse materialized lake views (MLVs). This skill covers slow refreshes, incremental vs full refresh behavior, Spark job failures, and data quality constraints, guiding you to optimize refresh configuration and monitor runs in Monitor Hub.
How This Skill Works
Follow a structured diagnostic flow: verify the refresh configuration and optimal toggle, check Change Data Feed (CDF) status on all source tables, review MLV definitions for unsupported expressions, and inspect Spark logs in Monitor Hub. Apply targeted fixes such as enabling CDF, adjusting partitioning (PARTITIONED BY), and tuning a custom Spark environment to improve incremental refresh eligibility and overall performance.
When to Use It
- MLV refresh runs are noticeably slow
- Incremental refresh falls back to a full refresh
- MLV lineage shows Failed or Skipped nodes
- Spark jobs for MLV refresh fail with errors
- Delta table not found errors during MLV creation or refresh
Quick Start
- Step 1: Open Manage materialized lake views and verify the Optimal refresh toggle is ON
- Step 2: Run scripts/check-cdf-status.sql to confirm CDF is enabled on all source tables
- Step 3: Review MLV definitions for unsupported expressions and consult Monitor Hub Spark logs for errors
Best Practices
- Verify the Optimal refresh toggle is ON in the lineage view
- Ensure delta.enableChangeDataFeed=true on all source tables
- Review MLV definitions for unsupported expressions that trigger full refresh
- Consult Monitor Hub spark logs for error details
- Apply PARTITIONED BY to large MLVs and tune the Spark environment for workloads
Example Use Cases
- Diagnosing a slow MLV refresh by analyzing Monitor Hub run history and enabling CDF
- Fixing incremental refresh by removing unsupported expressions in the MLV
- Resolving 'delta table not found' by correcting CDF settings and data quality constraints
- Diagnosing Spark job failures and applying a custom Spark environment for MLV workloads
- Adding PARTITIONED BY to large MLVs to improve parallelism and refresh duration