Get the FREE Ultimate OpenClaw Setup Guide →

fabric-network-remediate

npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-network-remediate --openclaw
Files (1)
SKILL.md
6.7 KB

Microsoft Fabric Network Performance remediate

Systematic toolkit for diagnosing and resolving network performance issues across Microsoft Fabric workloads including Spark, OneLake, Data Warehouse, Pipelines, and Dataflows.

When to Use This Skill

  • Fabric Spark sessions take longer than expected to start (>10 seconds)
  • Connection timeouts to external data sources from notebooks or pipelines
  • Managed private endpoint status shows Pending or Failed
  • DNS resolution returns public IPs instead of private IPs
  • Outbound access protection blocks required dependencies (PyPI, Conda)
  • On-premises data gateway connectivity failures
  • OneLake API calls returning 403 or timeout errors
  • Capacity throttling errors (HTTP 430)
  • Dataflow Gen2 staging failures behind firewalls
  • Cross-workspace environment attachment failures due to network mismatch

Prerequisites

  • PowerShell 7+ with Az module installed (Install-Module Az -Scope CurrentUser)
  • Fabric Admin or Workspace Admin role for network configuration changes
  • Azure portal access for Private Link Service and DNS zone management
  • Network access to run nslookup, Test-NetConnection, and Resolve-DnsName

Step-by-Step Workflows

Workflow 1: Diagnose Spark Session Startup Delays

Spark startup times vary based on networking configuration. Consult the reference table:

ScenarioTypical Startup Time
Default settings, no libraries5-10 seconds
Default settings + library dependencies5-10 sec + 30 sec-5 min
High traffic in region, no libraries2-5 minutes
High traffic + library dependencies2-5 min + 30 sec-5 min
Network security (Private Links/VNet)2-5 minutes
Network security + library dependencies2-5 min + 30 sec-5 min

Run the diagnostic script for automated assessment:

.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType SparkStartup

When Private Links or Managed VNets are enabled, Starter Pools are unavailable and Fabric must create clusters on demand, adding 2-5 minutes to session start time.

Workflow 2: Validate Managed Private Endpoint Connectivity

  1. Navigate to Fabric workspace Settings > Network security
  2. Under Managed private endpoints, verify Status shows Approved
  3. If Pending or Failed, see private-endpoint-remediate.md
  4. Validate DNS routing from a Fabric Notebook:
nslookup sqlserver.corp.contoso.com

Confirm the returned IP is a private range (10.x.x.x or 172.x.x.x), not public.

  1. Run the automated validation:
.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType PrivateEndpoint

Workflow 3: Configure Firewall Allowlisting

Fabric requires specific endpoints and service tags. Run the firewall audit script:

.\scripts\Test-FabricNetworkHealth.ps1 -CheckType FirewallEndpoints

For the complete endpoint reference, see firewall-endpoints.md.

Key service tags for Azure Firewall / NSG rules:

TagPurposeDirection
Power BIFabric core servicesBoth
DataFactoryPipeline operationsBoth
PowerQueryOnlineDataflow processingBoth
SQLWarehouse connectivityOutbound
EventHubReal-Time AnalyticsOutbound
KustoAnalyticsReal-Time AnalyticsBoth

Workflow 4: Troubleshoot Outbound Access Protection

When outbound access protection is enabled, public repositories (PyPI, Conda) are blocked. To install libraries in secured environments:

  1. Prepare a requirements.txt on a machine with internet access
  2. Download packages and dependencies using pip:
pip download -r requirements.txt -d ./packages
  1. Upload packages as custom libraries in the Fabric Environment
  2. See outbound-access-guide.md for detailed steps

Workflow 5: Resolve Capacity Throttling (HTTP 430)

When all Spark VCores are consumed, new jobs receive HTTP 430 errors. Formula: 1 Capacity Unit = 2 Spark VCores.

  1. Check current utilization in the Monitoring Hub
  2. Cancel idle or stuck Spark sessions
  3. Consider upgrading capacity SKU if sustained
  4. Enable queueing for pipeline and Spark Job Definition workloads

For queue limits by SKU, see capacity-throttling.md.

remediate Quick Reference

SymptomLikely CauseFirst Action
Spark startup >2 minPrivate Link/VNet enabledExpected; Starter Pools unavailable
Connection timeout from SparkFirewall blocking Fabric subnetOpen required ports (1433 for SQL)
DNS resolves to public IPPrivate DNS zone not linkedAdd A record pointing to private IP
MPE status = FailedPLS rejected or deletedRe-create MPE, verify PLS exists
HTTP 430 errorCapacity VCores exhaustedCancel jobs or upgrade SKU
PyPI install blockedOutbound access protectionUpload packages as custom libraries
Cross-workspace env failsNetwork settings mismatchEnsure same capacity and network config
OneLake API 403Endpoint URL validationUse *.dfs.fabric.microsoft.com

References

Source

git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-network-remediate/SKILL.mdView on GitHub

Overview

A systematic toolkit for diagnosing and resolving Microsoft Fabric network performance issues across Spark, OneLake, Data Warehouse, Pipelines, and Dataflows. It guides you through connectivity, DNS, firewall, and Spark startup delays to reduce timeouts and throttling.

How This Skill Works

The skill provides structured workflows and automated checks via Test-FabricNetworkHealth.ps1 to surface DNS, endpoint, and latency problems. It covers Spark startup delays, private endpoint connectivity, and firewall allowlisting, using prerequisites like PowerShell 7+, Az module, and appropriate Fabric roles to perform hands-on remediation.

When to Use It

  • Fabric Spark sessions take longer than expected to start (>10 seconds)
  • Connection timeouts to external data sources from notebooks or pipelines
  • Managed private endpoint status shows Pending or Failed
  • DNS resolution returns public IPs instead of private IPs
  • Outbound access protection blocks required dependencies (PyPI, Conda)

Quick Start

  1. Step 1: Run Spark startup diagnosis: .\\scripts\\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType SparkStartup
  2. Step 2: Validate Managed Private Endpoint connectivity: .\\scripts\\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType PrivateEndpoint
  3. Step 3: Audit firewall endpoints: .\\scripts\\Test-FabricNetworkHealth.ps1 -CheckType FirewallEndpoints

Best Practices

  • Verify prerequisites and permissions before making network changes
  • Run automated health checks with Test-FabricNetworkHealth.ps1 for each scenario
  • Validate DNS resolution returns private IPs (10.x.x.x or 172.x.x.x) when using private endpoints
  • Align firewall rules and service tags with the documented endpoint references
  • Document remediation steps and re-run health checks to confirm improvements

Example Use Cases

  • Diagnose and remediate Spark startup delays caused by Private Link/VNet configurations
  • Resolve DNS resolution returning public IPs by adjusting DNS zone or private link routing
  • Approve Managed Private Endpoints and re-run connectivity tests to restore access
  • Address OneLake API 403 or timeout errors by auditing firewall endpoints and service tags
  • Mitigate capacity throttling (HTTP 430) in cross-workspace network scenarios

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers