fabric-network-remediate
npx machina-cli add skill PatrickGallucci/fabric-skills/fabric-network-remediate --openclawMicrosoft Fabric Network Performance remediate
Systematic toolkit for diagnosing and resolving network performance issues across Microsoft Fabric workloads including Spark, OneLake, Data Warehouse, Pipelines, and Dataflows.
When to Use This Skill
- Fabric Spark sessions take longer than expected to start (>10 seconds)
- Connection timeouts to external data sources from notebooks or pipelines
- Managed private endpoint status shows Pending or Failed
- DNS resolution returns public IPs instead of private IPs
- Outbound access protection blocks required dependencies (PyPI, Conda)
- On-premises data gateway connectivity failures
- OneLake API calls returning 403 or timeout errors
- Capacity throttling errors (HTTP 430)
- Dataflow Gen2 staging failures behind firewalls
- Cross-workspace environment attachment failures due to network mismatch
Prerequisites
- PowerShell 7+ with Az module installed (
Install-Module Az -Scope CurrentUser) - Fabric Admin or Workspace Admin role for network configuration changes
- Azure portal access for Private Link Service and DNS zone management
- Network access to run
nslookup,Test-NetConnection, andResolve-DnsName
Step-by-Step Workflows
Workflow 1: Diagnose Spark Session Startup Delays
Spark startup times vary based on networking configuration. Consult the reference table:
| Scenario | Typical Startup Time |
|---|---|
| Default settings, no libraries | 5-10 seconds |
| Default settings + library dependencies | 5-10 sec + 30 sec-5 min |
| High traffic in region, no libraries | 2-5 minutes |
| High traffic + library dependencies | 2-5 min + 30 sec-5 min |
| Network security (Private Links/VNet) | 2-5 minutes |
| Network security + library dependencies | 2-5 min + 30 sec-5 min |
Run the diagnostic script for automated assessment:
.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType SparkStartup
When Private Links or Managed VNets are enabled, Starter Pools are unavailable and Fabric must create clusters on demand, adding 2-5 minutes to session start time.
Workflow 2: Validate Managed Private Endpoint Connectivity
- Navigate to Fabric workspace Settings > Network security
- Under Managed private endpoints, verify Status shows Approved
- If Pending or Failed, see private-endpoint-remediate.md
- Validate DNS routing from a Fabric Notebook:
nslookup sqlserver.corp.contoso.com
Confirm the returned IP is a private range (10.x.x.x or 172.x.x.x), not public.
- Run the automated validation:
.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType PrivateEndpoint
Workflow 3: Configure Firewall Allowlisting
Fabric requires specific endpoints and service tags. Run the firewall audit script:
.\scripts\Test-FabricNetworkHealth.ps1 -CheckType FirewallEndpoints
For the complete endpoint reference, see firewall-endpoints.md.
Key service tags for Azure Firewall / NSG rules:
| Tag | Purpose | Direction |
|---|---|---|
| Power BI | Fabric core services | Both |
| DataFactory | Pipeline operations | Both |
| PowerQueryOnline | Dataflow processing | Both |
| SQL | Warehouse connectivity | Outbound |
| EventHub | Real-Time Analytics | Outbound |
| KustoAnalytics | Real-Time Analytics | Both |
Workflow 4: Troubleshoot Outbound Access Protection
When outbound access protection is enabled, public repositories (PyPI, Conda) are blocked. To install libraries in secured environments:
- Prepare a
requirements.txton a machine with internet access - Download packages and dependencies using pip:
pip download -r requirements.txt -d ./packages
- Upload packages as custom libraries in the Fabric Environment
- See outbound-access-guide.md for detailed steps
Workflow 5: Resolve Capacity Throttling (HTTP 430)
When all Spark VCores are consumed, new jobs receive HTTP 430 errors. Formula: 1 Capacity Unit = 2 Spark VCores.
- Check current utilization in the Monitoring Hub
- Cancel idle or stuck Spark sessions
- Consider upgrading capacity SKU if sustained
- Enable queueing for pipeline and Spark Job Definition workloads
For queue limits by SKU, see capacity-throttling.md.
remediate Quick Reference
| Symptom | Likely Cause | First Action |
|---|---|---|
| Spark startup >2 min | Private Link/VNet enabled | Expected; Starter Pools unavailable |
| Connection timeout from Spark | Firewall blocking Fabric subnet | Open required ports (1433 for SQL) |
| DNS resolves to public IP | Private DNS zone not linked | Add A record pointing to private IP |
| MPE status = Failed | PLS rejected or deleted | Re-create MPE, verify PLS exists |
| HTTP 430 error | Capacity VCores exhausted | Cancel jobs or upgrade SKU |
| PyPI install blocked | Outbound access protection | Upload packages as custom libraries |
| Cross-workspace env fails | Network settings mismatch | Ensure same capacity and network config |
| OneLake API 403 | Endpoint URL validation | Use *.dfs.fabric.microsoft.com |
References
Source
git clone https://github.com/PatrickGallucci/fabric-skills/blob/main/skills/fabric-network-remediate/SKILL.mdView on GitHub Overview
A systematic toolkit for diagnosing and resolving Microsoft Fabric network performance issues across Spark, OneLake, Data Warehouse, Pipelines, and Dataflows. It guides you through connectivity, DNS, firewall, and Spark startup delays to reduce timeouts and throttling.
How This Skill Works
The skill provides structured workflows and automated checks via Test-FabricNetworkHealth.ps1 to surface DNS, endpoint, and latency problems. It covers Spark startup delays, private endpoint connectivity, and firewall allowlisting, using prerequisites like PowerShell 7+, Az module, and appropriate Fabric roles to perform hands-on remediation.
When to Use It
- Fabric Spark sessions take longer than expected to start (>10 seconds)
- Connection timeouts to external data sources from notebooks or pipelines
- Managed private endpoint status shows Pending or Failed
- DNS resolution returns public IPs instead of private IPs
- Outbound access protection blocks required dependencies (PyPI, Conda)
Quick Start
- Step 1: Run Spark startup diagnosis: .\\scripts\\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType SparkStartup
- Step 2: Validate Managed Private Endpoint connectivity: .\\scripts\\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType PrivateEndpoint
- Step 3: Audit firewall endpoints: .\\scripts\\Test-FabricNetworkHealth.ps1 -CheckType FirewallEndpoints
Best Practices
- Verify prerequisites and permissions before making network changes
- Run automated health checks with Test-FabricNetworkHealth.ps1 for each scenario
- Validate DNS resolution returns private IPs (10.x.x.x or 172.x.x.x) when using private endpoints
- Align firewall rules and service tags with the documented endpoint references
- Document remediation steps and re-run health checks to confirm improvements
Example Use Cases
- Diagnose and remediate Spark startup delays caused by Private Link/VNet configurations
- Resolve DNS resolution returning public IPs by adjusting DNS zone or private link routing
- Approve Managed Private Endpoints and re-run connectivity tests to restore access
- Address OneLake API 403 or timeout errors by auditing firewall endpoints and service tags
- Mitigate capacity throttling (HTTP 430) in cross-workspace network scenarios