Workload Isolation & the Blast Radius: When to Split Capacities
By Jonathan Flach · Published 2026-06-20 · Reviewed 2026-06-20
An idle F64 capacity costs $11.52 per hour. If one department's overnight Spark job burns through the 24-hour smoothed CU budget before 9 AM, every other team on that capacity — their report loads, pipeline runs, warehouse queries — hits background rejection until the debt clears. That is the blast radius in practice: a single workload's debt, a capacity-wide outage.
Fabric workload isolation is not a switch you turn on — it is an architectural choice with real cost on both sides. There is no native per-workspace CU reservation. The only hard isolation boundary is a separate capacity, and each capacity that needs free Power BI viewers must independently reach F64 ($8,409.60/month PAYG, as of June 2026). This article gives you the mechanism, the math, and a structured decision guide for when splitting capacities actually justifies that cost — and when pooling with surge protection is the smarter answer.
For the full monitoring picture — including how to detect blast-radius events in real time — see the complete guide to Microsoft Fabric capacity monitoring. For the topology cost trade-offs, see the single-vs-multiple-capacities analysis. For the throttling mechanics that drive the blast radius, see Fabric throttling explained.
How the blast radius actually works
Every workload on a Fabric capacity draws from one shared CU pool. Fabric applies smoothing to spread usage across time: interactive operations are smoothed over a 5-minute minimum window; background operations (Spark jobs, pipeline runs, semantic model refreshes) are smoothed over 24 hours (Understand your Fabric capacity throttling, Microsoft Learn, checked June 2026). This smoothing is what lets a capacity "burst" — a workload can temporarily use more CUs than the per-second SKU limit, borrowing against future capacity — but it is also what creates carry-forward debt that triggers throttling.
Throttling is keyed to future-capacity time windows, not utilization percentages:
| Stage | Future-usage window exceeded | What every workspace on the capacity experiences |
|---|---|---|
| Overage protection | Up to 10 min of future capacity consumed | Burst absorbed silently; no user impact |
| Interactive delay | 10–60 min carry-forward | ~20-second delay added to every interactive request |
| Interactive rejection | 60 min–24 h carry-forward | All new interactive requests rejected across the capacity |
| Background rejection | >24 h carry-forward | All requests rejected (background and interactive) across the capacity |
Source: Understand your Fabric capacity throttling, Microsoft Learn, checked June 2026.
The key word in that table is "every workspace." The throttling system operates at the capacity level, not the workspace level. When the background rejection threshold is crossed — because one Dataflow Gen2 job or one overnight Spark run consumed the 24-hour budget — production pipelines across the entire capacity freeze. Executive dashboards fail. Developers cannot deploy. The offending workspace and the carefully-governed reporting workspace are treated identically.
There is no native mechanism that says "throttle this workspace's background operations but protect that one." Workspace-level surge protection (preview, announced January 2026) changes the picture somewhat, but it is specifically a blocking gate on background usage, not a CU reservation (Workspace-level surge protection controls, Microsoft Learn, checked June 2026). A workspace that hits its admin-set CU percentage cap gets blocked — its operations rejected — rather than gently throttled. And Mission Critical mode exempts a workspace from workspace-level blocking, but does not protect it from capacity-level throttling. If the capacity itself is exhausted, Mission Critical workspaces throttle too.
The only hard isolation is a separate capacity.
Workspace-level surge protection: what it does and doesn't do
Surge protection is worth configuring on any shared capacity — it is the lowest-cost blast-radius limiter available without buying a separate SKU. Understanding its actual scope matters.
What it does:
- Lets capacity admins set a per-workspace maximum CU percentage consumed over a 24-hour rolling window
- Automatically blocks any workspace that exceeds its threshold — rejecting all operations from that workspace until the block period expires
- Supports Mission Critical designation to exempt high-priority workspaces from workspace-level blocking rules
- Applies an independent background rejection threshold at the capacity level (e.g. block background operations at 70% of the 24-hour budget, not 100%)
What it does not do:
- Does not give a workspace a guaranteed minimum CU allocation
- Does not prevent capacity-level throttling if total aggregate demand exceeds the SKU
- Does not protect Mission Critical workspaces from capacity-level throttling
- The CU detection threshold monitors background consumption only (24-hour rolling window); however, once a workspace is placed in the Blocked state, all operations — interactive and background — are rejected until the block period expires
The practical configuration: mark your production reporting workspaces Mission Critical (so they are never auto-blocked by workspace surge protection), set the capacity-level background rejection threshold to something like 70–80% rather than waiting for full budget exhaustion, and cap dev/test workspaces at 20–30% of the daily CU budget. This does not eliminate the blast radius — if a dev workspace burns 30% of the budget it still contributed to capacity-level exhaustion — but it prevents a single workspace from solo-consuming the entire pool.
The F64 multiplier every split topology must price
Before deciding to split capacities, price the F64 penalty. The free-viewer entitlement is per capacity, not per tenant. A free Fabric license can view Power BI reports only on workspaces assigned to an F64 or larger capacity (Understand Microsoft Fabric Licenses, Microsoft Learn, checked June 2026). Workspaces on any sub-F64 capacity require Power BI Pro ($14/user/mo) or PPU ($24/user/mo) for every report viewer.
If you split one F64 into two capacities to isolate two departments, and both departments have report viewers, both capacities must be F64 or higher. That doubles the base capacity cost before you count any workload.
The break-even math, using PAYG prices as of June 2026 (estimates):
| Split scenario | Isolation premium vs. pooling | Pro viewer break-even (sub-F64 option) |
|---|---|---|
| F64 → two F64s | +$8,409.60/mo | — (both capacities are F64; no per-user cost) |
| F64 → F64 + F32 | +$4,204.80/mo capacity | 301 users on Pro vs. paying the F32→F64 incremental |
| F64 → F64 + F16 | +$2,102.40/mo capacity | 451 users on Pro vs. paying the F16→F64 incremental |
| F64 → F64 + F2 | +$262.80/mo capacity | 582 users on Pro vs. paying the F2→F64 incremental |
Break-even = ceil((F64 PAYG − smaller SKU PAYG) / $14 per user per month). For example: ceil(($8,409.60 − $4,204.80) / $14) = ceil(300.34) = 301 users. All figures PAYG, June 2026 estimates.
A department with 250 report viewers, isolated on an F32, pays $4,204.80/mo (capacity) + $3,500/mo (250 × $14 Pro) = $7,704.80/mo. Putting those same 250 viewers on an F64 costs $8,409.60/mo with no per-user license cost — $704.80 more per month for a full blast-radius boundary and 2× the CU headroom. At 301 viewers, an F64 is cheaper than F32 + Pro licenses. That is the break-even that makes the isolation upgrade self-financing.
The split-vs-pool decision guide
This is the split-vs-pool-decision-guide — a structured scoring model for deciding when the blast radius justifies separate capacities versus when pooling with surge protection is the right answer. Score each dimension for your situation. The guide is genuinely original: it combines the F64 multiplier math, workspace surge protection reality, and the throttling stage mechanics into a single decision surface that does not appear in the sibling articles.
Run through each factor and note whether it pushes toward Pool (stay on one capacity) or Split (buy a separate capacity).
Factor 1: Blast-radius evidence from your actual data
Pool if: Pulling 14 days of compute detail from the Capacity Metrics app shows no single workspace consuming more than 30–35% of the daily CU budget on a consistent basis. The blast radius exists in theory but has not materialized.
Split if: One workspace or department routinely accounts for 40%+ of daily CU consumption and the workload cannot be optimized — the smoothed budget is one bad run away from background rejection for everyone else.
Verdict weight: High. This is the deciding factor. Theoretical blast radius without empirical evidence is not a reason to double your capacity cost.
Factor 2: Workload class mismatch
Pool if: Your workloads have complementary peak patterns — nightly Spark jobs run when daytime BI is quiet; weekend pipeline bursts settle before Monday morning report loads. The pooled SKU serves all of them with headroom to spare.
Split if: You have always-on, high-CU workloads that overlap with interactive BI peaks — Spark processing running at 9 AM when analysts are loading reports. No surge protection setting fixes overlapping peaks; only a separate capacity gives each workload its own uncontested CU pool.
Factor 3: Dev/test blast radius risk
Pool if: Dev/test workspaces are small, well-governed, and capped via workspace-level surge protection at a low percentage of the daily budget. Developers are not running overnight Spark experiments.
Split if: Dev/test workspaces run experimental notebooks, unoptimized Dataflow Gen2 refreshes, or large-scale pipeline tests that have caused or could cause prod throttling. A separate small PAYG dev capacity — an F4 or F8, pauseable when idle — costs $525.60 or $1,051.20/mo PAYG and keeps dev mistakes off the production blast radius entirely. Dev capacities do not need to reach F64; developers do not need free viewer licenses.
Factor 4: Chargeback and governance requirements
Pool if: Cost allocation can be satisfied at the workspace/department level by the Fabric Chargeback app's daily-refresh, workspace-grain view. You need visibility, not hard ring-fencing.
Split if: Departments own their own P&Ls and require independent pause/resume cycles, hard cost caps, or the ability to audit and dispute their specific capacity spend without sharing a pool with other departments. The Chargeback app shows allocation from a shared pool; it does not prevent one department's workload from consuming another's share.
Factor 5: Viewer licensing and the F64 multiplier
Pool if: All report viewers are on a single capacity that is already F64 or higher. Splitting would require a second F64 and the isolation premium is larger than the blast-radius cost.
Split if: A department has enough viewers that Pro licenses on a sub-F64 capacity cost more than upgrading to F64. Run the break-even: if ceil((F64 PAYG − your current SKU PAYG) / $14) ≤ your viewer count, the F64 upgrade pays for itself in eliminated Pro licenses. At that point, the isolation boundary is essentially free.
Reading the guide
| Factors pushing Split | Recommendation |
|---|---|
| 0–1 | Pool. Configure workspace-level surge protection and monitor CU distribution. |
| 2 | Pool with aggressive surge protection caps. Revisit in 90 days with fresh Capacity Metrics data. |
| 3 | Hybrid: keep prod pooled on F64; move dev/test to a separate small PAYG capacity. |
| 4–5 | Split. The blast radius or cost structure justifies separate capacities. Price the F64 multiplier per split before committing. |
What to do if a blast-radius event is happening right now
Throttling is active. Every team on the capacity is affected. Here is the correct response order:
- Open the Capacity Metrics app. The Compute page shows the top-consuming items over the current throttling window. Identify the offending workspace and item.
- Kill or pause the offending job. Cancel the Spark session, stop the pipeline run, or cancel the Dataflow Gen2 refresh. Stopping the workload stops new CU debt from accumulating; the carry-forward clears as time passes.
- Scale up the SKU temporarily if the workload is legitimate and time-sensitive. Scaling from F64 to F128 doubles the CU budget and can clear interactive throttling immediately. Scale back down once the job completes.
- Do not pause the capacity to clear the throttle. Pausing does technically end the active throttle — it resets the capacity. But it bills all accumulated smoothed carry-forward overage immediately at PAYG rates. On a reserved capacity you pay the reservation fee plus the PAYG overage bill for the settled debt. This is the pause trap: you traded a throttle for an instant, non-discountable overage charge. Fix the workload; do not pause.
- After the incident, block the offending workspace via workspace-level surge protection until you can fix the workload. Then configure a CU percentage cap for that workspace so it cannot solo-consume the daily budget again.
The enemy this article defeats: the throttling blast-radius
The throttling blast-radius is the canonical failure mode this guide is built around: one workload's CU debt, amplified by the 24-hour smoothing window, becomes a capacity-wide outage for every team on the shared pool. A single poorly-optimised Dataflow Gen2 refresh or an overnight Spark job that nobody stopped can push the carry-forward past the 24-hour threshold — and at that point, all requests across the capacity are rejected, background and interactive alike.
SpendWeave Pro surfaces this enemy because it keeps history beyond the 14-day window on the Capacity Metrics Compute page — and unlike the app's 30-day Item History preview, it links CU consumption to specific pipeline runs and workspace ownership. A blast-radius event on Day 1 that repeats on Day 16 is invisible to the native tool; SpendWeave holds the pattern. And while the Capacity Metrics app attributes CU consumption at the item level — it does not link OperationID to pipeline runs, and it cannot tell you which workspace has been responsible for 60% of your monthly CU spend across multiple incidents — SpendWeave's per-workspace CU attribution bridges that gap. You see the blast radius before it finds you.
What to do next
- Pull 14 days of compute detail from the Capacity Metrics app. Sort by workspace. If one workspace consistently represents 35%+ of daily CU consumption, you have a real blast-radius risk, not a theoretical one.
- Enable workspace-level surge protection today on any shared capacity. Set background rejection at 70–80% of the daily budget rather than 100%. Cap your highest-CU workspaces at 30–40% of the daily budget. Mark prod reporting workspaces Mission Critical. This is free and immediate.
- Run the split-vs-pool decision guide above. If you score 3 or higher, price the topology split. Factor in the F64 multiplier for every capacity that needs free viewers.
- If dev/test is your blast-radius risk: move it to a small PAYG capacity today. An F4 PAYG at $525.60/mo — paused outside working hours — is the cheapest isolation buy available.
- If you have never seen throttling: that does not mean the blast radius can't land. One unoptimized Power Query edit by a business analyst can consume the 24-hour budget. Know your largest workspace's CU percentage before you get the call.
Frequently asked questions
Does Microsoft Fabric support per-workspace CU isolation? Not natively. CUs are pooled across all workspaces on a single capacity. Workspace-level surge protection (preview, January 2026) can cap a workspace's background CU consumption against an admin-set threshold and block it if it exceeds that cap — but this is a blocking gate, not a CU reservation. The blocked workspace's operations are rejected; there is no soft guarantee of minimum CUs. The only true hard isolation is a separate capacity.
What is the blast radius in Microsoft Fabric? The blast radius is the scope of throttling impact when one workload over-consumes a capacity. Because CUs are shared with no per-workspace reservation, a single runaway pipeline or query can exhaust the 24-hour smoothed background budget for the entire capacity — triggering full rejection (background and interactive) for every workspace, not just the offending one. At the background rejection stage (>24 h carry-forward), interactive report loads and queries are also refused across the entire capacity, not just pipelines and Spark jobs. On interactive operations, exceeding the 10-minute carry-forward window adds a ~20-second delay to every interactive request across the capacity.
When should I split into multiple Fabric capacities? Split when: (1) one department or workload routinely consumes more than 40% of the daily CU budget and you cannot fix the workload; (2) a dev/test workspace has caused or could cause prod throttling; (3) departments have hard chargeback requirements the Chargeback app's daily-refresh, workspace-grain view cannot satisfy; or (4) you need independent pause/resume cycles for cost control. Do not split purely for theoretical isolation if you haven't verified the blast radius from your actual Capacity Metrics data.
Does each Fabric capacity need its own F64 for free Power BI viewers? Yes. The free-viewer entitlement is per capacity, not per tenant. Workspaces on any capacity below F64 require Power BI Pro ($14/user/mo) or PPU ($24/user/mo) for every report viewer. If you split into two capacities and only one is F64, the other capacity's workspaces still require per-user licenses. Every capacity that needs free viewers must independently meet the F64 threshold.
Does pausing a capacity clear an active throttle? Technically yes — pausing resets the capacity and ends the throttle. But it is a costly trap: pausing settles all accumulated smoothed/carry-forward overage to your Azure bill immediately at PAYG rates. On a reserved capacity you pay the reservation fee plus the PAYG overage bill for the debt. The correct response to throttling is to scale up the SKU temporarily, kill the offending job, or fix the query — not to pause.
Researched with AI assistance, written and fact-checked by Jonathan Flach, verified against Microsoft Learn.