Pillar C
Alternatives to the Capacity Metrics App for Long-Term History & Attribution
The Capacity Metrics app keeps only 14 days of compute history. An honest build-vs-buy of the alternatives: SemPy vault, FUAM, and SpendWeave Pro.
You monitor Microsoft Fabric capacity with the free Capacity Metrics app — a Power BI app that shows CU utilization, throttling stages, and your top-consuming items. It keeps 14 days of 30-second compute detail on its Compute page, 30 days of item-level compute history on its Item History page (preview, August 2025), and 30 days of storage data. That native ceiling is enough for month-level triage and item-level trending, but it can't answer any question that spans more than 30 days, and it cannot link a cost spike to a specific pipeline run. Those gaps define where native monitoring stops and where external tooling begins. This guide maps exactly what the native tools do, where each one stops, and how to fill the gaps without throwing away the free stack that already works.
Native Fabric monitoring is not one tool — it's five, each built for a different question. The Capacity Metrics app answers "is my capacity overloaded right now and over the last 14 days." The Chargeback app answers "which department used what, refreshed daily." Real-Time hub capacity events answer "alert me the moment a capacity throttles." FUAM (the community Fabric Unified Admin Monitoring solution) and a SemPy extract answer "give me history the app deleted." Knowing which tool answers which question — and which questions none of them answer well — is the whole game.
The named enemy this pillar exists to defeat is the metrics-retention wall: the ceiling that makes long-term capacity history impossible natively. The Compute page keeps 14 days of 30-second timepoint detail; the Item History page (preview, August 2025) extends item-level compute visibility to 30 days; the Storage page keeps 30 days of storage data. But none of those windows are configurable, and none reach multi-year history or per-pipeline-run attribution. Two supporting enemies travel with the retention wall — the attribution void (you can see that a pipeline cost CUs, not reliably which run or whose) and the throttling blast-radius (one workload's debt throttles the whole capacity, because there's no native per-workspace CU isolation — workspace-level surge protection shipped in preview in January 2026, but it throttles or limits a workspace's background usage against an admin-set threshold rather than giving each workspace a guaranteed CU reservation, so the tenant-wide blast radius still applies). Map the tools to the questions, name where each wall stands, and the monitoring strategy picks itself.
Five native (or near-native) surfaces carry Fabric capacity monitoring. Each is genuinely useful inside its lane.
The Capacity Metrics app is the operational source of truth. A capacity admin installs it as a Power BI app; its Compute page shows utilization and throttling over the last 14 days, and its Storage page tracks storage over 30 days (What is the Microsoft Fabric Capacity Metrics app?, Microsoft Learn, checked June 2026). Data is not live: "usage data becomes available within 10 to 15 minutes after the activity occurs," and dimensions like capacities, workspaces, and items refresh on a scheduled semantic-model refresh at midnight local time (Metrics app data latency, Microsoft Learn, checked June 2026). So a brand-new workspace won't even appear by name until the next nightly refresh. This is your sizing and triage tool — see the Capacity Metrics app walkthrough for reading it well — but it is a rolling 14-day window, not a record.
The Chargeback app answers the departmental question: which workspace or business unit consumed what share of the capacity. It's the right tool for showback — "Marketing used X% of the capacity, which is $Y of the monthly cost." But the data "isn't real-time; it's refreshed daily" (What is Microsoft Fabric Chargeback app?, Microsoft Learn, checked June 2026), and it inherits the metrics app's user-data masking: when an admin disables "Show user data in the Fabric Capacity Metrics app and reports," user emails are hidden across these surfaces (Capacity Metrics app, Microsoft Learn, checked June 2026). Useful for monthly cost allocation; not built for live or per-user views. We cover its limits in the Chargeback app vs. real attribution.
Real-Time hub capacity overview events are the only near-real-time native signal. An active capacity emits a summary line every 30 seconds carrying CU, interactive-delay throttling percentage, and more, plus a state event when the capacity changes state — active, overloaded (throttling), or paused (Explore Fabric capacity overview events, Microsoft Learn, checked June 2026). You can route these into an Eventstream, store them in an Eventhouse for history, and fire a Data Activator alert on throttling — covered in Fabric capacity events in the Real-Time hub. This is preview, and a paused capacity emits nothing.
FUAM (Fabric Unified Admin Monitoring) is the community/Microsoft solution-accelerator pattern that wires the admin APIs and metrics into a lakehouse so you can keep history the app drops. And a SemPy / DAX extract is the do-it-yourself version: query the Capacity Metrics semantic model from a notebook with evaluate_dax, or call the Power BI executeQueries REST API, and land the result in your own store — see extracting Fabric metrics with SemPy. Both exist precisely because the app forgets.
Here is the honest map of what the native stack does and where it stops. Rows are the monitoring needs every Fabric admin eventually has; columns are the five surfaces above. ✓ = does it well; ~ = partial, with a catch; ✗ = doesn't do it. Marked as of June 2026 against Microsoft Learn.
| Monitoring need | Capacity Metrics app | Chargeback app | Real-Time hub events | FUAM | Fabric Cost Analysis (FCA) |
|---|---|---|---|---|---|
| Real-time CU usage | ~ (10–15 min lag) | ✗ (daily refresh) | ✓ (30-sec summary) | ✗ (batch ingest) | ✗ (Azure billing latency) |
| Throttling visibility | ✓ (Compute page, 14 d) | ✗ | ✓ (state + delay %) | ~ (only what you ingest) | ✗ |
| Long-term history (>14 days) | ✗ (14-day compute wall) | ~ (daily, limited window) | ~ (only if you persist to Eventhouse) | ✓ (you own the lakehouse) | ~ (cost only, not CU detail) |
| Per-pipeline / per-item attribution | ~ (item-level, not per-run) | ~ (by workspace/dept) | ✗ | ~ (item-level via APIs) | ✗ (subscription/SKU level) |
| Per-user attribution | ~ (masked if admin setting off) | ✗ (aggregated, can mask) | ✗ | ~ (activity events, item-scoped) | ✗ |
| Proactive alerting | ✗ (no alert engine) | ✗ | ✓ (Activator on events) | ✗ (report, not alerts) | ~ (Azure Budgets, cost only) |
| Cross-capacity rollup | ~ (per app install/scope) | ~ (per capacity) | ~ (per-capacity streams) | ✓ (tenant-wide by design) | ✓ (subscription-wide cost) |
Read the matrix down the long-term-history and per-run-attribution rows and the gap is unmistakable: the only ✓ for >14-day history is FUAM, which you have to build and operate yourself, and nothing delivers reliable per-run or clean per-user attribution. The native tools are excellent at the two things they're designed for — real-time health (events) and short-term operational triage (the metrics app) — and structurally weak at the two things FinOps actually needs over time: durable history and precise attribution. That's not a knock on free tools doing their job; it's the line where a purpose-built capacity vault begins.
The Capacity Metrics app has three retention windows. The Compute page is a rolling 14-day window of 30-second timepoint detail — the most granular view in the app. The Item History page (preview, August 2025) extends item-level compute visibility to a 30-day window, with workspace and item slicers but without sub-hour drill-through (Item History page, Microsoft Learn, checked June 2026). The Storage page is a rolling 30-day window for storage. None of these windows can be extended by any admin setting, and none reach multi-year history. Any question that spans more than 30 days — multi-month CU trend, quarterly forecasting, year-end chargeback reconciliation, "was last quarter worse than the same quarter a year ago" — is unanswerable from native tools. And any question that asks "which specific pipeline run caused a spike" is unanswerable at any retention depth, because the attribution is item-level, not run-level.
This is why FUAM and SemPy extracts exist at all. The fix is always the same shape: extract before the window closes. A scheduled SemPy notebook queries the Capacity Metrics semantic model with evaluate_dax (or the Power BI executeQueries API — one query per request, 100k-row cap) and appends each pull to a lakehouse or Eventhouse you control, building the multi-year record the app can't. Note: Microsoft marks external querying of the Capacity Metrics semantic model as not supported — treat this pattern as best-effort. The full extraction pattern is in extracting Fabric metrics with SemPy, and the retention limit is dissected in the Fabric metrics retention limit. The principle: native monitoring is a live gauge, not a flight recorder. If you need the recorder, you build or buy it.
Even inside the 14-day window, the metrics app answers "what is consuming CUs" better than "who or which run." Attribution in Fabric is item-level: you can see that a specific pipeline, notebook, or semantic model drew CUs. What you can't natively do is link a capacity-event OperationID back to the individual pipeline run that spiked, or cleanly attribute a spike to a single user. The Chargeback app aggregates daily and operates at the workspace/department grain, and user identities are masked whenever the "Show user data in the Fabric Capacity Metrics app" setting is off — and service-principal-driven workloads frequently show up without a human owner at all.
So the practical failure mode is: a capacity throttles, the metrics app says "this Dataflow Gen2 item is your top consumer," and you still can't tell which of the 40 daily runs — fired by which scheduler or which person — actually caused the incident. Closing this gap means joining capacity telemetry to the Activity Events admin API (tenant-wide audit events, ~30-day window, one day per request) on operation and time, in your own store. The full teardown of why the platform can't do this for you is in the Fabric attribution void; the Chargeback-specific limits are in Chargeback app vs. attribution.
The third wall is architectural. Every workload on a capacity draws from one shared CU pool, and there's no native per-workspace CU isolation — so one workload's smoothed debt throttles everyone on the capacity, not just the offender. Workspace-level surge protection shipped in preview (January 2026), but it throttles or limits a workspace's background usage against an admin-set threshold rather than giving each workspace a guaranteed CU reservation — so the tenant-wide blast radius still applies. That makes monitoring throttling a whole-capacity concern, not a per-team one, and it's why proactive alerting matters so much: by the time the metrics app shows the throttle (10–15 minutes late), your users have already seen errors.
Throttling itself is staged on future-capacity time windows, not utilization percentages (Metrics app calculations, Microsoft Learn, checked June 2026):
| Stage | Future-usage window crossed | What users experience |
|---|---|---|
| Overage protection | up to ~10 min of future usage | Burst absorbed; no impact |
| Interactive delay | 10 min smoothed | A ~20-second throttle added to interactive requests |
| Interactive rejection | 60 min smoothed | New interactive requests rejected; users see errors |
| Background rejection | 24 h smoothed | All requests rejected, including background jobs |
The monitoring takeaway: watch the throttling percentages on the metrics app's Compute page and stream the Real-Time hub events so you can alert before interactive-rejection hits. The deeper mechanics live in Fabric throttling explained, the no-isolation problem in workload isolation and the blast radius, and the response playbook in throttling triage. For predicting it before it lands — by tracking the carry-forward burndown — see predicting Fabric throttling.
Layer the tools by the question each answers — don't expect one to do everything:
executeQueries pull into a lakehouse you own, or stand up FUAM. This is the only way to beat the retention wall and get multi-year trends.For the cost angle of all this — what the monitoring finds and how to act on it — pair this pillar with the reduce Microsoft Fabric costs playbook, and for the SKU-sizing groundwork, the Fabric pricing & capacity-planning guide. For the billing-event edge cases the meter exposes, see PAYG overage billing in 2026.
The named enemy this pillar defeats is the metrics-retention wall — the hard ceilings of 14-day 30-second compute detail and 30-day item-level history that make multi-year capacity trending and per-run attribution impossible natively — with the attribution void and the throttling blast-radius as its supporting cast. SpendWeave's stance is plain: the free native tools are genuinely good at real-time health and short-term triage, so use them; the moment you need multi-year history, per-run attribution, or alerts that fire before users feel the throttle, you've hit a wall the platform won't move for you — and that's the work a purpose-built capacity vault exists to do.
If this is the kind of plain-mechanics teardown you want, SpendWeave Pro keeps your capacity history, attribution, and throttling alerts running continuously inside your own tenant — picking up where the native retention walls end and building the multi-year record the app cannot.
How do I monitor Microsoft Fabric capacity usage? Start with the free Microsoft Fabric Capacity Metrics app — a Power BI app a capacity admin installs that shows CU utilization, throttling, and top-consuming items across your capacities. Its Compute page holds 14 days of 30-second compute detail, its Item History page (preview, August 2025) holds 30 days of item-level compute trends, and its Storage page holds 30 days of storage data, per Microsoft Learn. Data lags about 10–15 minutes behind live. For near-real-time signals, add Fabric capacity overview events in the Real-Time hub, which emit a capacity summary line every 30 seconds. For history beyond 30 days, per-run attribution, or proactive alerts, you extract the data to your own store, because the native tools don't retain or attribute it that way.
How long does the Fabric Capacity Metrics app keep data? The Capacity Metrics app keeps 14 days of 30-second compute detail on its Compute page, 30 days of item-level compute history on its Item History page (preview, August 2025), and 30 days of storage data on its Storage page, as of June 2026 per Microsoft Learn. All are rolling windows — with no native setting to extend them. The native ceiling is 30 days, which covers monthly item-level trending but not multi-month forecasting, quarterly chargeback, or per-run attribution. Those needs drive teams to build or buy an external capacity store.
Can I get real-time Fabric capacity alerts? Not from the Capacity Metrics app itself — its data lags 10–15 minutes and it has no alerting engine. The near-real-time path is Fabric capacity overview events in the Real-Time hub: an active capacity emits a summary line every 30 seconds (CU, interactive-delay and throttling percentages) plus a state event when it changes (active, overloaded, paused). Route those into an Eventstream and wire a Data Activator rule, and you get alerts on throttling or overload as it happens. That's the only native way to alert proactively rather than discover throttling after users have already hit errors.
Why can't the Fabric Metrics app tell me which pipeline cost me money? Because attribution in Fabric is item-level, not run-level or user-level. The Capacity Metrics app shows CU by item (a specific pipeline, notebook, or semantic model), but it doesn't link an OperationID back to the individual pipeline run that caused a spike, and the Chargeback app aggregates daily and can mask user identities when the "Show user data in the Fabric Capacity Metrics app" admin setting is off. So you can see that a pipeline burned CUs, but not reliably which run, triggered by whom — that's the attribution void.
What is the best tool for monitoring Microsoft Fabric capacity? There's no single best tool — each native option covers a different need. The Capacity Metrics app is the source of truth for CU and throttling (14-day compute detail; 30-day item history in preview; 30-day storage); the Chargeback app does daily departmental showback; Fabric capacity overview events give near-real-time alerting; FUAM (the community Fabric Unified Admin Monitoring solution) and a SemPy-based extract give you multi-year history if you build the pipeline. The native stack covers real-time and 30-day monitoring well, and leaves multi-year history, per-run attribution, and turnkey alerting to whatever you assemble — a custom Eventhouse, FUAM, or a purpose-built vault.
Researched with AI assistance, written and fact-checked by Jonathan Flach, verified against Microsoft Learn.
Pillar C
The Capacity Metrics app keeps only 14 days of compute history. An honest build-vs-buy of the alternatives: SemPy vault, FUAM, and SpendWeave Pro.
Pillar C
SemPy's evaluate_dax bypasses the executeQueries 100k-row cap to pull the full Capacity Metrics model — and reveals the gap no script closes.
Pillar C
Fabric's OperationID isn't linked to pipeline runs, attribution is item-level only, and the Chargeback app aggregates daily. Here's how to close the gap.
Pillar C
Fabric capacity events fire every 30 s and on state change. Route via Eventstream to an Eventhouse + Data Activator for sub-minute throttle alerts.
Pillar C
Annotated walkthrough of the Fabric Capacity Metrics app: Compute (14-day), Storage (30-day), Throttling, Timepoint, and Health pages — limits and gaps.
Pillar C
The Fabric Chargeback app refreshes daily, masks service-principal workloads, and stops at item grain. Here's what it covers and how to go deeper.
Pillar C
Five Fabric cost monitoring tools compared — Capacity Metrics app, Chargeback, FUAM, Fabric Cost Analysis, and SpendWeave Pro. Honest capability matrix.
Pillar C
Fabric's Capacity Metrics app has no built-in alerts. One Activator rule on Real-Time hub capacity events fires the moment throttling starts.
Pillar C
Fabric Capacity Metrics keeps 14 days of compute detail and 30 days of storage. No setting extends either. Retention matrix and extraction pattern inside.
Pillar C
Fabric PAYG overage charges at 3× the PAYG rate on a separate Azure meter. Reserved capacity does not cover it — here is what that means for your forecast.
Pillar C
Fabric throttling is staged across future-capacity time windows, not utilization percentages. One workload's debt blocks every user on the shared capacity.
Pillar C
The Capacity Metrics app lags 10–15 min — live triage runbook for an active throttle: what to read, in what order, and why pausing is a costly trap.
Pillar C
No native per-workspace CU isolation exists in Fabric. One runaway job throttles your whole capacity. Size the blast radius and decide when to split.
Pillar C
Track carry-forward debt slope to forecast Fabric throttle onset before users hit errors — a debt-trajectory method with a worked F32 example.