Microsoft Fabric Capacity Monitoring: The Complete Guide

You monitor Microsoft Fabric capacity with the free Capacity Metrics app — a Power BI app that shows CU utilization, throttling stages, and your top-consuming items. It keeps 14 days of 30-second compute detail on its Compute page, 30 days of item-level compute history on its Item History page (preview, August 2025), and 30 days of storage data. That native ceiling is enough for month-level triage and item-level trending, but it can't answer any question that spans more than 30 days, and it cannot link a cost spike to a specific pipeline run. Those gaps define where native monitoring stops and where external tooling begins. This guide maps exactly what the native tools do, where each one stops, and how to fill the gaps without throwing away the free stack that already works.

Native Fabric monitoring is not one tool — it's five, each built for a different question. The Capacity Metrics app answers "is my capacity overloaded right now and over the last 14 days." The Chargeback app answers "which department used what, refreshed daily." Real-Time hub capacity events answer "alert me the moment a capacity throttles." FUAM (the community Fabric Unified Admin Monitoring solution) and a SemPy extract answer "give me history the app deleted." Knowing which tool answers which question — and which questions none of them answer well — is the whole game.

The named enemy this pillar exists to defeat is the metrics-retention wall: the ceiling that makes long-term capacity history impossible natively. The Compute page keeps 14 days of 30-second timepoint detail; the Item History page (preview, August 2025) extends item-level compute visibility to 30 days; the Storage page keeps 30 days of storage data. But none of those windows are configurable, and none reach multi-year history or per-pipeline-run attribution. Two supporting enemies travel with the retention wall — the attribution void (you can see that a pipeline cost CUs, not reliably which run or whose) and the throttling blast-radius (one workload's debt throttles the whole capacity, because there's no native per-workspace CU isolation — workspace-level surge protection shipped in preview in January 2026, but it throttles or limits a workspace's background usage against an admin-set threshold rather than giving each workspace a guaranteed CU reservation, so the tenant-wide blast radius still applies). Map the tools to the questions, name where each wall stands, and the monitoring strategy picks itself.

The native monitoring stack, tool by tool

Five native (or near-native) surfaces carry Fabric capacity monitoring. Each is genuinely useful inside its lane.

The Capacity Metrics app is the operational source of truth. A capacity admin installs it as a Power BI app; its Compute page shows utilization and throttling over the last 14 days, and its Storage page tracks storage over 30 days (What is the Microsoft Fabric Capacity Metrics app?, Microsoft Learn, checked June 2026). Data is not live: "usage data becomes available within 10 to 15 minutes after the activity occurs," and dimensions like capacities, workspaces, and items refresh on a scheduled semantic-model refresh at midnight local time (Metrics app data latency, Microsoft Learn, checked June 2026). So a brand-new workspace won't even appear by name until the next nightly refresh. This is your sizing and triage tool — see the Capacity Metrics app walkthrough for reading it well — but it is a rolling 14-day window, not a record.

The Chargeback app answers the departmental question: which workspace or business unit consumed what share of the capacity. It's the right tool for showback — "Marketing used X% of the capacity, which is $Y of the monthly cost." But the data "isn't real-time; it's refreshed daily" (What is Microsoft Fabric Chargeback app?, Microsoft Learn, checked June 2026), and it inherits the metrics app's user-data masking: when an admin disables "Show user data in the Fabric Capacity Metrics app and reports," user emails are hidden across these surfaces (Capacity Metrics app, Microsoft Learn, checked June 2026). Useful for monthly cost allocation; not built for live or per-user views. We cover its limits in the Chargeback app vs. real attribution.

Real-Time hub capacity overview events are the only near-real-time native signal. An active capacity emits a summary line every 30 seconds carrying CU, interactive-delay throttling percentage, and more, plus a state event when the capacity changes state — active, overloaded (throttling), or paused (Explore Fabric capacity overview events, Microsoft Learn, checked June 2026). You can route these into an Eventstream, store them in an Eventhouse for history, and fire a Data Activator alert on throttling — covered in Fabric capacity events in the Real-Time hub. This is preview, and a paused capacity emits nothing.

FUAM (Fabric Unified Admin Monitoring) is the community/Microsoft solution-accelerator pattern that wires the admin APIs and metrics into a lakehouse so you can keep history the app drops. And a SemPy / DAX extract is the do-it-yourself version: query the Capacity Metrics semantic model from a notebook with evaluate_dax, or call the Power BI executeQueries REST API, and land the result in your own store — see extracting Fabric metrics with SemPy. Both exist precisely because the app forgets.

The native-vs-gap monitoring matrix

Here is the honest map of what the native stack does and where it stops. Rows are the monitoring needs every Fabric admin eventually has; columns are the five surfaces above. ✓ = does it well; ~ = partial, with a catch; ✗ = doesn't do it. Marked as of June 2026 against Microsoft Learn.

Monitoring need	Capacity Metrics app	Chargeback app	Real-Time hub events	FUAM	Fabric Cost Analysis (FCA)
Real-time CU usage	~ (10–15 min lag)	✗ (daily refresh)	✓ (30-sec summary)	✗ (batch ingest)	✗ (Azure billing latency)
Throttling visibility	✓ (Compute page, 14 d)	✗	✓ (state + delay %)	~ (only what you ingest)	✗
Long-term history (>14 days)	✗ (14-day compute wall)	~ (daily, limited window)	~ (only if you persist to Eventhouse)	✓ (you own the lakehouse)	~ (cost only, not CU detail)
Per-pipeline / per-item attribution	~ (item-level, not per-run)	~ (by workspace/dept)	✗	~ (item-level via APIs)	✗ (subscription/SKU level)
Per-user attribution	~ (masked if admin setting off)	✗ (aggregated, can mask)	✗	~ (activity events, item-scoped)	✗
Proactive alerting	✗ (no alert engine)	✗	✓ (Activator on events)	✗ (report, not alerts)	~ (Azure Budgets, cost only)
Cross-capacity rollup	~ (per app install/scope)	~ (per capacity)	~ (per-capacity streams)	✓ (tenant-wide by design)	✓ (subscription-wide cost)

Read the matrix down the long-term-history and per-run-attribution rows and the gap is unmistakable: the only ✓ for >14-day history is FUAM, which you have to build and operate yourself, and nothing delivers reliable per-run or clean per-user attribution. The native tools are excellent at the two things they're designed for — real-time health (events) and short-term operational triage (the metrics app) — and structurally weak at the two things FinOps actually needs over time: durable history and precise attribution. That's not a knock on free tools doing their job; it's the line where a purpose-built capacity vault begins.

Enemy #1 — the metrics-retention wall

The Capacity Metrics app has three retention windows. The Compute page is a rolling 14-day window of 30-second timepoint detail — the most granular view in the app. The Item History page (preview, August 2025) extends item-level compute visibility to a 30-day window, with workspace and item slicers but without sub-hour drill-through (Item History page, Microsoft Learn, checked June 2026). The Storage page is a rolling 30-day window for storage. None of these windows can be extended by any admin setting, and none reach multi-year history. Any question that spans more than 30 days — multi-month CU trend, quarterly forecasting, year-end chargeback reconciliation, "was last quarter worse than the same quarter a year ago" — is unanswerable from native tools. And any question that asks "which specific pipeline run caused a spike" is unanswerable at any retention depth, because the attribution is item-level, not run-level.

This is why FUAM and SemPy extracts exist at all. The fix is always the same shape: extract before the window closes. A scheduled SemPy notebook queries the Capacity Metrics semantic model with evaluate_dax (or the Power BI executeQueries API — one query per request, 100k-row cap) and appends each pull to a lakehouse or Eventhouse you control, building the multi-year record the app can't. Note: Microsoft marks external querying of the Capacity Metrics semantic model as not supported — treat this pattern as best-effort. The full extraction pattern is in extracting Fabric metrics with SemPy, and the retention limit is dissected in the Fabric metrics retention limit. The principle: native monitoring is a live gauge, not a flight recorder. If you need the recorder, you build or buy it.

Enemy #2 — the attribution void

Even inside the 14-day window, the metrics app answers "what is consuming CUs" better than "who or which run." Attribution in Fabric is item-level: you can see that a specific pipeline, notebook, or semantic model drew CUs. What you can't natively do is link a capacity-event OperationID back to the individual pipeline run that spiked, or cleanly attribute a spike to a single user. The Chargeback app aggregates daily and operates at the workspace/department grain, and user identities are masked whenever the "Show user data in the Fabric Capacity Metrics app" setting is off — and service-principal-driven workloads frequently show up without a human owner at all.

So the practical failure mode is: a capacity throttles, the metrics app says "this Dataflow Gen2 item is your top consumer," and you still can't tell which of the 40 daily runs — fired by which scheduler or which person — actually caused the incident. Closing this gap means joining capacity telemetry to the Activity Events admin API (tenant-wide audit events, ~30-day window, one day per request) on operation and time, in your own store. The full teardown of why the platform can't do this for you is in the Fabric attribution void; the Chargeback-specific limits are in Chargeback app vs. attribution.

Enemy #3 — the throttling blast-radius

The third wall is architectural. Every workload on a capacity draws from one shared CU pool, and there's no native per-workspace CU isolation — so one workload's smoothed debt throttles everyone on the capacity, not just the offender. Workspace-level surge protection shipped in preview (January 2026), but it throttles or limits a workspace's background usage against an admin-set threshold rather than giving each workspace a guaranteed CU reservation — so the tenant-wide blast radius still applies. That makes monitoring throttling a whole-capacity concern, not a per-team one, and it's why proactive alerting matters so much: by the time the metrics app shows the throttle (10–15 minutes late), your users have already seen errors.

Throttling itself is staged on future-capacity time windows, not utilization percentages (Metrics app calculations, Microsoft Learn, checked June 2026):

Stage	Future-usage window crossed	What users experience
Overage protection	up to ~10 min of future usage	Burst absorbed; no impact
Interactive delay	10 min smoothed	A ~20-second throttle added to interactive requests
Interactive rejection	60 min smoothed	New interactive requests rejected; users see errors
Background rejection	24 h smoothed	All requests rejected, including background jobs

The monitoring takeaway: watch the throttling percentages on the metrics app's Compute page and stream the Real-Time hub events so you can alert before interactive-rejection hits. The deeper mechanics live in Fabric throttling explained, the no-isolation problem in workload isolation and the blast radius, and the response playbook in throttling triage. For predicting it before it lands — by tracking the carry-forward burndown — see predicting Fabric throttling.

How to build a monitoring strategy that survives the walls

Layer the tools by the question each answers — don't expect one to do everything:

Install the Capacity Metrics app first. It's free, it's the source of truth for CU and throttling, and it's your sizing evidence. Read it the way the metrics app guide describes. Accept its 10–15 minute lag and 14-day compute window as facts, not failures.
Stream Real-Time hub capacity events for alerting. The metrics app can't alert; the events can. Route the 30-second summary into an Eventstream, persist to an Eventhouse, and set a Data Activator rule on the throttling percentage and state changes — the only native path to throttling alerts before users feel them.
Extract for history before the window closes. The Item History page (preview) gives 30 days of item-level trends natively. For sub-day forensic detail beyond 14 days, or multi-month and quarterly history, schedule a SemPy/executeQueries pull into a lakehouse you own, or stand up FUAM. This is the only way to beat the retention wall and get multi-year trends.
Join telemetry to activity events for attribution. To get past item-level to per-run/per-user, you have to correlate capacity data with the Activity Events admin API in your own store — no native tool does this join.
Decide build vs. buy honestly. If you have the data-engineering capacity to run a custom Eventhouse + extract + join pipeline and keep it healthy, FUAM is a strong start. If you'd rather not operate that pipeline, a purpose-built vault does it continuously in your tenant — that's where capacity-metrics-app alternatives and the broader Fabric cost-monitoring tools landscape come in.

For the cost angle of all this — what the monitoring finds and how to act on it — pair this pillar with the reduce Microsoft Fabric costs playbook, and for the SKU-sizing groundwork, the Fabric pricing & capacity-planning guide. For the billing-event edge cases the meter exposes, see PAYG overage billing in 2026.

What to do next

Install the Capacity Metrics app today and find your 24-hour smoothed background peak — that's your live gauge.
Stand up the Real-Time hub capacity event stream and set one Activator alert on throttling. Stop discovering throttling from user complaints.
Schedule a metrics extract before the 14-day window closes — SemPy notebook to a lakehouse, or FUAM — so you actually own your history.
Map your attribution honestly. Accept item-level natively; join activity events if you need per-run.
Pick build or buy for continuous monitoring based on whether you want to operate the pipeline or have it operated for you.

The named enemy this pillar defeats is the metrics-retention wall — the hard ceilings of 14-day 30-second compute detail and 30-day item-level history that make multi-year capacity trending and per-run attribution impossible natively — with the attribution void and the throttling blast-radius as its supporting cast. SpendWeave's stance is plain: the free native tools are genuinely good at real-time health and short-term triage, so use them; the moment you need multi-year history, per-run attribution, or alerts that fire before users feel the throttle, you've hit a wall the platform won't move for you — and that's the work a purpose-built capacity vault exists to do.

If this is the kind of plain-mechanics teardown you want, SpendWeave Pro keeps your capacity history, attribution, and throttling alerts running continuously inside your own tenant — picking up where the native retention walls end and building the multi-year record the app cannot.

Frequently asked questions

How do I monitor Microsoft Fabric capacity usage? Start with the free Microsoft Fabric Capacity Metrics app — a Power BI app a capacity admin installs that shows CU utilization, throttling, and top-consuming items across your capacities. Its Compute page holds 14 days of 30-second compute detail, its Item History page (preview, August 2025) holds 30 days of item-level compute trends, and its Storage page holds 30 days of storage data, per Microsoft Learn. Data lags about 10–15 minutes behind live. For near-real-time signals, add Fabric capacity overview events in the Real-Time hub, which emit a capacity summary line every 30 seconds. For history beyond 30 days, per-run attribution, or proactive alerts, you extract the data to your own store, because the native tools don't retain or attribute it that way.

How long does the Fabric Capacity Metrics app keep data? The Capacity Metrics app keeps 14 days of 30-second compute detail on its Compute page, 30 days of item-level compute history on its Item History page (preview, August 2025), and 30 days of storage data on its Storage page, as of June 2026 per Microsoft Learn. All are rolling windows — with no native setting to extend them. The native ceiling is 30 days, which covers monthly item-level trending but not multi-month forecasting, quarterly chargeback, or per-run attribution. Those needs drive teams to build or buy an external capacity store.

Can I get real-time Fabric capacity alerts? Not from the Capacity Metrics app itself — its data lags 10–15 minutes and it has no alerting engine. The near-real-time path is Fabric capacity overview events in the Real-Time hub: an active capacity emits a summary line every 30 seconds (CU, interactive-delay and throttling percentages) plus a state event when it changes (active, overloaded, paused). Route those into an Eventstream and wire a Data Activator rule, and you get alerts on throttling or overload as it happens. That's the only native way to alert proactively rather than discover throttling after users have already hit errors.

Why can't the Fabric Metrics app tell me which pipeline cost me money? Because attribution in Fabric is item-level, not run-level or user-level. The Capacity Metrics app shows CU by item (a specific pipeline, notebook, or semantic model), but it doesn't link an OperationID back to the individual pipeline run that caused a spike, and the Chargeback app aggregates daily and can mask user identities when the "Show user data in the Fabric Capacity Metrics app" admin setting is off. So you can see that a pipeline burned CUs, but not reliably which run, triggered by whom — that's the attribution void.

What is the best tool for monitoring Microsoft Fabric capacity? There's no single best tool — each native option covers a different need. The Capacity Metrics app is the source of truth for CU and throttling (14-day compute detail; 30-day item history in preview; 30-day storage); the Chargeback app does daily departmental showback; Fabric capacity overview events give near-real-time alerting; FUAM (the community Fabric Unified Admin Monitoring solution) and a SemPy-based extract give you multi-year history if you build the pipeline. The native stack covers real-time and 30-day monitoring well, and leaves multi-year history, per-run attribution, and turnkey alerting to whatever you assemble — a custom Eventhouse, FUAM, or a purpose-built vault.

Researched with AI assistance, written and fact-checked by Jonathan Flach, verified against Microsoft Learn.

Pillar C

Microsoft Fabric Capacity Monitoring: The Complete Guide

The native monitoring stack, tool by tool

The native-vs-gap monitoring matrix

Enemy #1 — the metrics-retention wall

Enemy #2 — the attribution void

Enemy #3 — the throttling blast-radius

How to build a monitoring strategy that survives the walls

What to do next

Frequently asked questions

Alternatives to the Capacity Metrics App for Long-Term History & Attribution

Extract Fabric Capacity Metrics with SemPy: What the Data Reveals at Scale

Fabric Cost Attribution: Why You Can't Identify Which Pipeline Run Cost You Money

Fabric Capacity Events in the Real-Time Hub: Near-Real-Time Monitoring

The Microsoft Fabric Capacity Metrics App: A Complete Guide

The Fabric Chargeback App vs True Cost Attribution: Limits & How to Go Deeper

Fabric Cost Monitoring: Native Tools vs Purpose-Built (Honest Comparison)

Fabric Capacity Alerts That Actually Fire: Real-Time Hub Over Metrics App

Fabric Capacity Metrics Retention: 14-Day Compute Limit, 30-Day Storage — and How to Keep History

PAYG Overage Billing (2026): What It Changes for Your Forecast

Fabric Throttling Explained: Why One Query Slows Your Whole Tenant

Real-Time Throttling Triage When the Metrics App Can't Keep Up

Workload Isolation & the Blast Radius: When to Split Capacities

Predicting Fabric Throttling Before It Happens