Controlling Copilot & AI Costs in Fabric: The New Billing Meters

By Jonathan Flach · Published 2026-06-20 · Reviewed 2026-06-20

A single Copilot request with 2,000 input tokens and 500 output tokens costs 400 CU-seconds — about 0.11 CU-hours on your Fabric capacity (Copilot consumption, Microsoft Learn, checked June 2026). At $0.18/CU-hour that is roughly two cents per prompt, which sounds harmless. The problem is that Copilot, AI Functions, and AI Services all draw from the same shared CU pool as your pipelines, reports, and Spark notebooks — and there is no native spending cap that stops an automated agent loop from draining that pool.

The Runaway AI enemy in Fabric's cost landscape is not rogue interactive users running too many prompts. It is automated workflows: a Data Activator rule firing on every row of a high-frequency event stream, a notebook AI Function iterating over millions of records, or a data agent retrying on failure in a tight loop. These scenarios can rack up thousands of Copilot-and-AI CU-hours before anyone opens the Capacity Metrics app. This article explains the billing mechanics, quantifies the realistic blast radius, and gives you a containment checklist. For the broader cost-reduction playbook this belongs to, start with how to reduce Microsoft Fabric costs.

How the Copilot-and-AI meter actually works

Microsoft Fabric does not charge a separate Copilot license. Instead, all AI consumption draws from the same CU pool as everything else on your capacity, reported under a single meter called Copilot and AI. That meter covers three distinct workload types (Fabric operations, Microsoft Learn, checked June 2026):

WorkloadWhat it coversOperation class
Copilot in FabricUser prompts and completions across all Fabric experiencesBackground
AI FunctionsLLM-backed data transformations in notebooks (pandas or PySpark)Interactive and background
AI ServicesPrebuilt Azure AI text analytics and translation callsBackground

The Capacity Metrics app shows these as three named line items so you can distinguish them — but they all share the same CU pool and the same smoothing windows as every other workload.

The token-billing formula

Copilot billing is driven entirely by tokens, not by session time or request count:

  • Input tokens: 100 CU-seconds per 1,000 tokens
  • Output tokens: 400 CU-seconds per 1,000 tokens (4x the input rate per the billing rate table — note: a separate Microsoft Learn narrative page describes output tokens as "three times more expensive than input tokens"; per the consumption rate table, the arithmetic ratio is 4:1 at 100 vs 400 CU-sec. We use the rate table as the authoritative source (Copilot consumption, Microsoft Learn, checked June 2026))

For the reference request (2,000 input, 500 output tokens):

(2,000 × 100 + 500 × 400) / 1,000 = 400 CU-seconds = 6.67 CU-minutes

Since Copilot is classified as a background job, its usage is smoothed over 24 hours. That 6.67 CU-minute job consumes only 1 CU-minute of the hourly capacity slice during smoothing — meaning an F64 (1,536 CU-hours/day) can absorb over 13,800 requests per day at that token profile before hitting background rejection (Copilot consumption, Microsoft Learn, checked June 2026).

That headroom is deceptive. Manual-user Copilot stays well within it. Automated workflows do not.

The F64 line for Data agents

Most Copilot experiences run on F2 or higher (Copilot for Data Engineering and Data Science, Microsoft Learn, checked June 2026). The exception is Data agents: they require a workspace assigned to an F64 or larger capacity to run natively — or the user must be assigned to a Fabric Copilot capacity, which explicitly supports "Data agents on a Fabric capacity workspace where the capacity SKU is smaller than F64" (Fabric Copilot capacity, Microsoft Learn, checked June 2026). Sub-F64 teams deploying data agents have two paths: upgrade the workspace capacity to F64+, or route users through a designated Fabric Copilot capacity and have agent consumption bill to that pool instead.

A separate feature, Fabric Copilot capacity, lets you route Copilot (prompts and completions) and Data agent consumption to a dedicated billing pool regardless of where the content workspace lives (Fabric Copilot capacity, Microsoft Learn, checked June 2026). This isolates Copilot utilization so it cannot throttle the production capacity hosting your pipelines and semantic models. Important limitation: Fabric Copilot capacity does not support AI Functions (Fabric Copilot capacity, Microsoft Learn, checked June 2026). To isolate AI Functions from production workloads, you must assign the notebook workspaces that run AI Functions to a separate regular Fabric capacity — the Copilot capacity feature will not contain them.

The runaway-AI cost scenario

This is a worked scenario that does not appear in any Microsoft documentation and that no sibling article covers.

Scenario: looping agent on a high-frequency event stream

Setup: A team deploys a data agent on an F64 capacity to classify incoming support tickets from an event stream. The agent calls Fabric AI Functions in a notebook loop: one LLM call per ticket, average 1,500 input tokens (system prompt + ticket text) and 400 output tokens (classification + reason). The event stream peaks at 3,000 tickets/hour during business hours, 8 hours/day, 22 business days/month.

Per-request cost (using the token formula):

(1,500 × 100 + 400 × 400) / 1,000 = 310 CU-seconds = 0.086 CU-hours

Monthly CU consumption (estimate, as of June 2026):

3,000 requests/hour × 8 hours × 22 days = 528,000 requests/month
528,000 × 0.086 CU-hours = ~45,400 CU-hours/month

Cost against an F64 PAYG capacity (64 CUs × 730 hours = 46,720 CU-hours/month):

That 45,400 CU-hours of AI Functions consumption alone accounts for 97% of the F64's monthly CU budget — leaving only 1,320 CU-hours for all pipelines, semantic model refreshes, and user queries. By default, when usage exceeds the 24-hour smoothed background limit, Fabric does not add a line-item charge — it triggers background rejection, blocking jobs across the entire capacity until the debt clears (F64 PAYG base: $8,409.60/month, June 2026). There is no automatic overage billing unless the capacity admin has opted into the capacity overage preview feature; if that feature is enabled, consumption beyond the limit is billed at 3x the PAYG rate ($0.54/CU-hour) rather than 1x — so even 5% overage on an F64 would add ~$1,261/month in overage charges (estimate, as of June 2026, scoped to capacity-overage-enabled tenants). Without that feature, the cost consequence is operational: blocked pipelines, not a larger invoice.

What makes this a runaway scenario: if the notebook contains a retry loop on API timeout (common in AI Functions code), and the event stream has a bad-data spike that generates repeated timeouts, the loop can fire orders of magnitude more requests than expected. Because Copilot is a background job smoothed over 24 hours, the Capacity Metrics app's interactive view does not alarm immediately — the debt accumulates quietly until it crosses the background rejection threshold (>24 hours of smoothed background CUs outstanding), at which point all operations on the entire capacity are rejected (background and interactive), blocking pipelines and model refreshes that have nothing to do with the AI workload.

The containment checklist

Work through this before enabling any automated AI workflow on a production capacity:

  1. Estimate worst-case token consumption first. Count your event volume × per-request tokens × hours/day. Use the formula: (input_k × 100 + output_k × 400) / 1,000 = CU-seconds per request. Convert to CU-hours and compare against your capacity's daily CU budget.
  2. Isolate AI workloads onto dedicated capacities — but use the right mechanism. For Copilot prompts and Data agents: designate a Fabric Copilot capacity to route that consumption to an isolated billing pool. For AI Functions (notebook LLM loops like this scenario): Fabric Copilot capacity does not support them — instead, assign the notebook workspaces to a separate regular Fabric capacity so runaway AI Functions cannot throttle production. Size each dedicated capacity to the worst-case estimate, not the average.
  3. Add a row-count or request-count ceiling in your code. No native kill switch exists in Fabric, so the ceiling lives in your notebook or pipeline logic: if request_count > MAX_REQUESTS: raise StopIteration or equivalent. Build this into every AI Functions loop.
  4. Disable retry loops for AI calls. Retries on LLM timeout compound token consumption nonlinearly. Use dead-letter patterns or manual reprocessing instead.
  5. Gate Data Activator rules to low-frequency events only. Data Activator suppresses repeat alerts within a state, but a poorly designed rule that fires on every row of a high-cardinality stream still triggers downstream AI Functions at stream frequency. Test alert firing rate under peak load before production.
  6. Monitor the Copilot-and-AI meter daily in the first week. The Capacity Metrics app shows Copilot, AI Functions, and AI Services as separate line items. Set a manual review cadence and look at the 24-hour smoothed background percentage — not just the current utilization spike. See Microsoft Fabric capacity monitoring for the monitoring mechanics.
  7. Never pause a capacity to clear AI-driven throttling. Pausing settles all smoothed background debt to your Azure bill immediately at full PAYG rates. If AI workloads have pushed 24-hour smoothed background past 100%, pausing writes that debt to the bill rather than clearing it. Size up or fix the loop.
  8. Audit which users and workspaces have Copilot enabled. The Copilot tenant setting allows admins to restrict Copilot to specific security groups. Narrowing the enabled set to teams with actual AI use cases reduces exposure. Workspace and capacity admins can also override the tenant setting for their scope.

What the Metrics app shows (and what it hides)

The Capacity Metrics app surfaces Copilot and AI consumption in two places. The Compute page shows the Copilot-and-AI meter's contribution to total CU usage, with a 14-day compute history window (30 days for storage). The Item History page breaks out CU-seconds by Copilot processing separately from AI Functions and AI Services, helping you isolate which sub-type is the driver.

What the Metrics app does not show by default: user names are masked unless the tenant admin enables the "Show user data in the Fabric Capacity Metrics app and reports" setting (Audit and usage admin settings, Microsoft Learn, checked June 2026). When masking is on, operations appear with "Masked user" as the actor; enabling the setting surfaces user names and email addresses tied to item-level operations. Note that attribution remains item-level (tied to which item and operation type consumed the CUs) — you can see that a specific user triggered Copilot activity on a specific item, but OperationID is not linked to individual agent runs, so tracing a CU spike to a specific prompt in a looping workflow still requires additional telemetry. This is the attribution void applied to AI: you can see that the Copilot-and-AI meter is consuming capacity and which user was active on which item, but not a granular per-prompt cost breakdown without external tooling.

The 14-day compute detail window is also a constraint here. A slow-burning AI agent that takes three weeks to accumulate meaningful 24-hour background debt is invisible in the native tool's history by the time the throttling event arrives. External storage of the Metrics app telemetry — or a continuous-monitoring product — is the only way to catch slow-burn patterns before they become billing events. The hidden costs of Microsoft Fabric covers the attribution and retention gaps in more detail.

Copilot and AI Functions — how smoothing and throttling differ

Copilot in Fabric and AI Services are classified as background jobs, so their CU usage is smoothed over a 24-hour rolling window. AI Functions, however, can be classified as either interactive or background depending on how they are invoked (Fabric operations, Microsoft Learn, checked June 2026). Interactive AI Functions calls are smoothed over minutes (not 24 hours) and can trigger interactive delay or rejection stages independently of the background smoothing window.

The throttling consequences for background-mode AI (Copilot, AI Services, and background AI Functions):

Smoothed background windowWhat happens
≤24 h of future capacityAbsorbed; background and interactive jobs run normally
>24 h of future capacityBackground rejection: all requests (background and interactive) are rejected until the debt clears

For interactive-mode AI Functions, the relevant thresholds are the interactive delay (10–60 min future capacity consumed) and interactive rejection (>60 min) stages — a separate path that can block interactive work even when the 24-hour background window is healthy.

An AI agent that burns 45,000 CU-hours in a month on a 46,720 CU-hour-capacity is operating at 96% of available capacity. A single bad day — say, 3x normal ticket volume — pushes the 24-hour smoothed background over 100%, and every pipeline, model refresh, and scheduled notebook on that capacity stops until the debt works itself down. That is the throttling blast-radius applied to AI: one misconfigured agent can take down an entire tenant's data operations.

The throttling stages (future-capacity time windows, not utilization percentages) are detailed in the Fabric capacity monitoring pillar.

What to do next

The practical sequence for controlling Fabric Copilot and AI costs:

  1. Check the Copilot-and-AI meter today. Open the Capacity Metrics app, navigate to the Compute page, and find the Copilot-and-AI line. If it is nonzero and you are not sure why, start there.
  2. Inventory every automated AI workflow — notebooks with AI Functions loops, Data Activator rules that call agents, scheduled data agents — and estimate their worst-case token consumption against the checklist above.
  3. Separate AI workloads onto dedicated capacity. For Copilot prompts and Data agents, use a Fabric Copilot capacity. For AI Functions notebook loops, move those workspaces to a separate regular Fabric capacity — Fabric Copilot capacity does not support AI Functions. Do this if any automated AI workflow could consume more than ~20% of your production capacity's CU budget at peak.
  4. Add hard ceilings in code for every AI Functions loop. The platform will not stop a runaway loop; your code must.
  5. Enable user data visibility in the admin portal ("Show user data in the Fabric Capacity Metrics app and reports") if you need to see which users are driving the Copilot-and-AI meter. This surfaces user names and email addresses tied to item-level operations — attribution remains at the item level, not per-prompt, but it narrows the investigation from "which capacity?" to "which user on which item?"

The enemy this defeats is Runaway AI: automated CU consumption with no native kill switch — smoothed silently as a background job (Copilot, AI Services) or capable of triggering interactive rejection stages (AI Functions) — until it blocks everything else on the capacity. SpendWeave's audit reads the Copilot-and-AI meter in your real capacity and tells you what the actual exposure looks like — before it becomes a billing event.

Frequently asked questions

How much does Copilot cost in Microsoft Fabric? Copilot draws from your existing Fabric CUs — no separate license. The rate is 100 CU-seconds per 1,000 input tokens and 400 CU-seconds per 1,000 output tokens. A typical 2,000-input/500-output-token request costs 400 CU-seconds (~0.11 CU-hours), about $0.02 at $0.18/CU-hour. Manual use stays cheap; automated agent loops can compound this to thousands of CU-hours per month (estimate, as of June 2026).

Does Copilot require an F64 capacity in Fabric? Most Copilot experiences run on F2 or higher. Data agents require F64+ when running natively on the workspace capacity — but users assigned to a Fabric Copilot capacity can run Data agents on sub-F64 workloads, with consumption billed to the Copilot capacity pool instead (Fabric Copilot capacity, Microsoft Learn, checked June 2026).

What is the Copilot-and-AI meter in Fabric? A single CU billing meter covering Copilot in Fabric (prompts), AI Functions (notebook LLM calls), and AI Services (prebuilt text analytics). All three draw from the same shared pool as your other Fabric workloads. The Capacity Metrics app labels them as separate line items so you can isolate the driver.

Can I disable or limit Copilot in Fabric to control costs? Admins can disable Copilot tenant-wide or by security group via the admin portal. There is no real-time per-run spending cap or automatic kill switch for a running agent loop. Cost control is preventive: selective enablement, isolated capacities, and hard ceilings in your code.

Does Copilot consumption count as a background or interactive job in Fabric? Copilot in Fabric and AI Services are classified as background jobs, smoothed over 24 hours. Sustained high consumption quietly accumulates background debt; crossing the 24-hour rejection threshold blocks all jobs on the capacity — including pipelines and model refreshes unrelated to the AI workload. AI Functions can be either interactive or background depending on how they are invoked; interactive AI Functions calls use a shorter smoothing window and can trigger interactive delay or rejection independently.

Researched with AI assistance, written and fact-checked by Jonathan Flach, verified against Microsoft Learn.