Celebal Technologies

Optimising the Cloud Bill:
Governing Data and AI Spend
through FinOps

7 min readJune 10, 2026
Optimising the Cloud Bill

Delivering FinOps though Eagle Eye IQ and Databricks Unity AI Gateway

Every enterprise utilising cloud infrastructure encounters challenges with cost predictability. While elastic compute accelerates business operations, it often increases expenditure, and attributing these costs can be difficult. Currently, organisations face two simultaneous challenges. First, standard cloud infrastructure is easy to provision but difficult to track, leading to ongoing cost sprawl. Second, the integration of large language models (LLMs) is impacting budgets at a significantly faster rate than unmanaged cloud compute did previously.

Consequently, FinOps must evolve beyond traditional cloud infrastructure management. On platforms like Databricks, where data engineering, analytics, and AI utilise the same compute and governance resources, FinOps must encompass data pipelines and models. This requires comprehensive observability. Celebal's Eagle Eye IQTM accelerator, combined with Databricks Unity Catalog and Mosaic AI Gateway, provides the necessary foundation for this unified approach.

The Compounding Challenge of Cloud Costs

The structural drivers of excessive cloud expenditure on data platforms are well documented. Celebal consistently identifies three primary issues during Databricks engagements:

Duplication

Independent business units often recreate existing assets, resulting in duplicate pipelines, redundant feature stores, and parallel model training. Compute and storage costs scale alongside this duplication without delivering additional business value.

Visibility

Workspace-level cost attribution is often fragmented, and pipeline-level observability is frequently absent. This lack of oversight allows idle clusters, inefficient jobs, and over-provisioned compute to remain undetected.

Governance

While a Centre of Excellence can define best practices, it often lacks the authority to enforce them. Business units make independent platform decisions, causing operational expenditure to grow linearly with usage while value capture lags.

This is a solvable engineering challenge when properly instrumented. Recent optimisation engagements by Celebal Technologies have delivered a 15 to 20 percent platform-wide cost reduction through right-sizing, alongside a 40 to 50 percent compute reduction on critical jobs. For instance, downsizing a primary pipeline from E8 to E4 instances reduced compute by half, and enabling Photon on Databricks where it provided over 50 percent improvement saved approximately $2,200 per month on a single workflow. These savings were achieved through precise measurement, including monitoring dashboards built on Databricks System Tables and verifying query tuning against actual scan volumes.

Cloud Cost Attribution Framework

Figure 1. Cost attribution to continuous optimisation. Surfacing hidden spend across four levels of the data and AI stack and then closing the loop with engineering actions.

Assessing the Financial Impact of LLMs

Historically, AI expenditure was concentrated on model training and GPU clusters managed by specific teams. Generative AI has altered this dynamic. With agentic coding tools and LLM-powered applications, costs are now distributed across employees and individual tokens, scaling directly with adoption.

A prominent example occurred in mid-2026. Uber deployed agentic coding tools, including Anthropic's Claude Code, to approximately 5,000 engineers. Adoption was rapid, with monthly active usage reaching 84 to 95 percent by April 2026. Consequently, per-engineer API costs reached $500 to $2,000 per month, consuming the company's entire 2026 AI tooling budget within four months. In response, Uber implemented a $1,500 monthly per-tool spending cap.

Microsoft encountered a similar situation. Claude Code was introduced to its Experiences and Devices group in December 2025. Six months later, the company cancelled most direct Claude Code licenses and migrated engineers to GitHub Copilot CLI. This decision was driven by excessive usage that exceeded allocated budgets.

These instances highlight that deploying AI tools without per-user and per-workload visibility creates significant financial liabilities. A governed AI platform is essential to address this gap.

The Necessity of Comprehensive Observability

The challenges faced by Uber and Microsoft share a root cause: expenditure was only visible in aggregate after budgets were exhausted. Aggregate billing indicates overspending but fails to identify the source or the responsible party.

FinOps on a unified data and AI platform requires observability across five distinct levels:

Operational LevelRequired VisibilityCommon Sources of Waste
Data and StorageDelta table growth, small-file proliferation, vacuum lag, and retention sprawl.Silent storage cost growth and high scan costs from uncompacted files.
Pipeline and JobDBU consumption by workflow, runtime drift, and duplicate workloads.SLA-breaching jobs, redundant pipelines, and idle clusters.
CodeCompute time per function, memory allocation profiling, API/database query volumes.Inefficient algorithms (e.g., O(n²) time complexity), memory leaks requiring oversized instances, N+1 query problems.
Model and EndpointTokens, requests, and expenditure per user, team, application, and provider.Unattributed inference costs and high per-engineer LLM expenditure.
OrganisationExpenditure attributed to business units, projects, and environments.Unowned spend and discrepancies between budgets and actuals.

Industry guidelines from the FinOps Foundation and Databricks emphasise that cost attribution and reporting must precede optimisation and control. Furthermore, technical telemetry requires a culture of accountability and clear ownership to be effective.

Eagle Eye IQ: An Observability Foundation from Celebal

Celebal developed Eagle Eye IQ to address these observability gaps. Operating as a central control plane built on Databricks Apps and governed by Unity Catalog, it monitors both data sources and subsequent pipelines.

Eagle Eye IQ integrates five core modules with an Agentic Layer on Top:

DQ Guardian

A data quality and rules engine featuring anomaly detection and profiling.

Lineage Lens

Provides column-level lineage and impact analysis.

ObserveIQ

Delivers real-time platform and cost monitoring with alerting capabilities.

Contract Vault

Manages data contracts, schema governance, and controlled sharing.

Code Inspector

Optimises code and defines code rules.

Aquila AI

Facilitates autonomous root-cause analysis and remediation.

For FinOps purposes, the ObserveIQ layer tracks DBU consumption by workspace, job, and user. It provides pipeline cost attribution, identifies idle clusters, detects job-runtime drift, flags duplicate workloads, and monitors storage growth. Implementations typically yield a 15 to 25 percent reduction in idle compute, full pipeline-to-cost traceability, and significantly faster root-cause analysis for cost spikes.

Because Eagle Eye IQ attributes cost down to the pipeline level and is governed through Unity Catalog, it enables organisations to understand the exact cost of their data operations.

Beyond cost attribution, the platform leverages automated metadata extraction to construct a complete and structured data taxonomy. This foundational taxonomy illuminates clear consumption patterns across the ecosystem.

While infrastructure management is foundational to FinOps, unoptimized platform code remains a significant driver of excess compute and inflated cloud costs. To address this execution-layer inefficiency, Code Inspector enables the definition and enforcement of strict, automated coding rules.

Aquila AIAquila AI Chat

Figure 2. Aquila AI provides a natural language frontend that amongst others facilitates autonomous root cause analysis and remediation.

Unity Catalog and the Unity AI Gateway

Observability highlights systemic issues, but governance provides the mechanism to manage them proactively. On Databricks, Unity Catalog serves as the unified governance plane for data and AI assets, while the Unity AI Gateway manages models, agents and MCPs.

Unity Catalog provides centralised access control, column-level lineage, and the tagging infrastructure required for accurate cost attribution. Data bricks guidelines state that robust tagging, enforced through Compute Policies, is the foundation of cost reporting.

The Mosaic AI Gateway extends this control to LLMs, providing a governed entry point for all models. It offers:

Usage Tracking

Visibility into token consumption and expenditure per request, user, and application.

Rate Limiting

Enforces user and endpoint caps to prevent budget overruns.

Cost Attribution

Links model expenditure directly to specific teams and projects.

Traffic Routing

Directs workloads to the most cost-appropriate model.

Guardrails

Applies PII detection and safety filters centrally.

Together, Unity Catalog and the AI Gateway enforce governance at the platform boundary, mitigating cost sprawl for both data pipelines and AI models.

Unity Catalog and AI Gateway Architecture

Figure 3. Eagle Eye IQ overlaying the Databricks FinOps control plane. Four cooperating layers. Governance sets the rules, observability watches the workload, usage delivers the value.

Organisational Culture and Accountability

Technological solutions require appropriate cultural alignment. Organisations that successfully implement FinOps share three characteristics:

  • Clear ownership for platform administration and monitoring.
  • Proactive consideration of costs throughout the project lifecycle.
  • A commitment to continuous improvement and optimisation.

These organisations typically adopt an operating model tailored to their structure, such as a Centralised Centre of Excellence or a Distributed Budget Centres model. Celebal implements this through a four-pillar framework: Observability, Right-Sizing, Workload Engineering, and Governance.

Conclusion

Data and AI expenditures now share similar characteristics: they scale rapidly with adoption, remain obscured in aggregate billing, and create financial liabilities when observability lags behind usage.

FinOps on Databricks addresses these challenges by providing comprehensive visibility through Eagle Eye IQ, centralising governance with Unity Catalog and the Mosaic AI Gateway, and fostering an organisational culture that ties technical expenditure directly to business value.

References

  1. Eagle Eye IQ: https://celebaltech.com/products/eagle-eye
  2. Unity AI Gateway: https://www.databricks.com/product/artificial-intelligence/ai-gateway