Niladri Ray
Flexera
Use this paper to better understand how to choose an AI approach with FinOps principles, capabilities, and outcomes in mind. These guidelines and information will improve how FinOps Practitioners and Engineers align deployment choices with FinOps maturity level and technical readiness.
Given the prominence of Generative AI in recent times, the diagram below repesents its use within the broader gamut of AI ML capabilities, a key pre-context before we dive into the “Finops for AI” focus for this paper.
Reference: https://www.ibm.com/think/topics/artificial-intelligence
While traditional ML still dominates in structured data tasks and real-world industry applications. LLMs are rapidly increasing, particularly in NLP, generative AI, and multimodal applications by enabling automation, personalization, and advanced data processing.
With the unlimited potential of use, Compute costs and interpretability remain major barriers for LLM adoption in every use case.
Choosing the right AI model is a very important part of the decision making in solving the business problem on hand – the following table shows a few examples of scenarios where the model could be traditional or LLM based on the requirements.
Traditional ML models | LLMs |
|
|
AI typically entails a multi -variate set of stages, with each stage carrying a unit economic cost impact on the TCO and thereby the opportunity of applying FinOps principles to Inform, Optimize & Operate at scale.
The next section delves into the how & why AI driven recommendations can significantly impact Infrastructure decisions and vice versa and some use cases that interweave the types of infra with associated personas and the indicative KPIs that can be used to measure & manage for driving improvements.
Fully Managed AI Infrastructure (Example: AWS Bedrock, Google Vertex AI, Azure OpenAI Service)
Pros & Cons
Partially Managed AI Infrastructure (Example: AWS Sagemaker, Google Kubernetes Engine (GKE) with AI, Azure Machine Learning)
Self-Managed AI Infrastructure (Example: Dedicated Instances, On-Prem NVIDIA DGX, Bare Metal AI Clusters)
The choice between these models significantly impacts cost efficiency, infrastructure maintenance, and performance scaling.
Stage | Crawl (Beginner/Early Adoption) | Walk (Intermediate/Scaling AI Workloads) | Run (Advanced/Enterprise AI Maturity) |
AI Infrastructure Model | Fully Managed (e.g., AWS Bedrock, Google Vertex AI, Azure OpenAI) | Partially Managed (e.g., AWS SageMaker, Google Kubernetes Engine with AI, Azure ML) | Self-Managed (e.g., On-Prem NVIDIA DGX, Dedicated AI Clusters) |
Technical Readiness | Low – Focus on AI adoption with minimal infra complexity | Medium – Some DevOps & ML engineering expertise required | High – Requires in-depth infrastructure & AI workload management |
FinOps Maturity | Basic cost visibility, pay-as-you-go, minimal optimization | Cost monitoring, workload optimization, right-sizing resources | Advanced FinOps – CapEx vs OpEx trade-offs, custom cost models |
Use Cases | Experimentation, AI research, Proof of Concept (PoC) | Scaling AI workloads, optimizing AI cost-performance trade-offs | Enterprise AI at scale, mission-critical AI applications |
Cost Considerations | High per-unit costs, but low operational overhead | Balanced cost-efficiency, requires hands-on cost control | High upfront investment, lower long-term costs with optimization |
Performance Optimization | Auto-scaling, but limited customization | Customizable compute resources (GPUs, TPUs, networking) | Full control over hardware and performance tuning |
Security & Compliance | Managed security by cloud providers | Shared responsibility, governance policies required | Full control over security, compliance, and data privacy |
Persona | AI researchers, innovation teams, early adopters | ML engineers, FinOps teams, scaling organizations | AI-heavy enterprises, regulated industries, large-scale AI deployments |
Persona | Use-case | Fully Managed AI Infra | Partially Managed AI Infra | Self-Managed AI Infra |
FinOps Practitioner | Implement cost controls and AI budget tracking to optimize cloud AI expenses | Optimizes AI spend through resource utilization and financial planning. | Tracks and optimizes AI infrastructure costs while ensuring governance over spend. | Tracks AI infrastructure costs, optimizes CapEx vs. OpEx, and ensures financial governance. |
Engineer –
AI Researchers & Data Scientists |
Runs AI training jobs with automated scaling in cloud AI services. | Focused on model development without worrying about infrastructure. | Configures AI environments, selects compute resources, and fine-tunes performance. | Designs, deploys, and optimizes AI models on dedicated hardware for maximum performance. |
Engineer –
ML/Development Engineer |
Deploys AI-powered chatbots or recommendation engines for real-time customer interactions. | Deploys AI models with minimal operational overhead and high scalability. | Manages AI model deployments with some infrastructure tuning. | Handles end-to-end AI deployment with full control over infrastructure. |
Engineer –
DevOps & Cloud Engineers |
Automates AI model deployment pipelines with cost-effective resource scaling. | Minimal DevOps involvement, as scaling and maintenance are managed by the provider. | Manages infrastructure setup, networking, and scaling for AI workloads. | Handles provisioning, networking, scaling, and maintenance of AI infrastructure. |
Finance | Tracks AI expenditures, ensuring budget adherence and financial reporting. | Oversees AI budget planning, monitors cloud AI spend, and ensures financial transparency. | Manages financial planning for AI infrastructure, forecasts AI-related costs. | Plans and justifies large capital investments (CapEx) while balancing OpEx. |
Procurement | Negotiate Infrastructure decisions and optimize procurement decisions | Ensures cost-effective managed AI services procurement. | Monitors AI compute costs, optimizes budget allocations, and negotiates cloud pricing. | Manages vendor selection, negotiates AI infra costs, and ensures optimal purchasing strategies. |
Product Owners | Aligns AI investments with business goals for cost-effective innovation. | Aligns AI investments with business goals for cost-effective innovation. | Balances AI cost efficiency with performance goals. | Ensures AI infra investments align with long-term business strategies. |
Enterprise Innovation Teams | Tests and validates AI use cases before scaling. | Tests and validates AI use cases before scaling. | Experiments with AI while balancing cost and control. | Drives AI innovation with full control over models and data. |
Sales field | Market segmentation, customer insights, and automation. | AI services automate buyer insights, content generation, competitor analysis – allowing sales teams to focus on execution rather than managing AI infrastructure. | Sales teams can fine-tune AI models for segmentation, integrate AI insights with their workflows while maintaining some control over data handling. | Enterprises needing deep control over AI-driven sales intelligence. |
Metric | Self-Managed AI | Fully Managed AI | Hybrid AI (Partially Managed) |
Training Cost | Direct GPU control, requires capacity planning | High per-hour cloud costs, but fully managed | Local infra for standard training, cloud for large-scale runs |
Fine-Tuning Cost per Million Parameters | Lower cost, but needs infrastructure | High API-based cost per parameter | Run lightweight fine-tuning locally, API fine-tuning for major updates |
Retraining ROI (Accuracy Gain per $ Spent) | High control over retraining efficiency but limited scalability | Optimized retraining cycles, but costly due to auto-scaling | Strategic retraining approach, balancing cloud efficiency and local control |
Inference Cost | Lower long-term cost, needs infra | High per-query pricing, zero infra setup | On-prem for frequent inference, cloud APIs for burst traffic |
Latency-to-Cost Efficiency | Low latency, requires dedicated resources | Cloud inference introduces network dependency | Edge computing solutions help maintain low-latency while reducing cloud dependency |
Storage Cost per Model | On-prem cheaper long-term | Cloud storage scales with high cost | Active models in cloud, old versions archived on-prem |
Model Downtime Cost | Less frequent downtime but requires in-house maintenance | Downtime risk depends on cloud SLAs, often mitigated by redundancy | Lower downtime risk by distributing workloads between cloud and on-premise |
Regulatory Compliance Cost – CAPEX/OPEX | High internal effort, low external cost | Cloud compliance tools reduce manual effort | Balance between in-house teams and cloud governance |
AI Bias & Fairness Audit Cost per Model | Internal compliance teams manage fairness checks | Cloud fairness audit tools (AWS, Google AI Governance) are available at a cost | Split compliance checks‚ sensitive audits on-prem, non-sensitive in cloud |
Energy Consumption per Training Cycle | High energy usage, can be optimized with renewables | Cloud providers optimize for efficiency but at a higher cost | Balanced approach‚ local compute for energy savings, cloud for scaling |
Model Versioning & Maintenance Cost | Requires manual version control, leading to increased infra costs | Automated versioning, but increasing storage costs over time | Versioning maintained locally for core models, cloud for scalability |
In order to effectively practice FinOps for AI, it is imperative for organizations to accurately identify AI-related workloads, among all the workloads running in their cloud or data center. AI workloads themselves can be running in any of SaaS/PaaS/IaaS/OnPrem setups. Typically, workloads can be considered as AI-related in one of the following three ways.
It is important for FinOps practitioners to have a reasonably robust method of identifying AI-related workloads, preferably using a combination of all these approaches. Below is the mind-map of how these approaches could be leveraged, with some examples that typify identification of AI workloads in different contexts
AI adoption is growing fast, but choosing the right infrastructure is key to keeping costs under control while ensuring performance and scalability. Whether you go for a fully managed, partially managed, or self-managed setup, the decision should align with your team’s technical readiness and FinOps maturity.
Here are the biggest takeaways:
Metric | Self-Managed AI | Fully Managed AI | Hybrid AI (Partially Managed) |
Regulatory Compliance Cost – OPEX | High internal effort, low external cost | Cloud compliance tools reduce manual effort | Balance between in-house teams and cloud governance |
Training Cost per Million Parameters | Lower cost but manual infra required | Expensive per GB, fully managed services | Process-sensitive data locally, bulk processing in cloud |
Inference Cost per 1K Predictions | Lower long-term cost, needs infra | High per-query pricing, zero infra setup | On-prem for frequent inference, cloud APIs for burst traffic |
API Cost vs. Self-Hosted Model | Self-hosting eliminates API costs but needs infra | Cloud APIs costly but no infra required | Hybrid: Frequently used models on-prem, occasional API calls |
Compute Cost per Training Session | Direct GPU control requires capacity planning | High per-hour cloud costs, but scalable and fully managed | Uses local infra for standard training, cloud for large-scale runs |
We’d like to thank the following people for their work on this Paper:
We’d also like to thank our FinOps Foundation staff for their support: Rob Martin, Samantha White, and Andrew Nhem.