Exercise 04 - Implement FinOps Framework for AI Cost Management
Introduction
The FinOps Framework provides a structured approach to cloud financial management, helping organizations optimize AI costs while maintaining performance. In this exercise, you’ll implement cost tracking, quota management, and automated alerts using Azure API Management token limits, Azure Monitor custom tables, and Logic Apps for automated subscription management.
By integrating retail pricing data with token consumption metrics, you’ll gain visibility into actual AI costs per subscription and automatically enforce cost quotas to prevent budget overruns.
Description
In this exercise, you will implement the FinOps framework for AI cost management by:
- Loading retail pricing data into Azure Monitor custom tables
- Configuring cost quotas for APIM product subscriptions
- Setting up automated monitoring and alerting
- Testing the complete cost management workflow
- Analyzing costs using Azure Workbooks
Success Criteria
- You have successfully loaded retail pricing data into Azure Monitor custom tables
- You have configured cost quotas for each APIM product subscription
- You have tested API calls and verified token consumption tracking
- You have viewed cost analysis in Azure Workbooks showing actual spending vs. quotas
- You understand how the automated alert system works to disable subscriptions exceeding quotas
Learning Resources
- FinOps Framework
- Azure API Management Token Limit Policy
- Azure Monitor Custom Logs
- Azure Monitor Data Collection Rules
- Azure Retail Prices API
- Azure OpenAI Pricing
- Azure Monitor Alerts with Logic Apps
Key Tasks
01: Open the FinOps Framework Jupyter Notebook
The FinOps implementation is automated through a Jupyter notebook that orchestrates the entire cost management workflow.
Expand this section to view the solution
-
In VS Code, navigate to the
src/02-api-gateway-policyfolder in your workspace. -
Open the file
finops-framework.ipynb. -
Ensure you have the Jupyter extension installed in VS Code (it should have been installed during the initial setup).
-
Select the Python kernel, we will create new python kernel for Jupyther notebook

Select Create Python Environment to create new environment 
Select Quick Create to create new environment 
Wait for environment to be created.
- Review the notebook structure - it contains the following key sections:
- Initialize notebook variables: Configure deployment settings
- Verify Azure CLI: Ensure proper authentication
- Verify azd deployment: Confirm resources are deployed
- Get environment values: Retrieve deployment outputs
- Display retail pricing: View current AI model pricing
- Load pricing data: Import pricing into Azure Monitor
- Load subscription quotas: Configure cost limits
- Execute test runs: Generate sample usage data
- View dashboards: Analyze costs in workbooks
This notebook automates the FinOps framework implementation. You can run all cells sequentially or execute them step by step for better understanding.
02: Verify Azure Developer CLI deployment
Before running the FinOps notebook, ensure that your Azure resources have been deployed successfully using azd up.
Due to cost & capacity limitation, we might not able to have everyone run
azd up. To workarround see note below.
If you did not run
azd upby yourself, you copy environment file from someone in your group that run it. you need to copy.azurefolder to your root workspace directory. see image below

Expand this section to view the solution
-
Open a terminal in VS Code (Terminal > New Terminal).
- Navigate to the repository root:
cd c:\DEV\TechWorkshop\TechWorkshop-L300-MCP-AI-Gateway - Check the azd environment status:
azd env get-values - You should see output containing key deployment values such as:
apim_gateway_url: Your API Management gateway URLapim_service_name: Your APIM service namepricingDCREndpoint: Data Collection Rule endpoint for pricingsubscriptionQuotaDCREndpoint: Data Collection Rule endpoint for quotas
- If the command fails or returns empty values, run the deployment:
azd up - Once deployment completes successfully, proceed to the next step.
The
azd upcommand deploys all required Azure resources including APIM, Azure AI Foundry, Data Collection Rules, Log Analytics Workspace, and Azure Monitor workbooks.
03: Initialize and configure the FinOps notebook
In this step, you’ll configure the notebook variables to match your deployment settings and verify connectivity to Azure.
Here is APIM configuration. 
We will setup subscription monthly qouta ( how much each subscription can use it buddget per month)
Expand this section to view the solution
-
In the
finops-framework.ipynbnotebook, locate Cell 3 (Initialize notebook variables). - Review the configuration variables:
aiservices_config: AI Foundry location configurationmodels_config: Azure OpenAI model definitions with pricing SKUsapim_products_config: Product tiers with token limits and cost quotasapim_subscriptions_config: Subscription assignments to products
- Optional: Modify the configuration to match your requirements:
apim_products_config = [ {"name": "platinum", "displayName": "Platinum Product", "tpm": 2000, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 15}, {"name": "gold", "displayName": "Gold Product", "tpm": 1000, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 10}, {"name": "silver", "displayName": "Silver Product", "tpm": 500, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 5} ]
The configuration defines three product tiers (Platinum, Gold, Silver) with different cost quotas ($15, $10, $5). The FinOps framework will automatically enforce these limits.
04: Load retail pricing data into Azure Monitor
This step fetches current AI model pricing from the Azure Retail Prices API and loads it into a custom Azure Monitor table for cost calculations. We will insert Princing data into Log Analytic custom table (PricingCL), so that we can convert token usage to pricing in USD.

Expand this section to view the solution
- In the notebook, run Cell 9 (Display retail pricing info) to view current pricing:
prices = requests.get(f"https://prices.azure.com/api/retail/prices?...") -
You should see a table displaying pricing for each model in your configured region:
Region SKU Retail Price (per 1K tokens) swedencentral GPT 5.2 chat inp Gl $0.0025 swedencentral GPT 5.2 chat opt Gl $0.0100 - Run Cell 11 (Load pricing data into Azure Monitor) to import prices:
client = LogsIngestionClient(endpoint=pricing_dcr_endpoint, ...) - The script will:
- Fetch retail prices for all configured models
- Calculate per-1K-token costs for input and output tokens
- Upload pricing data to Azure Monitor custom table using Data Collection Rules
- Verify successful upload - you should see messages like:
ℹ️ Adding model gpt-5.2-chat (source: 1M) with input / output tokens price 2.5 / 10.0 ✓ Upload succeeded for model gpt-5.2-chat - Understanding the pricing data:
- Input tokens are typically cheaper than output tokens
- Prices vary significantly between models (e.g., GPT-5-nano vs GPT-5.2-chat)
- Global Standard SKUs may have different pricing than regional deployments
The pricing data is loaded into a custom Azure Monitor table called
PricingData_CL. This data is used by Kusto queries in the Cost Analysis workbook to calculate actual spending based on token consumption.
05: Configure subscription cost quotas
In this step, you’ll load cost quota limits for each APIM subscription into Azure Monitor, enabling automated quota enforcement.
Expand this section to view the solution
- In the notebook, run Cell 13 (Load Subscription Quota) to configure quotas:
client = LogsIngestionClient(endpoint=subscription_quota_dcr_endpoint, ...) - The script will:
- Match each subscription to its product tier
- Extract the cost quota from the product configuration
- Upload quota data to Azure Monitor custom table
- Verify successful upload - you should see messages like:
ℹ️ Adding subscription1 with cost quota 15 ✓ Upload succeeded for subscription1 - How quotas work in the FinOps framework:
- Each product tier has a defined
costQuota(in USD) - Subscriptions assigned to a product inherit that quota
- Azure Monitor alerts compare actual spending against quotas
- Logic Apps automatically disable subscriptions when quotas are exceeded
- Each product tier has a defined
- Example quota configuration:
- Platinum ($15): High-value applications with generous budgets
- Gold ($10): Standard production applications
- Silver ($5): Development/testing environments
The quota data is stored in a custom Azure Monitor table called
SubscriptionQuota_CL. Azure Monitor alert rules query this table along with token consumption data to detect quota violations.
06: Test the FinOps implementation with HTTP requests
Now you’ll generate test traffic using the REST Client extension in VS Code to simulate real-world API usage and validate cost tracking.
Expand this section to view the solution
-
In VS Code, navigate to the
src/02-api-gateway-policyfolder in your workspace. -
Open the file
test.http. -
Ensure you have the REST Client extension installed in VS Code.
- Create or verify your
.envfile in the same directory with the following variables:APIM_GATEWAY_URL=https://YOUR-APIM-NAME.azure-api.net INFERENCE_API_PATH=inference API_VERSION=2025-03-01-preview SUBSCRIPTION1_KEY=YOUR-SUBSCRIPTION1-KEY-HERE SUBSCRIPTION2_KEY=YOUR-SUBSCRIPTION2-KEY-HERE SUBSCRIPTION3_KEY=YOUR-SUBSCRIPTION3-KEY-HERE SUBSCRIPTION4_KEY=YOUR-SUBSCRIPTION4-KEY-HEREyou will lookup subscription key in your Azure API Management as below
-
Get your subscription keys:
- Navigate to your APIM instance in Azure Portal → Subscriptions → Copy the primary keys

- Navigate to your APIM instance in Azure Portal → Subscriptions → Copy the primary keys
- Understanding the test file structure:
- The file contains multiple HTTP requests, one for each subscription
- Each request tests a different product tier (Platinum, Gold, Silver)
- Variables are loaded from the
.envfile using ``
- Run a single test request:
- Place your cursor on any request (e.g., “Test with Subscription 1”)
- Click the Send Request link above the request, or press
Ctrl+Alt+R(Windows/Linux) orCmd+Alt+R(Mac)
- You should see a response window appear with output like:
{ "id": "chatcmpl-...", "object": "chat.completion", "created": 1738880234, "model": "gpt-5.2-chat", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I cannot tell you the current time..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 42, "total_tokens": 57 } } - Test different subscription tiers:
- Run the Subscription 1 (Platinum) request - should have highest TPM limit (2000)
- Run the Subscription 2 (Gold) request - medium TPM limit (1000)
- Run the Subscription 3 & 4 (Silver) requests - lower TPM limit (500)
- Testing rate limits:
- Rapidly send multiple requests using the same subscription
- You may encounter a
429 Too Many Requestserror when hitting the TPM limit - This demonstrates the token limit policy enforcement
- Understanding the test results:
- HTTP 200: Request processed successfully
- HTTP 429: Rate limit exceeded (TPM quota for that product tier)
- HTTP 401/403: Invalid subscription key or suspended subscription
- Each successful request generates token consumption data tracked in Azure Monitor
- What happens behind the scenes:
- APIM receives the request and validates the subscription key
- Token limit policy checks TPM quota for the product
- Request is forwarded to Azure AI Foundry model endpoint
- Response tokens are counted and logged to Azure Monitor
- Emit token metric policy records consumption for cost calculation
- Optional: Modify the test requests:
- Change the
modelparameter to test different AI models - Adjust
max_completion_tokensto control response length - Modify the
contentmessage to test different prompts
- Change the
Using the REST Client extension provides a quick way to test individual subscription keys and product tiers without writing code. For automated testing scenarios, you can still use the notebook cell or create a Python script.
07: Analyze costs using Azure Workbooks
Finally, you’ll use Azure Workbooks to visualize costs, compare spending against quotas, and identify optimization opportunities.
Expand this section to view the solution
- Navigate to the Azure Portal. Then open dashboard

- The workbook displays:
- Total Costs by Subscription: Bar chart showing spending per subscription
- Cost vs. Quota: Comparison of actual costs against configured limits
- Model Usage: Breakdown of costs by AI model
- Token Consumption Trends: Time-series chart of token usage
- Quota Compliance: Percentage of quota consumed per subscription
The Cost Analysis workbook uses Kusto queries to join token consumption logs with pricing data and quota configurations, providing real-time cost visibility without requiring manual calculations.
10: FinOps best practices and optimization tips
Expand this section to view the solution
Based on the FinOps Framework principles, here are recommended practices for ongoing AI cost management:
1. Visibility & Allocation
- Tag all resources: Use Azure tags to track costs by team, project, or cost center
- Implement showback/chargeback: Use Azure Cost Management to allocate AI costs to business units
- Regular cost reviews: Schedule monthly reviews of the Cost Analysis workbook
- Monitor trends: Watch for unexpected spikes in token consumption
2. Optimization
- Right-size model selection: Use smaller models (e.g., GPT-5-nano) for simple tasks
- Optimize prompts: Reduce unnecessary tokens in system messages and user prompts
- Enable caching: Use Azure OpenAI’s prompt caching to reduce redundant token consumption
- Implement semantic caching: Cache similar queries at the application layer
- Batch operations: Process multiple requests together when possible
3. Governance & Control
- Implement tiered access: Assign users to appropriate product tiers based on needs
- Regular quota reviews: Adjust quotas based on actual usage patterns and business value
- Cost anomaly detection: Set up additional alerts for unusual spending patterns
- Approval workflows: Require approval for quota increases above certain thresholds
4. Automation
- Auto-scaling quotas: Implement Logic Apps to automatically adjust quotas based on business rules
- Scheduled reports: Send weekly cost summaries to stakeholders via email
- Predictive alerting: Use Azure Machine Learning to forecast quota exhaustion
- Resource tagging automation: Auto-tag API requests with metadata for granular cost tracking
5. Cultural Change (FinOps Principle)
- Cost awareness training: Educate developers on AI cost implications
- Cost-conscious development: Include cost considerations in sprint planning
- Shared responsibility: Make cost optimization everyone’s responsibility, not just finance
- Celebrate wins: Recognize teams that successfully optimize AI spending
6. Model-Specific Optimizations
- GPT-5-nano: Use for simple classification, extraction, and summarization
- GPT-5-mini: Balance of cost and capability for most tasks
- GPT-5.2: Reserve for complex reasoning, coding, and creative tasks
- Completion vs. Chat models: Choose based on actual use case requirements
7. Load Balancing & Resilience
- Multiple backend pools: Distribute traffic across regions for cost optimization
- Failover strategies: Implement graceful degradation to cheaper models during high load
- Priority queuing: Route high-value requests to premium models, others to cost-effective options
8. Continuous Improvement
- A/B testing: Compare model performance vs. cost regularly
- Feedback loops: Collect user satisfaction data to validate model selection
- Regular audits: Review and remove unused subscriptions and products
- Stay updated: Monitor Azure pricing changes and new model releases
The FinOps Framework emphasizes that cost optimization is an ongoing practice, not a one-time setup. Schedule quarterly reviews of your implementation and adjust based on business changes and new Azure capabilities.
Summary
Congratulations! You have successfully implemented the FinOps Framework for AI cost management using Azure API Management and Azure Monitor. You now have:
✅ Automated pricing data collection from Azure Retail Prices API
✅ Cost quota enforcement per APIM subscription and product tier
✅ Real-time cost tracking with custom Azure Monitor tables
✅ Automated alerts and remediation using Logic Apps
✅ Comprehensive cost visibility through Azure Workbooks
Key Takeaways
-
FinOps is a cultural shift: Cost optimization requires collaboration between engineering, finance, and business teams.
-
Visibility enables optimization: You can’t optimize what you can’t measure - comprehensive cost tracking is the foundation.
-
Automation scales governance: Manual quota enforcement doesn’t scale; automated alerts and remediation ensure consistent policy enforcement.
-
Right-sizing matters: Using the appropriate AI model for each task can reduce costs by 10-100x without sacrificing quality.
-
Continuous monitoring: AI cost management is an ongoing practice that requires regular review and adjustment.
Next Steps
- Exercise 05: Explore advanced MCP (Model Context Protocol) integration patterns
- Production deployment: Adapt this implementation to your organization’s requirements
- Advanced features: Implement predictive cost forecasting using Azure Machine Learning
- Integration: Connect cost data to your existing FinOps or cloud management platforms
Additional Resources
- FinOps Foundation
- Azure FinOps Guide
- Azure OpenAI Cost Management
- APIM Token Limit Policy
- Azure Monitor Custom Logs
Clean up resources: When you’re finished with this lab, use the clean-up notebook to remove all deployed resources and avoid unnecessary charges:
src/02-api-gateway-policy/clean-up-resources.ipynb