Want to save money on Azure Virtual Machines (VMs) while improving performance? Here’s how small and medium-sized businesses (SMEs) can optimize their Azure VM setup without needing a dedicated cloud team:
- Cut Costs: Use tools like Azure Spot VMs (up to 90% savings) and Reserved Instances (up to 72% savings) for predictable workloads.
- Right-Size VMs: Avoid overpaying for underutilized resources by resizing VMs based on workload needs.
- Automate Resource Management: Implement autoscaling to adjust VM capacity during traffic surges or quiet periods.
- Track and Optimize Usage: Use Azure Cost Management, Azure Advisor, and monitoring tools to identify idle resources and reduce expenses.
- Schedule Downtime: Shut down non-critical VMs during off-hours to save up to 70%.
With these strategies, SMEs can align their cloud spending with actual usage while maintaining performance. Let’s dive deeper into these tips.
Microsoft Azure Spot VM Deep Dive – Save a LOT of money on compute!

sbb-itb-c53a83b
Reviewing Your Current Azure VM Setup
Before tweaking your Azure Virtual Machine (VM) setup, it’s crucial to understand where your money is going. Thankfully, Azure offers built-in tools to help you track costs and pinpoint underused resources.
Using Azure Cost Management and Billing

The Cost Analysis tool in Azure Cost Management gives you a detailed view of your VM spending through time-based charts and tables. To focus specifically on Virtual Machines, filter by "Service name." You can also group costs by region, resource group, or custom tags to allocate expenses more effectively. For example, tagging resources with labels like "Environment: Production" or "Owner: Marketing" can make it easier to spot anomalies.
Want to avoid overspending? Set up budgets with automatic alerts at thresholds like 50%, 75%, or 90% of your budget. The cost data updates roughly every four hours, and you can even schedule automated exports to Azure Storage for deeper analysis in tools like Excel or Power BI.
This spending data serves as the foundation for the performance recommendations you’ll get from Azure Advisor.
Using Azure Advisor

Once you’ve reviewed your spending, it’s time to evaluate VM performance and usage. Azure Advisor builds on your cost data by offering actionable optimization tips. This free tool uses machine learning to analyze your usage patterns and suggests steps like resizing or shutting down VMs. For instance, it flags VMs with consistently low utilization – those with CPU usage at or below 5% and network activity under 7 MB for four days or more. It also identifies VMs with a 95th percentile CPU usage below 3% and network activity under 2% over seven days.
The Cost Optimization Workbook within Advisor provides a one-stop dashboard for spotting idle resources, improperly deallocated VMs (those in a "Stopped" state but still incurring charges), and opportunities for discounts through Reserved Instances or Savings Plans. You can adjust the analysis period between 7 and 90 days, but keep in mind that it may take up to 48 hours for recommendations to refresh after making changes. Plus, the "Quick Fix" button allows you to apply optimizations – like enabling Azure Hybrid Benefit – directly from the dashboard.
| Tool | What It Shows | Best Used For |
|---|---|---|
| Cost Management & Billing | Historical spending, budgets, and cost trends | Tracking expenses and setting spending limits |
| Azure Advisor | Right-sizing and shutdown recommendations | Identifying VMs to resize or turn off |
| Cost Optimization Workbook | Idle resources, deallocated VMs, and discount opportunities | Achieving quick wins with one-click fixes |
Choosing the Right VM Size for Your Needs
Selecting the right virtual machine (VM) size is all about matching your resources – like CPU, memory, storage, and network capacity – to what your workloads actually need. Many businesses over-provision their VMs to handle rare peak loads, which can unnecessarily inflate costs. To avoid this, it’s crucial to review workload-specific VM families and choose options that balance capacity with cost efficiency.
The financial consequences of mis-sizing can be steep. Benjamin Thomas, CTO at Sedai, points out, "A mis-sized VM can silently waste thousands monthly, while an undersized one degrades performance". And here’s a startling fact: fewer than 10% of cloud transformations achieve their full potential, often due to poor sizing decisions and infrastructure complexity.
Understanding VM SKUs can also help you make better decisions. For example, in a SKU like "Standard_D8s_v5", the "D" indicates a general-purpose VM, "8" represents eight virtual CPUs (vCPUs), "s" means Premium SSD support, and "v5" refers to the fifth-generation hardware. Generally, newer generations (like v5 compared to v4) offer better performance and cost efficiency due to updated processors and memory.
To determine the right size for your VM, analyze 30 days of usage data to identify workload patterns. If your peak CPU usage consistently stays below 50%, you might downsize. On the other hand, if CPU or memory usage regularly hits 90%, or input/output operations per second (IOPS) reach 95% of their limits, it’s time to upsize to avoid bottlenecks.
Matching VM Sizes to Your Workloads
Once you’ve reviewed your usage data and cost analysis, the next step is aligning VM sizes to specific workload types. Azure groups its VMs into families designed for different tasks:
- B-series: These are great for workloads with low baseline demands but occasional spikes, like development environments or small web servers.
- D-series: Known for balanced CPU-to-memory ratios, these work well for web and application servers.
- E-series: Designed for memory-intensive tasks, such as large databases or in-memory caching with SQL Server or SAP HANA.
- F-series: Ideal for compute-heavy jobs like batch processing or analytics.
- L-series: Best for storage-heavy workloads, such as NoSQL databases, thanks to NVMe storage.
- N-series: Tailored for GPU-accelerated tasks like machine learning or video rendering.
For example, if your database is constrained by memory, switching from a D-series to an E-series VM can boost performance while potentially lowering costs by reducing the number of vCPUs. Similarly, Azure Advisor can help distinguish between workloads that are user-facing (requiring high responsiveness) and those that are not. For user-facing tasks, Advisor aims for a P95 CPU utilization of 40% or lower, while batch jobs can go up to 80% utilization.
One thing to note: Azure doesn’t automatically collect in-guest memory metrics. To get accurate RAM usage data, you’ll need to install the Azure Monitor agent. Without it, you won’t have the full picture for making informed sizing decisions.
Azure Advisor Recommendations Table
Azure Advisor uses your workload data to recommend specific VM size adjustments. After monitoring usage patterns for a sufficient period, you can use the table below to see some common resizing scenarios and their potential savings based on East US pricing:
| Current VM SKU | vCPU / RAM | Recommended SKU | vCPU / RAM | Est. Monthly Savings |
|---|---|---|---|---|
| Standard_D4s_v5 | 4 / 16 GB | Standard_D2s_v5 | 2 / 8 GB | ~$70.00 |
| Standard_E8ds_v5 | 8 / 64 GB | Standard_E4ds_v5 | 4 / 32 GB | ~$150.00+ |
| Standard_F8s_v2 | 8 / 16 GB | Standard_F4s_v2 | 4 / 8 GB | ~$60.00 |
Keep in mind, resizing a VM requires a restart, so it’s best to plan these changes during maintenance windows or outside of business hours to avoid disruptions. If the desired VM size isn’t available in your current hardware cluster, you may need to deallocate the VM first to allow it to move to a cluster that supports the new size.
Setting Up Autoscaling for Flexible Resource Management
Autoscaling lets your Azure VMs adapt automatically to fluctuating workloads, eliminating the need for manual adjustments. This means you avoid paying for unused capacity during quiet times and don’t scramble to add resources during traffic surges. For small and medium-sized businesses juggling tight budgets and limited IT support, this approach can make a big difference.
Azure Monitor keeps an eye on key performance metrics like CPU usage, memory, and disk I/O. For example, when CPU usage goes beyond 70%, the system triggers a "scale-out" action to add more VMs. Conversely, when demand drops and CPU usage falls below 30%, it "scales in" by removing VMs. As Hari Chandrasekhar from Sedai points out:
"Without a clear autoscaling strategy, teams often overpay for underutilized resources, run into performance issues during traffic spikes, or spend extra time managing complex configurations".
To avoid unnecessary scaling adjustments, maintain a 40-point gap between scale-out (70%) and scale-in (30%) thresholds, and set a cooldown period of 5–10 minutes. You can also schedule scaling for predictable patterns – like adding capacity before 9-to-5 business hours and scaling down overnight or on weekends. These strategies create a solid foundation for integrating Virtual Machine Scale Sets into your setup.
Using Virtual Machine Scale Sets

Virtual Machine Scale Sets (VMSS) are the backbone of an effective autoscaling strategy. They automate the process of adding and removing VMs, managing a group of identical, load-balanced instances that adjust based on your rules. You’re billed only for the compute, storage, and networking resources you actually use.
To set up a scale set, you’ll need to define a few key settings. Start with instance limits: a minimum of 2 VMs for production to ensure availability, a maximum to control costs, and a default count for when metrics aren’t available. Then, create paired rules – one to scale out when CPU usage exceeds 70% for 10 minutes, and one to scale in when it drops below 30% for the same duration. Microsoft Learn highlights the benefits:
"Autoscale helps reduce the number of unnecessary VMs when demand is low… This ability helps reduce costs and efficiently create Azure resources as required".
For cost savings, you can mix in Azure Spot VMs for non-critical workloads. These interruptible instances can save up to 90% compared to standard rates. VMSS supports up to 1,000 VM instances and offers a service level agreement (SLA) of up to 99.99%, making it a reliable option for even large-scale applications. To enhance reliability, enable the Application Health Extension to detect and replace unhealthy instances automatically, and distribute VMs across multiple availability zones to safeguard against data center outages.
Reducing Costs with Spot VMs and Reserved Instances

Azure VM Pricing Models Comparison: Cost Savings Guide for SMEs
When it comes to managing cloud costs, choosing the right pricing model for your workloads is key. Azure offers two powerful tools for this: Spot VMs and Reserved Instances (RIs). Spot VMs are perfect for tasks that can handle interruptions, while RIs are ideal for steady, predictable workloads. Using these models strategically can lead to substantial savings.
Spot VMs for Non-Critical Workloads
Spot VMs allow you to tap into unused Azure capacity at discounts of up to 90% compared to standard pay-as-you-go rates. However, there’s a catch: Azure can reclaim these resources with just 30 seconds’ notice when capacity is needed. To use Spot VMs effectively, you need to design your workloads to handle these interruptions.
These VMs are best suited for tasks that are fault-tolerant, stateless, or time-flexible. Examples include batch processing, CI/CD build agents, development and testing environments, and media rendering. Perry Leong from Microsoft sums it up well:
"If the workload is stateless, scalable, or time, location, and hardware-flexible, then they may be a good fit for Spot VMs."
To maximize cost efficiency:
- Set your maximum price to -1 to ensure eviction happens only when Azure reclaims capacity, not because of price changes.
- Use the Deallocate eviction policy to preserve disk storage for later restarts.
- Deploy Spot VMs in Virtual Machine Scale Sets to automatically replace evicted instances or leverage the Spot Priority Mix feature for a blend of regular and Spot VMs.
For instance, a Standard_D4s_v5 VM in East US, typically priced at $0.192 per hour on a pay-as-you-go basis, can cost as little as $0.020 to $0.040 per hour with Spot pricing. This translates to monthly savings of $111 to $125 compared to the baseline cost of about $140. However, note that Spot discounts are not available for certain VM types like B-series (burstable) or promo sizes such as Dv2, NV, and H-series.
Reserved Instances for Predictable Workloads
For workloads that require consistent availability – like core databases or always-on applications – Reserved Instances (RIs) are a better option. By committing to a 1-year or 3-year term, you can save up to 72%. As Rajkishore, a Microsoft Certified IT Consultant, puts it:
"Strategic RI purchasing provides 25-65% savings for predictable workloads."
RIs offer flexibility in instance size, meaning a reservation for one VM size can be applied to another within the same flexibility group. For example, a Standard D4s v3 VM (4 vCPUs, 16GB RAM) can see its annual cost drop from $1,681.92 (pay-as-you-go) to $1,261.44 with a 1-year RI (a 25% reduction) or to $1,009.15 with a 3-year RI (around 40% savings).
Here’s how to approach RIs:
- Start with a 1-year commitment to evaluate workload stability before committing to a 3-year term for critical applications.
- Use Azure Advisor to identify underutilized VMs (e.g., those with under 5% CPU usage) to right-size your environment before purchasing RIs.
- Combine RIs with the Azure Hybrid Benefit for savings of up to 80%.
Alternatively, Azure Savings Plans provide up to 66% savings for consistent compute spend across services like VMs, Container Instances, and App Services. These plans offer more flexibility than traditional RIs.
Pricing Model Comparison Table
| Pricing Model | Typical Discount | Ideal for SMEs | Commitment | Monthly Savings Example |
|---|---|---|---|---|
| Pay-As-You-Go | 0% (Baseline) | Testing, short-term projects, variable demand | None | $70.08 (Baseline) |
| Spot VMs | Up to 90% | Batch processing, CI/CD, stateless web apps, non-critical dev/test | None | ~$7.01 (~90% savings) |
| Reserved Instances | 25-72% | Steady-state production workloads, mission-critical apps | 1 or 3 Years | ~$28.03 (~60% savings, 3-year) |
| Savings Plans | Up to 66% | Consistent compute spend across services | 1 or 3 Years | ~$23.83 (~66% savings) |
Source: Azure pricing examples
Turning Off Unused Resources and Scheduling VMs
One effective way to reduce Azure costs is by shutting down virtual machines (VMs) that aren’t being used. For small and medium-sized enterprises (SMEs), turning off development and test environments outside of regular business hours can slash compute costs by up to 70%. This approach works well alongside strategies like right-sizing and autoscaling.
Identifying Idle VMs
Start by pinpointing VMs that aren’t actively in use. Azure Advisor can help by flagging VMs that have been idle for seven days, with metrics like a P95 CPU utilization below 3% and outbound network usage under 2%. You can also set up custom alerts in Azure Monitor to notify you when CPU usage drops below 2% over a 15-minute window. To keep track of ownership and responsibility, use resource tags like "Owner" or "Department" before deciding to deallocate any VM.
Scheduling Auto-shutdown
Once you’ve identified idle VMs, take advantage of Azure’s Auto-shutdown feature to schedule regular downtime. This is particularly useful for environments like development or testing that only need to run during specific hours. For example, you could set an 8-hour schedule from 9:00 AM to 5:00 PM on weekdays. This feature is located in the Operations section of the Azure Portal. Make sure to set the correct timezone and enable the 30-minute email alert before shutdown. Importantly, ensure the VM reaches the "Stopped (Deallocated)" state, as shutting down the guest OS alone won’t stop compute charges.
Advanced Scheduling Options
If you need more flexibility – such as starting VMs automatically or managing a large number of VMs across multiple resource groups – Azure Automation Runbooks or the Start/Stop VMs v2 solution can help. These tools allow for dynamic power state management using tags like "AutoShutdown: true". They can even include a holiday calendar to prevent VMs from starting on public holidays. For example, shutting down 10 Standard_D4s_v5 VMs outside business hours could reduce monthly compute costs from $14,000 to $3,400, saving over $10,000.
Additional Considerations
While deallocating a VM stops compute billing, keep in mind that charges for OS and data disks (Managed Disks) and static Public IP addresses will continue. Also, if the VM uses a dynamic IP, it will likely receive a new IP address when restarted. If a consistent IP address is essential, consider switching to a static IP to avoid disruptions.
Tracking Performance and Making Ongoing Improvements
Optimization isn’t something you do just once and forget about. Your Azure VM environment needs regular monitoring to keep costs in check and ensure performance stays on track as workloads evolve. The Azure Well-Architected Framework emphasizes that "Performance efficiency is the ability of your workload to adapt to changes in load". Without continuous monitoring, you might end up paying for over-provisioned resources or dealing with performance issues that could impact your business. By keeping an eye on performance metrics, you can ensure your optimization efforts last over time.
Using Azure Monitor and Log Analytics

Azure Monitor offers two key tools for tracking VM performance: Metrics, which provide real-time numerical data, and Logs, which capture detailed system events for deeper analysis. Activating VM Insights simplifies the process by deploying the Azure Monitor Agent and offering preconfigured performance charts and dependency maps right out of the box. This eliminates the hassle of manual setup and gives you instant visibility into the guest operating system.
Set up alerts during the initial VM configuration to get notified about potential issues. For example, if CPU usage exceeds 80% for more than five minutes, it might indicate an overloaded workload. Similarly, memory availability dropping below 15% could signal memory leaks or under-provisioning. Use Log Analytics and Kusto Query Language (KQL) to dig into performance data across multiple machines and pinpoint the root causes of issues. Tools like Metrics Explorer allow you to visualize multiple metrics on a single graph, making it easier to identify patterns, such as how spikes in network traffic might affect CPU performance.
To keep monitoring costs under control, apply Data Collection Rules (DCRs) to filter out unnecessary event logs or performance counters. Azure Monitor Metrics retains data for 93 days by default, while Log Analytics can extend retention up to two years. Consider creating an "Agent Heartbeat" alert to notify you if the Azure Monitor Agent stops sending data, so you can address any disruptions quickly.
By integrating these insights with automated workflows, you can simplify ongoing maintenance and boost efficiency.
Automating Tasks with Azure Automation

Once you’ve set up real-time performance tracking, automation can help you handle routine tasks and maintain efficiency without constant manual intervention. For example, you can configure Azure Automation runbooks to trigger based on alerts from Azure Monitor. If CPU utilization drops below 2% for 15 minutes, a runbook could automatically deallocate the idle VM to save costs. Use System-Assigned Managed Identities for your Automation Account to securely manage resources without needing to store credentials.
For large-scale operations, tags like "Environment: Dev" can help you target specific VM groups for maintenance or staggered start/stop schedules. The Start/Stop VMs v2 solution is another great option – it allows you to manage VM power states across multiple subscriptions based on custom schedules, keeping costs low and operations smooth. Additionally, use Azure Policy to ensure consistent monitoring by automatically deploying the Azure Monitor Agent and assigning Data Collection Rules to both new and existing VMs.
Key Takeaways
Making smart, incremental adjustments to your virtual machines (VMs) can lead to considerable cost savings. For example, right-sizing VMs to align with actual workload demands prevents paying for underutilized resources, which often hover at just 5–15% utilization. Combine this with Reserved Instances for predictable workloads (offering up to 72% savings) and Spot VMs for non-critical tasks (up to 90% off), and you’ve got a flexible pricing strategy tailored to your business needs.
Beyond resizing, smaller tweaks can also bring meaningful savings. Automating start/stop schedules for non-production environments can reduce expenses by as much as 73%. Shifting infrequently accessed data to Cool or Archive storage tiers slashes storage costs by nearly 97.7%. Additionally, leveraging the Azure Hybrid Benefit for Windows Server or SQL Server can save up to 85% on licensing costs. Real-world examples include Akamai, which cut Kubernetes costs by 40–70%, and Branch.io, which saved millions annually in 2024 through similar optimizations.
To maintain these savings, treat optimization as an ongoing process. Tools like Azure Advisor and Azure Monitor can help you regularly review your setup and spot inefficiencies. Setting budget alerts at 90%, 100%, and 110% thresholds helps catch potential issues before they escalate.
For small and medium-sized businesses (SMEs) with teams of 15–40 people, Azure’s complexity can be overwhelming. From choosing the right VM families (like B-series, D-series, or E-series) to implementing autoscaling and governance policies, managing it all internally can stretch resources thin. If you need expert guidance on digital transformation and cloud optimization, Growth Shuttle offers tailored advisory services for SMEs. Starting at $600/month, their experienced advisors can help establish efficient processes, improve operations, and design scalable cloud strategies that align with your business objectives.
FAQs
How do I choose between Spot VMs, Reserved Instances, and Savings Plans?
Choosing between Spot VMs, Reserved Instances, and Savings Plans comes down to your workload’s predictability, your budget, and how much flexibility you need.
- Reserved Instances are ideal for steady, predictable workloads. They require a 1- or 3-year commitment but can save you up to 72% compared to pay-as-you-go pricing.
- Savings Plans offer more flexibility. They work across different VM types and regions, providing savings of up to 65%, without locking you into specific instances.
- Spot VMs are perfect for tasks that can handle interruptions. They come with massive discounts – up to 90% – but there’s always a chance your VM could be interrupted based on demand.
Each option has its strengths, so the right choice depends on how stable your workload is and how much risk you’re willing to take for savings.
What’s the safest way to right-size VMs without causing downtime?
The best way to resize Azure Virtual Machines (VMs) is by deallocating the VM first, especially when the new size isn’t supported on the same hardware cluster. Why? Resizing a running VM triggers a restart, which could disrupt stateful workloads or ongoing processes.
By deallocating the VM, you ensure the resizing process happens smoothly, without unexpected interruptions. However, if deallocation isn’t possible, you’ll need to plan for a restart. To minimize disruptions, schedule the resizing during a maintenance window when workloads are less active. This approach keeps your operations running as smoothly as possible.
Which Azure Monitor metrics should I track to catch waste early?
To spot inefficiencies early, keep an eye on critical Azure VM metrics such as CPU usage, RAM usage, disk space, network traffic (in/out), and inbound/outbound flows. These metrics can reveal areas where resources are being underused or overused. By monitoring them regularly, you can avoid wasted spending and ensure your VMs are running smoothly.