Azure Kubernetes Service for $1/day

Azure Kubernetes Service for $1/day

If you need a development cluster to learn Kubernetes or manage a small, non-production workload, Azure Kubernetes Service (AKS) is an appealing option.  Microsoft manages the platform and includes the control plane for free, and it has all of the enterprise features.

Unfortunately, it's expensive. However with some elbow grease it can be run for about $1/day. Let me show you how.

Design Goals

My requirement is to host some small workloads. Performance is not important. I had 3 design goals:

  1. Keep the costs low
  2. Run 24/7, with no evictions
  3. Use Linux nodes

Approach

Inspired by @staff0rd and @tr_stringer, I focused on 7 areas that have the greatest impact on cost for AKS. 5 are part of the build options, and 2 are additional configuration.

Build options

Use your provisioner of choice to create a new AKS cluster with the following properties.

  1. Node Count = 1.  AKS hosts the control plane, so this node will be dedicated for workloads.
  2. Node Size = Standard_B2s. This is the smallest VM, available in my region, that meets the node requirements of 2 vCPU and 4GB RAM.
  3. Node type = AvailabilitySet. The default ScaleSets don't allow access to the disk configuration of the underlying VMs, which we'll require momentarily.
  4. OS Disk Size = 32. This is the minimum supported by the node.
  5. Load Balancer = basic. This one is important, since the default is standard, and it can't be changed without rebuilding the cluster.

Here is a bicep script with these settings included. Note I also specify the nodeResourceGroup name, so it appears alongside the primary RG in the portal, but this is optional.

resource clusterName_resource 'Microsoft.ContainerService/managedClusters@2021-05-01' = {
  name: 'k8s-cluster'
  location: resourceGroup().location
  properties: {
    kubernetesVersion: '1.21.2'
    dnsPrefix: 'k8s-cluster'
    nodeResourceGroup: '${resourceGroup().name}-aksrg'
    agentPoolProfiles: [
      {
        name: 'nodepool'
        osDiskSizeGB: 32
        count: 1
        vmSize: 'Standard_B2s'
        osType: 'Linux'
        type: 'AvailabilitySet'
        mode: 'System'
      }
    ]
    networkProfile: {
      loadBalancerSku: 'basic'
    }
  }
  identity: {
    type: 'SystemAssigned'
  }
}

Additional Configuration

Our configuration so far is a good start, but it's still using a Pay-As-You-Go pricing tier for the node, and a Premium SSD. Let's make some tweaks.

Reserved Instances

Azure Reserved Instances are commitments made in advance to use  a certain VM SKU. In return for the commitment, the price is lower (though it can still be billed monthly). In my case, I expect to run this cluster 24/7 indefinitely, so I can take advantage of the discount.

Azure Reservation

Make sure to match the Product name to the SKU of the cluster VM (in my case, Standard_B2s, and Azure takes care of applying the discount automatically.

Standard SSD

Finally, let's look at the disk. When creating a typical VM, you have an option to specify the disk type (Standard or Premium), with defaults derived from the selected VM size. With AKS there is no such option, meaning we'll end up with the (performant but expensive ) Premium Managed SSDs. We need a workaround.

When we created the cluster, we opted for AvailabilitySet, which allows changes to disks attached to running VMs. Simply deallocate the VM, change the disk type, and re-start. No data is lost, and the Kubernetes will restore itself to a healthy status.

Once complete, the disk configuration will look like this:

AKS nodepool disk configuration

If automation is your jam, here's a simple bash script you can run from the CLI. Specify the NodePool resource group (where the VM AvailabilitySet for the cluster resides), and the script will update the disks automatically.

Many thanks to this comment by @jdudleyie for the inspiration.

#!/bin/bash

# This script will downgrade the disks on the aks nodes to standard_lrs to save costs.

# exit when any command fails
set -e

resourceGroupName=$1
sku=StandardSSD_LRS

vmIds=$(az vm list --resource-group "$resourceGroupName" --query "[].id" --output tsv)
echo $vmIds
for vmId in $vmIds
do
echo "---"
echo "Checking VM $vmId"

vmName="$(az vm get-instance-view --ids "$vmId" --query "name" --output tsv)"
echo "VM name $vmName"

vmDiskId=$(az vm get-instance-view --ids "$vmId" --query "storageProfile.osDisk.managedDisk.id" --output tsv)
echo "VM disk $vmDiskId"

vmDiskSku=$(az disk show --ids "$vmDiskId" --query "sku.name" --output tsv)
echo "VM disk SKU $vmDiskSku"

if [ $vmDiskSku = "$sku" ]
then
    echo "VM disk SKU doesn't need to be changed"
    continue
fi

echo "Deallocate VM"
az vm deallocate --id "$vmId"

echo "Updating VM disk SKU"
az disk update --ids "$vmDiskId" --sku "$sku" --remove creationData.galleryImageReference

echo "Start VM"
az vm start --id "$vmId"
done

echo "---"
echo "Done"

Conclusion

That's it! In my experience, this configuration runs about $1/day (including the cost of the reserved instance), performs well for a few small workloads, and provides all of the capabilities we know and love about AKS.