Azure, Cloud, IaC

Deploy a Kubernetes Cluster using 3 different IaC technologies (Terraform, ARM Templates and Pulumi)

Infrastructure as code (IaC) is the process of managing and provisioning cloud resources through machine-readable definition files rather than using physical hardware configuration...

Written by Freddy Ayala · 6 min read >

Infrastructure as code (IaC) is the process of managing and provisioning cloud resources through machine-readable definition files rather than using physical hardware configuration or interactive configuration tools, each cloud provider have their own flavor of IaC, which for most cases they are Infrastructure as Configuration technologies that lack basic “code” constructs such as conditionals and looping.

Evidently technologies such as terraform and arm templates have implemented their own way to do these sort of things with count (terraform) and copy blocks (arm templates) and ternary operators in terraform.

But what about using “real” code in order to define your infrastructure and deploying? There are new players in the market such as Pulumi that offers you the possibility of using programming languages such as c# and typescript (I refuse to call javascript a proper programming language) in order to deploy resources to most cloud providers.

AWS have started to develop their own solution that is CDK or Cloud Development Kit, which I’m eager to test after I confirm that is less buggy than Cloud Formation.

As of now the leader in the market of IaC is terraform, you can use it to deploy resource in almost every cloud provider and other platforms that are not necessarily cloud, it has become the defacto standard for IaC and with the new version 0.13 it has brought even more features.

However, today we will test 3 of these technologies in order to deploy the same kind of resource, an Azure Kubernetes Service Cluster (AKS) to compare the different solutions and weight their advantages and disadvantages.

Terraform

With terraform we will use a local state and define some input variables using the terraform.tfvars that we are going to use in order to deploy our cluster, our cluster will be defined in the following files:

main.tf

terraform {
  required_version = ">= 0.12.6"
  backend "local" {
    path = "./terraform.tfstate"
  }
}

provider azurerm {
    version = "~> 2.5.0"
  features {}
}

resource "azurerm_virtual_network" "vnet" {
  name                = "vnet-aks"
  location            = var.location
  resource_group_name = var.resource_group
  address_space       = var.node_address_space
}


resource "azurerm_kubernetes_cluster" "k8s" {
  name                = var.cluster_name
  location            = var.location
  resource_group_name = var.resource_group
  dns_prefix          = var.dns_prefix

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "Standard"
  }

  role_based_access_control {
    enabled = true
  }



  default_node_pool {
    name            = "agentpool"
    node_count      = var.agent_count
    vm_size         = var.vm_size
    os_disk_size_gb = var.os_disk_size_gb
  }

  identity {
    type = "SystemAssigned"
  }



}

variables.tf


variable "vm_size" {}

variable "os_disk_size_gb" {
  default = "100"
}
variable "resource_group" {
  description = "(Required) Contains the name of the resource group"
}
variable "agent_count" {
  default = 3
}

variable "ssh_public_key_data" {
}

variable "dns_prefix" {
  default = "k8stest"
}

variable cluster_name {
  default = "k8stest"
}

variable "location" {
  description = "Azure region to create the resources"
  type        = string
}

variable "node_address_space" {}




terraform.tfvars

#Basic settings
location       = "france central"
resource_group = "rg-frce-aks-test-01"
#Uses a service principal in order to create AKS resource
#Uses a custom VNET in order to automate the peering between the spoke and the hub


agent_count             = 3
os_disk_size_gb         = 100
vm_size                 = "Standard_DS2_v2"
dns_prefix              = "aks-test-itg"
cluster_name            = "aks-test-itg"
private_cluster_enabled = false
node_address_space      = ["11.1.0.0/16"]
node_address_prefix     = "11.1.0.0/24"
node_resource_group     = "rg-frce-aks-nodes-test-itg-01"


For the moment the code seems simple enough and terraform provides us with the possibility to customise our cluster to our liking.

Deployment

The deploiement is easy enough, just use the classic terraform commands and we are good to go.

Pulumi

Program.cs

// Copyright 2016-2020, Pulumi Corporation.  All rights reserved.

using System.Threading.Tasks;
using Pulumi;

class Program
{
    static Task<int> Main() => Deployment.RunAsync<AksStack>();
}

With pulumi things start to get interesting, first of all the code is C# and we define our deployment inside the constructor of a class that inherits from the Stack base class.

AksStack.cs

// Copyright 2016-2020, Pulumi Corporation.  All rights reserved.

using Pulumi;
using Pulumi.AzureAD;
using Pulumi.Azure.ContainerService;
using Pulumi.Azure.ContainerService.Inputs;
using Pulumi.Azure.Core;
using Pulumi.Azure.Network;
using Pulumi.Azure.Authorization;
using Pulumi.Random;
using Pulumi.Tls;

class AksStack : Stack
{
    public AksStack()
    {
        var config = new Pulumi.Config();
        var kubernetesVersion = config.Get("kubernetesVersion") ?? "1.16.9";

        var resourceGroup = new ResourceGroup("rg-aks-pulumi");

        var password = new RandomPassword("password", new RandomPasswordArgs
        {
            Length = 20,
            Special = true,
        }).Result;

        var sshPublicKey = new PrivateKey("ssh-key", new PrivateKeyArgs
        {
            Algorithm = "RSA",
            RsaBits = 4096,
        }).PublicKeyOpenssh;

        // Create the AD service principal for the K8s cluster.
        var adApp = new Application("aks");
        var adSp = new ServicePrincipal("aksSp", new ServicePrincipalArgs {ApplicationId = adApp.ApplicationId});
        var adSpPassword = new ServicePrincipalPassword("aksSpPassword", new ServicePrincipalPasswordArgs
        {
            ServicePrincipalId = adSp.Id,
            Value = password,
            EndDate = "2099-01-01T00:00:00Z",
        });

        // Grant networking permissions to the SP (needed e.g. to provision Load Balancers)
        var assignment = new Assignment("role-assignment", new AssignmentArgs
        {
            PrincipalId = adSp.Id,
            Scope = resourceGroup.Id,
            RoleDefinitionName = "Network Contributor"
        });

        // Create a Virtual Network for the cluster
        var vnet = new VirtualNetwork("vnet", new VirtualNetworkArgs
        {
            ResourceGroupName = resourceGroup.Name,
            AddressSpaces = {"10.2.0.0/16"},
        });

        // Create a Subnet for the cluster
        var subnet = new Subnet("subnet", new SubnetArgs
        {
            ResourceGroupName = resourceGroup.Name,
            VirtualNetworkName = vnet.Name,
            AddressPrefix = "10.2.1.0/24",
        });

        // Now allocate an AKS cluster.
        var cluster = new KubernetesCluster("aksCluster", new KubernetesClusterArgs
        {
            ResourceGroupName = resourceGroup.Name,
            DefaultNodePool = new KubernetesClusterDefaultNodePoolArgs
            {
                Name = "aksagentpool",
                NodeCount = 3,
                VmSize = "Standard_B2s",
                OsDiskSizeGb = 30,
                VnetSubnetId = subnet.Id,
            },
            DnsPrefix = "sampleaks",
            LinuxProfile = new KubernetesClusterLinuxProfileArgs
            {
                AdminUsername = "aksuser",
                SshKey = new KubernetesClusterLinuxProfileSshKeyArgs
                {
                    KeyData = sshPublicKey,
                },
            },
            ServicePrincipal = new KubernetesClusterServicePrincipalArgs
            {
                ClientId = adApp.ApplicationId,
                ClientSecret = adSpPassword.Value,
            },
            KubernetesVersion = kubernetesVersion,
            RoleBasedAccessControl = new KubernetesClusterRoleBasedAccessControlArgs {Enabled = true},
            NetworkProfile = new KubernetesClusterNetworkProfileArgs
            {
                NetworkPlugin = "azure",
                DnsServiceIp = "10.2.2.254",
                ServiceCidr = "10.2.2.0/24",
                DockerBridgeCidr = "172.17.0.1/16",
            },
        });

        this.KubeConfig = cluster.KubeConfigRaw;
    }

    [Output] public Output<string> KubeConfig { get; set; }
}

Some parameters are hardcoded but I guess that we can get them from an appsettings.json file

Pulumi.yaml

name: azure-cs-aks
description: Creates an Azure Kubernetes Service (AKS) cluster
runtime: dotnet
template:
  config:
    azure:location:
      description: The Azure location to use
      default: westeurope
    kubernetesVersion:
      description: The Kubernetes version to deploy
      default: 1.16.9

Deployment

For the deployment an account and access token is required from the site https://app.pulumi.com/

then we need to install Pulimi and initialise the stack

curl -fsSL https://get.pulumi.com | sh
 pulumi stack init
pulumi config set azure:location francecentral

Then we have pulumi up that is the equivalent of terrafom apply, which will compile the c# code and then deploy the resources

pulumi up

ARM Templates

For the arm templates we have to use the files deploy.json and parameters.json where we will define the resources.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.1",
    "parameters": {
        "clusterName": {
            "type": "string",
            "defaultValue":"aks101cluster",
            "metadata": {
                "description": "The name of the Managed Cluster resource."
            }
        },
        "location": {
            "type": "string",
            "defaultValue": "[resourceGroup().location]",
            "metadata": {
                "description": "The location of the Managed Cluster resource."
            }
        },
        "dnsPrefix": {
            "type": "string",
            "metadata": {
                "description": "Optional DNS prefix to use with hosted Kubernetes API server FQDN."
            }
        },
        "osDiskSizeGB": {
            "type": "int",
            "defaultValue": 0,
            "metadata": {
                "description": "Disk size (in GB) to provision for each of the agent pool nodes. This value ranges from 0 to 1023. Specifying 0 will apply the default disk size for that agentVMSize."
            },
            "minValue": 0,
            "maxValue": 1023
        },
        "agentCount": {
            "type": "int",
            "defaultValue": 3,
            "metadata": {
                "description": "The number of nodes for the cluster."
            },
            "minValue": 1,
            "maxValue": 50
        },
        "agentVMSize": {
            "type": "string",
            "defaultValue": "Standard_DS2_v2",
            "metadata": {
                "description": "The size of the Virtual Machine."
            }
        },
        "linuxAdminUsername": {
            "type": "string",
            "metadata": {
                "description": "User name for the Linux Virtual Machines."
            }
        },
        "sshRSAPublicKey": {
            "type": "string",
            "metadata": {
                "description": "Configure all linux machines with the SSH RSA public key string. Your key should include three parts, for example 'ssh-rsa AAAAB...snip...UcyupgH azureuser@linuxvm'"
            }
        },
        "servicePrincipalClientId": {
            "metadata": {
                "description": "Client ID (used by cloudprovider)"
            },
            "type": "securestring"
        },
        "servicePrincipalClientSecret": {
            "metadata": {
                "description": "The Service Principal Client Secret."
            },
            "type": "securestring"
        },
        "osType": {
            "type": "string",
            "defaultValue": "Linux",
            "allowedValues": [
                "Linux"
            ],
            "metadata": {
                "description": "The type of operating system."
            }
        }        
    },
    "resources": [
        {
            "apiVersion": "2020-03-01",
            "type": "Microsoft.ContainerService/managedClusters",
            "location": "[parameters('location')]",
            "name": "[parameters('clusterName')]",
            "properties": {
                "dnsPrefix": "[parameters('dnsPrefix')]",
                "agentPoolProfiles": [
                    {
                        "name": "agentpool",
                        "osDiskSizeGB": "[parameters('osDiskSizeGB')]",
                        "count": "[parameters('agentCount')]",
                        "vmSize": "[parameters('agentVMSize')]",
                        "osType": "[parameters('osType')]",
                        "storageProfile": "ManagedDisks"
                    }
                ],
                "linuxProfile": {
                    "adminUsername": "[parameters('linuxAdminUsername')]",
                    "ssh": {
                        "publicKeys": [
                            {
                                "keyData": "[parameters('sshRSAPublicKey')]"
                            }
                        ]
                    }
                },
                "servicePrincipalProfile": {
                    "clientId": "[parameters('servicePrincipalClientId')]",
                    "Secret": "[parameters('servicePrincipalClientSecret')]"
                }
            }
        }
    ],
    "outputs": {
        "controlPlaneFQDN": {
            "type": "string",
            "value": "[reference(parameters('clusterName')).fqdn]"
        }
    }
}

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "clusterName": {
            "value": "GEN-UNIQUE"
        },
        "dnsPrefix": {
            "value": "GEN-UNIQUE"
        },
        "linuxAdminUsername": {
            "value": "GEN-UNIQUE"
        },
        "sshRSAPublicKey": {
            "value": "GEN-SSH-PUB-KEY"
        },
        "servicePrincipalClientId": {
            "value": "GEN-AZUREAD-AKS-APPID"
        },
        "servicePrincipalClientSecret": {
            "value": "GEN-AZUREAD-AKS-APPID-SECRET"
        }
    }
}

We can deploy them either by using az cli or the azure portal, simple enough.

Conclusions (IMHO)

  • Terraform
    • Terraform has a very good support for most azure resources, the syntax is simple enough and is easy to use in general.
    • It covers most use cases, it is multi cloud, you can consider it a swiss knife, learn it once, use it everywhere.
    • In order to create modules it is require to create several files such as variables.tf and outputs.tf so it seems a bit cumbersome.
    • In general when trying to create DRY Code it falls short, it is easier to just duplicate code than to maintain modules.
    • Autocomplete is somewhat lacking with current tools.
  • Pulumi
    • A lot of potential for modularity, dry and reusability of code, I imagine myself creating libraries with building blocks for most of my uses cases.
    • For the moment feels a bit experimental and they do not support all the azure resources.
    • Have to investigate further the advance automatisation cases using the features of languages such as c#.
  • ARM Templates
    • We cannot avoid them sometimes, ARM templates support almost all Azure Resources.
    • Simple to deploy from az cli or the portal.
    • Reusability and modularity is a bit limited because you have to create a master template that calls children templates that have to be accessible through internet, cumbersome if you have multiple templates.
    • JSON is very verbose and hard to read when you have more complex templates.

Conclusions

  • In general, an AKS cluster deployment in the three different technologies is very similar because they share a similar naming convention and obviously the same parameters, we can customise all the aspects of the cluster deployment such as the AKS VNET and subnet, so no surprises here.
  • Pulumi code is easier to write thanks to autocomplete and is very refreshing to use a “real” programming language for IaC, in the example we used c# but we could have used either python, typescript or go.
  • ARM templates are very verbose and sometimes hard to read, but they tend to work ok.

So in my opinion for the moment Terraform is king, Pulumi is a very interesting contender that we are going to follow closely.

AWS have up their game with CDK, Pulimi has also joined the fray, so I wonder, Microsoft what’s your next move?

DNS Resolution Problems in AKS

Freddy Ayala in Azure, Cloud
  ·   1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *