Azure, Cloud, Uncategorized

Monitor a Linux Azure Virtual Machine using Terraform

Azure Monitor helps you maximize the availability and performance of your applications and services. It delivers a comprehensive solution for collecting, analyzing,...

Written by Freddy Ayala · 13 min read >

Azure Monitor helps you maximize the availability and performance of your applications and services. It delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. This information helps you understand how your applications are performing and proactively identify issues affecting them and the resources they depend on.

All data collected by Azure Monitor fits into one of two fundamental types, metrics and logsMetrics are numerical values that describe some aspect of a system at a particular point in time. They are lightweight and capable of supporting near real-time scenarios.

Azure provides some out of the box metrics for VM’s that we can use to monitor our resources, but in order to monitor guest level metrics such as free disk space we need to configure Performance Counters (https://docs.microsoft.com/en-us/azure/azure-monitor/agents/data-sources-performance-counters).

In this article we will see how to monitor CPU, Memory, Disk and disk.

Requirements:

  • A Virtual Machine
  • Log Analytics Workspace
  • Connect the virtual machine to log analyitics workspace (https://faun.pub/hook-your-azure-vm-into-log-analytics-with-the-mma-agent-vm-extension-using-terraform-ca438d7e07dc)

Unfortunately for the moment we cannot configure performance counters using terraform, se we are force to create an ARM template file that we will use to do so (which will be deployed by terraform):

{
    "$schema": "https://schema.management.azure.com/schemas/2019-08-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": {
            "type": "string",
            "metadata": {
              "description": "Name of the workspace."
            }
        },
        "location": {
          "type": "string",
          "metadata": {
            "description": "Specifies the location in which to create the workspace."
          }
        }
    },
    "resources": [
    {
        "apiVersion": "2020-08-01",
        "type": "Microsoft.OperationalInsights/workspaces",
        "name": "[parameters('workspaceName')]",
        "location": "[parameters('location')]",
        "resources": [
            {
                "apiVersion": "2020-08-01",
                "type": "datasources",
                "name": "LinuxPerformanceLogicalDisk",
                "dependsOn": [
                    "[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
                ],
                "kind": "LinuxPerformanceObject",
                "properties": {
                    "objectName": "Logical Disk",
                    "instanceName": "*",
                    "intervalSeconds": 60,
                    "performanceCounters": [                        
                        {
                            "counterName": "% Used Inodes"
                        },
                        {
                            "counterName": "Free Megabytes"
                        },
                        {
                            "counterName": "% Used Space"
                        },
                        {
                            "counterName": "Disk Transfers/sec"
                        },
                        {
                            "counterName": "Disk Reads/sec"
                        },
                        {
                            "counterName": "Disk Writes/sec"
                        }
                    ]
                }
            },
            {
                "apiVersion": "2020-08-01",
                "type": "datasources",
                "name": "LinuxPerformanceProcessor",
                "dependsOn": [
                    "[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
                ],
                "kind": "LinuxPerformanceObject",
                "properties": {
                    "objectName": "Processor",
                    "instanceName": "*",
                    "intervalSeconds": 60,
                    "performanceCounters": [
                        {
                            "counterName": "% Processor Time"
                        },
                        {
                            "counterName": "% Privileged Time"
                        }
                    ]
                }
            },
            {
                "apiVersion": "2020-08-01",
                "type": "datasources",
                "name": "LinuxPerformanceMemory",
                "dependsOn": [
                    "[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
                ],
                "kind": "LinuxPerformanceObject",
                "properties": {
                    "objectName": "Memory",
                    "instanceName": "*",
                    "intervalSeconds": 60,
                    "performanceCounters": [
                        {
                            "counterName": "% Used Memory"
                        },
                        {
                            "counterName": "% Available Memory"
                        }
                    ]
                }
            },
            {
                "apiVersion": "2020-08-01",
                "type": "datasources",
                "name": "DataSource_LinuxPerformanceCollection",
                "dependsOn": [
                    "[concat('Microsoft.OperationalInsights/workspaces/', parameters('workspaceName'))]"
                ],
                "kind": "LinuxPerformanceCollection",
                "properties": {
                    "state": "Enabled"
                }
            }
        ]
      }
    ]
}

Create a new file named monitoring.tf

First we will deploy our log analyitics performance counters arm template


resource "random_string" "unique" {
  length  = 8
  special = false
  upper   = false
}

resource "azurerm_template_deployment" "deploy_log_analyitics_linux_performance_counters" {
  name                = "linux-perf-counter-${random_string.unique.result}"
  resource_group_name = var.resource_group
  template_body       = file("${path.module}/arm/PerformanceCountersLogAnalytics.json")

  parameters = {
    "workspaceName"   = "${var.log_name}-${var.region}-${var.environment}"
    "location"        = var.location_log_analytics
  }

  deployment_mode = "Incremental"
}

Then we will create an action group alert to notify our users by mail if we have an alarm:

resource "azurerm_monitor_action_group" "action_group_alert" {
  name                = "action-group-test-alert-prod"
  resource_group_name = var.resource_group
  short_name          = "ag-botprod"

 dynamic "email_receiver" {
      for_each = var.admin_email


    content {
         name          = "sendto-${email_receiver.key}"
         email_address =  email_receiver.value
      }
    }
 
 

  arm_role_receiver {
    name                    = "sentorolemonitoringreader"
    role_id                 = "43d0d8ad-25c7-4714-9337-8ba259a9fe05"
    use_common_alert_schema = true
  }

   arm_role_receiver {
    name                    = "sentorolemonitoringcontributor"
    role_id                 = "749f88d5-cbae-40b8-bcfc-e573ddc772fa"
    use_common_alert_schema = true
  }
}

To monitor disk space we will create a new schedule rule alert with a log analytics query to monitor disk space:

resource "azurerm_monitor_scheduled_query_rules_alert" "monitor_disk_space" {
  name                = "monitor-disk-test-${var.environment}"
  location            = var.location_log_analytics
  resource_group_name = var.resource_group
  
  action {
    action_group           = [ azurerm_monitor_action_group.action_group_alert.id ]
    email_subject          = "Used Disk Space Over 80%"
  }

  data_source_id  = data.azurerm_log_analytics_workspace.logs.id
  description     = "Alert to monitor free disk space"
  enabled         = true
  query           = <<-QUERY
  Perf
| where TimeGenerated > ago(5min)
| where (ObjectName == "Logical Disk" or ObjectName == "LogicalDisk") and CounterName contains "% Used Space" and InstanceName != "_Total" and InstanceName != "HarddiskVolume1" and CounterValue  >=85
| project TimeGenerated, Computer, ObjectName, CounterName, InstanceName, CounterValue 
  QUERY
  severity        = 1
  frequency       = 5
  time_window     = 5
  
  trigger {
    operator  = "GreaterThan"
    threshold = 0
  }
}

Same thing for the CPU

resource "azurerm_monitor_metric_alert" "cpu" {
  name                =  "monitor-cpu-test-${var.environment}"
  resource_group_name = var.resource_group
  scopes              = [data.azurerm_virtual_machine.vm-test.id]
  description         = "Action will be triggered when Average CPU is greater than 85"

  severity        = 1

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachines"
    metric_name      = "Percentage CPU"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 85
  }

   action {
    action_group_id           = azurerm_monitor_action_group.action_group_alert.id 
  }
}

And finally the memory:

resource "azurerm_monitor_scheduled_query_rules_alert" "monitor_memory" {
  name                = "monitor-memory-test-${var.environment}"
  location            = var.location_log_analytics
  resource_group_name = var.resource_group
  
  action {
    action_group           = [ azurerm_monitor_action_group.action_group_alert.id ]
    email_subject          = "Memory Over 80%"
  }

  data_source_id  = data.azurerm_log_analytics_workspace.logs.id
  description     = "Alert to monitor memory used"
  enabled         = true
  query           = <<-QUERY
  Perf
| where TimeGenerated > ago(1min)
| where CounterName contains "% Used Memory" and InstanceName != "_Total"  and CounterValue  >=80
| project TimeGenerated, Computer, ObjectName, CounterName, InstanceName, CounterValue 
  QUERY
  severity        = 1
  frequency       = 5
  time_window     = 5
  
  trigger {
    operator  = "GreaterThan"
    threshold = 0
  }
}

And there you go, you will receive an email if your virtual machine has low disk space or high cpu/ram usage.

Happy terraforming!

2 Replies to “Monitor a Linux Azure Virtual Machine using Terraform”

  1. Hi Freddy,

    Thanks for this wonderful piece, I’m still new to terraform. Using you steps for the monitoring of VMs. I’m getting the error that the data source “data_source_id = data.azurerm_log_analytics_workspace.logs.id” , “scopes = [data.azurerm_virtual_machine.backupvm.id]” and this “data_source_id = data.azurerm_log_analytics_workspace.logs.id” are not yet decleared” has not been declared in the root module. How can i get this done to clear the error?

  2. Hi,

    I have a question, before it all , did the vm need to be connecting with the log analitycs workspace?

Leave a Reply

Your email address will not be published. Required fields are marked *