Manage Azure Policy with Terraform

Brendan Thompson • 18 November 2022 • 17 min read

Today we are going to talk about managing Azure Policy using Terraform. Azure Policy has a few components to it; Policy Definition, Policy Definition Set (also known as an Initiative), policy assignment, policy exemption and policy remediation. We will talk about all of these except for policy remediation.

I am going to reverse the order I would normally explain a concept, in this blog we will look at the call to the module first and then dive into each of the components.

The Module Call

Below we have a call to our azure-policy-initiative module, this module will create policies, assignments and exemptions for us. This is done by the use of an initiative definition. As can be seen, we are passing in three main things; assignment, exemptions, and the initiative_definition.

module "global_core" {
  source = "./modules/azure-policy-initiative"

  assignment = {
    assignments = [{
      id   = data.azurerm_resource_group.this.id
      name = "DefaultRG"
    }]
    scope = "rg"
  }

  exemptions = [{
    assignment_reference = "DefaultRG"
    category             = "Mitigated"
    id                   = data.azurerm_resource_group.this.id
    risk_id              = "R-001"
    scope                = "rg"
  }]

  environment           = "dev"
  initiative_definition = format("%s/initiatives/core.yaml", path.module)
}

The Initiative Definition

Let's step into the initiative definition next.

name: 'global-core-initiative'
display_name: Global Core Initiative
description: Core initiative scoped to the global level
policies:
  AllowedLocations:
    type: 'Custom'
    file: allowed_locations.json
    default:
      parameters:
        listOfAllowedLocations:
          value:
            - australiaeast
      effect: audit
    dev:
      parameters:
        listOfAllowedLocations:
          value:
            - australiaeast
            - uswest
      effect: deny
  CostCentreTag:
    type: 'BuiltIn'
    id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
    default:
      parameters:
        tagName:
          value: 'CostCentre'
        tagValue:
          value: 'abc-123'
  OwnerTag:
    type: 'BuiltIn'
    id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
    default:
      parameters:
        tagName:
          value: 'Owner'
        tagValue:
          value: 'BLT'

There are four top-level keys that we have to set:

With these properties we will be able to pass in all of the relevant values into our policies for a range of environments, further, we can also set a default if we don't have values specific to an environment. Let's take a look at the two types of policy definitions.

First, let's take a look at the policy properties:

Inside the default or environment block we have the following few properties:

Custom Policy Definition

AllowedLocations:
  type: 'Custom'
  file: allowed_locations.json
  default:
    parameters:
      listOfAllowedLocations:
        value:
          - australiaeast
    effect: audit
  dev:
    parameters:
      listOfAllowedLocations:
        value:
          - australiaeast
          - uswest
    effect: deny

The key at the beginning AllowedLocations is how we will reference our policy and retrieve its components in the Terraform code. By allowing us to pass in the file to a json file it allows us to easily create custom policies alongside the fantastic baseline policies that Microsoft already give us. In the above example if we were running our Terraform code in the uat environment our code would use the properties we have defined in default as there are no environment-specific overrides. Allowing this makes our Terraform more powerful as perhaps when we start with Azure policy we don't necessarily understand what each environment requires, or they all explicitly require the same types of enforcement.

Built-in Policy Definition

OwnerTag:
  type: 'BuiltIn'
  id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
  default:
    parameters:
      tagName:
        value: 'Owner'
      tagValue:
        value: 'BLT'

The above is how we would set our parameters for a Built-in Azure policy. Remembering that we cannot set the effect of this policy as that is set by Microsoft, if you did need to alter that effect then it would be best to use a custom policy.

The Custom Policy Definition

We won't go into the mud on how to write an Azure Policy Definition if you're interested in that then check out the Azure Policy definition structure article by Microsoft.

The main point here is that you have a json definition of the Azure Policy either that you have written from scratch or perhaps you're pulling from the Azure portal so that you're now able to change the effect. As you can see on the high-lighted line below the value for the effect key is using string interpolation which will be set by templatefile in our Terraform code later. This is how we are going to be setting the effect on a per-environment basis.

{
  "properties": {
    "displayName": "Allowed locations",
    "policyType": "Custom",
    "mode": "All",
    "description": "This policy enables you to restrict the locations your organization can specify when deploying resources. Use to enforce your geo-compliance requirements. Excludes resource groups, Microsoft.AzureActiveDirectory/b2cDirectories, and resources that use the 'global' region.",
    "metadata": {
      "version": "1.0.0",
      "category": "General"
    },
    "parameters": {
      "listOfAllowedLocations": {
        "type": "Array",
        "metadata": {
          "description": "The list of locations that can be specified when deploying resources.",
          "strongType": "location",
          "displayName": "Allowed locations"
        }
      }
    },
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "location",
            "notIn": "[parameters('listOfAllowedLocations')]"
          },
          {
            "field": "location",
            "notEquals": "global"
          },
          {
            "field": "type",
            "notEquals": "Microsoft.AzureActiveDirectory/b2cDirectories"
          }
        ]
      },
      "then": {
"effect": "${effect}"
} } } }

The Module

Now we get into the fun stuff 🎉! Whilst going through the module I am going to split it up into some sub-sections to make it easier for us to talk through. Further, the module supports three scopes; Management Group (mg), Subscription (sub), and Resource Group (rg) I will just be referencing the resource group code below as it is almost identical to the other scopes.

The Module Interface#

If you've worked with me or read my articles before you would know that I treat the variables.tf as our documented API interface, think of it like an OpenAPI definition for a REST API. We will go through each variable one by one.

Our first variable is initiative_definition this is where we pass the full path to our definition yaml file like we discussed in The Initiative Definition.

variable "initiative_definition" {
  type        = string
  description = <<DESC
    (Required) path to the initiative definition file
  DESC
}

Secondly, we need to pass in an environment, this must be in whatever format you've used in the initiative definition otherwise our Terraform code won't be able to retrieve the properties for an environment.

variable "environment" {
  type        = string
  description = <<DESC
    (Required) environment that the initiatives should be applied to.
  DESC
}

The most important input variable for us is the assignment variable, this is where we pass in a single or list of resource IDs we are going to be assigning the policy initiative. Allowing a list of assignments means that we can deal with assignments on a larger scale than a single resource. This is especially powerful when operating in an enterprise environment.

The name property of the assignments object we will use as part of our exemptions process, this ensures there is an easy and intentional lookup for us when we are trying to exempt a resource from a given initiative.

We also have some validation ensuring that the scope passed in is valid for our scenario.

variable "assignment" {
  type = object({
    assignments = list(object({
      id   = string
      name = string
    }))
    scope = optional(string, "rg")
  })
  description = <<DESC
    (Required) assignment details for the policy.
    Properties:
      `assignments` (Required)    - list of assignments
        `id` (Required)   - resource ID
        `name` (Required) - friendly name/reference for the assignment
      `scope` (Optional)          - resource scope for the assignment [Default: `rg`]

  DESC

  validation {
    condition = contains(
      ["sub", "mg", "rg"],
      var.assignment.scope
    )
    error_message = "Err: invalid assignment scope."
  }
}

When Azure Policy is concerned there is always going to be a requirement to be able to exempt some resources from having that policy applied/enforced on them. We manage that here through the use of the exemptions variable. This variable allows us to pass in a list of our exemptions object. We have the assignment_reference which as we mentioned above is a reference to name in the assignments object. This allows us to cleanly look up which assignment we are looking to exempt a given resource for.

In this variable, we need to validate that our exemption scope is valid, not only valid for Azure but for our given scenario. For instance, you can exempt a single resource from a policy but our module only supports down to the resource group is the most granular level. The second thing we are validating is that the category on the exemption is one of the two valid strings as expected by Microsoft.

variable "exemptions" {
  type = list(object({
    id                   = string
    risk_id              = string
    scope                = string
    category             = string
    assignment_reference = string
  }))
  description = <<DESC
    (Optional) List of exemption objects
    Properties:
      `id` (Required)                   - the resource ID for the exemption
      `risk_id` (Required)              - internal risk reference ID
      `scope` (Required)                - the scope for the exemption (sub, mg, rg)
      `category` (Required)             - exemption category
      `assignment_reference` (Required) - assignment-friendly name/reference
  DESC

  validation {
    condition = alltrue(
      [
        for exemption in var.exemptions :
        contains(
          ["sub", "mg", "rg"],
          exemption.scope
        )
      ]
    )
    error_message = "Err: invalid exemption scope."
  }

  validation {
    condition = alltrue(
      [
        for exemption in var.exemptions :
        contains(
          ["Mitigated", "Waiver"],
          exemption.category
        )
      ]
    )
    error_message = "Err: invalid exemption category."
  }

  default = []
}

Local Variables and Setup#

The first few pieces of setup that we are going to do is get some random_uuid's setup that we can use for unique names of our policies, assignments and exemptions. Some properties in the azurerm the provider will auto-generate names for us, and others won't. In this instance, we are going to be dealing with the generation of the names.

Next, we need to decode our initiative_definition yaml into a Terraform object that we can use throughout our module. The policies local variable is a convenience variable for us so that we can quickly access the property. Also, if the way we access the policies object/key from our yaml file changes the code that consumes the policies doesn't need to know about that change.

resource "random_uuid" "policy" {
  for_each = {
    for k, v in local.policies :
    k => v
    if v.type == "Custom"
  }
}
resource "random_uuid" "exemptions" {}
resource "random_uuid" "assignment" {
  for_each = {
    for assignment in var.assignment.assignments :
    assignment.name => assignment.id
  }
}

locals {
  initiative_definition = yamldecode(file(var.initiative_definition))
  policies              = local.initiative_definition.policies
}

Policy definitions#

We use an azurerm_policy_definition resource for a Custom policy and the azurerm_policy_definition data source for our BuiltIn policies. Doing so allows us to support both in our module.

When we are creating a Custom policy we have an object that is the filename of a policy json file before creating these policy instances we need to complete the templatefile on each policy. We will loop through our local.policies object and decodes each file to json once the templatefile action has been performed and we have applied the effect either via a default key from our initiative definition or an environment-specific one. This will only occur when the type property is Custom. Then we simply take the properties from our json and plug them into the resource. Some properties such as; metadata, policy_rule, and parameters require to have jsonencode on the object we are retrieving from the policy json as when we do our for_each those are converted into objects that Terraform can deal with.

resource "azurerm_policy_definition" "this" {
  for_each = {
    for k, v in local.policies :
    k => jsondecode(
      templatefile(
        "${path.root}/policies/${v.file}",
        { effect = try(v[var.environment].effect, v.default.effect) }
      )
    )
    if v.type == "Custom"
  }

  name         = random_uuid.policy[each.key].result
  policy_type  = each.value.properties.policyType
  mode         = each.value.properties.mode
  display_name = each.value.properties.displayName
  description  = each.value.properties.description
  metadata     = jsonencode(each.value.properties.metadata)
  policy_rule  = jsonencode(each.value.properties.policyRule)
  parameters   = jsonencode(each.value.properties.parameters)
}

For the data source, we simply need to loop through our local.policies object and filter to only use objects where the type property is BuiltIn. We do this by using a for expression within the for_each block. You can read more about that in my post Terraform For Expressions.

data "azurerm_policy_definition" "this" {
  for_each = {
    for k, v in local.policies :
    k => v
    if v.type == "BuiltIn"
  }

  name = each.value.id
}

Policy Initiative#

Now that we have all of our policies in the state we require them its time to create our initiative and pass in the parameter values to each policy.

First off we will merge all our policies, both the resource and the data source. This will give us a single object to operate on. Using the new all_policies object we will get the parameter values, this will be environment specific if available otherwise it will return default. Having a pre-populated property for this allows for easy access within the azurerm_policy_set_definition resource.

locals {
  all_policies = merge(azurerm_policy_definition.this, data.azurerm_policy_definition.this)

  parameters = {
    for k, v in local.all_policies :
    k => try(
      local.policies[k][var.environment].parameters,
      local.policies[k].default.parameters
    )
  }
}

Now we have two objects; all_policies and parameters these two combined are what allow us to set up all the policies within the initiative. Using a dynamic block -which you can read more about here- we will iterate over each policy in local.all_policies and assign the parameter_values from the local.parameters variable based on the key from our for_each. This is easily possible as when we created the local.parameters variable we did so by doing a for_each over the local.all_policies variable, this means that both the dynamic block and our parameters variable will use the same value as a key.

resource "azurerm_policy_set_definition" "this" {
  name         = local.initiative_definition.name
  policy_type  = "Custom"
  display_name = local.initiative_definition.display_name
  description  = local.initiative_definition.description

  dynamic "policy_definition_reference" {
    for_each = local.all_policies

    content {
      policy_definition_id = policy_definition_reference.value.id
      parameter_values     = jsonencode(local.parameters[policy_definition_reference.key])
    }
  }
}

Policy Assignment#

The actual policy assignment portion of the module is most likely the simplest part. In this, we simply for through the var.assignment.assignments list and return a map where the key is the name property and the value is the id property of our assignments object.

We do however do a check on scope to ensure that we are operating on the right scope for the right resource type. In this instance the resource group. If we were doing this on azurerm_management_group_policy_assignment the resource then our check would be if var.assignment.scope == "mg". You can see that in the full module code the terraform-azurerm-policy-initiative repository on my GitHub.

resource "azurerm_resource_group_policy_assignment" "this" {
  for_each = {
    for assignment in var.assignment.assignments :
    assignment.name => assignment.id
    if var.assignment.scope == "rg"
  }

  name                 = random_uuid.assignment[each.key].result
  resource_group_id    = each.value
  policy_definition_id = azurerm_policy_set_definition.this.id
}

Policy Exemption#

The exemptions are where things get a little funkier, as we need to be able to match zero or more exemptions to the correct assignment.

Our first problem to solve is how we reference the correct Terraform resource block given each assignment type (mg, sub, rg) has its own Terraform resource. We do this by using the local variables' ability to reference a resource rather than a string. The try is important as Terraform will try to evaluate each of these even if they're not called which would be fine except that they will never all exist at the same time given assignment can only be done on a single scope.

locals {
  assignments = {
    sub = try(azurerm_subscription_policy_assignment.this, "")
    mg  = try(azurerm_management_group_policy_assignment.this, "")
    rg  = try(azurerm_resource_group_policy_assignment.this, "")
  }
}

With the above we can now access the right Terraform resource with the following:

resource "azurerm_resource_group_policy_exemption" "this" {
  ...

  policy_assignment_id = local.assignments["rg"]["AssignmentReferenceName"].id
  // -> azurerm_resource_group_policy_assignment.this["AssignmentReferenceName"].id

  ...
}

To be honest, the ability to reference other resources with locals is INCREDIBLY powerful!!

Now that we can get the right policy assignment it's time to deal with the exemption side of things. For this, we are going to for through our assignments and our exemptions variables to create a new data structure containing all the relevant pieces of data. The assignment_id key will only ever return one value due to the use of the one function, this behavior is 💯 what we want if there was an instance where there were more than one assignment ID for a specific assignment_reference we would know someone has made a mistake. At this stage, we also validate that the assignment.scope is correct.

You can read more about the for expressions in my Terraform For Expressions post.

resource "azurerm_resource_group_policy_exemption" "this" {
  for_each = {
    for i in flatten([
      for assignment in var.assignment.assignments : [
        for exemption in var.exemptions : {
          id = format("%s_%s", assignment.name, element(
            split("/", exemption.id),
            length(split("/", exemption.id)) - 1
          ))
          data = {
            id       = exemption.id
            risk_id  = exemption.risk_id
            category = exemption.category
            assignment_id = one([
              for scope, assignment in local.assignments :
              assignment[exemption.assignment_reference].id
              if scope == var.assignment.scope
            ])
          }
        }
        if(
          exemption.assignment_reference == assignment.name
          && exemption.scope == "rg"
        )
      ]
    ]) : i.id => i.data
  }

  name = format(
    "%s_%s",
    random_uuid.exemptions.result,
    element(
      split("/", each.key),
      length(split("/", each.key)) - 1
    )
  )
  policy_assignment_id = each.value.assignment_id
  resource_group_id    = each.value.id
  exemption_category   = each.value.category
  description = jsonencode({
    "risk_id" : "${each.value.risk_id}"
  })
}

The name property is something that we construct out of the random_uuid for the exemptions as well as the last component of the resource ID. In the instance of a resource group that will be the name of the resource group. We also use this same logic to generate the id or key field on our for_each it is because of this that the resource we are referencing must exist before this code is run. If the resource does not exist then Terraform will error out saying that it is unable to determine the value of something that is part of the ID of a map. Whilst this behavior is not ideal I also don't think that it is that bad. The reason being is that should we ever try and exempt a policy on a resource that doesn't exist Terraform/Azure is going to wig out, therefore the behavior is more or less the same just at a different place in the run.

Closing Out#

Today we have gone through a module I've created to deal with creating Azure Policy initiatives. We went through the initiative definition, the custom policy definition and the module itself. By using this module we are now easily able to deploy and manage Azure Policies and exemptions on our cloud platform at scale. We also ensured that we can have the right level of flexibility when it comes to setting the parameter values and the effects on an Azure Policy.

For me, this was not what I would call an easy module to write, as it required me to think about how I could get the most amount of configuration information into the module without making it overly complex to consume. However, going back to My Development Workflow helped me through the process. This module had four iterations before it got to what we have here today.

You can find this module at BrendanThompson/terraform-azurerm-policy-initiative

I would love to hear from you on if you think this module is useful and what you have done to manage something as complex as Azure Policy in your cloud environment!


Brendan Thompson

Principal Cloud Engineer

Discuss on Twitter