Manage Azure Policy with Terraform
Brendan Thompson • 18 November 2022 • 17 min read
Today we are going to talk about managing Azure Policy using Terraform. Azure Policy has a few components to it; Policy Definition, Policy Definition Set (also known as an Initiative), policy assignment, policy exemption and policy remediation. We will talk about all of these except for policy remediation.
I am going to reverse the order I would normally explain a concept, in this blog we will look at the call to the module first and then dive into each of the components.
The Module Call
Below we have a call to our azure-policy-initiative
module, this module will create policies,
assignments and exemptions for us. This is done by the use of an initiative definition. As can be
seen, we are passing in three main things; assignment
, exemptions
, and the initiative_definition
.
module "global_core" {
source = "./modules/azure-policy-initiative"
assignment = {
assignments = [{
id = data.azurerm_resource_group.this.id
name = "DefaultRG"
}]
scope = "rg"
}
exemptions = [{
assignment_reference = "DefaultRG"
category = "Mitigated"
id = data.azurerm_resource_group.this.id
risk_id = "R-001"
scope = "rg"
}]
environment = "dev"
initiative_definition = format("%s/initiatives/core.yaml", path.module)
}
The Initiative Definition
Let's step into the initiative definition next.
name: 'global-core-initiative'
display_name: Global Core Initiative
description: Core initiative scoped to the global level
policies:
AllowedLocations:
type: 'Custom'
file: allowed_locations.json
default:
parameters:
listOfAllowedLocations:
value:
- australiaeast
effect: audit
dev:
parameters:
listOfAllowedLocations:
value:
- australiaeast
- uswest
effect: deny
CostCentreTag:
type: 'BuiltIn'
id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
default:
parameters:
tagName:
value: 'CostCentre'
tagValue:
value: 'abc-123'
OwnerTag:
type: 'BuiltIn'
id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
default:
parameters:
tagName:
value: 'Owner'
tagValue:
value: 'BLT'
There are four top-level keys that we have to set:
name
— unique name for the initiative.display_name
— the name displayed with the initiative.description
— a description of what the policy is for and does.policies
— objects containing the value definitions ofBuiltIn
andCustom
policies.
With these properties we will be able to pass in all of the relevant values into our policies for a
range of environments, further, we can also set a default
if we don't have values specific to an
environment. Let's take a look at the two types of policy definitions.
First, let's take a look at the policy properties:
type
— the type of policy that we are referencing;Custom
orBuiltIn
.file
— if the policy isCustom
we require the name of the file to import. This will be forced into the${path.root}/policies/
directory by the Terraform code.id
— the GUIDid
of an existing Azure policy provided by Microsoft, this is required whentype
is set toBuiltIn
.default
— the default parameter and effect values.dev/uat/prd/...
— the key is the environment and the keys must be the same asdefault
, this provides an optional setting of policy parameters by the environment.
Inside the default
or environment block we have the following few properties:
parameters
— a parameters object that MUST align to the parameters required by either the custom policy or an Azure BuiltIn policy.effect
— the effect this policy will have,deny
as an example. This property cannot be set onBuiltIn
policies.
Custom Policy Definition
AllowedLocations:
type: 'Custom'
file: allowed_locations.json
default:
parameters:
listOfAllowedLocations:
value:
- australiaeast
effect: audit
dev:
parameters:
listOfAllowedLocations:
value:
- australiaeast
- uswest
effect: deny
The key at the beginning AllowedLocations
is how we will reference our policy and retrieve its
components in the Terraform code. By allowing us to pass in the file
to a json
file it allows
us to easily create custom policies alongside the fantastic baseline policies that Microsoft already
give us. In the above example if we were running our Terraform code in the uat
environment our
code would use the properties we have defined in default
as there are no environment-specific
overrides. Allowing this makes our Terraform more powerful as perhaps when we start with Azure policy
we don't necessarily understand what each environment requires, or they all explicitly require the
same types of enforcement.
Built-in Policy Definition
OwnerTag:
type: 'BuiltIn'
id: 1e30110a-5ceb-460c-a204-c1c3969c6d62
default:
parameters:
tagName:
value: 'Owner'
tagValue:
value: 'BLT'
The above is how we would set our parameters for a Built-in Azure policy. Remembering that we cannot
set the effect
of this policy as that is set by Microsoft, if you did need to alter that effect
then it would be best to use a custom policy.
The Custom Policy Definition
We won't go into the mud on how to write an Azure Policy Definition if you're interested in that then check out the Azure Policy definition structure article by Microsoft.
The main point here is that you have a json
definition of the Azure Policy either that you have
written from scratch or perhaps you're pulling from the Azure portal so that you're now able to change
the effect. As you can see on the high-lighted line below the value for the effect
key is using
string interpolation which will be set by templatefile
in our Terraform code later.
This is how we are going to be setting the effect on a per-environment basis.
{
"properties": {
"displayName": "Allowed locations",
"policyType": "Custom",
"mode": "All",
"description": "This policy enables you to restrict the locations your organization can specify when deploying resources. Use to enforce your geo-compliance requirements. Excludes resource groups, Microsoft.AzureActiveDirectory/b2cDirectories, and resources that use the 'global' region.",
"metadata": {
"version": "1.0.0",
"category": "General"
},
"parameters": {
"listOfAllowedLocations": {
"type": "Array",
"metadata": {
"description": "The list of locations that can be specified when deploying resources.",
"strongType": "location",
"displayName": "Allowed locations"
}
}
},
"policyRule": {
"if": {
"allOf": [
{
"field": "location",
"notIn": "[parameters('listOfAllowedLocations')]"
},
{
"field": "location",
"notEquals": "global"
},
{
"field": "type",
"notEquals": "Microsoft.AzureActiveDirectory/b2cDirectories"
}
]
},
"then": {
"effect": "${effect}"
}
}
}
}
The Module
Now we get into the fun stuff 🎉! Whilst going through the module I am going to split it up into
some sub-sections to make it easier for us to talk through. Further, the module supports three scopes;
Management Group (mg
), Subscription (sub
), and Resource Group (rg
) I will just be referencing
the resource group code below as it is almost identical to the other scopes.
The Module Interface#
If you've worked with me or read my articles before you would know that I treat the variables.tf
as our documented API interface, think of it like an OpenAPI definition for a REST API.
We will go through each variable one by one.
Our first variable is initiative_definition
this is where we pass the full path to our definition
yaml
file like we discussed in The Initiative Definition.
variable "initiative_definition" {
type = string
description = <<DESC
(Required) path to the initiative definition file
DESC
}
Secondly, we need to pass in an environment
, this must be in whatever format you've used in the
initiative definition otherwise our Terraform code won't be able to retrieve the properties for an
environment.
variable "environment" {
type = string
description = <<DESC
(Required) environment that the initiatives should be applied to.
DESC
}
The most important input variable for us is the assignment
variable, this is where we pass in a
single or list
of resource IDs we are going to be assigning the policy initiative. Allowing a
list
of assignments means that we can deal with assignments on a larger scale than a single
resource. This is especially powerful when operating in an enterprise environment.
scope
can be used for the assignment
The name
property of the assignments
object we will use as part of our exemptions
process,
this ensures there is an easy and intentional lookup for us when we are trying to exempt
a resource
from a given initiative.
id
property MUST exist before the initiative code running if there are exemptions
defined or Terraform will return an error!We also have some validation ensuring that the scope
passed in is valid for our scenario.
variable "assignment" {
type = object({
assignments = list(object({
id = string
name = string
}))
scope = optional(string, "rg")
})
description = <<DESC
(Required) assignment details for the policy.
Properties:
`assignments` (Required) - list of assignments
`id` (Required) - resource ID
`name` (Required) - friendly name/reference for the assignment
`scope` (Optional) - resource scope for the assignment [Default: `rg`]
DESC
validation {
condition = contains(
["sub", "mg", "rg"],
var.assignment.scope
)
error_message = "Err: invalid assignment scope."
}
}
When Azure Policy is concerned there is always going to be a requirement to be able to exempt
some resources from having that policy applied/enforced on them. We manage that here through the
use of the exemptions
variable. This variable allows us to pass in a list of our exemptions
object. We have the assignment_reference
which as we mentioned above is a reference to name
in
the assignments
object. This allows us to cleanly look up which assignment we are looking to
exempt a given resource for.
In this variable, we need to validate that our exemption scope is valid, not only valid for Azure but
for our given scenario. For instance, you can exempt a single resource from a policy but
our module only supports down to the resource group is the most granular level. The second thing we
are validating is that the category
on the exemption is one of the two valid strings as expected
by Microsoft.
variable "exemptions" {
type = list(object({
id = string
risk_id = string
scope = string
category = string
assignment_reference = string
}))
description = <<DESC
(Optional) List of exemption objects
Properties:
`id` (Required) - the resource ID for the exemption
`risk_id` (Required) - internal risk reference ID
`scope` (Required) - the scope for the exemption (sub, mg, rg)
`category` (Required) - exemption category
`assignment_reference` (Required) - assignment-friendly name/reference
DESC
validation {
condition = alltrue(
[
for exemption in var.exemptions :
contains(
["sub", "mg", "rg"],
exemption.scope
)
]
)
error_message = "Err: invalid exemption scope."
}
validation {
condition = alltrue(
[
for exemption in var.exemptions :
contains(
["Mitigated", "Waiver"],
exemption.category
)
]
)
error_message = "Err: invalid exemption category."
}
default = []
}
Local Variables and Setup#
The first few pieces of setup that we are going to do is get some random_uuid
's setup that we can
use for unique names of our policies, assignments and exemptions. Some properties in the azurerm
the provider will auto-generate names for us, and others won't. In this instance, we are going to be
dealing with the generation of the names.
Next, we need to decode our initiative_definition
yaml
into a Terraform object that we can use throughout
our module. The policies
local variable is a convenience variable for us so that we can quickly access
the property. Also, if the way we access the policies
object/key from our yaml
file changes the
code that consumes the policies
doesn't need to know about that change.
resource "random_uuid" "policy" {
for_each = {
for k, v in local.policies :
k => v
if v.type == "Custom"
}
}
resource "random_uuid" "exemptions" {}
resource "random_uuid" "assignment" {
for_each = {
for assignment in var.assignment.assignments :
assignment.name => assignment.id
}
}
locals {
initiative_definition = yamldecode(file(var.initiative_definition))
policies = local.initiative_definition.policies
}
Policy definitions#
We use an azurerm_policy_definition
resource for a Custom
policy and the
azurerm_policy_definition
data source for our BuiltIn
policies. Doing so allows us to support
both in our module.
When we are creating a Custom
policy we have an object that is the filename of a policy json
file
before creating these policy instances we need to complete the templatefile
on each policy. We will
loop through our local.policies
object and decodes each file to json
once the templatefile
action
has been performed and we have applied the effect
either via a default
key from our initiative
definition or an environment-specific one. This will only occur when the type
property is Custom
.
Then we simply take the properties from our json
and plug them into the resource. Some properties such
as; metadata
, policy_rule
, and parameters
require to have jsonencode
on the object we are
retrieving from the policy json
as when we do our for_each
those are converted into objects
that Terraform can deal with.
resource "azurerm_policy_definition" "this" {
for_each = {
for k, v in local.policies :
k => jsondecode(
templatefile(
"${path.root}/policies/${v.file}",
{ effect = try(v[var.environment].effect, v.default.effect) }
)
)
if v.type == "Custom"
}
name = random_uuid.policy[each.key].result
policy_type = each.value.properties.policyType
mode = each.value.properties.mode
display_name = each.value.properties.displayName
description = each.value.properties.description
metadata = jsonencode(each.value.properties.metadata)
policy_rule = jsonencode(each.value.properties.policyRule)
parameters = jsonencode(each.value.properties.parameters)
}
For the data source, we simply need to loop through our local.policies
object and filter to only
use objects where the type
property is BuiltIn
. We do this by using a for
expression within
the for_each
block. You can read more about that in my post Terraform For Expressions.
data "azurerm_policy_definition" "this" {
for_each = {
for k, v in local.policies :
k => v
if v.type == "BuiltIn"
}
name = each.value.id
}
Policy Initiative#
Now that we have all of our policies in the state we require them its time to create our initiative and pass in the parameter values to each policy.
First off we will merge all our policies, both the resource and the data source. This will
give us a single object to operate on. Using the new all_policies
object we will get the parameter
values, this will be environment specific if available otherwise it will return default
. Having
a pre-populated property for this allows for easy access within the azurerm_policy_set_definition
resource.
locals {
all_policies = merge(azurerm_policy_definition.this, data.azurerm_policy_definition.this)
parameters = {
for k, v in local.all_policies :
k => try(
local.policies[k][var.environment].parameters,
local.policies[k].default.parameters
)
}
}
Now we have two objects; all_policies
and parameters
these two combined are what allow us to
set up all the policies within the initiative. Using a dynamic
block -which you can read more about
here- we will iterate over each policy in local.all_policies
and assign the parameter_values
from the local.parameters
variable based on the key from our for_each
. This is easily possible
as when we created the local.parameters
variable we did so by doing a for_each
over the
local.all_policies
variable, this means that both the dynamic
block and our parameters variable
will use the same value as a key.
resource "azurerm_policy_set_definition" "this" {
name = local.initiative_definition.name
policy_type = "Custom"
display_name = local.initiative_definition.display_name
description = local.initiative_definition.description
dynamic "policy_definition_reference" {
for_each = local.all_policies
content {
policy_definition_id = policy_definition_reference.value.id
parameter_values = jsonencode(local.parameters[policy_definition_reference.key])
}
}
}
Policy Assignment#
The actual policy assignment portion of the module is most likely the simplest part. In this, we simply
for
through the var.assignment.assignments
list and return a map
where the key
is the name
property and the value
is the id
property of our assignments
object.
We do however do a check on scope
to ensure that we are operating on the right scope for the right
resource type. In this instance the resource group. If we were doing this on azurerm_management_group_policy_assignment
the resource then our check would be if var.assignment.scope == "mg"
. You can see that in the full
module code the terraform-azurerm-policy-initiative repository on my GitHub.
resource "azurerm_resource_group_policy_assignment" "this" {
for_each = {
for assignment in var.assignment.assignments :
assignment.name => assignment.id
if var.assignment.scope == "rg"
}
name = random_uuid.assignment[each.key].result
resource_group_id = each.value
policy_definition_id = azurerm_policy_set_definition.this.id
}
Policy Exemption#
The exemptions are where things get a little funkier, as we need to be able to match zero or more exemptions to the correct assignment.
Our first problem to solve is how we reference the correct Terraform resource block given each
assignment type (mg
, sub
, rg
) has its own Terraform resource. We do this by using the local
variables' ability to reference a resource rather than a string. The try
is important as Terraform
will try to evaluate each of these even if they're not called which would be fine except that they
will never all exist at the same time given assignment can only be done on a single scope.
locals {
assignments = {
sub = try(azurerm_subscription_policy_assignment.this, "")
mg = try(azurerm_management_group_policy_assignment.this, "")
rg = try(azurerm_resource_group_policy_assignment.this, "")
}
}
With the above we can now access the right Terraform resource with the following:
resource "azurerm_resource_group_policy_exemption" "this" {
...
policy_assignment_id = local.assignments["rg"]["AssignmentReferenceName"].id
// -> azurerm_resource_group_policy_assignment.this["AssignmentReferenceName"].id
...
}
To be honest, the ability to reference other resources with locals
is INCREDIBLY powerful!!
Now that we can get the right policy assignment it's time to deal with the exemption side of
things. For this, we are going to for
through our assignments
and our exemptions
variables to
create a new data structure containing all the relevant pieces of data. The assignment_id
key will
only ever return one value due to the use of the one
function, this behavior is 💯 what we want
if there was an instance where there were more than one assignment ID for a specific assignment_reference
we would know someone has made a mistake. At this stage, we also validate that the assignment.scope
is correct.
You can read more about the for
expressions in my Terraform For Expressions post.
resource "azurerm_resource_group_policy_exemption" "this" {
for_each = {
for i in flatten([
for assignment in var.assignment.assignments : [
for exemption in var.exemptions : {
id = format("%s_%s", assignment.name, element(
split("/", exemption.id),
length(split("/", exemption.id)) - 1
))
data = {
id = exemption.id
risk_id = exemption.risk_id
category = exemption.category
assignment_id = one([
for scope, assignment in local.assignments :
assignment[exemption.assignment_reference].id
if scope == var.assignment.scope
])
}
}
if(
exemption.assignment_reference == assignment.name
&& exemption.scope == "rg"
)
]
]) : i.id => i.data
}
name = format(
"%s_%s",
random_uuid.exemptions.result,
element(
split("/", each.key),
length(split("/", each.key)) - 1
)
)
policy_assignment_id = each.value.assignment_id
resource_group_id = each.value.id
exemption_category = each.value.category
description = jsonencode({
"risk_id" : "${each.value.risk_id}"
})
}
The name
property is something that we construct out of the random_uuid
for the exemptions as well
as the last component of the resource ID. In the instance of a resource group that will be the name
of the resource group. We also use this same logic to generate the id
or key
field on our for_each
it is because of this that the resource we are referencing must exist before this code is run. If the
resource does not exist then Terraform will error out saying that it is unable to determine the value
of something that is part of the ID of a map. Whilst this behavior is not ideal I also don't think
that it is that bad. The reason being is that should we ever try and exempt a policy on a resource
that doesn't exist Terraform/Azure is going to wig out, therefore the behavior is more or less the
same just at a different place in the run.
Closing Out#
Today we have gone through a module I've created to deal with creating Azure Policy initiatives. We went through the initiative definition, the custom policy definition and the module itself. By using this module we are now easily able to deploy and manage Azure Policies and exemptions on our cloud platform at scale. We also ensured that we can have the right level of flexibility when it comes to setting the parameter values and the effects on an Azure Policy.
For me, this was not what I would call an easy module to write, as it required me to think about how I could get the most amount of configuration information into the module without making it overly complex to consume. However, going back to My Development Workflow helped me through the process. This module had four iterations before it got to what we have here today.
You can find this module at BrendanThompson/terraform-azurerm-policy-initiative
I would love to hear from you on if you think this module is useful and what you have done to manage something as complex as Azure Policy in your cloud environment!