Rollback Terraform Cloud/Enterprise State

Brendan Thompson • 2 June 2021 • 6 min read

When you're dealing with Terraform Cloud/Enterprise (TFC/E) and something has gone wrong with state it becomes a complex situation to deal with as there is no easy way to rollback state to a different version. This blog post aims to guide you on the pathway to do this via the API, as I feel that it is the cleanest and most controlled way to deal with state.

The code that is contained within the post can act as a baseline to producing a more complex utility that might be able to help with a range of things, such as managing rollback/rollforwards on the TF version of a workspace.

First of all we will need our imports, these use mostly the standard library with the addition of the go-tfe package for dealing with TFC/E as well as the cast package to help us with casting some types.

import (
    "context"
    "crypto/md5"
    "encoding/base64"
    "encoding/json"
    "flag"
    "fmt"
    "net/http"

    "github.com/hashicorp/go-tfe"
    "github.com/spf13/cast"
)

Whilst this isn't totally necessary I am create a custom type to represent state, this will make it easier to read and write the code.

type State map[string]interface{}

We will use this type to decode our state file into something that we are able to more easily manipulate.

We are going to produce 5 functions that will perform the primary logic of our utility, those are:

Lets go through creating each of these, starting with getCurrentState.

The following function is going to get the most recent version of state for our workspace. It does this by first collecting all the information about the workspace, downloading the state file into memory, and then decoding this into our custom type.

getCurrentState()
func getCurrentState(workspaceID string, c *tfe.Client, ctx context.Context) (State, error) {
    var state State

    // Get workspace
    ws, err := c.Workspaces.ReadByID(ctx, workspaceID)
    if err != nil {
        return nil, err
    }

    // Get current state
    sv, err := c.StateVersions.Current(ctx, ws.ID)
    if err != nil {
        return nil, err
    }

    // Download state file into memory
    resp, err := http.Get(sv.DownloadURL)
    if err != nil {
        return nil, err
    }

    // Decode JSON body into the custom type
    err = json.NewDecoder(resp.Body).Decode(&state)
    if err != nil {
        return nil, err
    }

    return state, nil
}

Now that we have our latest version of state we will want to grab a specific version of state that we rollback to, this is done through the getSpecificState function. This function essentially does the same thing as getCurrentState however instead of passing in a workspaceID we are going to pass in the the stateVersion.

getSpecificState()
func getSpecificState(stateVersion string, c *tfe.Client, ctx context.Context) (State, error) {
    var state State

    // Get current state
    sv, err := c.StateVersions.Read(ctx, stateVersion)
    if err != nil {
        return nil, err
    }

    // Download state file into memory
    resp, err := http.Get(sv.DownloadURL)
    if err != nil {
        return nil, err
    }

    err = json.NewDecoder(resp.Body).Decode(&state)
    if err != nil {
        return nil, err
    }

    return state, nil
}

This next function, the prepareState function is not 100% necessary and could live within the rollbackToSpecificVersion function however I feel that it is useful to have separated as the utility will likely be expanded to do more interesting rollback scenarios and they would all require the use of such a function as this.

The function itself is fairly simple, we are grabbing the version of state that we intend to upload to TFC/E marshalling it back into a JSON object, base64 encoding it and then creating an instance of the StateVersionCreateOptions that is required by the state version creation function.

prepareState()
func prepareState(state State) (*tfe.StateVersionCreateOptions, error) {
    // Generate JSON state
    jsonState, err := json.Marshal(state)
    if err != nil {
        return nil, err
    }

    // Base64 encode JSON state
    base64EncodedState := base64.StdEncoding.EncodeToString(jsonState)

    // Create state version payload
    opts := tfe.StateVersionCreateOptions{
        MD5:     tfe.String(fmt.Sprintf("%x", md5.Sum(jsonState))),
        Serial:  tfe.Int64(state["serial"].(int64)),
        State:   tfe.String(base64EncodedState),
        Lineage: tfe.String(state["lineage"].(string)),
    }

    return &opts, nil
}

Obviously, now that we have our object to create a state version provided to us via the prepareState function we need to actually upload this to the TFC/E instance. We will do this by the uploadState function. This function, like the prepareState function is fairly simple, it will lock the workspace upload the new -or old- version of state to TFC/E, then it will unlock the workspace so it can be used again.

uploadState()
func uploadState(opts *tfe.StateVersionCreateOptions, ctx context.Context,
    c *tfe.Client, workspaceID string) error {
    // Lock workspace
    lockOptions := tfe.WorkspaceLockOptions{
        Reason: tfe.String("Locking workspace in order to perform rollback."),
    }

    _, err := c.Workspaces.Lock(ctx, workspaceID, lockOptions)
    if err != nil {
        return err
    }

    // Create new state version
    _, err = c.StateVersions.Create(ctx, workspaceID, *opts)
    if err != nil {
        _, _ = c.Workspaces.Unlock(ctx, workspaceID)
        return (err)
    }

    // Unlock workspace
    _, err = c.Workspaces.Unlock(ctx, workspaceID)

    return nil
}

We have all of the pieces together to rollback state to a particular version we will bring that all together using the rollbackToSpecificVersion function. This function grabs the the current state, the state we want to rollback to, increments the serial number, prepares our state object, and finally uploads state to TFC/E.

rollbackToSpecificVersion()
func rollbackToSpecificVersion(stateVersion string, ctx context.Context,
    c *tfe.Client, workspaceID string) error {
    var state State

    currentState, err := getCurrentState(workspaceID, c, ctx)
    if err != nil {
        panic(err)
    }

    specificState, err := getSpecificState(stateVersion, c, ctx)
    if err != nil {
        panic(err)
    }

    state = specificState
    state["serial"] = cast.ToInt64(currentState["serial"]) + 1

    opts, err := prepareState(state)
    if err != nil {
        return err
    }

    err = uploadState(opts, ctx, c, workspaceID)
    if err != nil {
        return err
    }

    return nil
}

The final piece to the puzzle is the main function, which will be the entrypoint into the utility. This function also deals with setting up our Terraform API client, ingesting our CLI arguments and making the call to our rollbackToSpecificVersion function.

main()
func main() {
    token := flag.String("token", "", "Terraform Token")
    workspaceID := flag.String("workspace-id", "", "Workspace ID")
    address := flag.String("address", "https://app.terraform.io", "Address of TFE host, defaults to TFC.")
    stateVersion := flag.String("state-version", "", "Version of state to rollback to")
    flag.Parse()

    ctx := context.Background()

    config := &tfe.Config{
        Address: *address,
        Token:   *token,
    }

    client, err := tfe.NewClient(config)
    if err != nil {
        panic(err)
    }

    err = rollbackToSpecificVersion(*stateVersion, ctx, client, *workspaceID)
    if err != nil {
        panic(err)
    }
}

This ends our journey for rolling back state to a particular version, admittedly this is a fairly basic scenario with some basic code but it does prove out that it is possible to programmatically rollback our state file. Ideally HashiCorp would release an API for dealing with state, but at the time of posting this they currently have not.

Brendan Thompson

Principal Cloud Engineer

Discuss on Twitter