How Terraform (sort of) works

This document is written for provider developers on TPG. It's both incomplete and inaccurate. However, it's a useful model for explaining interactions between Terraform Core and the Terraform Plugin SDK. Particularly, the Resource Instance Change Lifecycle document is a great summary of how the Terraform binary ("Core") understands interactions (most of the state terms are drawn from there), but doesn't map well to the SDK's provider framework that we use.

This model will freely cross between Core and the SDK- the goal is to map what users see to what developers write, not to explain the protocol, Core, or SDK accurately.

Terraform Commands and Core

Terraform users will generally use five commands to interact with Terraform: terraform apply, terraform plan, terraform import, terraform refresh, and terraform destroy. In addition, they'll have a statefile in their current directory (terraform.tfstate generally) and 1+ config files like main.tf. Prior to any command, Terraform will perform a validation step where it calls ValidateFuncs directly on the raw values written in a user's config. If values are "unknown", such as values drawn from an interpolation, they're not validated.

Each of those 5 user-facing commands are roughly made up of some combination of two actions- "refresh" and "apply". A refresh is when Terraform finds the current state of a resource from the API and keeps track of it as the "prior state". It knows what resource and region to use based on the statefile. For most providers it only needs the id, which will contain a unique identifier, however in TPG we tend to draw directly from fields like project and name. An apply is when Terraform performs the appropriate actions to bring the resource from it's prior state to a new state, "planned new state". The real state at the end of an apply is called a "new state".

We can model those 5 commands with apply and refresh like the following:

terraform refresh performs a refresh, writing the prior state into the statefile, replacing the old contents
terraform import writes the supplied id into the statefile and then performs a refresh, writing the prior state into the statefile
terraform plan performs a refresh, and compares the prior state to the "proposed new state" to create the planned new state. It displays the difference between the prior state and planned new state to the user
terraform apply implicitly performs terraform plan. If the user approves the change, it performs an apply.
terraform destroy is a convenient way to call terraform apply where the "proposed new state" is empty (indicating that the resource should be deleted)

It's clear how Terraform uses the statefile- it's roughly a serialized state. However, the user's config isn't consumed directly. Instead, Terraform uses that to build the proposed new state. To do so, Terraform copies config values directly into the proposed state and copies Computed values from the prior state (when not present there, they get the special value "unknown"). Any other values are assumed to have the corresponding zero value for their type ("", 0, false, etc.) Optional+Computed values are treated like a Computed value when unset, and a normal value when set.

It's also worth noting that any time Terraform creates a state, it will run StateFunc functions on each field if there are any, allowing the state to be modified. They allow us to modify values in the state, but they only have access to that single field's value. In practice we don't use them much.

Apply

During Apply, Terraform assumes that the planned new state and new state are identical by default. The ResourceData d has a few different meanings depending on the CRUD method.

In Create, d.Get draws from the planned new state and d.Set sets the new state
In Delete, d.Get draws from the prior state
In Update, d.Get draws from the planned new state, d.GetChange from (prior state, planned new state). d.Set sets the new state

Otherwise, Apply isn't very exciting- Terraform calls the appropriate provider methods.

Planned New State

During terraform plan, the planned new state is created and then diffed against the prior state to show a diff to the user. During an apply, the planned new state is the desired state for the resource to reach.

During terraform plan, the planned new state is created by:

Filling in unset values in the proposed new state that have a Default with that value.
Running CustomizeDiff
Running DiffSuppressFuncs (DSFs)

DSFs were added to the provider SDK before CD. They're very constrained, and can only see the value of the current field (or subfields if that field is a block). They can return true to indicate that both values for the field are identical. If they are, Terraform discards the proposed new state's value, and replaces it with the prior state's value.

CustomizeDiff is much more flexible. The ResourceDiff diff is available, which is roughly a superset of d. In addition to d's ability to read a whole resource state, diff can modify the planned new state, diff return errors, and clear the diffs on fields like a DSF. In theory, ValidateFunc DiffSuppressFunc, Default could all be implemented in terms of CustomizeDiff.

Provider Developer's Toolbox

As highlighted above, you've got a number of tools to modify a user's config and make it useful. To summarise them again, they're:

ValidateFuncs allow you to reject configs based on a single field being invalid
Default values which fill in unset values during terraform plan diffs and terraform apply
StateFuncs let you canonicalise values when states are created (but TPG doesn't use them much)
DiffSuppressFuncs let you tell Terraform to keep the value from the prior state if it's identical to the one in config
Optional+Computed fields tell Terraform to handle them as if they're output-only when unset, and configurable when set

CustomizeDiff

Finally, CustomizeDiff is incredibly powerful, effectively allowing the provider to perform arbitrary transformations. It's somewhat dangerous to use, as it's very easy to make a transformation that Terraform Core will reject. Based on that, other more focused tools should be preferred. These are some cases where CustomizeDiff can solve otherwise unsolvable problems:

Conditionally setting fields as ForceNew to indicate the resource should be recreated. For example, allowing disks to size up but not down.
Adding custom error messages.
- For example, App Engine applications can't be moved once created. Instead of an erroneous ForceNew, TPG returns an error if a user attempts to move one.
- Complex validations. For example, asserting that one value must not be greater than another or that if one field is set, another must be.
Adding conditional defaults based on the value of another field.
Rewriting the planned new state for a value. For example, reordering a list to match the prior state when possible.

Complications

The split between Core / SDK is still somewhat new, and we're in the middle of growing pains. Terraform 0.12 included a major overhaul of Core according to their holistic view of how providers and the SDK should work, even when that view went against SDK convention or impossible to fulfill. For example, the Resource Instance Change Lifecycle page lists many assertions that values stay the same between states that providers today do not fulfill. Today they're all warnings, but Core's assertions are intended to become errors in the future.

One particularly amusing example is Default values. As implemented by the SDK, they cause the following error: - .port: planned value cty.NumberIntVal(80) does not match config value cty.NullVal(cty.Number).

Another is Optional + Computed. It was never intended to work quite the way it does, especially when used with TypeList and TypeSet. However, there's no viable alternative.

For TPG, the issue Fit our state-setting model to the protocol is the largest divide between the providers / SDK (and the model presented here) and Core.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly