From 6b4ed241d36f0a51f12b57a71ed3cc7d0ed7bf19 Mon Sep 17 00:00:00 2001 From: Martin Atkins Date: Mon, 31 Aug 2020 18:43:19 -0700 Subject: [PATCH] docs: Provider protocol wire format for values Although we have in this same directory the protocol buffers schemas for the static parts of the provider wire protocol, many of the protocol messages include DynamicValue messages that are presented in a nested dynamic serialization format. That format was previously not documented, and was thus defined only by the implementation. Terraform happens to use a third-party library to implement parts of this encoding, which means that the rules were even harder to track down from reading the code. Regardless of how Terraform happens to implement its serialization and deserialization of DynamicValue, it's the wire format that is contractual and so this document is an implementation-agnostic description of the mapping rules for serializing any Terraform Language value by reference to a provider-defined schema. All future changes to the rules described in here must be backward compatible until protocol major version 6, which is not currently planned. --- docs/plugin-protocol/object-wire-format.md | 210 +++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 docs/plugin-protocol/object-wire-format.md diff --git a/docs/plugin-protocol/object-wire-format.md b/docs/plugin-protocol/object-wire-format.md new file mode 100644 index 000000000..5e1809c81 --- /dev/null +++ b/docs/plugin-protocol/object-wire-format.md @@ -0,0 +1,210 @@ +# Wire Format for Terraform Objects and Associated Values + +The provider wire protocol (as of major version 5) includes a protobuf message +type `DynamicValue` which Terraform uses to represent values from the Terraform +Language type system, which result from evaluating the content of `resource`, +`data`, and `provider` blocks, based on a schema defined by the corresponding +provider. + +Because the structure of these values is determined at runtime, `DynamicValue` +uses one of two possible dynamic serialization formats for the values +themselves: MessagePack or JSON. Terraform most commonly uses MessagePack, +because it offers a compact binary representation of a value. However, a server +implementation of the provider protocol should fall back to JSON if the +MessagePack field is not populated, in order to support both formats. + +The remainder of this document describes how Terraform translates from its own +type system into the type system of the two supported serialization formats. +A server implementation of the Terraform provider protocol can use this +information to decode `DynamicValue` values from incoming messages into +whatever representation is convenient for the provider implementation. + +A server implementation must also be able to _produce_ `DynamicValue` messages +as part of various response messages. When doing so, servers should always +use MessagePack encoding, because Terraform does not consistently support +JSON responses across all request types and all Terraform versions. + +Both the MessagePack and JSON serializations are driven by information the +provider previously returned in a `Schema` message. Terraform will encode each +value depending on the type constraint given for it in the corresponding schema, +using the closest possible MessagePack or JSON type to the Terraform language +type. Therefore a server implementation can decode a serialized value using a +standard MessagePack or JSON library and assume it will conform to the +serialization rules described below. + +## MessagePack Serialization Rules + +The MessagePack types referenced in this section are those defined in +[The MessagePack type system specification](https://github.com/msgpack/msgpack/blob/master/spec.md#type-system). + +Note that MessagePack defines several possible serialization formats for each +type, and Terraform may choose any of the formats of a specified type. +The exact serialization chosen for a given value may vary between Terraform +versions, but the types given here are contractual. + +Conversely, server implementations that are _producing_ MessagePack-encoded +values are free to use any of the valid serialization formats for a particular +type. However, we recommend choosing the most compact format that can represent +the value without a loss of range. + +### `Schema.Block` Mapping Rules for MessagePack + +To represent the content of a block as MessagePack, Terraform constructs a +MessagePack map that contains one key-value pair per attribute and one +key-value pair per distinct nested block described in the `Schema.Block` message. + +The key-value pairs representing attributes have values based on +[the `Schema.Attribute` mapping rules](#Schema.Attribute-mapping-rules-for-messagepack). +The key-value pairs representing nested block types have values based on +[the `Schema.NestedBlock` mapping rules](#Schema.NestedBlock-mapping-rules-for-messagepack). + +### `Schema.Attribute` Mapping Rules for MessagePack + +The MessagePack serialization of an attribute value depends on the value of the +`type` field of the corresponding `Schema.Attribute` message. The `type` field is +a compact JSON serialization of a +[Terraform type constraint](https://www.terraform.io/docs/configuration/types.html), +which consists either of a single +string value (for primitive types) or a two-element array giving a type kind +and a type argument. + +The following table describes the type-specific mapping rules. Along with those +type-specific rules there are two special rules that override the mappings +in the table below, regardless of type: + +* A null value is represented as a MessagePack nil value. +* An unknown value (that is, a placeholder for a value that will be decided + only during the apply operation) is represented as a + [MessagePack extension](https://github.com/msgpack/msgpack/blob/master/spec.md#extension-types) + value whose type identifier is zero and whose value is unspecified and + meaningless. + +| `type` Pattern | MessagePack Representation | +|---|---| +| `"string"` | A MessagePack string containing the Unicode characters from the string value serialized as normalized UTF-8. | +| `"number"` | Either MessagePack integer, MessagePack float, or MessagePack string representing the number. If a number is represented as a string then the string contains a decimal representation of the number which may have a larger mantissa than can be represented by a 64-bit float. | +| `"bool"` | A MessagePack boolean value corresponding to the value. | +| `["list",T]` | A MessagePack array with the same number of elements as the list value, each of which is represented by the result of applying these same mapping rules to the nested type `T`. | +| `["set",T]` | Identical in representation to `["list",T]`, but the order of elements is undefined because Terraform sets are unordered. | +| `["map",T]` | A MessagePack map with one key-value pair per element of the map value, where the element key is serialized as the map key (always a MessagePack string) and the element value is represented by a value constructed by applying these same mapping rules to the nested type `T`. | +| `["object",ATTRS]` | A MessagePack map with one key-value pair per attribute defined in the `ATTRS` object. The attribute name is serialized as the map key (always a MessagePack string) and the attribute value is represented by a value constructed by applying these same mapping rules to each attribute's own type. | +| `["tuple",TYPES]` | A MessagePack array with one element per element described by the `TYPES` array. The element values are constructed by applying these same mapping rules to the corresponding element of `TYPES`. | +| `"dynamic"` | A MessagePack array with exactly two elements. The first element is a MessagePack binary value containing a JSON-serialized type constraint in the same format described in this table. The second element is the result of applying these same mapping rules to the value with the type given in the first element. This special type constraint represents values whose types will be decided only at runtime. | + +### `Schema.NestedBlock` Mapping Rules for MessagePack + +The MessagePack serialization of a collection of blocks of a particular type +depends on the `nesting` field of the corresponding `Schema.NestedBlock` message. +The `nesting` field is a value from the `Schema.NestingBlock.NestingMode` +enumeration. + +All `nesting` values cause the individual blocks of a type to be represented +by applying +[the `Schema.Block` mapping rules](#Schema.Block-mapping-rules-for-messagepack) +to the block's contents based on the `block` field, producing what we'll call +a _block value_ in the table below. + +The `nesting` value then in turn defines how Terraform will collect all of the +individual block values together to produce a single property value representing +the nested block type. For all `nesting` values other than `MAP`, blocks may +not have any labels. For the `nesting` value `MAP`, blocks must have exactly +one label, which is a string we'll call a _block label_ in the table below. + +| `nesting` Value | MessagePack Representation | +|---|---| +| `SINGLE` | The block value of the single block of this type, or nil if there is no block of that type. | +| `LIST` | A MessagePack array of all of the block values, preserving the order of definition of the blocks in the configuration. | +| `SET` | A MessagePack array of all of the block values in no particular order. | +| `MAP` | A MessagePack map with one key-value pair per block value, where the key is the block label and the value is the block value. | +| `GROUP` | The same as with `SINGLE`, except that if there is no block of that type Terraform will synthesize a block value by pretending that all of the declared attributes are null and that there are zero blocks of each declared block type. | + +For the `LIST` and `SET` nesting modes, Terraform guarantees that the +MessagePack array will have a number of elements between the `min_items` and +`max_items` values given in the schema, _unless_ any of the block values contain +nested unknown values. When unknown values are present, Terraform considers +the value to be potentially incomplete and so Terraform defers validation of +the number of blocks. For example, if the configuration includes a `dynamic` +block whose `for_each` argument is unknown then the final number of blocks is +not predictable until the apply phase. + +## JSON Serialization Rules + +The JSON serialization is a secondary representation for `DynamicValue`, with +MessagePack preferred due to its ability to represent unknown values via an +extension. + +The JSON encoding described in this section is also used for the `json` field +of the `RawValue` message that forms part of an `UpgradeResourceState` request. +However, in that case the data is serialized per the schema of the provider +version that created it, which won't necessarily match the schema of the +_current_ version of that provider. + +### `Schema.Block` Mapping Rules for JSON + +To represent the content of a block as JSON, Terraform constructs a +JSON object that contains one property per attribute and one property per +distinct nested block described in the `Schema.Block` message. + +The properties representing attributes have property values based on +[the `Schema.Attribute` mapping rules](#Schema.Attribute-mapping-rules-for-json). +The properties representing nested block types have property values based on +[the `Schema.NestedBlock` mapping rules](#Schema.NestedBlock-mapping-rules-for-json). + +### `Schema.Attribute` Mapping Rules for JSON + +The JSON serialization of an attribute value depends on the value of the `type` +field of the corresponding `Schema.Attribute` message. The `type` field is +a compact JSON serialization of a +[Terraform type constraint](https://www.terraform.io/docs/configuration/types.html), +which consists either of a single +string value (for primitive types) or a two-element array giving a type kind +and a type argument. + +The following table describes the type-specific mapping rules. Along with those +type-specific rules there is one special rule that overrides the rules in the +table regardless of type: + +* A null value is always represented as JSON `null`. + +| `type` Pattern | JSON Representation | +|---|---| +| `"string"` | A JSON string containing the Unicode characters from the string value. | +| `"number"` | A JSON number representing the number value. Terraform numbers are arbitrary-precision floating point, so the value may have a larger mantissa than can be represented by a 64-bit float. | +| `"bool"` | Either JSON `true` or JSON `false`, depending on the boolean value. | +| `["list",T]` | A JSON array with the same number of elements as the list value, each of which is represented by the result of applying these same mapping rules to the nested type `T`. | +| `["set",T]` | Identical in representation to `["list",T]`, but the order of elements is undefined because Terraform sets are unordered. | +| `["map",T]` | A JSON object with one property per element of the map value, where the element key is serialized as the property name string and the element value is represented by a property value constructed by applying these same mapping rules to the nested type `T`. | +| `["object",ATTRS]` | A JSON object with one property per attribute defined in the `ATTRS` object. The attribute name is serialized as the property name string and the attribute value is represented by a property value constructed by applying these same mapping rules to each attribute's own type. | +| `["tuple",TYPES]` | A JSON array with one element per element described by the `TYPES` array. The element values are constructed by applying these same mapping rules to the corresponding element of `TYPES`. | +| `"dynamic"` | A JSON object with two properties: `"type"` specifying one of the `type` patterns described in this table in-band, giving the exact runtime type of the value, and `"value"` specifying the result of applying these same mapping rules to the table for the specified runtime type. This special type constraint represents values whose types will be decided only at runtime. | + +### `Schema.NestedBlock` Mapping Rules for JSON + +The JSON serialization of a collection of blocks of a particular type depends +on the `nesting` field of the corresponding `Schema.NestedBlock` message. +The `nesting` field is a value from the `Schema.NestingBlock.NestingMode` +enumeration. + +All `nesting` values cause the individual blocks of a type to be represented +by applying +[the `Schema.Block` mapping rules](#Schema.Block-mapping-rules-for-json) +to the block's contents based on the `block` field, producing what we'll call +a _block value_ in the table below. + +The `nesting` value then in turn defines how Terraform will collect all of the +individual block values together to produce a single property value representing +the nested block type. For all `nesting` values other than `MAP`, blocks may +not have any labels. For the `nesting` value `MAP`, blocks must have exactly +one label, which is a string we'll call a _block label_ in the table below. + +| `nesting` Value | JSON Representation | +|---|---| +| `SINGLE` | The block value of the single block of this type, or `null` if there is no block of that type. | +| `LIST` | A JSON array of all of the block values, preserving the order of definition of the blocks in the configuration. | +| `SET` | A JSON array of all of the block values in no particular order. | +| `MAP` | A JSON object with one property per block value, where the property name is the block label and the value is the block value. | +| `GROUP` | The same as with `SINGLE`, except that if there is no block of that type Terraform will synthesize a block value by pretending that all of the declared attributes are null and that there are zero blocks of each declared block type. | + +For the `LIST` and `SET` nesting modes, Terraform guarantees that the JSON +array will have a number of elements between the `min_items` and `max_items` +values given in the schema.