terraform/helper/schema/field_writer_map.go

352 lines
8.6 KiB
Go
Raw Normal View History

package schema
import (
"fmt"
"reflect"
"strconv"
"strings"
"sync"
"github.com/mitchellh/mapstructure"
)
// MapFieldWriter writes data into a single map[string]string structure.
type MapFieldWriter struct {
Schema map[string]*Schema
lock sync.Mutex
result map[string]string
}
// Map returns the underlying map that is being written to.
func (w *MapFieldWriter) Map() map[string]string {
w.lock.Lock()
defer w.lock.Unlock()
if w.result == nil {
w.result = make(map[string]string)
}
return w.result
}
func (w *MapFieldWriter) unsafeWriteField(addr string, value string) {
w.lock.Lock()
defer w.lock.Unlock()
if w.result == nil {
w.result = make(map[string]string)
}
w.result[addr] = value
}
helper/schema: Clear existing map/set/list contents before overwriting There are situations where one may need to write to a set, list, or map more than once per single TF operation (apply/refresh/etc). In these cases, further writes using Set (example: d.Set("some_set", newSet)) currently create unstable results in the set writer (the name of the writer layer that holds the data set by these calls) because old keys are not being cleared out first. This bug is most visible when using sets. Example: First write to set writes elements that have been hashed at 10 and 20, and the second write writes elements that have been hashed at 30 and 40. While the set length has been correctly set at 2, since a set is basically a map (as is the entire map writer) and map results are non-deterministic, reads to this set will now deliver unstable results in a random but predictable fashion as the map results are delivered to the caller non-deterministic - sometimes you may correctly get 30 and 40, but sometimes you may get 10 and 20, or even 10 and 30, etc. This problem propagates to state which is even more damaging as unstable results are set to state where they become part of the permanent data set going forward. The problem also applies to lists and maps. This is probably more of an issue with maps as a map can contain any key/value combination and hence there is no predictable pattern where keys would be overwritten with default or zero values. This is contrary to complex lists, which has this problem as well, but since lists are deterministic and the length of a list properly gets updated during the overwrite, the problem is masked by the fact that a read will only read to the boundary of the list, skipping any bad data that may still be available due past the list boundary. This update clears the child contents of any set, list, or map before beginning a new write to address this issue. Tests are included for all three data types.
2017-11-05 20:14:51 +01:00
// clearTree clears a field and any sub-fields of the given address out of the
// map. This should be used to reset some kind of complex structures (namely
// sets) before writing to make sure that any conflicting data is removed (for
// example, if the set was previously written to the writer's layer).
func (w *MapFieldWriter) clearTree(addr []string) {
prefix := strings.Join(addr, ".") + "."
for k := range w.result {
if strings.HasPrefix(k, prefix) {
delete(w.result, k)
}
}
}
func (w *MapFieldWriter) WriteField(addr []string, value interface{}) error {
w.lock.Lock()
defer w.lock.Unlock()
if w.result == nil {
w.result = make(map[string]string)
}
schemaList := addrToSchema(addr, w.Schema)
if len(schemaList) == 0 {
return fmt.Errorf("Invalid address to set: %#v", addr)
}
// If we're setting anything other than a list root or set root,
// then disallow it.
for _, schema := range schemaList[:len(schemaList)-1] {
if schema.Type == TypeList {
return fmt.Errorf(
"%s: can only set full list",
strings.Join(addr, "."))
}
if schema.Type == TypeMap {
return fmt.Errorf(
"%s: can only set full map",
strings.Join(addr, "."))
}
if schema.Type == TypeSet {
return fmt.Errorf(
"%s: can only set full set",
strings.Join(addr, "."))
}
}
return w.set(addr, value)
}
func (w *MapFieldWriter) set(addr []string, value interface{}) error {
schemaList := addrToSchema(addr, w.Schema)
if len(schemaList) == 0 {
return fmt.Errorf("Invalid address to set: %#v", addr)
}
schema := schemaList[len(schemaList)-1]
switch schema.Type {
case TypeBool, TypeInt, TypeFloat, TypeString:
return w.setPrimitive(addr, value, schema)
case TypeList:
return w.setList(addr, value, schema)
case TypeMap:
return w.setMap(addr, value, schema)
case TypeSet:
return w.setSet(addr, value, schema)
case typeObject:
return w.setObject(addr, value, schema)
default:
panic(fmt.Sprintf("Unknown type: %#v", schema.Type))
}
}
func (w *MapFieldWriter) setList(
addr []string,
v interface{},
schema *Schema) error {
k := strings.Join(addr, ".")
setElement := func(idx string, value interface{}) error {
addrCopy := make([]string, len(addr), len(addr)+1)
copy(addrCopy, addr)
return w.set(append(addrCopy, idx), value)
}
var vs []interface{}
if err := mapstructure.Decode(v, &vs); err != nil {
return fmt.Errorf("%s: %s", k, err)
}
helper/schema: Clear existing map/set/list contents before overwriting There are situations where one may need to write to a set, list, or map more than once per single TF operation (apply/refresh/etc). In these cases, further writes using Set (example: d.Set("some_set", newSet)) currently create unstable results in the set writer (the name of the writer layer that holds the data set by these calls) because old keys are not being cleared out first. This bug is most visible when using sets. Example: First write to set writes elements that have been hashed at 10 and 20, and the second write writes elements that have been hashed at 30 and 40. While the set length has been correctly set at 2, since a set is basically a map (as is the entire map writer) and map results are non-deterministic, reads to this set will now deliver unstable results in a random but predictable fashion as the map results are delivered to the caller non-deterministic - sometimes you may correctly get 30 and 40, but sometimes you may get 10 and 20, or even 10 and 30, etc. This problem propagates to state which is even more damaging as unstable results are set to state where they become part of the permanent data set going forward. The problem also applies to lists and maps. This is probably more of an issue with maps as a map can contain any key/value combination and hence there is no predictable pattern where keys would be overwritten with default or zero values. This is contrary to complex lists, which has this problem as well, but since lists are deterministic and the length of a list properly gets updated during the overwrite, the problem is masked by the fact that a read will only read to the boundary of the list, skipping any bad data that may still be available due past the list boundary. This update clears the child contents of any set, list, or map before beginning a new write to address this issue. Tests are included for all three data types.
2017-11-05 20:14:51 +01:00
// Wipe the set from the current writer prior to writing if it exists.
// Multiple writes to the same layer is a lot safer for lists than sets due
// to the fact that indexes are always deterministic and the length will
// always be updated with the current length on the last write, but making
// sure we have a clean namespace removes any chance for edge cases to pop up
// and ensures that the last write to the set is the correct value.
w.clearTree(addr)
// Set the entire list.
var err error
for i, elem := range vs {
is := strconv.FormatInt(int64(i), 10)
err = setElement(is, elem)
if err != nil {
break
}
}
if err != nil {
for i, _ := range vs {
is := strconv.FormatInt(int64(i), 10)
setElement(is, nil)
}
return err
}
w.result[k+".#"] = strconv.FormatInt(int64(len(vs)), 10)
return nil
}
func (w *MapFieldWriter) setMap(
addr []string,
value interface{},
schema *Schema) error {
k := strings.Join(addr, ".")
v := reflect.ValueOf(value)
vs := make(map[string]interface{})
if value == nil {
// The empty string here means the map is removed.
w.result[k] = ""
return nil
}
if v.Kind() != reflect.Map {
return fmt.Errorf("%s: must be a map", k)
}
if v.Type().Key().Kind() != reflect.String {
return fmt.Errorf("%s: keys must strings", k)
}
for _, mk := range v.MapKeys() {
mv := v.MapIndex(mk)
vs[mk.String()] = mv.Interface()
}
helper/schema: Clear existing map/set/list contents before overwriting There are situations where one may need to write to a set, list, or map more than once per single TF operation (apply/refresh/etc). In these cases, further writes using Set (example: d.Set("some_set", newSet)) currently create unstable results in the set writer (the name of the writer layer that holds the data set by these calls) because old keys are not being cleared out first. This bug is most visible when using sets. Example: First write to set writes elements that have been hashed at 10 and 20, and the second write writes elements that have been hashed at 30 and 40. While the set length has been correctly set at 2, since a set is basically a map (as is the entire map writer) and map results are non-deterministic, reads to this set will now deliver unstable results in a random but predictable fashion as the map results are delivered to the caller non-deterministic - sometimes you may correctly get 30 and 40, but sometimes you may get 10 and 20, or even 10 and 30, etc. This problem propagates to state which is even more damaging as unstable results are set to state where they become part of the permanent data set going forward. The problem also applies to lists and maps. This is probably more of an issue with maps as a map can contain any key/value combination and hence there is no predictable pattern where keys would be overwritten with default or zero values. This is contrary to complex lists, which has this problem as well, but since lists are deterministic and the length of a list properly gets updated during the overwrite, the problem is masked by the fact that a read will only read to the boundary of the list, skipping any bad data that may still be available due past the list boundary. This update clears the child contents of any set, list, or map before beginning a new write to address this issue. Tests are included for all three data types.
2017-11-05 20:14:51 +01:00
// Wipe this address tree. The contents of the map should always reflect the
// last write made to it.
w.clearTree(addr)
// Remove the pure key since we're setting the full map value
delete(w.result, k)
// Set each subkey
addrCopy := make([]string, len(addr), len(addr)+1)
copy(addrCopy, addr)
for subKey, v := range vs {
if err := w.set(append(addrCopy, subKey), v); err != nil {
return err
}
}
2015-01-15 23:12:24 +01:00
// Set the count
core: Use .% instead of .# for maps in state The flatmapped representation of state prior to this commit encoded maps and lists (and therefore by extension, sets) with a key corresponding to the number of elements, or the unknown variable indicator under a .# key and then individual items. For example, the list ["a", "b", "c"] would have been encoded as: listname.# = 3 listname.0 = "a" listname.1 = "b" listname.2 = "c" And the map {"key1": "value1", "key2", "value2"} would have been encoded as: mapname.# = 2 mapname.key1 = "value1" mapname.key2 = "value2" Sets use the hash code as the key - for example a set with a (fictional) hashcode calculation may look like: setname.# = 2 setname.12312512 = "value1" setname.56345233 = "value2" Prior to the work done to extend the type system, this was sufficient since the internal representation of these was effectively the same. However, following the separation of maps and lists into distinct first-class types, this encoding presents a problem: given a state file, it is impossible to tell the encoding of an empty list and an empty map apart. This presents problems for the type checker during interpolation, as many interpolation functions will operate on only one of these two structures. This commit therefore changes the representation in state of maps to use a "%" as the key for the number of elements. Consequently the map above will now be encoded as: mapname.% = 2 mapname.key1 = "value1" mapname.key2 = "value2" This has the effect of an empty list (or set) now being encoded as: listname.# = 0 And an empty map now being encoded as: mapname.% = 0 Therefore we can eliminate some nasty guessing logic from the resource variable supplier for interpolation, at the cost of having to migrate state up front (to follow in a subsequent commit). In order to reduce the number of potential situations in which resources would be "forced new", we continue to accept "#" as the count key when reading maps via helper/schema. There is no situation under which we can allow "#" as an actual map key in any case, as it would not be distinguishable from a list or set in state.
2016-06-05 10:34:43 +02:00
w.result[k+".%"] = strconv.Itoa(len(vs))
2015-01-15 23:12:24 +01:00
return nil
}
func (w *MapFieldWriter) setObject(
addr []string,
value interface{},
schema *Schema) error {
// Set the entire object. First decode into a proper structure
var v map[string]interface{}
if err := mapstructure.Decode(value, &v); err != nil {
return fmt.Errorf("%s: %s", strings.Join(addr, "."), err)
}
// Make space for additional elements in the address
addrCopy := make([]string, len(addr), len(addr)+1)
copy(addrCopy, addr)
// Set each element in turn
var err error
for k1, v1 := range v {
if err = w.set(append(addrCopy, k1), v1); err != nil {
break
}
}
if err != nil {
for k1, _ := range v {
w.set(append(addrCopy, k1), nil)
}
}
return err
}
func (w *MapFieldWriter) setPrimitive(
addr []string,
v interface{},
schema *Schema) error {
k := strings.Join(addr, ".")
if v == nil {
// The empty string here means the value is removed.
w.result[k] = ""
return nil
}
var set string
switch schema.Type {
case TypeBool:
var b bool
if err := mapstructure.Decode(v, &b); err != nil {
return fmt.Errorf("%s: %s", k, err)
}
set = strconv.FormatBool(b)
case TypeString:
if err := mapstructure.Decode(v, &set); err != nil {
return fmt.Errorf("%s: %s", k, err)
}
case TypeInt:
var n int
if err := mapstructure.Decode(v, &n); err != nil {
return fmt.Errorf("%s: %s", k, err)
}
2015-01-28 22:20:14 +01:00
set = strconv.FormatInt(int64(n), 10)
2015-01-28 18:53:34 +01:00
case TypeFloat:
var n float64
if err := mapstructure.Decode(v, &n); err != nil {
return fmt.Errorf("%s: %s", k, err)
}
2015-01-28 22:20:14 +01:00
set = strconv.FormatFloat(float64(n), 'G', -1, 64)
default:
return fmt.Errorf("Unknown type: %#v", schema.Type)
}
w.result[k] = set
return nil
}
func (w *MapFieldWriter) setSet(
addr []string,
value interface{},
schema *Schema) error {
addrCopy := make([]string, len(addr), len(addr)+1)
copy(addrCopy, addr)
2015-01-16 13:30:11 +01:00
k := strings.Join(addr, ".")
if value == nil {
w.result[k+".#"] = "0"
return nil
}
// If it is a slice, then we have to turn it into a *Set so that
// we get the proper order back based on the hash code.
if v := reflect.ValueOf(value); v.Kind() == reflect.Slice {
// Build a temp *ResourceData to use for the conversion
tempSchema := *schema
tempSchema.Type = TypeList
tempSchemaMap := map[string]*Schema{addr[0]: &tempSchema}
tempW := &MapFieldWriter{Schema: tempSchemaMap}
// Set the entire list, this lets us get sane values out of it
if err := tempW.WriteField(addr, value); err != nil {
return err
}
// Build the set by going over the list items in order and
// hashing them into the set. The reason we go over the list and
// not the `value` directly is because this forces all types
// to become []interface{} (generic) instead of []string, which
// most hash functions are expecting.
s := schema.ZeroValue().(*Set)
tempR := &MapFieldReader{
Map: BasicMapReader(tempW.Map()),
Schema: tempSchemaMap,
}
for i := 0; i < v.Len(); i++ {
is := strconv.FormatInt(int64(i), 10)
result, err := tempR.ReadField(append(addrCopy, is))
if err != nil {
return err
}
if !result.Exists {
panic("set item just set doesn't exist")
}
s.Add(result.Value)
}
value = s
}
helper/schema: Clear existing map/set/list contents before overwriting There are situations where one may need to write to a set, list, or map more than once per single TF operation (apply/refresh/etc). In these cases, further writes using Set (example: d.Set("some_set", newSet)) currently create unstable results in the set writer (the name of the writer layer that holds the data set by these calls) because old keys are not being cleared out first. This bug is most visible when using sets. Example: First write to set writes elements that have been hashed at 10 and 20, and the second write writes elements that have been hashed at 30 and 40. While the set length has been correctly set at 2, since a set is basically a map (as is the entire map writer) and map results are non-deterministic, reads to this set will now deliver unstable results in a random but predictable fashion as the map results are delivered to the caller non-deterministic - sometimes you may correctly get 30 and 40, but sometimes you may get 10 and 20, or even 10 and 30, etc. This problem propagates to state which is even more damaging as unstable results are set to state where they become part of the permanent data set going forward. The problem also applies to lists and maps. This is probably more of an issue with maps as a map can contain any key/value combination and hence there is no predictable pattern where keys would be overwritten with default or zero values. This is contrary to complex lists, which has this problem as well, but since lists are deterministic and the length of a list properly gets updated during the overwrite, the problem is masked by the fact that a read will only read to the boundary of the list, skipping any bad data that may still be available due past the list boundary. This update clears the child contents of any set, list, or map before beginning a new write to address this issue. Tests are included for all three data types.
2017-11-05 20:14:51 +01:00
// Clear any keys that match the set address first. This is necessary because
// it's always possible and sometimes may be necessary to write to a certain
// writer layer more than once with different set data each time, which will
// lead to different keys being inserted, which can lead to determinism
// problems when the old data isn't wiped first.
w.clearTree(addr)
for code, elem := range value.(*Set).m {
Change Set internals and make (extreme) performance improvements Changing the Set internals makes a lot of sense as it saves doing conversions in multiple places and gives a central place to alter the key when a item is computed. This will have no side effects other then that the ordering is now based on strings instead on integers, so the order will be different. This will however have no effect on existing configs as these will use the individual codes/keys and not the ordering to determine if there is a diff or not. Lastly (but I think also most importantly) there is a fix in this PR that makes diffing sets extremely more performand. Before a full diff required reading the complete Set for every single parameter/attribute you wanted to diff, while now it only gets that specific parameter. We have a use case where we have a Set that has 18 parameters and the set consist of about 600 items (don't ask :wink:). So when doing a diff it would take 100% CPU of all cores and stay that way for almost an hour before being able to complete the diff. Debugging this we learned that for retrieving every single parameter it made over 52.000 calls to `func (c *ResourceConfig) get(..)`. In this function a slice is created and used only for the duration of the call, so the time needed to create all needed slices and on the other hand the time the garbage collector needed to clean them up again caused the system to cripple itself. Next to that there are also some expensive reflect calls in this function which also claimed a fair amount of CPU time. After this fix the number of calls needed to get a single parameter dropped from 52.000+ to only 2! :smiley:
2015-11-18 11:24:04 +01:00
if err := w.set(append(addrCopy, code), elem); err != nil {
return err
}
}
w.result[k+".#"] = strconv.Itoa(value.(*Set).Len())
return nil
}