terraform/configs/configload/getter.go

154 lines
4.8 KiB
Go
Raw Normal View History

package configload
import (
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
"fmt"
"log"
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
"os"
"path/filepath"
cleanhttp "github.com/hashicorp/go-cleanhttp"
getter "github.com/hashicorp/go-getter"
)
// We configure our own go-getter detector and getter sets here, because
// the set of sources we support is part of Terraform's documentation and
// so we don't want any new sources introduced in go-getter to sneak in here
// and work even though they aren't documented. This also insulates us from
// any meddling that might be done by other go-getter callers linked into our
// executable.
var goGetterDetectors = []getter.Detector{
new(getter.GitHubDetector),
new(getter.GitDetector),
new(getter.BitBucketDetector),
2019-04-18 19:47:13 +02:00
new(getter.GCSDetector),
new(getter.S3Detector),
new(getter.FileDetector),
}
var goGetterNoDetectors = []getter.Detector{}
var goGetterDecompressors = map[string]getter.Decompressor{
"bz2": new(getter.Bzip2Decompressor),
"gz": new(getter.GzipDecompressor),
"xz": new(getter.XzDecompressor),
"zip": new(getter.ZipDecompressor),
"tar.bz2": new(getter.TarBzip2Decompressor),
"tar.tbz2": new(getter.TarBzip2Decompressor),
"tar.gz": new(getter.TarGzipDecompressor),
"tgz": new(getter.TarGzipDecompressor),
"tar.xz": new(getter.TarXzDecompressor),
"txz": new(getter.TarXzDecompressor),
}
var goGetterGetters = map[string]getter.Getter{
"file": new(getter.FileGetter),
2019-04-18 19:47:13 +02:00
"gcs": new(getter.GCSGetter),
"git": new(getter.GitGetter),
"hg": new(getter.HgGetter),
"s3": new(getter.S3Getter),
"http": getterHTTPGetter,
"https": getterHTTPGetter,
}
var getterHTTPClient = cleanhttp.DefaultClient()
var getterHTTPGetter = &getter.HttpGetter{
Client: getterHTTPClient,
Netrc: true,
}
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
// A reusingGetter is a helper for the module installer that remembers
// the final resolved addresses of all of the sources it has already been
// asked to install, and will copy from a prior installation directory if
// it has the same resolved source address.
//
// The keys in a reusingGetter are resolved and trimmed source addresses
// (with a scheme always present, and without any "subdir" component),
// and the values are the paths where each source was previously installed.
type reusingGetter map[string]string
// getWithGoGetter retrieves the package referenced in the given address
// into the installation path and then returns the full path to any subdir
// indicated in the address.
//
// The errors returned by this function are those surfaced by the underlying
// go-getter library, which have very inconsistent quality as
// end-user-actionable error messages. At this time we do not have any
// reasonable way to improve these error messages at this layer because
// the underlying errors are not separatelyr recognizable.
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
func (g reusingGetter) getWithGoGetter(instPath, addr string) (string, error) {
packageAddr, subDir := splitAddrSubdir(addr)
log.Printf("[DEBUG] will download %q to %s", packageAddr, instPath)
realAddr, err := getter.Detect(packageAddr, instPath, goGetterDetectors)
if err != nil {
return "", err
}
var realSubDir string
realAddr, realSubDir = splitAddrSubdir(realAddr)
if realSubDir != "" {
subDir = filepath.Join(realSubDir, subDir)
}
if realAddr != packageAddr {
log.Printf("[TRACE] go-getter detectors rewrote %q to %q", packageAddr, realAddr)
}
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
if prevDir, exists := g[realAddr]; exists {
log.Printf("[TRACE] copying previous install %s to %s", prevDir, instPath)
err := os.Mkdir(instPath, os.ModePerm)
if err != nil {
return "", fmt.Errorf("failed to create directory %s: %s", instPath, err)
}
err = copyDir(instPath, prevDir)
if err != nil {
return "", fmt.Errorf("failed to copy from %s to %s: %s", prevDir, instPath, err)
}
} else {
log.Printf("[TRACE] fetching %q to %q", realAddr, instPath)
client := getter.Client{
Src: realAddr,
Dst: instPath,
Pwd: instPath,
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
Mode: getter.ClientModeDir,
configload: Don't download the same module source multiple times It is common for the same module source package to be referenced multiple times in the same configuration, either because there are literally multiple instances of the same module source or because a single package (or repository) contains multiple modules in sub-directories and many of them are referenced. To optimize this, here we introduce a simple caching behavior where the module installer will detect if it's asked to install multiple times from the same source and produce the second and subsequent directories by copying the first, rather than by downloading again over the network. This optimization is applied once all of the go-getter detection has completed and sub-directory portions have been trimmed, so it is also able to normalize differently-specified source addresses that all ultimately detect to the same resolved address. When installing, we always extract the entire specified package (or repository) and then reference the specified sub-directory, so we can safely re-use existing directories when the base package is the same, even if the sub-directory is different. However, as a result we do not yet address the fact that the same package will be stored multiple times _on disk_, which may still be problematic when referencing large repositories multiple times in disk-storage-constrained environments. We could address this in a subsequent change by investigating the use of symlinks where possible. Since the Registry installer is implemented just as an extra resolution step in front of go-getter, this optimization applies to registry modules too. This does not apply to local relative references, which will continue to just resolve into the already-prepared directory of their parent module. The cache of previously installed paths lives only for the duration of one call to InstallModules, so we will never re-use directories that were created by previous runs of "terraform init" and there is no risk that older versions will pollute the cache when attempting an upgrade from a source address that doesn't explicitly specify a version. No additional tests are added here because the existing module installer tests (when TF_ACC=1) already cover the case of installing multiple modules from the same source.
2018-06-22 04:02:27 +02:00
Detectors: goGetterNoDetectors, // we already did detection above
Decompressors: goGetterDecompressors,
Getters: goGetterGetters,
}
err = client.Get()
if err != nil {
return "", err
}
// Remember where we installed this so we might reuse this directory
// on subsequent calls to avoid re-downloading.
g[realAddr] = instPath
}
// Our subDir string can contain wildcards until this point, so that
// e.g. a subDir of * can expand to one top-level directory in a .tar.gz
// archive. Now that we've expanded the archive successfully we must
// resolve that into a concrete path.
var finalDir string
if subDir != "" {
finalDir, err = getter.SubdirGlob(instPath, subDir)
log.Printf("[TRACE] expanded %q to %q", subDir, finalDir)
if err != nil {
return "", err
}
} else {
finalDir = instPath
}
// If we got this far then we have apparently succeeded in downloading
// the requested object!
return filepath.Clean(finalDir), nil
}