Cloud Native Immutable Infrastructure Principles

P1 - If infrastructure is immutable, it is easily reproduced [1],[2], consistent [3], disposable [4],[5], will have a repeatable [6] provisioning process, and will not have configuration or artifacts that are modifiable in place.

Many qualities characterize an immutable process. Reproducibility, consistency, disposability and repeatability are mandatory attributes in any well designed infrastructure process, including an immutable one.

P2 - If the provisioning of cloud native infrastructure is dynamic, it will use unattended code execution [7][8][9] with declarative configuration [10],[11],[12],[13],[14].

The gold standard for cloud infrastructure is for it be able to be provisioned without any assistance. The tools that provision that infrastructure should accept declarative configuration as inputs.

P3 - If a cloud native infrastructure element [15],[16] (compute, storage, or network) is provisioned [17],[18],[19],[20],[21],[22], that element is ready for use.

P4 - If a cloud native infrastructure element is provisioned, it will be provisioned immutably.

Once immutable infrastructure \(the orchestrator and all of the software and hardware that it depends on\) is provisioned, the infrastructure is not changed after it is made ready for use. New changes to the infrastructure are rolled out as new instances of infrastructure.

P5 - If an infrastructure element’s provisioning is immutable, its base configuration [23],[24],[25] will not be changed.

In the immutable change management model [38], immutable infrastructure elements are built from scratch (or from artifacts and configuration with a known state) as a new instance of the element. The artifacts of an immutable infrastructure are composed of scripts, binaries, containers, images, and server templates while the configuration is the declaration of what that infrastructure should look like after it is instantiated. If the infrastructure element is hardware, such as a physical layer 1 networking device, it should be ‘flashed’ (a complete replacement of its software) for its artifact updates. Conversely changes to virtual infrastructure should be treated the same way, with a new virtual instance being deployed based on a current artifact. The _configuration_ for such a device should be managed with an atomic application of a versioned configuration file, which replaces all of the configuration on the device at once. This configuration should only apply to non base configuration and not the fundamental configuration used to define the provisioning of such a device itself. The choice of **change management model** is **separate** from a **deployment strategy**, such as a phoenix [46] or canary [47] deployment strategy. Any deployment strategy that supports immutable infrastructure (i.e. when infrastructure such as a server needs a configuration or artifact change, a brand new instance of that infrastructure is created) can be used.

P6 - If an infrastructure element is immutable [26],[27],[28],[29],[30],[31], its base configuration is stored as a template [32],[33],[34],[35],[36],[37].

An infrastructure **image** resides at the lowest level and usually includes an operating system, but may also include an **orchestrator** for the higher level applications or any other dependencies that have a low rate of change but are needed for applications. This **image** is managed using a **template** **system** with **versioning** (e.g. a versioned image of an operating system) and minimizes the MTTR[44] (mean time to recovery) and deployment time of the infrastructure.

P7 - If there is configuration outside of an infrastructure element’s template, it is versioned and stored in source control [41],[43].

Any configuration that is applied after an infrastructure’s base image/template has been created (also called **bootstrapping**) will be applied **before** the infrastructure is considered ready for use. After the element is in use, no more configuration is allowed.

P8 - If an infrastructure element is immutable, the dependencies of the applications that run on that infrastructure element will be decoupled [48],[49],[50],[51][52],[53] from that infrastructure element

An infrastructure element such as a generic host server, a network device, or a storage device, has a different rate of change with respect to, and is separate from, the applications that are deployed on or in relation to those elements. For instance, a generic host server [42] can act as a network router, but should still have separation between the dependencies that make it ready for use to the network applications deployed on that server. A microservice has all of its dependencies deployed with it during its deployment phase of the pipeline, which is separate from the infrastructure’s pipeline. These dependencies are decoupled from the infrastructure environment (e.g. a node) so the rate of change of the environment is separate from the microservices it hosts.

P9 - If an infrastructure element is immutable, the applications that run on that infrastructure element will run in an unprivileged mode [54],[55]

Applications should not require any elevated level of security permissions on the underlying infrastructure as they should have no need to make modifications to it.

P10 - If an application and its dependencies are decoupled from the infrastructure element that it runs on, the application will be orchestrated [56].

All application components that run on the infrastructure, except data [57],[58],[59],[60],[61], will be orchestrated and built from versioned artifacts and configuration templates during each deploy.

P11 - If an infrastructure element provides a service [62],[63],[64],[65], that service will be available via service discovery.

P12 - If an infrastructure is immutable, its changes will be managed via a change management pipeline [66[,[67],[68]

P13 - If artifacts [69],[70] are provisioned as new immutable infrastructure elements, they are deployed using templates.

The pipeline for low level infrastructure artifacts should combine them into templates and/or images which reduces the provisioning time for everything dependent on them.

P14 - If an immutable infrastructure has a change management pipeline, the change management pipeline applies tests [71],[72],[73],[74],[75],[76] to the codebase with increasing levels of complexity.

P15 - If the immutable or idempotent infrastructure has dependencies, the dependencies constrain the relationship structure [77] between multiple organizations

P16 - If the immutable or idempotent infrastructure has a dependency, that dependency will be delivered [78] from a provider to a consumer [79] in the form of a library [80][81] or a service instance [82] via a pipeline.

All software components delivered by a provider need to have the ability to be integration **tested** [85],[86] using the consumer’s pipeline [95] in a non-production environment. This should be via the consumer’s own pipeline via a library or a self-serviced instance, or a hosted environment [83] given by the provider. The software should resist **breaking** the **contract** formed **between** the **provider** of that software and the **consumer** of that software. This means paying attention to forward and backward compatibility [87],[88],[89],[90],[91],[92] with respect to interfaces between the provider and consumer. Any breaks in compatibility should force a major version [93],[94] change.

LICENSE

This work is licensed under a Creative Commons Attribution 4.0 International License.

LIST OF CONTRIBUTORS

If you would like credit for helping with these documents (for either this document or any of the other four documents linked above), please add your name to the list of contributors.

W Watson Vulk Coop

Taylor Carpenter Vulk Coop

Denver Williams Vulk Coop

Jeffrey Saelens Charter Communications

Endnotes

It should be possible to effortlessly and reliably rebuild any element of an infrastructure. Effortlessly means that there is no need to make any significant decisions about how to rebuild the thing. Decisions about which software and versions to install on a server, how to choose a hostname, and so on should be captured in the scripts and tooling that provision it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 349-352). O'Reilly Media. Kindle Edition.
When problems are discovered, fixes may not be rolled out to all of the systems that could be affected by them. Differences in versions and configurations across servers mean that software and scripts that work on some machines don’t work on others. This leads to inconsistency across the servers, called configuration drift. [...] Even when servers are initially created and configured consistently, differences can creep in over time: [...]. But variations should be captured and managed in a way that makes it easy to reproduce and to rebuild servers and services. Unmanaged variation between servers leads to snowflake servers and automation fear. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 278-290). O'Reilly Media. Kindle Edition.
Given two infrastructure elements providing a similar service for example, two application servers in a cluster the servers should be nearly identical. Their system software and configuration should be the same, except for those bits of configuration that differentiate them, like their IP addresses. Letting inconsistencies slip into an infrastructure keeps you from being able to trust your automation. If one file server has an 80 GB partition, while another has 100 GB, and a third has 200 GB, then you can’t rely on an action to work the same on all of them. This encourages doing special things for servers that don’t quite match, which leads to unreliable automation. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 380-384). O'Reilly Media. Kindle Edition.
One of the benefits of dynamic infrastructure is that resources can be easily created, destroyed, replaced, resized, and moved. In order to take advantage of this, systems should be designed to assume that the infrastructure will always be changing. Software should continue running even when servers disappear, appear, and when they are resized. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 357-359). O'Reilly Media. Kindle Edition.
A popular expression is to “treat your servers like cattle, not pets.” ,Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 362-363). O'Reilly Media. Kindle Edition.
Building on the reproducibility principle, any action you carry out on your infrastructure should be repeatable. This is an obvious benefit of using scripts and configuration management tools rather than making changes manually, but it can be hard to stick to doing things this way, especially for experienced system administrators. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 393-395). O'Reilly Media. Kindle Edition.
These are some characteristics of scripts and tasks that help to support reliable unattended execution: Idempotent It should be possible to execute the same script or task multiple times without bad effects. Pre-checks A task should validate its starting conditions are correct, and fail with a visible and useful error if not, leaving things in a usable state. Post-checks A task should check that it has succeeded in making the changes. This isn’t just a matter of checking return codes on commands, but proving that the end result is there. For example, checking that a virtual host has been added to a web server could involve making an HTTP request to the web server. Visible failure When a task fails to execute correctly, it should be visible to the team. This may involve an information radiator and/or integration with monitoring services (covered in “What Is An Information Radiator?” and “Alerting: Tell Me When Something Is Wrong” ). Parameterized Tasks should be applicable to multiple operations of a similar type. For example, a single script can be used to configure multiple virtual hosts, even ones with different characteristics. The script will need a way to find the parameters for a particular virtual host, and some conditional logic or templating to configure it for the specific situation. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1071-1086). O'Reilly Media. Kindle Edition.
In order to allow a tool to repeatedly run unattended, it needs to be idempotent. This means that the result of running the tool should be the same no matter how many times it’s run. Idempotent scripts and tools can be set to run continuously (for example, at a fixed time interval), which helps to prevent configuration drift and improve confidence in automation. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1090-1092). O'Reilly Media. Kindle Edition.
If you apply the Terraform definition five times, it will only create one web server. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Location 1291). O'Reilly Media. Kindle Edition.
A good domain-specific language (DSL) for server configuration works by having you define the state you want something to be in, and then doing whatever is needed to bring it into that state. This should happen without side effects from being applied to the same server many times. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1096-1097). O'Reilly Media. Kindle Edition.
So declarative definitions lend themselves to running idempotently. You can safely apply your definitions over and over again, without thinking about it too much. If something is changed to a system outside of the tool, applying the definition will bring it back into line, eliminating sources of configuration drift. When you need to make a change, you simply modify the definition, and then let the tooling work out what to do. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1275-1278). O'Reilly Media. Kindle Edition.
“Configuration definition file” is a generic term for the tool-specific files used to drive infrastructure automation tools. Most tools seem to have their own names: playbooks, cookbooks, manifests, templates, and so on. A configuration definition could be any one of these, or even a configuration file or script. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1156-1158). O'Reilly Media. Kindle Edition.
A configuration registry is a directory of information about the elements of an infrastructure. It provides a means for scripts, tools, applications, and services to find the information they need in order to manage and integrate with infrastructure. This is particularly useful with dynamic infrastructure because this information changes continuously as elements are added and removed.
There are many configuration registry products. Some examples include Zookeeper , Consul , and etcd . Many server configuration tool vendors provide their own configuration registry for example, Chef Server, PuppetDB, and Ansible Tower. These products are designed to integrate easily with the configuration tool itself, and often with other elements such as a dashboard. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1335-1339). O'Reilly Media. Kindle Edition.
A stack is a collection of infrastructure elements that are defined as a unit (the inspiration for choosing the term stack comes mainly from the term’s use by AWS CloudFormation). A stack can be any size. It could be a single server. It could be a pool of servers with their networking and storage. It could be all of the servers and other infrastructure involved in a given application. Or it could be everything in an entire data center. What makes a set of infrastructure elements a stack isn’t the size, but whether it’s defined and changed as a unit. The concept of a stack hasn’t been commonly used with manually managed infrastructures. Elements are added organically, and networking boundaries are naturally used to think about infrastructure groupings. But automation tools force more explicit groupings of infrastructure elements. It’s certainly possible to put everything into one large group. And it’s also possible to structure stacks by following network boundaries. But these aren’t the only ways to organize stacks. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3227-3230). O'Reilly Media. Kindle Edition.
The difficulty of a monolithic definition is that it becomes cumbersome to change. With most definition tools, the file can be organized into separate files. But if making a change involves running the tool against the entire infrastructure stack, things become dicey: It’s easy for a small change to break many things. It’s hard to avoid tight coupling between the parts of the infrastructure. Each instance of the environment is large and expensive. Developing and testing changes requires testing an entire stack at once, which is cumbersome. If many people can make changes to the infrastructure, there is a high risk someone will break something. On the other hand, if changes are limited to a small group to minimize the risk, then there are likely to be long delays waiting for changes to be made. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3357-3363). O'Reilly Media. Kindle Edition.
“Provisioning” is a term that can be used to mean somewhat different things [...] provisioning is used to mean making an infrastructure element such as a server or network device ready for use. Depending on what is being provisioned, this can involve: Assigning resources to the element. Instantiating the element. Installing software onto the element. Configuring the element. Registering the element with infrastructure services. At the end of the provisioning process, the element is fully ready for use.
“Provisioning” is sometimes used to refer to a more narrow part of this process. For instance, Terraform and Vagrant both use it to define the callout to a server configuration tool like Chef or Puppet to configure a server after it has been created. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1218-1220). O'Reilly Media. Kindle Edition.
The infrastructure element could be a server; a part of a server, such as a user account; network configuration, such as a load balancer rule; or many other things. Different tools have different terms for this: for example, playbooks (Ansible), recipes (Chef), or manifests (Puppet). The term “configuration definition file” is used [...] as a generic term for these. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 421-424). O'Reilly Media. Kindle Edition.
An infrastructure definition tool will create servers but isn’t responsible for what’s on the server itself. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1304-1305). O'Reilly Media. Kindle Edition.
The term “environment” is typically used when there are multiple stacks that are actually different instances of the same service or set of services. The most common use of environments is for testing. An application or service may have “development,” “test,” “preproduction,” and “production” environments. The infrastructure for these should generally be the same, with perhaps some variations due to scale. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3227-3230). O'Reilly Media. Kindle Edition.
This dependency on manual effort to keep multiple files up to date and consistent is the weakness of per-environment definition files. There is no way to ensure that changes are made consistently. Also, because the change is made manually to each environment, it isn’t possible to use an automated pipeline to test and promote the change from one environment to the next.
[...] server configuration should result in the following : A new server can be completely provisioned on demand, without waiting more than a few minutes. A new server can be completely provisioned without human involvement for example, in response to events. When a server configuration change is defined, it is applied to servers without human involvement. Each change is applied to all the servers it is relevant to, and is reflected in all new servers provisioned after the change has been made. The processes for provisioning and for applying changes to servers are repeatable, consistent, self-documented, and transparent. It is easy and safe to make changes to the processes used to provision servers and change their configuration. Automated tests are run every time a change is made to a server configuration definition, and to any process involved in provisioning and modifying servers. Changes to configuration, and changes to the processes that carry out tasks on an infrastructure, are versioned and applied to different environments, in order to support controlled testing and staged release strategies. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1440-1449). O'Reilly Media. Kindle Edition.
A new server is created by the dynamic infrastructure platform using an infrastructure definition tool [...] The server is created from a server template, which is a base image of some kind. This might be in a VM image format specific to the infrastructure platform (e.g., an AWS AMI image or VMware VM template), or it could be an OS installation disk image from a vendor (e.g., an ISO image of the Red Hat installation DVD). Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1460-1465). O'Reilly Media. Kindle Edition.
There are many use cases where new servers are created: A member of the infrastructure team needs to build a new server of a standard type for example, adding a new file server to a cluster. They change an infrastructure definition file to specify the new server. A user wants to set up a new instance of a standard application for example, a bug-tracking application. They use a self-service portal, which builds an application server with the bug-tracking software installed. A web server VM crashes because of a hardware issue. The monitoring service detects the failure and triggers the creation of a new VM to replace it. User traffic grows beyond the capacity of the existing application server pool, so the infrastructure platform’s autoscaling functionality creates new application servers and adds them to the pool to meet the demand. A developer commits a change to the software they are working on. The CI software (e.g., Jenkins or GoCD) automatically provisions an application server in a test environment with the new build of the software so it can run an automated test suite against it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1467-1475). O'Reilly Media. Kindle Edition.
The immutable server pattern mentioned in “Server Change Management Models” doesn’t make configuration updates to existing servers. Instead, changes are made by building a new server with the new configuration. With immutable servers, configuration is usually baked into the server template. When the configuration is updated, a new template is packaged. New instances of existing servers are built from the new template and used to replace the older servers. This approach treats server templates like software artifacts. Each build is versioned and tested before being deployed for production use. This creates a high level of confidence in the consistency of the server configuration between testing and production. Advocates of immutable servers view making a change to the configuration of a production server as bad practice, no better than modifying the source code of software directly on a production server. Immutable servers can also simplify configuration management, by reducing the area of the server that needs to be managed by definition files. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2239-2247). O'Reilly Media. Kindle Edition.
Using the term “immutable” to describe this pattern can be misleading. “Immutable” means that a thing can’t be changed, so a truly immutable server would be useless. As soon as a server boots, its runtime state changes processes run, entries are written to logfiles, and application data is added, updated, and removed. It’s more useful to think of the term “immutable” as applying to the server’s configuration, rather than to the server as a whole. This creates a clear line between configuration and data. It forces teams to explicitly define which elements of a server they will manage deterministically as configuration and which elements will be treated as data. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2918-2926). O'Reilly Media. Kindle Edition.
Antipattern: Handcrafted Server ... manually building servers leads almost immediately to configuration drift and snowflake servers. Server creation is not traceable, versionable, or testable, and is certainly not self-documenting. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2376-2391). O'Reilly Media. Kindle Edition.
Antipattern: Hot Cloned Server ... A server that has been cloned from a running server is not reproducible. You can’t create a third server from the same starting point, because both the original and the new server have moved on: they’ve been in use, so various things on them will have changed. Cloned servers also suffer because they have runtime data from the original server. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2412-2423). O'Reilly Media. Kindle Edition.
Antipattern: Snowflake Factory Many organizations adopt automated tools to provision servers, but a person still creates each one by running the tool and choosing options for the particular server. This typically happens when processes that were used for manual server provisioning are simply carried over to the automated tools. The result is that it may be quicker to build servers, but that servers are still inconsistent, which can make it difficult to automate the process of keeping them patched and updated. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2436-2441). O'Reilly Media. Kindle Edition.
The phrase lift and shift describes installing software that was written for static infrastructure (i.e., most software designed with legacy, pre-cloud assumptions) onto dynamic infrastructure. Although the software ends up running on a dynamic infrastructure platform, the infrastructure must be managed statically to avoid breaking anything. These applications are unlikely to be able to take advantage of advanced infrastructure capabilities such as automatic scaling and recovery, or the creation of ad hoc instances. Some characteristics of non-cloud-native software that require “lift and shift” migrations : Stateful sessions. Storing data on the local filesystem. Slow-running startup routines. Static configuration of infrastructure parameters Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5628-5634). O'Reilly Media. Kindle Edition.
In many cases, new servers can be built using off-the-shelf server template images. Infrastructure platforms, such as IaaS clouds, often provide template images for common operating systems. Many also offer libraries of templates built by vendors and third parties, who may provide images that have been preinstalled and configured for particular purposes, such as application servers. But many infrastructure teams find it useful to build their own server templates. They can pre-configure them with their team’s preferred tools, software, and configuration. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1506-1512). O'Reilly Media. Kindle Edition.
Packaging common elements onto a template makes it faster to provision new servers. Some teams take this further by creating server templates for particular roles such as web servers and application servers. One of the key trade-offs is that, as more elements are managed by packaging them into server templates, the templates need to be updated more often. This then requires more sophisticated processes and tooling to build and manage templates. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1510-1515). O'Reilly Media. Kindle Edition.
Keeping templates minimal makes sense when there is a lot of variation in what may be installed on a server. For example, if people create servers by self-service, choosing from a large menu of configuration options, it makes sense to provision dynamically when the server is created. Otherwise, the library of prebuilt templates would need to be huge to include all of the variations that a user might select. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2564-2567). O'Reilly Media. Kindle Edition.
At the other end of the provisioning spectrum is putting nearly everything into the server template. Building new servers then becomes very quick and simple, just a matter of selecting a template and applying instance-specific configuration such as the hostname. This can be useful for infrastructures where new instances need to be spun up very quickly for example, to support automated scaling. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2574-2577). O'Reilly Media. Kindle Edition.
With immutable servers, templates are treated like a software artifact in a continuous delivery pipeline. Any change to a server’s configuration is made by building a new version of the template. Each new template version is automatically tested before it is rolled out to production environments. This ensures that every production server’s configuration has been thoroughly and reliably tested. There is no opportunity to introduce an untested configuration change to production. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2587-2590). O'Reilly Media. Kindle Edition.
It may be useful to create a separate server template for each role, with the relevant software and configuration baked into it. This requires more sophisticated (i.e., automated) processes for building and managing templates but makes creating servers faster and simpler. Different server templates may be needed for reasons other than the functional role of a server. Different operating systems and distributions will each need their own server template, as will significant versions. For example, a team could have separate server templates for CentOS 6.5.x, CentOS 7.0.x, Windows Server 2012 R2, and Windows Server 2016. In other cases, it could make sense to have server templates tuned for different purposes. Database server nodes could be built from one template that has been tuned for high-performance file access, while web servers may be tuned for network I/O throughput. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2796-2802). O'Reilly Media. Kindle Edition.
The four models for making configuration changes servers are ad hoc, configuration synchronization, immutable servers, and containerized servers. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2887-2888). O'Reilly Media. Kindle Edition.
Ad hoc change management makes changes to servers only when a specific change is needed. This was the traditional approach before the automated server configuration tools became mainstream, and is still the most commonly used approach. It is vulnerable to configuration drift, snowflakes, and all of the evils described [...]. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1599-1602). O'Reilly Media. Kindle Edition.
Configuration synchronization repeatedly applies configuration definitions to servers, for example, by running a Puppet or Chef agent on an hourly schedule. This ensures that any changes to parts of the system managed by these definitions are kept in line. Configuration synchronization is the mainstream approach for infrastructure as code, and most server configuration tools are designed with this approach in mind. The main limitation of this approach is that many areas of a server are left unmanaged, leaving them vulnerable to configuration drift. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1605-1609). O'Reilly Media. Kindle Edition.
Immutable infrastructure makes configuration changes by completely replacing servers. Changes are made by building new server templates, and then rebuilding relevant servers using those templates. This increases predictability, as there is little variance between servers as tested, and servers in production. It requires sophistication in server template management. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1611-1614). O'Reilly Media. Kindle Edition.
Containerized services works by packaging applications and services in lightweight containers (as popularized by Docker). This reduces coupling between server configuration and the things that run on the servers. So host servers tend to be very simple, with a lower rate of change. One of the other change management models still needs to be applied to these hosts, but their implementation becomes much simpler and easier to maintain. Most effort and attention goes into packaging, testing, distributing, and orchestrating the services and applications, but this follows something similar to the immutable infrastructure model, which again is simpler than managing the configuration of full-blown virtual machines and servers. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1617-1621). O'Reilly Media. Kindle Edition.
Bootstrap Configuration with Immutable Servers: The purest use of immutable servers is to bake everything onto the server template and change nothing, even when creating server instances from the template. But some teams have found that for certain types of changes, the turnaround time needed to build a new template is too slow. An emerging practice is to put almost everything into the server template, but add one or two elements when bootstrapping a new server. This might be a configuration setting that is only known when the server is created, or it might be a frequently changing element such as an application build for testing. A small development team using continuous integration (CI) or continuous delivery (CD) is likely to deploy dozens of builds of their application a day, so building a new server template for every build may be unacceptably slow. Having a standard server template image that can pull in and start a specified application build when it is started is particularly useful for microservices. This still follows the immutable server pattern, in that any change to the server’s configuration (such as a new version of the microservice) is carried out by building a new server instance. It shortens the turnaround time for changes to a microservice, because it doesn’t require building a new server template. However, this practice arguably weakens the testing benefits from the pure immutable model. Ideally, a given combination of server template version and microservice version will have been tested through each stage of a change management pipeline. But there is some risk that the process of installing a microservice, or making other changes, when creating a server will behave slightly differently when done for different servers. This could cause unexpected behavior. So this practice trades some of the consistency benefits of baking everything into a template and using it unchanged in every instance in order to speed up turnaround times for changes made in this way. In many cases, such as those involving frequent changes, this trade-off works quite well. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3099-3116). O'Reilly Media. Kindle Edition.
The most effective measurement of a change management pipeline is the cycle time. Cycle time is the time between deciding on the need for a change to seeing that change in production use. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4867-4868). O'Reilly Media. Kindle Edition.
Blue-green replacement is the most straightforward pattern to replace an infrastructure element without downtime. This is the blue-green deployment pattern for software 4 applied to infrastructure. It requires running two instances of the affected infrastructure, keeping one of them live at any point in time. Changes and upgrades are made to the offline instance, which can be thoroughly tested before switching usage over to it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5681-5685). O'Reilly Media. Kindle Edition.
Phoenix replacement is the natural progression from blue-green using dynamic infrastructure. Rather than keeping an idle instance around between changes, a new instance can be created each time a change is needed. As with blue-green, the change is tested on the new instance before putting it into use. The previous instance can be kept up for a short time, until the new instance has been proven in use. But then the previous instance is destroyed. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5694-5697). O'Reilly Media. Kindle Edition.
The canary pattern involves deploying the new version of an element alongside the old one, and then routing some portion of usage to the new elements. For example, with version A of an application running on 20 servers, version B may be deployed to two servers. A subset of traffic, perhaps flagged by IP address or by randomly setting a cookie, is sent to the servers for version B. The behavior, performance, and resource usage of the new element can be monitored to validate that it’s ready for wider use. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5724-5728). O'Reilly Media. Kindle Edition.
The benefits of containerization include: Decoupling the runtime requirements of specific applications from the host server that the container runs on. Repeatably create consistent runtime environments by having a container image that can be distributed and run on any host server that supports the runtime. Defining containers as code (e.g.,in a Dockerfile) that can be managed in a VCS, used to trigger automated testing, and generally having all of the characteristics for infrastructure as code. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1633-1637). O'Reilly Media. Kindle Edition.
The benefits of decoupling runtime requirements from the host system are particularly powerful for infrastructure management. It creates a clean separation of concerns between infrastructure and applications. The host system only needs to have the container runtime software installed, and then it can run nearly any container image. Applications, services, and jobs are packaged into containers along with all of their dependencies [...]. These dependencies can include operating system packages, language runtimes, libraries, and system files. Different containers may have different, even conflicting dependencies, but still run on the same host without issues. Changes to the dependencies can be made without any changes to the host system. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1652-1658). O'Reilly Media. Kindle Edition.
Sharing the OS kernel means a container has less overhead than a hardware virtual machine. A container image can be much smaller than a VM image, because it doesn’t need to include the entire OS. It can start up in seconds, as it doesn’t need to boot a kernel from scratch. And it consumes fewer system resources, because it doesn’t need to run its own kernel. So a given host can run more container processes than full VMs. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1701-1703). O'Reilly Media. Kindle Edition.
The best way to think of a container is as a method to package a service, application, or job. It’s an RPM on steroids, taking the application and adding in its dependencies, as well as providing a standard way for its host system to manage its runtime environment . Rather than a single container running multiple processes, aim for multiple containers, each running one process. These processes then become independent, loosely coupled entities. This makes containers a nice match for microservice application architectures. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1708-1711). O'Reilly Media. Kindle Edition.
Containerization has the potential to create a clean separation between layers of infrastructure and the services and applications that run on it. Host servers that run containers can be kept very simple, without needing to be tailored to the requirements of specific applications, and without imposing constraints on the applications beyond those imposed by containerization and supporting services like logging and monitoring. So the infrastructure that runs containers consists of generic container hosts. These can be stripped down to a bare minimum, including only the minimum toolsets to run containers, and potentially a few agents for monitoring and other administrative tasks. This simplifies management of these hosts, as they change less often and have fewer things that can break or need updating. It also reduces the surface area for security exploits. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1723-1729). O'Reilly Media. Kindle Edition.
Containerization offers a different model for managing server processes. Processes are packaged so that they can be run on servers that haven’t been specifically built for the purpose. A pool of generic container hosts can be available to run a variety of different containerized processes or jobs. Assigning containerized processes to hosts is flexible and quick. The number of container hosts can be adjusted automatically based on the aggregated demand across many different types of services. However, this approach requires a scheduler to start and manage container instances. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2051-2055). O'Reilly Media. Kindle Edition.
“Don’t Make Changes Directly on the Production Environment: Most downtime in production environments is caused by uncontrolled changes. Production environments should be completely locked down, so that only your deployment pipeline can make changes to it. That includes everything from the configuration of the environment to the applications deployed on it and their data.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 273). Pearson Education. Kindle Edition.
“The CNF should run without privileges. Privileged actions should be managed by the scheduler and environment”, x-factor-cnfs, Fred Kautz, https://github.com/fkautz/x-factor-cnfs/blob/master/content/process-containers.md To achieve process isolation, X-factor CNFs are designed to run in process containers without privileges. Privileged actions may be requested by the container and performed by privileged delegate. Running unprivileged promotes loose coupling between environments, reduces the overall attack surface, and gives the scheduler the ability to clean up after the pod in the case of the pod failing. The X-factor CNF methodology recognizes the need for hardware which requires additional kernel modules. When possible, kernel modules must follow standard Linux kernel device driver standards [...] and do not affect the kernel's runtime environment beyond enabling the device. These devices must also not be bound directly from the CNF. Instead, they are listed as an interface mechanism and injected into the container runtime by the orchestrator. The existence of a hardware device should not affect other CNFs. Some kernel modifications may be acceptable, e.g. DPDK or drivers. This should be immutable infrastructure with a clean interface for pods. In short, pods should not be allowed to modify their infrastructure.
Container orchestration tools have emerged following the rise of containerization systems like Docker. Most of these run agents on a pool of container hosts and are able to automatically select hosts to run new container instances, replace failed instances, and scale numbers of instances up and down. Some tools also handle service discovery, network routing, storage, scheduled jobs, and other capabilities. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2063-2066). O'Reilly Media. Kindle Edition.
Data: Files generated and updated by the system, applications, and so on. It may change frequently. The infrastructure may have some responsibility for this data (e.g., distributing it, backing it up, replicating it, etc.). But the infrastructure will normally treat the contents of the files as a black box, not caring about what’s in the files. Database data files and logs files are examples of data in this sense. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2306-2309). O'Reilly Media. Kindle Edition.
The key difference between configuration and data is whether automation tools will automatically manage what’s inside the file. So even though some infrastructure tools do care about what’s in system logfiles, they’re normally treated as data files.
Data creates a particular challenge for zero-downtime change strategies. All of the patterns described involve running multiple versions of a component simultaneously, with the option to roll back and keep the first version of the element if there is a problem with the new one. However, if the components use read/write data storage, this can be a problem. The problem comes when the new version of the component involves a change to data formats so that it’s not possible to have both versions share the same data storage without issues. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5766-5770). O'Reilly Media. Kindle Edition.
An effective way to approach data for zero-downtime deployments is to decouple data format changes from software releases. This requires the software to be written so that it can work with two different data formats, the original version and the new version. It is first deployed and validated with the data in the original format. The data is then migrated as a background task while the software is running. The software is able to cope with whichever data format a given record is in. Once the data migration is complete, compatibility with the old data format should be removed from the next release of the software, to keep the code clean. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5776-5780). O'Reilly Media. Kindle Edition.
Any mechanism that efficiently replicates data across distributed storage will replicate corrupted data just as happily as good data. A data availability strategy needs to address how to preserve and restore previous versions of data. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5823-5826). O'Reilly Media. Kindle Edition.
[...] core infrastructure resources: compute, networking, and storage. These provide the basic building blocks for infrastructure as code. However, most infrastructures will need a variety of other supporting services and tools. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1821-1822). O'Reilly Media. Kindle Edition.
[...] principles of infrastructure as code for services can be summarized as: The service can be easily rebuilt or reproduced. The elements of the service are disposable. The infrastructure elements managed by the service are disposable. The infrastructure elements managed by the service are always changing. Instances of the service are configured consistently. Processes for managing and using the service are repeatable. Routine requests are fulfilled quickly, with little effort, preferably through self-service or automatically. Complex changes can be made easily and safely. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1832-1837). O'Reilly Media. Kindle Edition.
Some of the specific practices include: Use externalized definition files. Self-document systems and processes. Version all the things. Continuously test systems and processes. Make small changes rather than batches of them. Keep services available continuously. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1838-1840). O'Reilly Media. Kindle Edition.
Prefer Products with Cloud-Compatible Licensing Licensing can make dynamic infrastructure difficult with some products. Some examples of licensing approaches that work poorly include: A manual process to register each new instance, agent, node, etc., for licensing. Clearly, this defeats automated provisioning. If a product’s license does require registering infrastructure elements, there needs to be an automatable process for adding and removing them. Inflexible licensing periods. Some products require customers to buy a fixed set of licenses for a long period. For example, a monitoring tool may have licensing based on the maximum number of nodes that can be monitored. The licenses may need to be purchased on a monthly cycle. This forces the customer to pay for the maximum number of nodes they might use during a given month, even when they only run that number of nodes for a fraction of the time. This cloud-unfriendly pricing model discourages customers from taking advantage of the ability to scale capacity up and down with demand. Vendors pricing for cloud charge by the hour at most. Heavyweight purchasing process to increase capacity. This is closely related to the licensing period. When an organization is hit with an unexpected surge in business, they shouldn’t need to spend days or weeks to purchase the extra capacity they need to meet the demand. It’s common for vendors to have limits in place to protect customers against accidentally over-provisioning, but it should be possible to raise these limits quickly. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1880-1892). O'Reilly Media. Kindle Edition.
Continuous delivery for software is implemented using a deployment pipeline. A deployment pipeline is an automated manifestation of a release process. It builds the application code, and deploys and tests it on a series of environments before allowing it to be deployed to production. The same concept is applied to infrastructure changes. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3823-3826). O'Reilly Media. Kindle Edition.
The point of CD and the software deployment pipeline is to allow changes to be delivered in a continuous flow, rather than in large batches. Changes can be validated more thoroughly, not only because they are applied with an automated process, but also because changes are tested when they are small, and because they are tested immediately after being committed. The result, when done well, is that changes can be made more frequently, more rapidly, and more reliably. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4525-4529). O'Reilly Media. Kindle Edition.
Teams who embrace the pipeline as the way to manage changes to their infrastructure find a number of benefits: Their infrastructure management tooling and codebase is always production ready. There is never a situation where extra work is needed (e.g., merging, regression testing, and “hardening”) to take work live. Delivering changes is nearly painless. Once a change has passed the technical validation stages of the pipeline, it shouldn’t need technical attention to carry through to production unless there is a problem. There is no need to make technical decisions about how to apply a change to production, as those decisions have been made, implemented, and tested in earlier stages. It’s easier to make changes through the pipeline than any other way. Hacking a change manually other than to bring up a system that is down is more work, and scarier, than just pushing it through the pipeline. Compliance and governance are easy. The scripts, tools, and configuration for making changes are transparent to reviewers. Logs can be audited to prove what changes were made, when, and by whom. With an automated change management pipeline, a team can prove what process was followed for each and every change. This tends to be stronger than taking someone’s word that documented manual processes are always followed. Change management processes can be more lightweight. People who might otherwise need to discuss and inspect each change can build their requirements into the automated tooling and tests. They can periodically review the pipeline implementation and logs, and make improvements as needed. Their time and attention goes to the process and tooling, rather than inspecting each change one by one. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4551-4563). O'Reilly Media. Kindle Edition.
The important thing is how the artifact is treated, conceptually. A configuration artifact is an atomic, versioned collection of materials that provision and/or configure a system component. An artifact is: Atomic A given set of materials is assembled, tested, and applied together as a unit. Portable It can be progressed through the pipeline, and different versions can be applied to different environments or instances. It can be reliably and repeatably applied to any environment, and so any given environment has an unambiguous version of the component. Complete A given artifact should have everything needed to provision or configure the relevant component. It should not assume that previous versions of the component artifacts have been applied before. Consistent Applying the artifact to any two component instances should have the same results. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4669-4678). O'Reilly Media. Kindle Edition.
With immutable servers [...], the artifact is the server template image. The build stage checks a server template definition file, such as a Packer template, or script out from VCS, builds the server template image, and publishes it by making it available in the infrastructure platform. With AWS, for example, this results in an AMI image. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4697-4701). O'Reilly Media. Kindle Edition.
A pipeline typically applies tests with increasing levels of complexity. The earlier stages focus on faster and simpler tests, such as unit tests, and testing a single service. Later stages cover broader sections of a system, and often replicate more of the complexities of production, such as integration with other services. It’s important that the environments, tooling, and processes involved in applying changes and deploying software are consistent across all of the stages of the pipeline. This ensures that problems that might appear in production are discovered early. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3827-3831). O'Reilly Media. Kindle Edition.
A top-heavy test suite is difficult to maintain, slow to run, and doesn’t pinpoint errors as well as a more balanced suite. High-level tests tend to be brittle. One change in the system can break a large number of tests, which can be more work to fix than the original change. This leads to the test suite falling behind development, which means it can’t be run continuously. Higher-level tests are also slower to run than the more focused lower-level tests, which makes it impractical to run the full suite frequently. And because higher-level tests cover a broad scope of code and components, when a test fails it may take a while to track down and fix the cause. This usually comes about when a team puts a UI-based test automation tool at the core of their test automation strategy. This in turn often happens when testing is treated as a separate function from building. Testers who aren’t involved in building the system don’t have the visibility or involvement with the different layers of the stack. This prevents them from developing lower-level tests and incorporating them into the build and change process. For someone who only interacts with the system as a black box, the UI is the easiest way to interact with it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4025-4034). O'Reilly Media. Kindle Edition.
So it’s good sense to sanity check each new server when it’s created. Automated server smoke testing scripts can check the basic things you expect for all of your servers, things specific to the server’s role, and general compliance. For example: Is the server running and accessible? Is the monitoring agent running? Has the server appeared in DNS, monitoring, and other network services? Are all of the necessary services (web, app, database, etc.) running? Are required user accounts in place? Are there any ports open that shouldn’t be? Are any user accounts enabled that shouldn’t be? Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2511-2515). O'Reilly Media. Kindle Edition.
People managing projects to develop and deploy software have a bucket of requirements they call non-functional requirements, or NFRs ; these are also sometimes referred to as cross-functional requirements (CFRs). Performance, availability, and security tend to be swept into this bucket. NFRs related to infrastructure can be labeled operational qualities for convenience. These are things that can’t easily be described using functional terms: take an action or see a result. Operational requirements are only apparent to users and stakeholders when they go wrong. If the system is slow, flaky, or compromised by attackers, people notice. Automated testing is essential to ensuring operational requirements. Every time a change is made to a system or its infrastructure, it’s important to prove that the change won’t cause operational problems. The team should have targets and thresholds defined which it can write tests against. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4242-4249). O'Reilly Media. Kindle Edition.
Metrics are best used by the team to help itself, and should be continually reviewed to decide whether they are still providing value. Some common metrics used by infrastructure teams include: Cycle time The time taken from a need being identified to fulfilling it. This is a measure of the efficiency and speed of change management. [...] Mean time to recover (MTTR) The time taken from an availability problem (which includes critically degraded performance or functionality) being identified to a resolution, even where it’s a workaround. This is a measure of the efficiency and speed of problem resolution. Mean time between failures (MTBF) The time between critical availability issues. This is a measure of the stability of the system, and the quality of the change management process. Although it’s a valuable metric, over-optimizing for MTBF is a common cause of poor performance on other metrics. Availability The percentage of time that the system is available, usually excluding time the system is offline for planned maintenance. This is another measurement of system stability. It is often used as an SLA in service contracts. True availability The percentage of time that the system is available, not excluding planned maintenance.
Layered templates fit well, conceptually at least, with a change management pipeline (see Chapter 12 ), because changes to the base image can then ripple out to automatically build the role-specific templates. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2805-2807). O'Reilly Media. Kindle Edition.
Integration Models The design and implementation of pipelines for testing how systems and infrastructure elements integrate depends on the relationships between them, and the relationships between the teams responsible for them. There are several typical situations: Single team One team owns all of the elements of the system and is fully responsible for managing changes to them. In this case, a single pipeline, with fan-in as needed, is often sufficient. Group of teams A group of teams works together on a single system with multiple services and/or infrastructure elements. Different teams own different parts of the system, which all integrate together. In this case, a single fan-in pipeline may work up to a point, but as the size of the group and its system grows, decoupling may become necessary. Separate teams with high coordination Each team (which may itself be a group of teams) owns a system, which integrates with systems owned by other teams. A given system may integrate with multiple systems. Each team will have its own pipeline and manage its releases independently. But they may have a close enough relationship that one team is willing to customize its systems and releases to support another team’s requirements. This is often seen with different groups within a large company and with close vendor relationships. Separate teams with low coordination As with the previous situation, except one of the teams is a vendor with many other customers. Their release process is designed to meet the requirements of many teams, with little or no customizations to the requirements of individual customer teams. “X as a Service” vendors, providing logging, infrastructure, web analytics, and so on, tend to use this model. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4892-4907). O'Reilly Media. Kindle Edition.
Managing the configuration of hardware network devices to support a dynamic infrastructure is a particular challenge. For example, you may need to add and remove servers from a load balancer if they are created and destroyed automatically in response to demand. Network devices tend to be difficult to automatically configure, although many are able to load configuration files over the network for example, from a TFTP server. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 859-861). O'Reilly Media. Kindle Edition.
Given two integrated components, one provides a service, and the other consumes it. The provider component needs to test that it is providing the service correctly for its consumers. And the consumer needs to test that it is consuming the provider service correctly. For example, one team may manage a monitoring service, which is used by multiple application teams. The monitoring team is the provider, and the application teams are the consumers. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4930-4933). O'Reilly Media. Kindle Edition.
Pattern: Library Dependency One way that one component can provide a capability to another is to work like a library. The consumer pulls a version of the provider and incorporates it into its own artifact, usually in the build stage of a pipeline. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4937-4939). O'Reilly Media. Kindle Edition.
The important characteristic is that the library component is versioned, and the consumer can choose which version to use. If a newer version of the library is released, the consumer may opt to immediately pull it in, and then run tests on it. However, it has the option to “pin” to an older version of the library. This gives the consumer team the flexibility to release changes even if they haven’t yet incorporated new, incompatible changes to their provider library. But it creates the risk that important changes, such as security patches, aren’t integrated in a timely way. This is a major source of security vulnerability in IT systems. For the provider, this pattern gives freedom to release new changes without having to wait for all consumer teams to update their components. But it can result in having many different versions of the component in production, which increases the time and hassle of support. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4941-4947). O'Reilly Media. Kindle Edition.
Pattern: Self-Provisioned Service Instance The library pattern can be adapted for full-blown services. A well-known example of this is AWS’s Relational Database Service, or RDS , offered by AWS. A team can provision complete working database instances for itself, which it can use in a pipeline for a component that uses a database. As a provider, Amazon releases new database versions, while still making older versions available as “previous generation DB instances” . This has the same effect as the library pattern, in that the provider can release new versions without waiting for consumer teams to upgrade their own components. Being a service rather than a library, the provider is able to transparently release minor updates to the service. Amazon can apply security patches to its RDS offering, and new instances created by consumer teams will automatically use the updated version. The key is for the provider to keep close track of the interface contract, to make sure the service behaves as expected after updates have been applied. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4963-4973). O'Reilly Media. Kindle Edition.
Providing Test Instances of a Service to Consumers The provider of a hosted service needs to provide support for consumers to develop and test their integration with the service. This is useful to consumers for a number of purposes: To learn how to correctly integrate to the service. To test that integration still works after changes to the consumer system. To test that the consumer system still works after changes to the provider. To test and develop against new provider functionality before it is released. To reproduce production issues for troubleshooting. To run demonstrations of the consumer system without affecting production data. An effective way for a provider to support these is to provide self-provisioned service instances. If consumers can create and configure instances on-demand, then they can easily handle their own testing and demonstration needs. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4992-4999). O'Reilly Media. Kindle Edition.
[...] a common technique is to deploy changes into production without necessarily releasing them to end users. New versions of components are put into the production environment and integrated with other components in ways that won’t impact normal operations. This allows them to be tested in true production conditions, before the switch is flipped to put them into active use. They can even be put into use in a drip-feed fashion, measuring their performance and impact before rolling them out to the full user base. Techniques for doing this include those used to hide unfinished changes in the codebase, such as feature toggles, as well as zero-downtime replacement patterns. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5062-5067). O'Reilly Media. Kindle Edition.
Contract tests are automated tests that check whether a provider interface behaves as consumers expect. This is a much smaller set of tests than full functional tests, purely focused on the API that the service has committed to provide to its consumers. By running contract tests in their own pipeline, the provider team will be alerted if they accidentally make a change that breaks the contract. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5092-5095). O'Reilly Media. Kindle Edition.
Practice: Run Consumer-Driven Contract (CDC) Tests A variation on these previous practices is for a provider to run tests provided by consumer teams. These tests are written to formalize the expectations the consumer has for the provider’s interface. The provider runs the tests in a stage of its own pipeline, and fails any build that fails to pass these tests. A failure of a CDC test tells the provider they need to investigate the nature of the failed expectation. In some cases, the provider will realize they have made an error, so they can correct it. In others, they may see that the consumer’s expectations are incorrect, so they can let them know to change their own code. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5112-5118). O'Reilly Media. Kindle Edition.
Practice: Ensure Backward Compatibility of Interfaces Providers should work hard to avoid making changes that break existing interfaces. Once a release is published and in use, new releases should not change the interfaces used by consumers. It’s normally easy to add new functionality without breaking interfaces. A command-line tool can add a new argument without changing the behavior of existing arguments. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5045-5048). O'Reilly Media. Kindle Edition.
If there is a need to make drastic changes to an existing interface, it’s often better to create a new interface, leaving the old one in place. The old interface can be deprecated, with warnings to users that they should move to using the new version. This gives consumers time to develop and test their use of the new interface before they switch, which in turn gives the provider the flexibility to release and iterate on the new functionality without having to wait for all of their consumers to switch. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5052-5055). O'Reilly Media. Kindle Edition.
Backward-compatible changes are those that allow any future implementations to consume older versions of the design. For example, if a client that supports one or more of the new extensions receives a representation from a server that does not support these new extensions, the client implementation will continue to function successfully. Usually, this means the new extensions do not create a required dependency that causes implementations to break if that extension is missing. Amundsen, Mike. Building Hypermedia APIs with HTML5 and Node (p. 147). Kindle Edition.
Often backward-compatible changes mean that implementations need to be prepared to handle representations that are missing expected elements. This may mean that some functionality or feature is not available to an implementation and that implementation must work around that missing element or fall back to a mode that supports an older design of the media type. Amundsen, Mike. Building Hypermedia APIs with HTML5 and Node (p. 147). Kindle Edition.
An extension can be considered “forward compatible” if the changes do not break existing implementations. This means existing client applications will continue to work successfully when they receive a representation that contains the new features. Servers will continue to function properly when they receive a representation that contains the new features. Usually, this means the new features/functionality can be safely ignored by implementations that do not recognize them. Amundsen, Mike. Building Hypermedia APIs with HTML5 and Node (p. 146). Kindle Edition.
The flipside of maintaining backward compatibility for providers is for consumers to ensure version tolerance. A consumer team should ensure that a single build of their system can easily work with different versions of a provider that they integrate with, if it’s a likely situation. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5070-5071). O'Reilly Media. Kindle Edition.
In order to support both forward and backward compatibility, there are some general guidelines that should be followed when making changes to media type designs. Existing design elements cannot be removed [...] The meaning or processing of existing elements cannot be changed [...] New design elements must be treated as optional Amundsen, Mike. Building Hypermedia APIs with HTML5 and Node (pp. 147-148). Kindle Edition.
Versioning a media type means making changes to the media type that will likely cause existing implementations of the original media type to “break” or misbehave in some significant way. [...] Versioning should be seen as a last resort. Amundsen, Mike. Building Hypermedia APIs with HTML5 and Node (p. 148). Kindle Edition.
Practice: Decouple Pipelines When separate teams build different components of a system, such as microservices, joining pipeline branches for these components together with the fan-in pattern can create a bottleneck. The teams need to spend more effort on coordinating the way they handle releases, testing, and fixing. This may be fine for a small number of teams who work closely together, but the overhead grows exponentially as the number of teams grows. Decoupling pipelines involves structuring the pipelines so that a change to each component can be released independently. The components may still have dependencies between each other, so they may need integration testing. But rather than requiring all of the components to be released to production together in a “big bang” deployment, a change to one component could go ahead to production before changes to the second component are released. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4883-4889). O'Reilly Media. Kindle Edition.

PreviousCloud Native Microservice Principles NextCloud Native Declarative OSI Principles

Last updated 4 years ago

Was this helpful?