Cloud Native Principles

The concepts of immutable infrastructure, declarative apis, and microservices each deserve individual treatment. The following statements are high level generalizations that can help lead towards an understanding of these three concepts before diving deeply into each of them individually.

P1 - If a project is cloud native [1],[2], it uses immutable infrastructure [3], declarative apis, and microservices.

P2 - If infrastructure is immutable, it is easily reproduced [4],[5], consistent [6], disposable [7],[8], will have a repeatable [9] deployment process, and will not have configuration or artifacts that are modifiable in place.

The process that implements immutable infrastructure needs to be reproducible (without any needing to ‘think’ each time provisioning occurs) and repeatable (automated). This process also needs to be consistent (infrastructure elements should be identical) and disposable (designed to be easily created, destroyed, replaced, resized, etc). A project’s immutable infrastructure, which can include everything from the physical hardware at the lowests levels up to the platforms that the application is installed on, has its configuration protected from change (not modifiable) after it is deployed into an environment. This configuration is stored in such a way that it can be used to recreate the infrastructure as needed. Furthermore, the process should be idempotent allowing a state to be applied multiple times and the same desired state still being achieved.

P3 - If a project has an efficient and repeatable deployment process, its process is versioned [10], automated [11], and has low overhead/coarse grained packaging [12],[13],[14],[15]

The core of cloud native development rests in coarse-grained packaging such as that found in container technologies such as Docker. Any light weight / low overhead technology that satisfies the requirements for low overhead and coarse grained packaging (packaging all of the dependencies together with the application) can satisfy the deployment requirements for cloud native applications. Normal CI/CD best practices apply for the deployment practice itself.

P4 - If a project’s deployment is automated, configuration [16], environment [17], and artifacts [18] are completely managed by a pipeline.\

P5 - If a projects deployment is managed completely by a pipeline, the project’s environment is protected [19]

Production environments should be only directly modified by the automated pipeline process and therefore not directly modifiable by anyone. This protects against snowflake configuration.

P6 - If a project’s environment is protected, it provides observability [21] of the project’s internal components.

In order to maintain, debug, and have insight into a protected environment, its infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging.

P7 - If a project's uses declarative APIs [22], its configuration is declarative [23],[24]

P8 - If a project’s configuration is declarative [25], it designates what to do, not how to do it.

Declarative APIs for an immutable infrastructure are anything that configures the infrastructure element. This declaration can come in the form of a YAML file or a script, as long as the configuration designates the desired outcome, not how to achieve said outcome.

P9 - If a project exists as a microservice [26],[28],[29], it is not monolithic, it is resilient, it follows 12-factor principles [30], and is discoverable [31].

When a service is monolithic, multiple business capabilities are tightly coupled, therefore requiring coordination with multiple groups within the organization that are developing the service. A microservice separates concerns based on business capability (features or groups of features). This allows for a more rapid deployment of services with a faster feedback loop.

P10 - If a microservice is resilient, it is self-healing and distributed [32].

A microservice is also resilient, in that it is accompanied by some kind of strategy for healing itself. This includes strategies for restarting after failures and distributive scaling in response to load. A microservice scales out to handle load (more processes are spawned on more machines) instead of scaling up (increasing the capacity of the individual machines)

P11 - If a microservice is self-healing [33], it is compatible with declarative configuration and orchestration [34].

Once a microservice is coupled with a declarative strategy (a strategy that outlines what the system should look like), it can then be handed over to an orchestrator in order to implement that strategy.

LICENSE

This work is licensed under a Creative Commons Attribution 4.0 International License.

LIST OF CONTRIBUTORS

If you would like credit for helping with these documents (for either this document or any of the other four documents linked above), please add your name to the list of contributors.

W Watson Vulk Coop Taylor Carpenter Vulk Coop

Denver Williams Vulk Coop

Jeffrey Saelens Charter Communications

Bill Mulligan Loodse

Endnotes

  1. Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

  2. Immutable infrastructure makes configuration changes by completely replacing servers. Changes are made by building new server templates, and then rebuilding relevant servers using those templates. This increases predictability, as there is little variance between servers as tested, and servers in production. It requires sophistication in server template management.” Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1611-1614). O'Reilly Media. Kindle Edition.

  3. It should be possible to effortlessly and reliably rebuild any element of an infrastructure. Effortlessly means that there is no need to make any significant decisions about how to rebuild the thing. Decisions about which software and versions to install on a server, how to choose a hostname, and so on should be captured in the scripts and tooling that provision it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 349-352). O'Reilly Media. Kindle Edition.

  4. When problems are discovered, fixes may not be rolled out to all of the systems that could be affected by them. Differences in versions and configurations across servers mean that software and scripts that work on some machines don’t work on others. This leads to inconsistency across the servers, called configuration drift. [...] Even when servers are initially created and configured consistently, differences can creep in over time: [...]. But variations should be captured and managed in a way that makes it easy to reproduce and to rebuild servers and services. Unmanaged variation between servers leads to snowflake servers and automation fear. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 278-290). O'Reilly Media. Kindle Edition. .

  5. Given two infrastructure elements providing a similar service for example, two application servers in a cluster the servers should be nearly identical. Their system software and configuration should be the same, except for those bits of configuration that differentiate them, like their IP addresses. Letting inconsistencies slip into an infrastructure keeps you from being able to trust your automation. If one file server has an 80 GB partition, while another has 100 GB, and a third has 200 GB, then you can’t rely on an action to work the same on all of them. This encourages doing special things for servers that don’t quite match, which leads to unreliable automation. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 380-384). O'Reilly Media. Kindle Edition.

  6. One of the benefits of dynamic infrastructure is that resources can be easily created, destroyed, replaced, resized, and moved. In order to take advantage of this, systems should be designed to assume that the infrastructure will always be changing. Software should continue running even when servers disappear, appear, and when they are resized. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 357-359). O'Reilly Media. Kindle Edition.

  7. A popular expression is to “treat your servers like cattle, not pets.” ,Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 362-363). O'Reilly Media. Kindle Edition.

  8. Building on the reproducibility principle, any action you carry out on your infrastructure should be repeatable. This is an obvious benefit of using scripts and configuration management tools rather than making changes manually, but it can be hard to stick to doing things this way, especially for experienced system administrators. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 393-395). O'Reilly Media. Kindle Edition.

  9. Everything you need to build, deploy, test, and release your application should be kept in some form of versioned storage. This includes requirement documents, test scripts, automated test cases, network configuration scripts, deployment scripts, database creation, upgrade, downgrade, and initialization scripts, application stack configuration scripts, libraries, toolchains, technical documentation, and so on. All of this stuff should be version-controlled, and the relevant version should be identifiable for any given build. That is, these change sets should have a single identifier, such as a build number or a version control changeset number, that references every piece.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 26). Pearson Education. Kindle Edition.

  10. “In general, your build process should be automated up to the point where it needs specific human direction or decision making. This is also true of your deployment process and, in fact, your entire software release process. Acceptance tests can be automated. Database upgrades and downgrades can be automated too. Even network and firewall configuration can be automated. You should automate as much as you possibly can.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 25). Pearson Education. Kindle Edition.

  11. Containerized services works by packaging applications and services in lightweight containers (as popularized by Docker). This reduces coupling between server configuration and the things that run on the servers. So host servers tend to be very simple, with a lower rate of change. One of the other change management models still needs to be applied to these hosts, but their implementation becomes much simpler and easier to maintain. Most effort and attention goes into packaging, testing, distributing, and orchestrating the services and applications, but this follows something similar to the immutable infrastructure model, which again is simpler than managing the configuration of full-blown virtual machines and servers. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1617-1621). O'Reilly Media. Kindle Edition.

  12. “The value of a containerization system is that it provides a standard format for container images and tools for building, distributing, and running those images. Before Docker, teams could isolate running processes using the same operating system features, but Docker and similar tools make the process much simpler.” Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1631-1633). O'Reilly Media. Kindle Edition.

  13. Configuration management refers to the process by which all artifacts relevant to your project, and the relationships between them, are stored, retrieved, uniquely identified, and modified.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 31). Pearson Education. Kindle Edition.

  14. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 25-26. “Containers leverage modern Linux kernel primitives such as control groups (cgroups) and namespaces to provide similar resource allocation and isolation features as those provided by virtual machines with much less overhead and much greater portability.”

  15. “... we consider it bad practice to inject configuration information at build or packaging time. This follows from the principle that you should be able to deploy the same binaries to every environment so you can ensure that the thing that you release is the same thing that you tested. The corollary of this is that anything that changes between deployments needs to be captured as configuration, and not baked in when the application is compiled or packaged.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (pp. 41-42). Pearson Education. Kindle Edition.

  16. “An environment is all of the resources that your application needs to work and their configuration.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 277). Pearson Education. Kindle Edition.

  17. “The key characteristic of binaries is that you should be able to copy them onto a new machine and, given an appropriately configured environment and the correct configuration for the application in that environment, start your application—without relying on any part of your development toolchain being installed on that machine.”Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 134). Pearson Education. Kindle Edition.

  18. Don’t Make Changes Directly on the Production Environment: Most downtime in production environments is caused by uncontrolled changes. Production environments should be completely locked down, so that only your deployment pipeline can make changes to it. That includes everything from the configuration of the environment to the applications deployed on it and their data.” Humble, Jez. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (p. 273). Pearson Education. Kindle Edition.

  19. --

  20. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 27–28. “Visibility: Our architectures must provide us with the tools necessary to see failure when it happens. We need the ability to measure everything, establish a profile for “what’s normal,” detect deviations from the norm (including absolute values and rate of change), and identify the components contributing to those deviations. Feature-rich metrics, monitoring, alerting, and data visualization frameworks and tools are at the heart of all cloud-native application architecture”:

  21. Declarative configuration is different from imperative configuration , where you simply take a series of actions (e.g., apt-get install foo ) to modify the world. Years of production experience have taught us that maintaining a written record of the system’s desired state leads to a more manageable, reliable system. Declarative configuration enables numerous advantages, including code review for configurations as well as documenting the current state of the world for distributed teams. Additionally, it is the basis for all of the self-healing behaviors in Kubernetes that keep applications running without user action.” Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 892-896). Kindle Edition.

  22. “To understand these two approaches, consider the task of producing three replicas of a piece of software. With an imperative approach, the configuration would say: “run A, run B, and run C.” The corresponding declarative configuration would be “replicas equals three.” Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 181-183). Kindle Edition.

  23. “The combination of declarative state stored in a version control system and Kubernetes’s ability to make reality match this declarative state makes rollback of a change trivially easy. It is simply restating the previous declarative state of the system. With imperative systems this is usually impossible, since while the imperative instructions describe how to get you from point A to point B, they rarely include the reverse instructions that can get you back. “Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 186-190). Kindle Edition.

  24. “Because it describes the state of the world, declarative configuration does not have to be executed to be understood. Its impact is concretely declared. Since the effects of declarative configuration can be understood before they are executed, declarative configuration is far less error-prone. Further, the traditional tools of software development, such as source control, code review, and unit testing, can be used in declarative configuration in ways that are impossible for imperative instructions. “ Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 183-186). Kindle Edition.

  25. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 16.. “As we decouple the business domain into independently deployable bounded contexts of capabilities, we also decouple the associated change cycles. As long as the changes are restricted to a single bounded context, and the service continues to fulfill its existing contracts, those changes can be made and deployed independent of any coordination with the rest of the business. The result is enablement of more frequent and rapid deployments, allowing for a continuous flow of value.”

  26. --

  27. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 27–28. “Adoption of new technology can be accelerated. Large monolithic application architectures are typically associated with long-term commitments to technical stacks. These commitments exist to mitigate the risk of adopting new technology by simply not doing it. Technology adoption mistakes are more expensive in a monolithic architecture, as those mistakes can pollute the entire enterprise architecture. If we adopt new technology within the scope of a single monolith, we isolate and minimize the risk in much the same way that we isolate and minimize the risk of runtime failure.”

  28. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 27–28. “Microservices offer independent, efficient scaling of services. Monolithic architectures can scale, but require us to scale all components, not simply those that are under heavy load. Microservices can be scaled if and only if their associated load requires it.”

  29. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 10–11 “Codebase Each deployable app is tracked as one codebase tracked in revision control. It may have many deployed instances across multiple environments. Dependencies An app explicitly declares and isolates dependencies via appropriate tooling (e.g., Maven, Bundler, NPM) rather than depending on implicitly realized dependencies in its deployment environment. Config Configuration, or anything that is likely to differ between deployment environments (e.g., development, staging, production) is injected via operating system-level environment variables. Backing services Backing services, such as databases or message brokers, are treated as attached resources and consumed identically across all environments. Build, release, run The stages of building a deployable app artifact, combining that artifact with configuration, and starting one or more processes from that artifact/configuration combination, are strictly separated. Processes The app executes as one or more stateless processes (e.g., master/workers) that share nothing. Any necessary state is externalized to backing services (cache, object store, etc.). Port binding The app is self-contained and exports any/all services via port binding (including HTTP). Concurrency Concurrency is usually accomplished by scaling out app processes horizontally (though processes may also multiplex work via internally managed threads if desired). Disposability Robustness is maximized via processes that start up quickly and shut down gracefully. These aspects allow for rapid elastic scaling, deployment of changes, and recovery from crashes. Dev/prod parity Continuous delivery and deployment are enabled by keeping development, staging, and production environments as similar as possible. Logs Rather than managing logfiles, treat logs as event streams, allowing the execution environment to collect, aggregate, index, and analyze the events via centralized services. Admin processes Administrative or management tasks, such as database migrations, are executed as one-off processes in environments identical to the app’s long-running processes.”

  30. Service discovery tools help solve the problem of finding which processes are listening at which addresses for which services. A good service discovery system will enable users to resolve this information quickly and reliably. A good system is also low-latency; clients are updated soon after the information associated with a service change. Finally, a good service discovery system can store a richer definition of what that service is. For example, perhaps there are multiple ports associated with the service.” Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 1423-1426). Kindle Edition.

  31. “In many cases decoupling state from applications and building your microservices to be as stateless as possible results in maximally reliable, manageable systems. However, nearly every system that has any complexity has state in the system somewhere, from the records in a database to the index shards that serve results for a web search engine. At some point you have to have data stored somewhere. Integrating this data with containers and container orchestration solutions is often the most complicated aspect of building a distributed system. This complexity largely stems from the fact that the move to containerized architectures is also a move toward decoupled, immutable, and declarative application development. These patterns are relatively easy to apply to stateless web applications, but even “cloud-native” storage solutions like Cassandra or MongoDB involve some sort of manual or imperative steps to set up a reliable, replicated solution. “ Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 2908-2915). Kindle Edition.

  32. “A self-healing infrastructure is an inherently smart deployment that is automated to respond to known and common failures. Depending on the failure, the architecture is inherently resilient and takes appropriate measures to remediate the error.” Laszewski, Tom. Cloud Native Architectures: Design high-availability and cost-effective applications for the cloud (pp. 131-132). Packt Publishing. Kindle Edition.

  33. Container orchestration tools have emerged following the rise of containerization systems like Docker. Most of these run agents on a pool of container hosts and are able to automatically select hosts to run new container instances, replace failed instances, and scale numbers of instances up and down. Some tools also handle service discovery, network routing, storage, scheduled jobs, and other capabilities. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2063-2066). O'Reilly Media. Kindle Edition.