# Cloud Native Microservice Principles

## How cloud native organizations deliver software using pipelines

**Cloud Native Microservices Writ Large**

***P1*** *- If an organization or a set of organizations deliver* ***cloud native software**, the software’s features will be delivered as* ***microservice software (a microservice, set of microservices, or component of a microservice)**,* ***declarative APIs**, and* ***immutable infrastructure**.*

***P2*** *- If an organization develops microservice software, the features of the microservice software will be constrained by the* ***organization's business capabilities*** \[1],\[2],\[3],\[4] *and* ***structure*** \[5].

Conway’s law predicts that a system will represent the organizational structure that created that system. The various groups within an organization have different rates of change and concerns with respect to business capability. The delivery of microservice software harnesses the existing group boundaries within an organization and works with, not against, the different rates of change and business capabilities residing within those boundaries.

***P3*** *- If an organization develops microservice software, the* ***responsibility*** *of the microservice software from* ***inception to delivery*** *will be with that organization.*

***P4*** *- If an organization has all of the responsibilities for the microservice software, that organization has the structure of a* ***product team*** \[6].

Microservice teams are product delivery teams. These teams are responsible for all parts of features delivery spanning from requirements gathering to production deployment. This allows the team to deploy based on business capability and to be sensitive to those capabilities’ rate of change.

***P5*** *- If a product team has responsibility for microservice software, the* ***rate of change*** \[7],\[8],\[9],\[10],\[11],\[12]*,* ***cycle time*** \[13],\[14]*, and* ***pipeline*** \[15],\[16],\[17],\[18] *for that microservice software will be driven by that* *product* *team.*

Similar to how a building’s components have different **rates of change** (foundation, plumbing, exterior, etc), software components and services also have different rates of change. When we split up services based on business capability the **responsibility** for **changes** and **actual rate of change** are **coupled**. At the same time **conflicting** **agendas**, road maps, and concerns are **decoupled**.

When the new freedom given with respect to rate of change microservice teams can then adopt techniques that are sensitive to **cycle time** and **MTTR** (mean time to recovery). This leads to using a software delivery methods and best practices that are compatible with deployment pipelines.

***P6*** *- If the microservice has dependencies, the* ***dependencies*** *constrain the* ***relationship*** *structure between* ***multiple organizations*** \[19],\[20]

Organizations and product teams that deliver software require varying levels of coordination with one another. Teams that have higher levels of coordination with other teams need to coordinate deployment pipelines and integration testing.

***P7*** *- If the microservice software has a dependency, it will be delivered from a* ***provider*** \[21] *to a* ***consumer*** *in the form of a* ***library*** \[22],\[23] *or a* ***service instance*** \[24],\[25],\[26] *via a pipeline.*

The rate of change between providers of microservice software and consumers of that software needs to be managed. When software is delivered as a library, it has a release number that can be referenced in the pipeline of the consumer. When software is delivered as a service instance, it can either be self service (and therefore can be referenced via release number in the consumer’s pipeline) or it can be hosted. If microservice software is hosted, there needs to be a way to reference a test instance of that service for the consumers pipeline. The license registration process for service instances should be automated, flexible, and should avoid impeding the development of a deployment pipeline.

***P8*** *- If a microservice is deployed, the microservice will be* ***deployed*** *with* ***all of its library dependencies*** \[27],\[28],\[29],\[30],\[31],\[32]

The microservice has all of its dependencies deployed with it during the deployment phase of the pipeline. These dependencies are decoupled from the infrastructure environment (e.g. a node) so the rate of change of the environment is separate from the microservices it hosts.

***P9*** *- If a microservice is* ***deployed,*** \[33],\[34],\[35] *the pipeline artifacts and configuration for the microservice will be* ***versioned*** *and associated with the* ***stack*** \[36],\[37] *of infrastructure elements that were* ***provisioned*** *with* *it.*

The provisioning of infrastructure and the deployment of a microservice are related. The deployment of a microservice must know the version of the infrastructure that it was deployed and tested with.

**Cloud Native in the Small: Microservices and Networking**

Cloud native network functions follow the same principles as cloud native microservices with few exceptions.

***P10*** *- If an organization or a set organizations deliver* ***cloud native network functions**, the software’s features will be delivered as* ***microservice software (a microservice, set of microservices, or component of a microservice)**.*

***P11*** *- If a pipeline provisions network infrastructure\[38] (physical or virtual layer 1 and layer 2* \[39] *networking functions), it will be* ***provisioned*** \[40] *using* ***declarative configuration**.*

Network infrastructure (the platform that the cloud native network functions will be deployed into) is provisioned (instances are made available for use to consumers) using declarative configuration. Configuration should designate what the outcome is, while the tools that provision that network infrastructure should create that outcome.

***P12*** *- If a pipeline* ***provisions network infrastructure**, it will be* ***provisioned*** *immutably.*

***P13*** *- If a* ***provider*** *for network infrastructure delivers* ***software*** *or* ***hardware**, it will be delivered to the* ***consumer*** *as a* ***library dependency*** *or* ***service instance*** *(whether self service or hosted).*

***P14*** *- If the provider of networking software delivers* ***cloud native service chains**, the service chains will be* ***composed*** *of* ***immutable microservices*** *with* ***declarative APIs***,\[41],\[42],\[43],\[44]*.*

Cloud native network functions can be composed with one another. During this composition their configuration is not modified after deployment (immutable), designates the outcome of the network that is wanted (declarative), and not steps of how to get to that outcome (imperative).

***P15*** *- If an* ***application developer*** *consumes a cloud native networking function, it will be* ***consumed*** *using a declarative API.*

A cloud native network function exposes its configuration using a declarative API, such as a yaml file. An application developer has the ability to reference cloud native functions at a higher level, using elements that were provided by operators.

***P16*** *- If an* ***operator*** *combines cloud native network functions into a service chain, they will* ***combined*** ***using*** *a declarative API and will be* ***exposed*** *as a declarative API.*

Operators compose fine grained cloud native functions and provide them in as a coarse grained element to consumers (e.g. application developers) via a declarative API.

***P17*** *- If a* ***cloud native network function developer*** *creates networking software, it will* ***expose*** *a declarative API.*

The cloud native network functions themselves are developed in such a way as to expose a way to configure them declaratively.

**LICENSE**

\
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

**LIST OF CONTRIBUTORS**

If you would like credit for helping with these documents (for either this document or any of the other four documents linked above), please add your name to the list of contributors.

W Watson Vulk Coop

Taylor Carpenter Vulk Coop

Denver Williams Vulk Coop

Jeffrey Saelens Charter Communications

## Endnotes

1. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 16.. “**Microservices represent the decomposition of monolithic business systems into independently deployable services that do “one thing well.”** That one thing usually represents a business capability, or the smallest, “atomic” unit of service that delivers business value.”
2. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 16.. “As we **decouple** the **business domain** into independently deployable **bounded contexts** of **capabilities**, we also **decouple** the associated **change** **cycles**. As long as the changes are restricted to a single bounded context, and the service continues to **fulfill** its existing **contracts**, those changes can be made and **deployed** **independent** of any **coordination** with the rest of the business. The result is enablement of **more** frequent and rapid **deployments**, allowing for a continuous flow of value.”
3. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 16–17.Development can be accelerated by scaling the development organization itself. It’s very **difficult** to build software faster by **adding more peopl**e due to the overhead of **communication** and coordination. Fred Brooks taught us years ago that **adding** more **people** to a **late** software project **makes it later**. However, rather than placing all of the developers in a single sandbox, we can create **parallel work streams** by building more **sandboxes** through **bounded contexts**.
4. Stine, Matt. Migrating to Cloud-Native Application Architecture, O'reilly, 2015, pp. 17 The new developers that we add to each sandbox can **ramp** **up** and become productive more **rapidly** due to the **reduced** **cognitive** **load** of learning the business domain and the existing code, and building relationships within a **smaller** **team**.
5. **Conway’s** **law** describes the relationship between the **structure** of an **organization** and its systems: Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a **design** whose **structure** is a **copy** of the **organization’s communication structure**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4914-4922). O'Reilly Media. Kindle Edition.
6. **Cross-functional teams** put **all** of the **people** **responsible** for building and running an aspect of a system **together**. This may include testers, project managers, analysts, and a commercial or product owner, as well as different types of engineers. These **teams** should be **small**; Amazon uses the term “two-pizza teams,” meaning the team is small enough that two pizzas is enough to feed everyone. The advantage of this approach is that **people** are dedicated to a **single**, **focused** **service** or **small set of services**, avoiding the need to multitask between projects. Teams formed of a consistent set of people work far more effectively than those whose membership changes from day to day. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 6457-6462). O'Reilly Media. Kindle Edition.
7. The peculiarity of buildings that turned Architectural Digest into a contradiction of itself is that different parts of **buildings change at different rates**. Brand, Stewart. How Buildings Learn (p. 21). Penguin Publishing Group. Kindle Edition.
8. “Our basic argument is that there isn’t such a thing as a building,” says Duffy. “A **building** properly conceived is **several** **layers** of **longevity** of built **components**.” He distinguishes four layers, which he calls Shell, Services, Scenery, and Set. **Shell** is the **structure**, which lasts the **lifetime** of the building (fifty years in Britain, closer to thirty-five in North America). **Services** are the cabling, plumbing, air conditioning, and elevators (“lifts”), which have to be replaced every **fifteen years** or so. **Scenery** is the layout of partitions, dropped ceilings, etc., which **changes** every **five to seven years**. **Set** is the shifting of furniture by the occupants, often a matter of **months or weeks**.Brand, Stewart. How Buildings Learn (pp. 21-22). Penguin Publishing Group. Kindle Edition.
9. I’ve taken the liberty of expanding Duffy’s “four S’s”—which are oriented toward interior work in commercial buildings—into a slightly revised, general-purpose **“six S’s**”: • **SITE** - This is the geographical setting, the urban location, and the legally defined lot, whose boundaries and context outlast **generations** of ephemeral buildings. “Site is eternal,” Duffy agrees. • **STRUCTURE** - The foundation and load-bearing elements are perilous and expensive to change, so people don’t. These are the building. Structural life ranges from **30 to 300 years** (but few buildings make it past 60, for other reasons). • **SKIN** - Exterior surfaces now change every **20 years** or so, to keep up with fashion or technology, or for wholesale repair. Recent focus on energy costs has led to re-engineered Skins that are air-tight and better-insulated. • **SERVICES** - These are the working guts of a building: communications wiring, electrical wiring, plumbing, sprinkler system, HVAC (heating, ventilating, and air conditioning), and moving parts like elevators and escalators. They wear out or obsolesce every **7 to 15 years**. Many buildings are demolished early if their outdated systems are too deeply embedded to replace easily. • **SPACE PLAN** - The interior layout—where walls, ceilings, floors, and doors go. Turbulent commercial space can change every **3 years** or so; exceptionally quiet homes might wait **30 years**. • **STUFF** - Chairs, desks, phones, pictures; kitchen appliances, lamps, hair brushes; all the things that twitch around **daily** to monthly. Furniture is called mobilia in Italian for good reason.Brand, Stewart. How Buildings Learn (pp. 24-25). Penguin Publishing Group. Kindle Edition.
10. Frank Duffy: “**Thinking** about buildings in this **time-laden** way is very **practical**. As a **designer** you **avoid** such classic mistakes as **solving a five- minute problem with a fifty-year solution**, or vice versa. It **legitimizes** the existence of **different** **design** **skills**—architects, service engineers, space planners, interior designers—all with their **different agendas** defined by this **time** **scale**. It means you **invent** building **forms** which are very **adaptive**.” Brand, Stewart. How Buildings Learn (p. 32). Penguin Publishing Group. Kindle Edition.
11. The **layering** also **defines** how a **building** **relates** to **people**. **Organizational** **levels** of **responsibility** **match** the **pace** **levels**. The building interacts with individuals at the level of Stuff; with the tenant organization (or family) at the Space plan level; with the landlord via the Services (and slower levels) which must be maintained; with the public via the Skin and entry; and with the whole community through city or county decisions about the footprint and volume of the Structure and restrictions on the Site. The community does not tell you where to put your desk or your bed; you do not tell the community where the building will go on the Site (unless you’re way out in the country). Brand, Stewart. How Buildings Learn (pp. 32-33). Penguin Publishing Group. Kindle Edition.
12. O’Neill’s A **Hierarchical** **Concept** of **Ecosystems**. O’Neill and his co-authors noted that **ecosystems** could be **better understood** by **observing** the **rates of change** of different **components**. Hummingbirds and flowers are quick, redwood trees slow, and whole redwood forests even slower. Most **interaction** is **within** the same **pace** **level**—hummingbirds and flowers pay attention to each other, oblivious to redwoods, who are oblivious to them. Meanwhile the forest is attentive to climate change but not to the hasty fate of individual trees. Brand, Stewart. How Buildings Learn (p. 33). Penguin Publishing Group. Kindle Edition.
13. The most effective **measurement** of a change management **pipeline** is the **cycle** **time**. **Cycle** **time** is the **time** between **deciding** on the **need** for a **change** to seeing that **change** in **production** use. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4867-4868). O'Reilly Media. Kindle Edition.
14. **Metrics** are best used by the team to help itself, and should be continually reviewed to decide whether they are still providing value. Some common metrics used by infrastructure teams include: **Cycle time** The time taken from a **need** being **identified** to **fulfilling** it. This is a measure of the efficiency and speed of change management. Cycle time is discussed in more detail later in this chapter. **Mean time to recover (MTTR)** The **time** taken from an **availability** **problem** (which includes critically degraded performance or functionality) being identified to a **resolution**, even where it’s a workaround. This is a measure of the efficiency and speed of problem resolution. **Mean time between failures (MTBF)** The **time** **between** critical **availability** **issues**. This is a measure of the stability of the system, and the quality of the change management process. Although it’s a valuable metric, **over-optimizing for MTBF** is a common cause of **poor performance** on **other** **metrics**. **Availability** The percentage of time that the **system** is **available**, usually excluding time the system is offline for planned maintenance. This is another measurement of system stability. It is often used as an SLA in service contracts. True **availability** The percentage of time that the system is **available**, **not excluding planned maintenance.**
15. **Continuous** **delivery** for software is implemented using a **deployment** **pipeline**. A deployment pipeline is an **automated** **manifestation** of a **release** **process**. It **builds** the application code, and **deploys** and **tests** it on a **series** of **environments** **before** allowing it to be deployed to **production**. The **same** concept is applied to **infrastructure** changes. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3823-3826). O'Reilly Media. Kindle Edition.
16. The **point** of CD and the software deployment **pipeline** is to allow **changes** to be delivered in a **continuous** **flow, rather than in large batches**. Changes can be validated more thoroughly, not only because they are applied with an automated process, but also because **changes** are **tested** when they are **small**, and because they are **tested** **immediately** after being committed. The result, when done well, is that **changes** can be **made** **more** **frequently**, more **rapidly**, and more **reliably**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4525-4529). O'Reilly Media. Kindle Edition.
17. Teams who embrace the **pipeline** as the way to manage changes to their infrastructure find a number of **benefits**: Their infrastructure management tooling and **codebase** is **always** **production** **ready**. There is **never** a situation where **extra** **work** is needed (e.g., **merging, regression testing, and “hardening”**) to take work **live**. Delivering changes is nearly painless. Once a change has passed the technical validation stages of the pipeline, it **shouldn’t need technical attention** to carry through to production unless there is a problem. There is no need to make technical decisions about how to apply a change to production, as those decisions have been made, implemented, and tested in earlier stages. It’s easier to make changes through the pipeline than any other way. **Hacking** a change **manually** other than to bring up a system that is down is **more work,** and **scarier**, than just pushing it through the **pipeline**. **Compliance** and **governance** are easy. The **scripts**, tools, and **configuration** for making changes are **transparent** to **reviewers**. Logs can be **audited** to prove what changes were made, when, and by whom. With an automated change management pipeline, a team can **prove** what **process** was followed for each and every **change**. This tends to be **stronger** than taking someone’s word that **documented manual processes** are always followed. Change management processes can be more lightweight. People who might otherwise need to discuss and inspect each change can build their requirements into the automated tooling and tests. They can periodically review the pipeline implementation and logs, and make improvements as needed. Their time and attention goes to the process and tooling, rather than inspecting each change one by one. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4551-4563). O'Reilly Media. Kindle Edition.
18. The **design** of your change management **pipelines** is a **manifestation** of your system’s **architecture**. Both of these are a **manifestation** of your **team structure.** Conway’s law describes the relationship between the structure of an organization and its systems: Any **organization** that **designs a system (**&#x64;efined more broadly here than just information systems) will inevitably **produce a design** whose **structure** is a **copy** of the **organization’s communication structure.** Organizations can take advantage of this to shape their teams, systems, and pipeline to optimize for the outcomes they want. This is sometimes called the **Inverse Conway Maneuver .** Ensure that the people needed to deliver a given change through to production are all a part of the same team. This may involve restructuring the team but may also be done by changing the system’s design. It can often be achieved by changing the service model, which is the goal of **self-service** **systems**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4914-4922). O'Reilly Media. Kindle Edition.
19. ***Integration Models*** The design and implementation of **pipelines** for testing how **systems** and infrastructure elements **integrate** **depends** on the **relationships** between them, and the relationships between the **teams** responsible for them. There are several typical situations: ***Single team*** One **team** owns all of the elements of the system and is **fully responsible** for managing changes to them. In this case, a single **pipeline, with fan-in** as needed, is often sufficient. ***Group of teams*** A group of teams works together on a **single system** with **multiple services** and/or infrastructure elements. Different teams own different parts of the system, which all integrate together. In this case, a **single fan-in pipeline** may work up to a point, but as the size of the group and its system **grows**, **decoupling** may become **necessary**. ***Separate teams with high coordination*** Each team (which may itself be a group of teams) **owns a system**, which **integrates** with **systems** owned by **other teams.** A given system may integrate with multiple systems. Each **team** will have its **own pipeline** and manage its releases independently. But they may have a close enough relationship that one **team** is willing to **customize** its **systems** and releases to **support** another **team’s requirements.** This is often seen with different groups within a large company and with **close vendor** relationships. ***Separate teams with low coordination*** As with the previous situation, except one of the teams is a **vendor** with **many other customers.** Their release process is designed to meet the requirements of many teams, with little or **no** **customizations** to the requirements of individual customer teams. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4892-4907). O'Reilly Media. Kindle Edition.
20. **Practice: Decouple Pipelines** When **separate** **teams** build **different** **components** of a system, such as **microservices**, joining pipeline branches for these components together with the **fan-in pattern** can create a **bottleneck**. The teams need to spend more **effort** on **coordinating** the way they handle **releases**, testing, and fixing. This may be fine for a small number of teams who work closely together, but the overhead grows exponentially as the number of teams grows. Decoupling pipelines involves structuring the **pipelines** so that a **change** to each **component** can be **released** **independently**. The components may still have dependencies between each other, so they may need integration testing. But rather than requiring all of the components to be released to production together in a “big bang” deployment, a **change** to **one component** could go ahead to **production** before changes to the second component are released. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4883-4889). O'Reilly Media. Kindle Edition.
21. Given **two integrated components**, one **provides** a service, and the other **consumes** it. The **provider** component needs to **test** that it is providing the service correctly for its consumers. And the **consumer** needs to **test** that it is consuming the provider service correctly. For example, one team may manage a monitoring service, which is used by multiple application teams. The monitoring team is the provider, and the application teams are the consumers. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4930-4933). O'Reilly Media. Kindle Edition.
22. **Pattern: Library Dependency** One way that one component can provide a capability to another is to work like a **library**. The **consumer** pulls a **version** of the **provider** and **incorporates** it into its own **artifact**, usually in the **build** stage of a **pipeline**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4937-4939). O'Reilly Media. Kindle Edition.
23. The important characteristic is that the **library** component is **versioned**, and the **consumer** can **choose** which **version** to use. If a newer version of the library is released, the consumer may opt to immediately pull it in, and then run tests on it. However, it has the option to “**pin**” to an **older version** of the library. This gives the **consumer** team the **flexibility** to release changes even if they **haven’t yet incorporated new,** incompatible changes to their provider library. But it creates the **risk** that important changes, such as **security** patches, **aren’t integrated** in a **timely** way. This is a major source of security vulnerability in IT systems. For the **provider**, this pattern gives **freedom** to **release** new changes without having to **wait** for all **consumer** teams to update their components. But it can result in having **many** different **versions** of the component in **production**, which increases the time and hassle of support. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4941-4947). O'Reilly Media. Kindle Edition.
24. **Pattern: Self-Provisioned Service Instance** The library pattern can be adapted for full-blown services. A well-known example of this is AWS’s Relational Database Service, or RDS , offered by AWS. A team can provision complete working database instances for itself, which it can use in a pipeline for a component that uses a database. As a provider, Amazon releases new database versions, while still making older versions available as “previous generation DB instances” . This has the **same** effect as the **library pattern**, in that the **provider** can **release new versions** **without waiting** for **consumer teams** to **upgrade** their own components. Being a **service** **rather** than a **library**, the provider is able to **transparently** **release** **minor updates** to the service. Amazon can apply **security** patches to its RDS offering, and **new instances** created by consumer teams will automatically **use the updated version**. The key is for the **provider** to keep close track of the **interface** **contract**, to make sure the service behaves as expected after updates have been applied. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4963-4973).
25. Prefer Products with **Cloud-Compatible Licensing** Licensing can make dynamic infrastructure difficult with some products. Some examples of **licensing** approaches that work **poorly** include: A **manual** process to **register** each new instance, agent, node, etc., for licensing. Clearly, this defeats automated provisioning. If a product’s license does require registering infrastructure elements, there needs to be an **automatable** **process** for adding and removing them. **Inflexible licensing periods.** Some products require customers to buy a **fixed** set of **licenses** for a **long** **period**. For example, a monitoring tool may have licensing based on the maximum number of nodes that can be monitored. The licenses may need to be purchased on a **monthly** cycle. This forces the customer to pay for the maximum number of nodes they might use during a given month, even when they only run that number of nodes for a fraction of the time. This **cloud-unfriendly** **pricing** model **discourages** customers from taking advantage of the ability to **scale capacity up and down with demand.** Vendors pricing for cloud **charge by the hour at most.** Heavyweight purchasing process to increase capacity. This is closely related to the licensing period. When an organization is hit with an unexpected surge in business, they **shouldn’t** **need** to spend **days or weeks** to **purchase the extra capacity** they need to meet the demand. It’s common for vendors to have limits in place to protect customers against accidentally over-provisioning, but it should be possible to raise these limits quickly. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1880-1892). O'Reilly Media. Kindle Edition.
26. **Providing Test Instances of a Service to Consumers** The **provider** of a **hosted** service needs to **provide** **support** for **consumers** to **develop** and **test** their **integration** with the **service**. This is useful to consumers for a number of purposes: To learn how to correctly integrate to the service. To **test** that integration still works after **changes** to the **consumer** system. To **test** that the consumer system still works **after** **changes** to the **provider**. To **test** and develop against **new** **provider** **functionality** **before** it is **released**. To **reproduce** **production** **issues** for troubleshooting. To run **demonstrations** of the consumer system without affecting production data. An effective way for a provider to support these is to provide self-provisioned service instances. If **consumers** can **create** and **configure** **instances** **on-demand**, then they can easily handle their own testing and demonstration needs. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4992-4999). O'Reilly Media. Kindle Edition.
27. The benefits of **decoupling** **runtime** **requirements** from the **host** **system** are particularly powerful for infrastructure management. It creates a clean **separation** of concerns between **infrastructure** and **applications**. The host system **only** needs to have the **container** **runtime** **software** installed, and then it can run nearly any container image. Applications, services, and jobs are packaged into containers along with all of their dependencies \[...]. These dependencies can include operating system packages, language runtimes, libraries, and system files. **Different** **containers** may have different, even **conflicting** **dependencies**, but still run on the **same** **host** without issues. **Changes** to the **dependencies** can be made **without** any **changes** to the **host** system. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1652-1658). O'Reilly Media. Kindle Edition.
28. The important thing is **how the artifact is treated**, conceptually. A **configuration** **artifact** is an **atomic**, **versioned** collection of materials that **provision and/or configure a system component**. An **artifact** is: ***Atomic*** A given **set** of **materials** is **assembled**, **tested**, and **applied** together as a unit. ***Portable*** It can be **progressed** through the **pipeline**, and different versions can be applied to **different environments** or instances. It can be reliably and **repeatably applied to any environment,** and so any given environment has an unambiguous version of the component. ***Complete*** A given artifact should have **everything needed** to **provision** or **configure** the relevant **component**. It should **not assume** that **previous versions** of the **component artifacts** have been **applied** **before**. ***Consistent*** **Applying** the artifact to any two component instances should have the **same results**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 4669-4678). O'Reilly Media. Kindle Edition.
29. The best way to think of a **container** is as a **method** to **package** a **service**, application, or job. It’s an RPM on steroids, taking the application and adding in its dependencies, as well as providing a standard way for its **host** system to **manage** its **runtime** environment . Rather than a single container running multiple processes, aim for **multiple** **containers**, each running **one** **process**. These processes then become **independent**, **loosely** **coupled** entities. This makes containers a nice match for microservice application architectures. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1708-1711). O'Reilly Media. Kindle Edition.
30. The **benefits** of **containerization** include: **Decoupling** the **runtime** **requirements** of specific applications from the **host** **server** that the container runs on. Repeatably create **consistent** **runtime** **environments** by having a **container** **image** that can be distributed and run on **any** **host** **server** that supports the runtime. Defining **containers** as **code** (e.g.,in a **Dockerfile**) that can be **managed** in a **VCS**, used to trigger **automated** **testing**, and generally having all of the characteristics for infrastructure as code. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1633-1637). O'Reilly Media. Kindle Edition.

31.The **immutable server pattern** mentioned in “Server Change Management Models” **doesn’t make configuration updates to existing servers**. Instead, changes are made by **building a new server** with the new configuration. With **immutable servers**, **configuration** is **usually** **baked** into the **server template**. When the configuration is updated, a new template is **packaged**. **New instances** of **existing servers** are built from the **new template** and used to **replace** the **older servers**. This approach **treats** **server templates** like **software artifacts**. Each build is versioned and tested before being deployed for production use. This creates a high level of confidence in the consistency of the server configuration between testing and production. **Advocates** of **immutable server**s view making a **change** to the **configuration** of a **production** **server** as **bad** practice, no better than modifying the source code of software directly on a production server. Immutable servers can also **simplify configuration** management, by **reducing** the area of the server that **needs** to be managed by **definition files**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2239-2247). O'Reilly Media. Kindle Edition.

1. Using the term “**immutable**” to describe this pattern can be misleading. “Immutable” means that a thing can’t be changed, so a **truly immutable server would be useless**. As soon as a server boots, its **runtime** **state** **changes** **processes** run, entries are written to logfiles, and **application data** is added, updated, and removed. It’s more **useful** to think of the term “**immutable**” as applying to the **server’s configuration,** rather than to the server as a whole. This creates a clear **line** between **configuration** and **data**. It forces teams to explicitly **define** which elements of a server they will **manage** deterministically as **configuration** and which elements will be treated as **data**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 2918-2926). O'Reilly Media. Kindle Edition.
2. **Blue-green** **replacement** is the most straightforward pattern to replace an infrastructure element **without** **downtime**. This is the blue-green deployment pattern for software 4 applied to infrastructure. It requires running two instances of the affected infrastructure, **keeping** one of them **live** at any point in time. Changes and **upgrades** are made to the **offline** **instance**, which can be **thoroughly** **tested** before **switching** usage over to it. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5681-5685). O'Reilly Media. Kindle Edition.
3. **Phoenix** **replacement** is the natural progression from blue-green using dynamic infrastructure. Rather than keeping an idle instance around between changes, a **new** **instance** can be created each time a **change** is needed. As with blue-green, the change is **tested** on the new instance before putting it into use. The previous instance can be **kept** **up** for a **short** **time**, until the new instance has been **proven** in use. But then the **previous** **instance** is **destroyed**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5694-5697). O'Reilly Media. Kindle Edition.
4. The **canary** **pattern** involves deploying the **new** **version** of an element alongside the old one, and then **routing** some **portion** of usage to the new elements. For example, with version A of an application running on 20 servers, version B may be deployed to two servers. A subset of traffic, perhaps flagged by IP address or by randomly setting a cookie, is sent to the servers for version B. The behavior, performance, and resource usage of the new element can be monitored to validate that it’s ready for wider use. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 5724-5728). O'Reilly Media. Kindle Edition.
5. A **stack** is a **collection** of **infrastructure elements** that are **defined** as a **unit** (the inspiration for choosing the term stack comes mainly from the term’s use by AWS CloudFormation). A stack can be any size. It could be a single server. It could be a pool of servers with their networking and storage. It could be all of the servers and other infrastructure involved in a given application. Or it could be everything in an entire data center. What makes a set of infrastructure elements a **stack** isn’t the size, but **whether it’s defined and changed as a unit.** The concept of a stack **hasn’t** been commonly used with **manually managed infrastructures.** Elements are added organically, and **networking boundaries** are naturally used to think about infrastructure groupings. But automation tools force more explicit groupings of infrastructure elements. It’s certainly possible to put everything into one large group. And it’s also possible to structure stacks by following **network boundaries.** But these **aren’t the only ways to organize stacks**. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 3227-3230). O'Reilly Media. Kindle Edition.
6. “I Heard Calico Is Suggesting Layer 2: I Thought You Were Layer 3! What’s Happening?” Project Calico Documentation, docs.projectcalico.org/v3.5/usage/troubleshooting/faq#i-heard-calico-is-suggesting-layer-2-i-thought-you-were-layer-3-whats-happening. It’s important to distinguish what Calico provides to the workloads hosted in a data center (a purely layer 3 network) with what the Calico project recommends operators use to build their **underlying network fabric**. Calico’s core principle is that **applications** and **workloads** overwhelmingly **need only IP connectivity** to communicate. For this reason we build an **IP-forwarded network** to **connect** the tenant **applications** and **workloads** to **each other**, and the broader world. However, **the underlying physical fabric obviously needs to be set up too**. Here, Calico has discussed how both a layer 2 (see here) or a layer 3 (see here) fabric could be integrated with Calico. This is one of the great strengths of the Calico model: it allows the **infrastructure** to be **decoupled** from what we show to the **tenant applications** and **workloads**. We have some thoughts on different interconnect approaches (as noted above), but just because we say that there are **layer 2** and **layer 3** ways of **building the fabric**, and that those decisions may have an impact on **route scale**, does not mean that Calico is “going back to Ethernet” or that we’re recommending layer 2 for tenant applications. In all cases we forward on IP packets, no matter what architecture is used to build the fabric.
7. “Concerns over Ethernet at scale” Calico over an Ethernet interconnect fabric, <https://docs.projectcalico.org/v3.5/reference/private-cloud/l2-interconnect-fabric>. It has been acknowledged by the industry for years that, beyond a certain size, **classical Ethernet networks** are **unsuitable** for **production** deployment. Although there have been [multiple](https://en.wikipedia.org/wiki/Provider_Backbone_Bridge_Traffic_Engineering) [attempts](https://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_14-3/143_trill.html) [to address](https://en.wikipedia.org/wiki/Virtual_Private_LAN_Service) these issues, the scale-out networking community has, largely abandoned Ethernet for anything other than providing physical point-to-point links in the networking fabric. The principal reasons for **Ethernet** **failures** at **large scale** are: 1. **Large numbers of end points** [1](https://docs.projectcalico.org/v3.5/reference/private-cloud/l2-interconnect-fabric#fn:1). Each **switch** in an Ethernet network must **learn** the **path** to **all Ethernet endpoints** that are connected to the Ethernet network. Learning this amount of state can become a **substantial** task when we are talking about **hundreds of thousands of end points**. 2. **High rate** of **churn** or change in the network. With that many end points, most of them being **ephemeral** (such as virtual machines or containers), there is a large amount of churn in the network. That load of **re-learning** paths can be a **substantial** burden on the **control plane** processor of **most Ethernet switches**. 3. High volumes of **broadcast** **traffic**. As each node on the **Ethernet** network **must** use **Broadcast packets** to **locate peers**, and many use broadcast for other purposes, the resultant packet replication to each and every end point can lead to broadcast storms in large Ethernet networks, effectively consuming most, if not all resources in the network and the attached end points. 4. Spanning tree. **Spanning tree** is the protocol used to **keep** an Ethernet network **from** forming **loops**. The protocol was designed in the era of smaller, simpler networks, and it has not aged well. As the number of links and interconnects in an Ethernet network goes up, many implementations of spanning tree become more **fragile**. Unfortunately, **when** spanning tree **fails** in an Ethernet network, the effect is a **catastrophic** loop or partition (or both) in the network, and, in most cases, difficult to troubleshoot or resolve. While many of these issues are **crippling** at **VM scale** (tens of thousands of end points that live for hours, days, weeks), they will be absolutely **lethal** at **container** **scale** (**hundreds of thousands of end points that live for seconds, minutes, days**).
8. “**Provisioning”** is a term that can be used to mean somewhat different things \[...] provisioning is used to mean **making** an **infrastructure element** such as a **server** or **network device** **ready for use**. Depending on what is being provisioned, this can involve: Assigning **resources** to the element. **Instantiating** the element. **Installing** software onto the element. **Configuring** the element. **Registering** the element with infrastructure services. At the end of the provisioning process, the element is **fully ready for use.** Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1206-1208). O'Reilly Media. Kindle Edition.
9. “**Declarative** **configuration** is **different** from **imperative** **configuration** , where you simply take a series of actions (e.g., apt-get install foo ) to modify the world. Years of production experience have taught us that maintaining a written **record** of the system’s **desired** **state** leads to a more **manageable**, **reliable** system. Declarative configuration enables numerous **advantages**, including **code** **review** for configurations as well as **documenting** the **current** **state** of the world for distributed teams. Additionally, it is the **basis** for all of the **self-healing** behaviors in Kubernetes that keep applications running **without user action.**” Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 892-896). Kindle Edition.
10. “The **combination** of **declarative** **state** stored in a **version** control system and Kubernetes’s ability to make **reality** **match** this declarative **state** makes **rollback** of a change trivially **easy**. It is simply restating the previous declarative state of the system. With **imperative** **systems** this is usually **impossible**, since while the **imperative** **instructions** describe how to get you from point A to point B, they **rarely** **include** the **reverse** instructions that can get you back. “Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 186-190). Kindle Edition.
11. “Because it describes the state of the world, **declarative** **configuration** does **not** have to be **executed** to be **understood**. Its impact is concretely declared. Since the effects of declarative configuration can be understood before they are executed, declarative configuration is far **less error-prone**. Further, the traditional tools of software development, such as **source control, code review, and unit testing**, can be used in **declarative** configuration in ways that are **impossible** for **imperative** instructions. “ Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 183-186). Kindle Edition.
12. So **declarative** definitions lend themselves to running **idempotently**. You can **safely** **apply** your **definitions** **over and over again**, without thinking about it too much. If something is changed to a system outside of the tool, **applying** the definition will bring it back into line, **eliminating** sources of **configuration** **drift**. When you need to make a change, you simply modify the definition, and then let the tooling work out what to do. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud (Kindle Locations 1275-1278). O'Reilly Media. Kindle Edition.
