Simplify Deployment with Infrastructure Manifest (Part 1)

This is Part 1 in a short seriesÂ about using a Manifest of your infrastructure forÂ automation.

Part 1: Build the Infrastructure Manifest
Part 2:Â Manifest-Based Application Deployment

At the last few DevOps conferences I’ve attended, the lunch-time discussion have revolved around tying your test, build, and deploy workflows to your cloud infrastructure. A lot of people areÂ trying to bend tools like Chef for this purpose and are generally unhappy with the result.

After a lot of trial and error, the strategy that we currently use at Signal is to

completely specify your infrastructure definitions in a simple JSON manifest; and
use your Cloud API(s) to transform this functional definition into a working Manifest which details all hosts in your infrastructure and which applicationsÂ they run

Once we have the Manifest,

we can build numerous toolsÂ on top of it to help automate our infrastructure. In addition to deployment,Â we also use this for configuration management, load balancing, and even DNS. Because it uses your Cloud API to know about the running hosts, you could even use it for service discovery in an autoscaled environment.

Here in Part 1,Â we’ll build the Infrastructure Manifest. In Part 2 we’ll show how we can use the Manifest to Simplify Deployment.

The important thing to note here is that this process can work regardless of which cloud(s) you’re using, what tools you fancy, or what your favorite color is.

The Manifest

What does this Manifest look like in practice? Well, it depends on how your system is organized.

Like many companies, we organize our deployments into environments and regions within an environment. This allows us to have isolated environments for development, staging, and production, as well as geographically distributed regions within each environment. Within each region, you may also have disparate availability zones for fault-tolerance and failover, if your cloud supports that sort of thing.

In this case, your Manifest may look like this:

{
  "prod": {
    "us-east-1": {
      "appserve01ea1": {
        "applications": [
          "appserve"
        ],
        "zone": "us-east-1a",
        "fqdn": "ec2-1-2-3-4.compute-1.amazonaws.com",
        "private ip": "10.9.8.7",
        "public ip": "1.2.3.4",
        "id": "i-a1234bc5"
      },
      "appserve02ea1": {
        "applications": [
          "appserve"
        ],
        "zone": "us-east-1b",
        "fqdn": "ec2-1-2-3-5.compute-1.amazonaws.com",
        "private ip": "10.9.8.6",
        "public ip": "1.2.3.5",
        "id": "i-b5678cd9"
      }
    }
  },
  "staging": {
    "rs-us-1": {
      "appserve01st1": {
        "applications": [
          "appserve"
        ],
        "zone": "rs-us-1a",
        "fqdn": "10.9.8.7",
        "private ip": "10.9.8.7",
        "public ip": "5.6.7.8",
        "id": "21234567"
      },
      "appserve02st1": {
        "applications": [
          "appserve"
        ],
        "zone": "rs-us-1a",
        "fqdn": "10.9.8.6",
        "private ip": "10.9.8.6",
        "public ip": "5.6.7.9",
        "id": "21234568"
      }
    }
  }
}

We have our environments each in different clouds, from a private OpenStack cluster to Rackspace to Amazon. The important thing is for this JSON representation to be consistent regardless of which cloud each environment and region are located in. The other important thing is the Manifest is dynamic. When your infrastructure changes, the Manifest should immediately and automatically reflect the changes. (Okay, the Cloud APIs are slow, so go ahead and cache it, but not for too long!)

What’s going on behind the scenes to create this Manifest? Remember the two things that constitute this strategy: infrastructure definitions and Cloud API-based transformation.

Infrastructure Definitions

YourÂ infrastructure should have a well-defined contract for theÂ applications it hosts. For example, you may require that “allÂ services expose an HTTPÂ endpoint for health checks,” or that “allÂ services must be located behind a load balancer with at least two machines.”Â These twoÂ stipulationsÂ are good examplesÂ because they’re criticalÂ forÂ many companies’ high-availability architectures.

An applicationÂ definition for this minimal contract forÂ “appserve”Â shown in the Manifest could be

{
  "appserve": { "type": "http", "backend": 8080, "frontend": 80, "healthcheck": {"port": 8080, "resource": "/hc"} }
}

Each service in your infrastructure should have a single application definition entry used for all environments and regions.

Now that we’ve defined the application, how do we map it to (virtual) machines running in each environment and region? Again, we have flexibility based on your organization. If you value consistency across environments and want each application to run on its own VM, then you could almost bypass this step. Let’s say you have two applications in addition to “appserve”; that is, you have a RESTful database service (“dataserve”) and a configuration UI (“uiserve”). Your mapping would essentially reduce to this:

{
  "appserve": ["appserve"],
  "dataserve": ["dataserve"],
  "uiserve": ["uiserve"]
}

Since this on its own isn’t very useful, it would likely just be coupled with your Cloud API transformation step.

However, if you’re a cost-sensitive startup like we are, then you might want the flexibility to run more applications on a single VM in some environments and regions; for example, you have a small development environment or you have multiple production regions for geographic distribution but eachÂ may receive different amounts ofÂ traffic. Thus, your Manifest may look more like this:

{
  "staging": {
    "rs-us-1": {
      "confstack": ["dataserve", "uiserve"],
      "appserve": ["appserve"]
    }
  },
  "prod": {
    "us-east-1": {
      "dataserve": ["dataserve"],
      "uiserve": ["uiserve"],
      "appserve": ["appserve"]
    },
    "eu-west-1": {
      "confstack": ["dataserve", "uiserve"],
      "appserve": ["appserve"]
    }
  }
}

Notice that you can actually have differently named instances in each environment and region; however, consistency will help in designing theÂ toolsÂ which consume the Manifest.

Now that we have some infrastructure definitions, let’s look at how we can transform them into the Manifest.

Cloud API Transformation

This is where things get interesting. Now we inject all the “runtime” information from the VMs running in your cloud into the definitions from above to create a working Manifest. If you run all your environments in one cloud, you can use the vendor-specific APIs. Or if you’re concerned with cloud portability, then you may want to choose aÂ cloud-agnostic API. So far, my favorite uniform Cloud API is Apache Libcloud, but its stillÂ farÂ from perfect at normalizing across the different Cloud platforms.

The first thing we’re going to need is a mapping from environment and region to their respective Libcloud “driver”. This is where you specify your keys and secrets for each driver.

DRIVERS = {
  "staging": {
    "rs-us-1": lambda: get_driver(Provider.RACKSPACE)("ima", "secret")
  },
  "prod": {
    "us-east-1": lambda: get_driver(Provider.EC2_US_EAST)("aint", "gonna"),
    "eu-west-1": lambda: get_driver(Provider.EC2_EU_WEST)("tell", "you")
  }
}

Why do we use a lambda here? We found that some drivers (cough rackspace cough) like to timeout for no good reason after a while, so for now we just get a new driver instance every time we need it.

Let’s call the application definitions APPLICATIONS and the machine to application mappings APPLICATION_MAPPINGS. We’ll assume they’re defined as part of a single config.js file which we read, parse, and cache for 10 minutes:

@ttl_cache(ttl=600)
def config():
  return load(open('config.js'))

For this example, we’ll assume the hostname-format shown above; that is, a host like appserve01ea1 consists of a server group followed by an instance number and then a globally-unique region-specific suffix. This server-group is what matches a set of application definitions to a VM.

We can use a simple regex to extract this server-group:

BASENAME_RE = re.compile('(?P[a-zA-Z]+)')

The last piece of the puzzle is to smash this infrastructure definition together with the state returned by the Cloud API to produce the final Manifest.

def build_manifest(env, region):
  application_mapping = config()['APPLICATION_MAPPING']
  region_manifest = {}
  for node in DRIVERS[env][region].list_nodes():
    server_group = BASENAME_RE.search(node.name).group('basename')
    applications = application_mapping[env][region].get(server_group, [])
    region_manifest[node.name] = {
      "id": node.id,
      "applications": applications,
      "private_ip": node.private_ips[0],
      "public_ip": node.public_ips[0],
      "zone": node.extra['availability'],
      "fqdn": node.extra['dns_name']
    }
  return region_manifest

This method retrieves the list of all nodes in the environment/region, extracts and maps the server_group to a list of applications, and then adds a (normalized) entry containing the machine info and its applications to the Manifest. I’m not going to delve into the normalization code, but the full code is available as a Gist.

Let’s add the final cherry on top. We can expose this Manifest over HTTP to making building tools on top of it a breeze.

app = Flask(__name__)
 
@app.route('/api/&lt;env&gt;/&lt;region&gt;/manifest')
def list_manifest(env, region):
  return jsonify(build_manifest(env, region))
 
if __name__ == '__main__':
    app.run()

There youÂ have it! A simple python app which exposes your Infrastructure as a REST service.

In my next post, I’ll describe how you can use this as the backbone for your deployments.

Posted in Tutorials.

No comments

By codyaray – November 11, 2014

Simplify Deployment with Infrastructure Manifest (Part 1)

The Manifest

Infrastructure Definitions

Cloud API Transformation

0 Responses

About Cody A. Ray

Recent Posts

Categories

Recent Comments

Simplify Deployment with Infrastructure Manifest (Part 1)

The Manifest

Infrastructure Definitions

Cloud API Transformation

0 Responses

Subscribe

About Cody A. Ray

Recent Posts

Categories

Recent Comments