Skip to content


Simplify Deployment with Infrastructure Manifest (Part 1)

This is Part 1 in a short series about using a Manifest of your infrastructure for automation.

  • Part 1: Build the Infrastructure Manifest
  • Part 2: Manifest-Based Application Deployment

At the last few DevOps conferences I’ve attended, the lunch-time discussion have revolved around tying your test, build, and deploy workflows to your cloud infrastructure. A lot of people are trying to bend tools like Chef for this purpose and are generally unhappy with the result.

After a lot of trial and error, the strategy that we currently use at Signal is to

  • completely specify your infrastructure definitions in a simple JSON manifest; and
  • use your Cloud API(s) to transform this functional definition into a working Manifest which details all hosts in your infrastructure and which applications they run

Once we have the Manifest,

we can build numerous tools on top of it to help automate our infrastructure. In addition to deployment, we also use this for configuration management, load balancing, and even DNS. Because it uses your Cloud API to know about the running hosts, you could even use it for service discovery in an autoscaled environment.

Here in Part 1, we’ll build the Infrastructure Manifest. In Part 2 we’ll show how we can use the Manifest to Simplify Deployment.

The important thing to note here is that this process can work regardless of which cloud(s) you’re using, what tools you fancy, or what your favorite color is.

The Manifest

What does this Manifest look like in practice? Well, it depends on how your system is organized.

Like many companies, we organize our deployments into environments and regions within an environment. This allows us to have isolated environments for development, staging, and production, as well as geographically distributed regions within each environment. Within each region, you may also have disparate availability zones for fault-tolerance and failover, if your cloud supports that sort of thing.

In this case, your Manifest may look like this:

{
  "prod": {
    "us-east-1": {
      "appserve01ea1": {
        "applications": [
          "appserve"
        ],
        "zone": "us-east-1a",
        "fqdn": "ec2-1-2-3-4.compute-1.amazonaws.com",
        "private ip": "10.9.8.7",
        "public ip": "1.2.3.4",
        "id": "i-a1234bc5"
      },
      "appserve02ea1": {
        "applications": [
          "appserve"
        ],
        "zone": "us-east-1b",
        "fqdn": "ec2-1-2-3-5.compute-1.amazonaws.com",
        "private ip": "10.9.8.6",
        "public ip": "1.2.3.5",
        "id": "i-b5678cd9"
      }
    }
  },
  "staging": {
    "rs-us-1": {
      "appserve01st1": {
        "applications": [
          "appserve"
        ],
        "zone": "rs-us-1a",
        "fqdn": "10.9.8.7",
        "private ip": "10.9.8.7",
        "public ip": "5.6.7.8",
        "id": "21234567"
      },
      "appserve02st1": {
        "applications": [
          "appserve"
        ],
        "zone": "rs-us-1a",
        "fqdn": "10.9.8.6",
        "private ip": "10.9.8.6",
        "public ip": "5.6.7.9",
        "id": "21234568"
      }
    }
  }
}

We have our environments each in different clouds, from a private OpenStack cluster to Rackspace to Amazon. The important thing is for this JSON representation to be consistent regardless of which cloud each environment and region are located in. The other important thing is the Manifest is dynamic. When your infrastructure changes, the Manifest should immediately and automatically reflect the changes. (Okay, the Cloud APIs are slow, so go ahead and cache it, but not for too long!)

What’s going on behind the scenes to create this Manifest? Remember the two things that constitute this strategy: infrastructure definitions and Cloud API-based transformation.

Infrastructure Definitions

Your infrastructure should have a well-defined contract for the applications it hosts. For example, you may require that “all services expose an HTTP endpoint for health checks,” or that “all services must be located behind a load balancer with at least two machines.” These two stipulations are good examples because they’re critical for many companies’ high-availability architectures.

An application definition for this minimal contract for “appserve” shown in the Manifest could be

{
  "appserve": { "type": "http", "backend": 8080, "frontend": 80, "healthcheck": {"port": 8080, "resource": "/hc"} }
}

Each service in your infrastructure should have a single application definition entry used for all environments and regions.

Now that we’ve defined the application, how do we map it to (virtual) machines running in each environment and region? Again, we have flexibility based on your organization. If you value consistency across environments and want each application to run on its own VM, then you could almost bypass this step. Let’s say you have two applications in addition to “appserve”; that is, you have a RESTful database service (“dataserve”) and a configuration UI (“uiserve”). Your mapping would essentially reduce to this:

{
  "appserve": ["appserve"],
  "dataserve": ["dataserve"],
  "uiserve": ["uiserve"]
}

Since this on its own isn’t very useful, it would likely just be coupled with your Cloud API transformation step.

However, if you’re a cost-sensitive startup like we are, then you might want the flexibility to run more applications on a single VM in some environments and regions; for example, you have a small development environment or you have multiple production regions for geographic distribution but each may receive different amounts of traffic. Thus, your Manifest may look more like this:

{
  "staging": {
    "rs-us-1": {
      "confstack": ["dataserve", "uiserve"],
      "appserve": ["appserve"]
    }
  },
  "prod": {
    "us-east-1": {
      "dataserve": ["dataserve"],
      "uiserve": ["uiserve"],
      "appserve": ["appserve"]
    },
    "eu-west-1": {
      "confstack": ["dataserve", "uiserve"],
      "appserve": ["appserve"]
    }
  }
}

Notice that you can actually have differently named instances in each environment and region; however, consistency will help in designing the tools which consume the Manifest.

Now that we have some infrastructure definitions, let’s look at how we can transform them into the Manifest.

Cloud API Transformation

This is where things get interesting. Now we inject all the “runtime” information from the VMs running in your cloud into the definitions from above to create a working Manifest. If you run all your environments in one cloud, you can use the vendor-specific APIs. Or if you’re concerned with cloud portability, then you may want to choose a cloud-agnostic API. So far, my favorite uniform Cloud API is Apache Libcloud, but its still far from perfect at normalizing across the different Cloud platforms.

The first thing we’re going to need is a mapping from environment and region to their respective Libcloud “driver”. This is where you specify your keys and secrets for each driver.

DRIVERS = {
  "staging": {
    "rs-us-1": lambda: get_driver(Provider.RACKSPACE)("ima", "secret")
  },
  "prod": {
    "us-east-1": lambda: get_driver(Provider.EC2_US_EAST)("aint", "gonna"),
    "eu-west-1": lambda: get_driver(Provider.EC2_EU_WEST)("tell", "you")
  }
}

Why do we use a lambda here? We found that some drivers (cough rackspace cough) like to timeout for no good reason after a while, so for now we just get a new driver instance every time we need it.

Let’s call the application definitions APPLICATIONS and the machine to application mappings APPLICATION_MAPPINGS. We’ll assume they’re defined as part of a single config.js file which we read, parse, and cache for 10 minutes:

@ttl_cache(ttl=600)
def config():
  return load(open('config.js'))

For this example, we’ll assume the hostname-format shown above; that is, a host like appserve01ea1 consists of a server group followed by an instance number and then a globally-unique region-specific suffix. This server-group is what matches a set of application definitions to a VM.

We can use a simple regex to extract this server-group:

BASENAME_RE = re.compile('(?P[a-zA-Z]+)')

The last piece of the puzzle is to smash this infrastructure definition together with the state returned by the Cloud API to produce the final Manifest.

def build_manifest(env, region):
  application_mapping = config()['APPLICATION_MAPPING']
  region_manifest = {}
  for node in DRIVERS[env][region].list_nodes():
    server_group = BASENAME_RE.search(node.name).group('basename')
    applications = application_mapping[env][region].get(server_group, [])
    region_manifest[node.name] = {
      "id": node.id,
      "applications": applications,
      "private_ip": node.private_ips[0],
      "public_ip": node.public_ips[0],
      "zone": node.extra['availability'],
      "fqdn": node.extra['dns_name']
    }
  return region_manifest

This method retrieves the list of all nodes in the environment/region, extracts and maps the server_group to a list of applications, and then adds a (normalized) entry containing the machine info and its applications to the Manifest. I’m not going to delve into the normalization code, but the full code is available as a Gist.

Let’s add the final cherry on top. We can expose this Manifest over HTTP to making building tools on top of it a breeze.

app = Flask(__name__)
 
@app.route('/api/<env>/<region>/manifest')
def list_manifest(env, region):
  return jsonify(build_manifest(env, region))
 
if __name__ == '__main__':
    app.run()

There you have it! A simple python app which exposes your Infrastructure as a REST service.

In my next post, I’ll describe how you can use this as the backbone for your deployments.

Posted in Tutorials.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

 



Log in here!