Skip to content


Lesson Learned: Circuit Breakers

I just finished reading Release It! by Michael T. Nygard. Unfortunately, however, I didn’t learn about circuit breakers until the app featured in the “Intro to Streams” series (part 1, part 2) was complete. Let’s walk through the streaming example again and add a circuit breaker to protect the integration point.

Problem

tl;dr – External integration points break in unexpected ways. Always insulate your code from these integration points with circuit breakers.

It turns out that the FTP client had some unexpected failure modes when used with certain FTP servers and cloud-based FTP providers. For example, tnftp simply stops responding when attempting to STOR a file with an invalid path. (Broken FTP protocol?) Likewise, hostedftp.com allows you to connect with expired credentials but none of the FTP data commands will work (like uploading a file); the data channel/socket is immediately closed. The second case was arguably better, resulting in an unhandled exception that killed the process. The first problem, on the other hand, resulted in a locked system with little indication as to the cause; since the FTP client didn’t emit any events (error or otherwise), the node server never closed the HTTP response. (Yes, I would like to move the upload out of the request thread, but I’m going with a simplest-solution-first approach.)

Solution

We need a circuit breaker! Circuit breakers serve two purposes:

  • to protect clients from down, slow, or broken services
  • to protect services from excess client demand when over capacity

Circuit breakers should be applied to almost every integration point (the exception being handshake-based protocols). If the service returns too many errors or responds too slowly, the breaker trips to the Open state. When in this state, any client attempts to reach the service will fail fast. When the breaker has been Open for some specified period of time, it will move to the Half Open state, allowing the client’s next request to reach the service. Depending on the result of this call, the circuit breaker will either move to the Closed state or back to the Open state.

A state transition diagram for a simple breaker might make this clearer:

circuit-breaker-state-transition

With this in mind, recall the streaming publish function introduced in Intro to Streams (Part 2). The version presented here is simplified some so we can focus only on the relevant bits.

function publish(data) {
  data
    .pipe(ftp())
    .on('error', function(err) {
      self.emit('error', err);
    })
    .on('success', function() {
      self.emit('success');
    });
}

Using Matt Weagle’s excellent port of Akka’s CircuitBreaker pattern to Node.js, its easy to wrap a circuit breaker around this function.

  function _publish(name, callback) {
    datastore.getStream(name)
      .pipe(ftp())
      .on('error', callback)
      .on('success', callback);
  }
 
  var breakers = {},
    MAX_FAILURES = 5,
    CALL_TIMEOUT_MS = 10000,  // 10 seconds
    RESET_TIMEOUT_MS = 60000; // 1 minute
 
  function publish(name) {
    breakers[name] = breakers[name] || CircuitBreaker.new_circuit_breaker(_publish, MAX_FAILURES, CALL_TIMEOUT_MS, RESET_TIMEOUT_MS);
    breakers[name](name, function(err) {
      if (err) {
        self.emit('error', err);
      } else {
        self.emit('success');
      }
    });
  };

You don’t have to worry about the details of the ftp() function here. This could be any stream to an external integration point. We had to restructure things a bit so that we had a callback that accepts an error for use by the circuit-breaker library; no big deal.

However, there are a couple subtle points in here.

  • There’s one circuit breaker per integration point. In this case, we support an arbitrary number of integration points identified by a “name” string. This allows any integration point to fail and trip its breaker independently of other external services.
  • The publish function no longer accepts a data stream; rather, it retrieves the stream from the store itself. This is because we don’t want to create streams that aren’t going to be published (because the breaker is open and we’re failing fast). If this data stream comes from the file system, for instance, we’d soon run into a “too many open file descriptors” problem.

Not too bad for automagically protecting our client from external integration points.

How do you protect your applications from integration points? Do you have any techniques or libraries I should learn about?

Posted in Tutorials.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

 



Log in here!