Monday, December 23, 2013

Ops Design Pattern: local haproxy talking to service layer

Modern Web application architectures are often composed of a front-end application layer (app servers running Java/Python/Ruby aided by a generous helping of JavaScript) talking to one or more layers of services (RESTful or not) which in turn may talk to a distributed database such as Riak.

Typically we see a pair of load balancers in HA mode deployed up front and handling user traffic, or more commonly serving as the origin for CDN traffic. In order to avoid deploying many pairs of load balancers in between the front-end app server layer and various services layers, or in between one service layer and another, one design pattern I've successfully used is an haproxy instance running locally (on 127.0.0.1) on each node that needs to talk to N other nodes running some type of service. This approach has several advantages:

  • No need to use up nodes, be they bare-metal servers, cloud instances or VMs, for the sole purpose of running yet another haproxy instance (and you actually need 2 nodes for an HA configuration, plus you need to run keepalive or something similar on each node)
  • Potentially fewer bottlenecks, as each node fans out to all other services it needs to talk to, with no need to go first through a centralized load balancer
  • Easy deployment via Chef or Puppet, by simply adding the installation of the haproxy instance to the app node's cookbook/manifest
The main disadvantage of this solution is an increased number of health checks against the service nodes behind each haproxy (1 health check from each app node). Also, as @lusis pointed out, in some scenarios, for example when the local haproxy instances talk to a Riak cluster, there is the possibility of each app node seeing a different image of the cluster in terms of the particular Riak node(s) it gets the data from (but I think with Riak this is the case even with a centralized load balancer approach).

In any case, I recommend this approach which has worked really well for us here at NastyGal. I used a similar approach in the past as well at Evite. 

Thanks to @bdha for spurring an interesting Twitter thread about this approach, and to @lusis and @obfuscurity for jumping into the discussion with their experiences. As @garethr said, somebody needs to start documenting these patterns!

Thursday, December 12, 2013

Setting HTTP request headers in haproxy and interpolating variables

I had the need to set a custom HTTP request header in haproxy. For versions up to 1.4.x, the way to do this is :

reqadd X-Custom-Header:\ some_string

However, some_string is just a static string, and I could see no way of interpolating a variable in the string. Googling around, this is possible in haproxy 1.5.x with this method:

http-request set-header X-Custom-Header %[dst_port]

where dst_port is the variable we want to interpolate and %[variable] is the syntax for interpolation.

Other examples of variables available for you in haproxy.cfg are in Section 7.3 "Fetching samples" in the haproxy 1.5 configuration manual.

Tuesday, December 10, 2013

Creating sensu alerts based on graphite data

We had the need to create Sensu alerts based on some of the metrics we send to Graphite. Googling around, I found this nice post by @ulfmansson talking about the way they did it at Recorded Future. Ulf recommends using Sean Porter's check-data.rb Sensu plugin for alerting based on Graphite data. It wasn't clear how to call the plugin, so we experimented a bit and came up with something along these lines (note that check-data.rb requires the sensu-plugin gem):

$ ruby check-data.rb -s graphite.example.com -t "movingAverage(stats.lb1.prod.assets-backend.session_current,10)" -w 100 -c 200

This run the check-data.rb script against the server graphite.example.com (-s option) requesting the value or the target metric movingAverage(stats.lb1.prod.assets-backend.session_current,10) (-t option) and setting a warning threshold of 100 for this value (-w option), and a critical threshold of 200 (-c option).  The target can be any function supported by Graphite. In this example, it is a 10-minute moving average for the number of sessions for the "assets" haproxy backend. By default check-data.rb looks at the last 10 minutes of Graphite data (this can be changed by specifying something like -f "-5mins").

To call the check in the context of sensu, you need to deploy it to the client which will run it, and configure the check on the Sensu server in a json file in /etc/sensu/conf.d/checks:

"command": "/etc/sensu/plugins/check-data.rb -s graphite.example.com -t \"movingAverage(stats.lb1.prod.assets-backend.session_current,10)\" -w 100 -c 200"

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...