Loadbalancer-less clusters on Linux

Computer/Linux

Loadbalancer-less clusters on Linux

장선생™ 2005. 6. 15. 11:36

(원문 그대로 올립니다)

When you think of implementing virtual servers, either because you need to cope with a higher load or to provide enhanced reliability, you face a new problem: how to avoid making the load balancers (or directors) the bottleneck and the single point of failure for the whole architecture. And there is an associated cost with traditional load balancing as you also need to add one (or better two for redundancy) more servers for the load balancer function itself.

The solution is simple: don’t use load balancers at all.

Clusterip is a relatively new iptables extension written by Harald Welte, which allows the configuration of server farms (or clusters) without load balancers (or directors in Linux Virtual Server jargon). The iptables clusterip module is included with the latest 2.6 kernels so it will be present right out of the box in most modern Linux distributions.

The idea behind clusterip is simple: all servers in the farm will present a common ethernet MAC address (which is a multicast MAC address) for the virtual IP address (VIP), so ARP requests for this VIP will be responded by any node in the cluster using this common MAC address. The node handling any given IP packet is determined by a hashing algorithm that we’ll review in a few moments.

Clusterip is actually an iptables target extension. It supports a few parameters:

“—new ”
Create a new ClusterIP. You always have to set this on the first rule for a given ClusterIP

“—hashmode ” “mode”
Specify the hashing mode. Has to be one of sourceip, sourceip-sourceport, sourceip-sourceport-destport

“—clustermac ” “mac”
Specify the ClusterIP MAC address. Has to be a link-layer multicast address

“—total-nodes ” “num”
Number of total nodes within this cluster

“—local-node ” “num”
Local node number within this cluster

While some of the parameters are self explanatory, others may require some discussion.

Hashmode specifies the way requests will be distributed among different nodes: Sourceip will assign all traffic sourced by a single IP to a single server in the farm. This means that if thousands of requests are coming from a single IP (i.e. a proxy server), all those requests will be assigned to the same server (the traffic distribution will be less than optimal). Sourceip-sourceport and sourceip-sourceport-destport will provide a more even distribution of traffic, but will require more memory to hold larger hash tables.

Clustermac determines the virtual MAC that will be used to respond to ARP requests. The only requirements are that it needs to be the same in all the nodes of the cluster, and that it needs to belong to the range of multicast ethernet MAC addresses. A multicast MAC address is indicated by the low order bit of the first byte (which, by the way, is the first one in the wire). If the servers are connected to an ethernet switch, the use of a multicast MAC address forces the switch to send this packet to all the ports.

An example configuration for a two nodes cluster would be:



Node 1

iptables - A INPUT - d 192.168.1.1 - i eth0 - p tcp --dport 80 - j CLUSTERIP --new --hashmode sourceip --clustermac 01:23:45:67:89:AB --total- nodes 2 --local-node 1



Node 2

iptables - A INPUT - d 192.168.1.1 - i eth0 - p tcp --dport 80 - j CLUSTERIP --new --hashmode sourceip --clustermac 01:23:45:67:89:AB --total- nodes 2 --local-node 2

In our example, 192.168.1.1 is the Virtual IP address (VIP), we load balance HTTP traffic to port 80 (web servers), our hashing algorithm is based solely on the source IP address (not as even in traffic distribution but frugal in memory requirements), the multicast MAC address is 01:23:45:67:89:AB (the low order bit in the first byte must be on, therefore the most significant byte must be an odd number) and we have two nodes. Each node receives its own node number under --local-node.

If you execute: cat /proc/net/ipt_CLUSTERIP/192.168.1.1 in one of the nodes, you’ll see that it will return the node number of that node. This is more than just an identifier: it really makes that node attend requests addressed for that node number.

If one of the nodes dies (let’s say node 1), you will want to assign another node (in this case node 2) to respond for those queries, so you’ll execute in node 2: echo "+1" > /proc/net/ipt_CLUSTERIP/192.168.1.1.

Now if you take a look at /proc/net/ipt_CLUSTERIP/192.168.1.1 you’ll see that it will respond to queries for both nodes (2,1).

So, clusterip provides you with functionality to have loadbalancer-less clusters, thus avoiding the bottleneck and single point of failure that a load-balancer may represent, and it does it with high availability in mind. But are there any caveats?

The first caveat is that clusterip is still marked experimental, which means that either it may make wonders for you, or it may not work at all (nasty bugs may/will be present). On the other hand, due to a recent patch to clusterip, the version in the latest kernels got out of sync with the userland, so some combinations of kernels and iptables won’t work (you need either an older userland iptables or a very recent kernel).

Regardless of the caveats and warnings, be sure that the need for something like clusterip in Linux, will bring enough testers to squash the current bugs very soon.