Load Balancing with HAproxy

Modified: 17 Jun 2014 18:11 UTC
Stability: Unknown

Regardless of the option you choose, using some sort of load balancing solution in your infrastructure is highly recommended. However, if you need a solution that is cost-effective up front, you can setup an open source load balancing solution such as HAproxy.

HAProxy is an open source load balancer that functions as a fast proxy server and provides high availability for TCP and HTTP-based applications. HAproxy is easy to install on a SmartMachine and includes support for SMF. Although not installed on a SmartMachine by default, HAproxy is available as a package in pkgsrc on any SmartMachine you provision. Once installed, you will need to configure HAproxy in various ways to fit your environment.

In this topic: {toc}

This topic shows you how to install and configure the open source load balancer HAproxy in the Joyent Public Cloud.

Installing HAproxy

To install HAproxy, run this command:

pkgin in haproxy

Installation of HAproxy includes an SMF manifest that supports the enabling, disabling, and restarting of the HAproxy persistent daemon process. You can enable that process by running:

svcadm enable haproxy:default

You can learn more about using SMF in the Joyent Public Cloud here.

The default location for HAproxy is:

/opt/local/sbin/haproxy

The default location for the HAproxy configuration file is:

/opt/local/etc/haproxy.cfg

Configuring HAproxy

There are two specific components of an HAproxy configuration that you will need to address: configuration properties and the load balancing algorithms your configuration will use.

HAproxy Configuration Parameters

HAproxy is installed with a configuration file that is populated with default values for various parameters. Before you start using HAproxy, you will need to set these parameters to values that support your environment. HAproxy configuration parameters are broken down into three primary components:

Below are some notable parameters.

Parameter Value Description
mode tcp, http, or health This parameters sets the mode for the proxy instance. Use one of the three following values with this parameter.

About Proxy Sections

Proxy sections are defined in the following way: Proxy Description
defaults <name> Sets default parameters for all sections following its declaration.
If you include more than one defaults section in a configuration file, the default parameter values are reset by the last listed section.
frontend <name>
Describes a set of listening sockets accepting client connections.
backend <name>
Describes a set of servers to which the proxy will connect to forwarded incoming connections.
listen <name>
Defines a complete proxy with its frontend and backend parts combined in one section.
Generally, most useful for TCP-only traffic.

All proxy names must be formed from upper and lower case letters, digits, '-' (dash), '_' (underscore) , '.' (dot) and ':' (colon). ACL names are case-sensitive, which means that "www" and "WWW" are two different proxies.

HAproxy Load Balancing Algorithms

The algoritm you define determines how HAproxy balances load across your servers. You can set the algorithm to use with the balance parameter.

Round Robin

Requests are rotated among the servers in the backend.

Servers declared in the backend section also accept a weight parameter which specifies their relative weight. When balancing load, the Round Robin algorithm will respect that weight ratio.
Example:

...
option tcplog
balance roundrobin
maxconn 10000
...

Static Round Robin

Each server is used in turn, according to the defined weight for the server. This algorithm is a static version of the round-robin algoritm, which means that changing the weight ratio for a server on the fly will have no effect. However, you can define as many servers as you like with this algorithm. In addition, when a server comes online, this algoritm ensures that the server is immediately reintroduced into the farm after re-computing the full map. This algoritm also consome slightly less CPU cycles (around -1%).

Example:

...
option tcplog
balance static-rr
maxconn 10000
...

Least Connection

Each server is used in turn, according to the defined weight for the server. This algorithm is a static version of the round-robin algoritm, which means that changing the weight ratio for a server on the fly will have no effect. However, you can define as many servers as you like with this algorithm. In addition, when a server comes online, this algoritm ensures that the server is immediately reintroduced into the farm after re-computing the full map. This algoritm also consome slightly less CPU cycles than the Round Robin algorithm (around -1%).

Example:

...
option tcplog
balance leastconn
maxconn 10000
...

Source

A hash of the source IP is divided by the total weight of the running servers to determine which server will receive the request. This ensures that clients from the same IP address always hit the same server, which is a poor man's session persistence solution.

Example:

...
option tcplog
balance source
maxconn 10000
...

URI

This algorithm hashes either the left part of the URI (before the question mark) or the whole URI (if the whole parameter is present) and divides the hash value by the total weight of the running servers. The result designates which server will receive the request. This ensures that the proxy will always direct the same URI to the same server as long as all servers remain online.

This is used with proxy caches and anti-virus proxies in order to maximize the cache hit rate. This algorithm is static by default, which means that changing a server's weight on the fly will have no effect. However, you can change this using a hash-type parameter.

You can only use this algorithm for a configuration with an HTTP backend.

Exampple:

...
option tcplog
balance uri
maxconn 10000
...

URL Parameter

The URL parameter specified in argument will be looked up in the query string of each HTTP GET request.

You can use this algorithm to check specific parts of the URL, such as values sent through POST requests. For example, you can set this algorithm to direct a request that specifies a user_id with a specific value to the same server using the url_param method. Essentially, this is another way of achieving session persistence in some cases (see the official HAproxy documentation for more information).

Example:

...
option tcplog
balance url_param userid
maxconn 10000
...

or

...
option tcplog
balance url_param session_id check_post 64
maxconn 10000
...

Configuration File Examples

The following are example configurations for HAproxy. You can use these as a base for your own configurations.

The IP addresses noted here are just an example. If you use these examples as a template, ensure you modify relevant IPs to fit your environment.

Round Robin

global
        log 127.0.0.1 local0
        log 127.0.0.1 local1 notice
        #log loghost  local0 info
        maxconn 4096
        #chroot /usr/share/haproxy
        uid 99
        gid 99
        daemon
        #debug
        #quiet
defaults
        log global
        mode http
        option httplog
        option dontlognull
        retries 3
        redispatch
        maxconn 2000
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
listen http 127.0.0.1:8080
        mode tcp
        option tcplog
        balance roundrobin
        maxconn 10000
        server web01 10.2.2.4:5000 maxconn 5000
        server web02 10.2.2.5:5001 maxconn 5000

Round Robin with X-Forward-For Header

global
        log 127.0.0.1 local0
        log 127.0.0.1 local1 notice
        #log loghost local0 info
        maxconn 4096
        #chroot /usr/share/haproxy
        uid 99
        gid 99
        daemon
        #debug
        #quiet
defaults
        log global
        mode http
        option httplog
        option dontlognull
        option forwardfor
        retries 3
        redispatch
        maxconn 2000
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
listen http 127.0.0.1:8080
        mode tcp
        option tcplog
        balance roundrobin
        maxconn 10000
        server web01 127.0.0.1:5000 maxconn 5000
        server web02 127.0.0.1:5001 maxconn 5000

Least Connections

global
        log 127.0.0.1 local0
        log 127.0.0.1 local1 notice
        #log loghost local0 info
        maxconn 4096
        #chroot /usr/share/haproxy
        uid 99
        gid 99
        daemon
        #debug
        #quiet
defaults
        log global
        mode http
        option httplog
        option dontlognull
        option forwardfor
        retries 3
        redispatch
        maxconn 2000
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
listen http 127.0.0.1:8080
        mode tcp
        option tcplog
        balance leastconn
        maxconn 10000
        server web01 127.0.0.1:5000 maxconn 5000
        server web02 127.0.0.1:5001 maxconn 5000

References

For more information on HAproxy, see the full set of HAproxy documentation found here