Tuesday, September 18, 2007

How I Benchmark Postfix

The goal of benchmarking an MTA can be to determine the number of messages that can be relayed through it under ideal conditions, and also under realistic conditions.

Ideal conditions means without bottlenecks. Bottlenecks for a MTA include network and I/O latency, and concurrency. Under ideal conditions, the MTA would never be blocked on reads and writes to/from its peers on the network, or its queues on disk. It would also be able to accept and establish as many simultaneous connections as needed at any given point in time.

Of course, having those kind of resources is unrealistic, but understanding how they affect the MTA's performance is important. When benchmarking the MTA, it is easy to identify the aspects of each element by introducing them individually, or one at a time. For instance, we can start a series of benchmarks by wiring the MTA directly its peers, NIC to NIC, eliminating any latency introduced by going through external network devices such as hubs, switches, or routers. Likewise, we can practically eliminate I/O latency by setting up the queue in RAM. Eliminating concurrency as a factor is much more difficult, since it depends on multiple factors, like CPU, OS, and MTA throughput capabilities. However, by slowly ramping up (or reducing) the concurrency parameters, we can plot a chart depicting performance increases or decreases with relation to concurrency.

Once we've established baseline numbers under ideal conditions, we can methodically introduce realistic bottlenecks until we get to a state that is representative of the actual production environment.

To begin, you need 2 boxes with Postfix installed, including the smtp-source and smtp-sink utilities, which are included w/the Postfix source code. We'll call the host you want to benchmark hostA. Modify hostA's main.cf file to relay everything directly to the other host, hostB:

relayhost = [hostB]

On hostB, leave Postfix down. To eliminate latency that would be introduced by relaying to another MTA, we will run only the smtp-sink process, a SMTP bit bucket, on port 25 on hostB, with connection caching and high concurrency enabled:

# smtp-sink -4c :25 1024

On hostA, the spool directory should be on a RAM disk. On Solaris systems, /tmp is a virtual RAM disk, so you can just copy the queue dir over there, and then modify the queue_directory parameter in main.cf to point to the /tmp copy. On Linux systems, you can create a virtual RAM disk by using the tmpfs file system. To determine if your system supports it grep tmpfs from /proc/filesystems:

# grep tmpfs /proc/filesystems
nodev tmpfs
#

If you don't see any output from your grep, you don't support it... otherwise, just create a virtual file system like so, copy your queue directory over to it, and then point Postfix at it by modifying the queue_directory parameter in main.cf:

# mount tmpfs /mnt -t tmpfs -o size=128m

To ensure maximum concurrency, you will probably want to jack up hostA's MTA's default_process_limit value in main.cf. I set mine to 1024, but that may be too much for your system. I'm benchmarking with a pair of 4-core Sun T2000s, which are designed to support very high concurrency.

If you can, make sure the two hosts are wired directly to each other, NIC to NIC. Depending on your NICs, that may require a crossover cable. The NICs on the T2000 are auto-sensing, so a regular 8P8C cable will do.

Now the fun begins... On hostA, we will use the smtp-source command to inject 10,000 25 kB messages with 256 concurrent connections, and connection caching.

# date; smtp-source -4dNcm 10000 -s 256 -l 25000 -f foo@bar.com -t bar@foo.com localhost:25

This series of commands will display the exact time, then send 10000 messages, then display the time again. The command exits when all 10000 messages have been accepted into the queue by Postfix. The benchmark is not over until every message has been relayed to hostB. To determine the exact second that happens, you will need to repeatedly check the mail queue:

# mailq |tail -1; date

When the mail queue is finally empty, the benchmark is over. Calculate messages per minute (MPM) by taking the number of seconds between the first date call (from the smtp-source call) to the last date displayed when the mail queue was finally empty, divide 10000 by that number, and multiply that quotient by 60.

Typical results for my T2000s are between 14600 and 16200. Moving the queue directory to a real disk gives results in the mid 5000s, and then changing from a straight shot MTA->smtp-sink to MTA->MTA->smtp-sink (i.e. starting Postfix on hostB, and having it relay everything to the smtp-sink process) results in numbers comparable to what is actually seen in the production environment, which is between 4500 and 4800 MPM.

I'm cutting this blog entry short now because, although I haven't done anything yet to scale down the concurrency factor, that would only be an exercise in methodology, and for all practical purposes I've already provided sufficient evidence that I/O is by far my MTA's lowest hanging fruit in terms of upgrade potential. Since I'm only operating on a pair of mirrored 15K SAS drives, which are also the same drives that every other file system on the box is on, so I have to make sure routine I/O outside of the MTA is minimized, including logging only to a remote host, except for possibly low traffic logs like the system log. It is obvious that if I ever need more performance out of the box, an upgrade to a high-speed array would be the way to go.

No comments:

Blog Archive