Archive for August, 2007

Multi-homing and high-availability

Last night I worked on punbb a little - mainly to add support for hosting multiple sites with a single copy of the software. I was successful to a certain degree. I had to modify the cache folder layout o include a sub-directory for each forum instance, and those sub-folders have to be created manually. All in all, my changes were very minor and I’ll likely create a diff for those who are interested in doing the same.

I’ve also been looking for a new way to manage virtual hosts. I prefer to keep my web server configuration focused on the network - ip addresses and what not, and leave the handling of domains and urls for scripts which can be updated on the fly without restarting the web server.

Beyond that, I’ve been exploring multi-homing. It is a quick and easy way to
acheive high-availability. As I understand it, you assign two different ip addresses to a single a record, and supporting clients will be able to try either one, and if one doesn’t work, they’ll use the other. This provides very basic “round-robin” load balancing as well, but the distribution is hit-or-miss.

Servers that can be relatively easily multi-homed include web servers and smtp servers. Web servers are a little more challenging because you have to keep the content in sync, but mysql’s replication capabilities make that a lot easier. In the case of simple replication, all you have to do is make sure that only the primary server is updated, which is a simple task with a scripted if / then statement.

imap service would be a terrific service to multi-home, but it poses a well-known and difficult challenge: synchronized, distributed, fault-tolerant file systems. Trying to side step that challenge as referred to with a web server by only updating the master, you cannoteasily ensure that the primary file system is updated.

It might be possible to use something like DRDB to keep two file systems in syncronization, but in my experience you need a dedicated (100+ Mbit) network to make that happen. Given the size of most emails, it might be possible over a 1Mbit line.

An alternative technique would be to not use multi-homing but instead use a failover switch to a secondary and regularly syncronized backup. When the primary server fails, the imap dns entry would need to update to point towards the backup imap server, requiring a low ttl (300 seconds?) for practical use.
In this way, you can use the built-in backup mail transfer (mx) dns records to point to either multiple smtp server and the secondary and tertiary servers will hold the mail until the primary is back online.

xmmp and voip is another potential candidate for multihoming or high-availability failover.

techniques like drdb are better suited towards single location setups, to make sure that a service doesn’t go down even when the service is up. along those lines, multi-homing and failovers should be the last step in the chain of failure - meaning that everything (within reason) that could have gone wrong would not have caused interruption, even if multi-homing were not in place.

For that to happen, you’d need to have a highly available filesystem, with RAID and DRDB, multiple, a load balancer, systems monitor, and load director, as well as have those be highly available. The problem with that setup is that for the majority of the time the setup is major overkill, and for 24/7 operations, the energy costs can get substantial. I’ve been conducting a bunch of research in this area and I think I’ve come up with some pretty good solutions.

My interests are are now focused on the pfSense distribution, and its implementation of CARP - or common address routing protocol, as well as its load balancing system. I’ve recently purchased two WRAP boards to configure this with, however they only have two ethernet ports each, so I might be looking at using a wireless card for having them monitor each other, which wouldn’t be too big of a deal, unless they can monitor each other through the lan, but I’ve heard a danger in this setup is a “split brain” condition in which both routers are unable to connect to each other and are both trying to be the master.

Behind that setup, I plan to setup a cluster of:

* load-balanced, high-availability pxe-booted web, email, dns, and ntp servers
* mysql ndb cluster
* a highly available nfs drbd server used to provide the storage requirements for the above mentioned servers.

Services that “will never go down” are still subject to the possibility of a regional network outage, and that is where the multihoming and failover takes place.

Another strategy I’m exploring is the idea of outgoing load-balancing. I’m still unsure how that fits into this plan. I’m not sure if that is part of the work I’m planning for with carp.

And yet another concept I’ve been exploring is that of virtualization. In some cases it might not make sense to put separate different services (like ntp and dns) on separate machines, but it might make sense to put them on separate virtual machines for process isolation and security purposes.

By doubling up on everything and setting up multi-homing or failovers, you protect yourself to a certain degree, but the failover servers “waiting for the primary server to fail” are a waste of energy. Futhermore, as service demands ebb and flow, the number of machines in a service pool does not need to remain constant. For these reasons, it makes sense to consider the use of different power levels in each of the machines.

For example, the following techniques can be employed to save energy, yet provide backup machines in the event of failure:
* cpu frequency and voltage scaling
* hard disk standby
* sleep states
* powering off the machine

The ability to control the power given to servers is a necessity, but not easily achieved. ACPI, sleep states, wake-on-lan, and switched power distributions can help.

Monitoring, logging, and remote controls.
To-do.

Multiple A Record Priorities?

I often wondered whether priorities assigned to multiple a records for the same dns zone has any affect. I didn’t think so, and also thought that priorities only really matter for MX records (and thanks to spammers, they don’t matter that much their either!).

I did a little research today and found someone who agrees:

http://www.webhostingtalk.com/showthread.php?t=555564

So when using multiple a records, the priority and preference values have no meaning.

I also found a clearer nomenclature for “multiple a records”: round robin dns.

http://en.wikipedia.org/wiki/Round_robin_DNS

In my experiments with round robin dns, it turns out not every client can figure it out, so you really do need something like carp or bgp to step in when there is a network outage. Buckets. :-(

I just found this:
http://www.isc.org/index.pl?/sw/bind/docs/bind-load-bal.php

Towards the end of the paper is describes SRV records, and how they can be used for DNS load balancing. However, SRV records aren’t used by http clients (browsers) at the moment.

XPath and Testing




XPath is a language used in selenium to help give directions to a specific element on a page. The following path contains the coordinates to the text ASSETS +.

//td[text()='ASSETS +']

The best place to learn about XPath that I’ve found is ZVON.

Wordpress Updates

I’ve got a couple of Wordpress updates in my mind… a trigger which clears the WP-Cache when someone posts a comment.

While not really an update, I’m planning to also have a trigger which changes the database host when a post is made, enabling the use of replicated databases for selects, and a primary host for updates, deletes, or inserts.

UPDATE: I’m now using multiple databases with wordpress. A local database for reading, and a master which is hosted on another computer for inserts, updates, and deletes, as described above. It works AWESOME! So fast, highly distributed and available, and generally cool.

UPDATE 2: For remote updates, I’d like to turn on compression. Apparently PHP is able to do so with the MYSQL_CLIENT_COMPRESS flag in the mysql_connect function.

UPDATE 3: I’m getting a lot of problems with spam karma 2 and a multi-database setup. I’ll have to either rework spam karma 2, rework the wordpress config files, or setup sql-relay.

UPDATE 4 November 24, 2007: I finally got around to creating the trigger which flushes all the cached files when an ANY edit is made. I also am using a custom cache location, and need to make that configurable before sharing.