Distributing user load and doing server maintenance

If you have multiple servers you usually want to spread the user load among them. Also, if you have maintenance scheduled for a particular server, you will want to steer users away from that server. This article explains the typical way to do that in the IRC world.

Using DNS round robin
Most irc networks use a setup like this:
 * irc1.example.net has an IP address of 1.1.1.1
 * irc2.example.net has an IP address of 2.2.2.2
 * You may or may not also have a hub server, eg hub.example.net
 * irc.example.net will then point to both 1.1.1.1 and 2.2.2.2

As you can see, the DNS record irc.example.net points to the IP addresses of all the IRC servers that are meant for accepting clients, so the IP of irc1 (1.1.1.1) and irc2 (2.2.2.2). You tell users to connect to irc.example.net and new connections will automatically be distributed 50%/50% to irc1 and irc2. This feature is called DNS round-robin, round-robin DNS, or DNS RR for short. See also the wikipedia entry.

Also, if for some reason one of your irc servers goes offline, like irc1, then your users would still have a 50% chance each time to connect to irc2 which is still up. So it is also a resilient setup.

This example can be extended if you have many more servers, like an irc3.example.net, irc4.example.net, and so on. The principle stays the same.

Doing server maintenance
Every once in a while you will have to take down or restart an IRC server, reasons could be:
 * Hardware maintenance
 * A kernel upgrade or distribution upgrade (eg Ubuntu 20.04 to 22.04)
 * An upgrade of UnrealIRCd

If you know ahead of time that there will be a down period, then you can take the server IP out of DNS round robin. So in the example we used earlier, if you are going to do maintenance on irc1 then you would take the IP address of irc1 (1.1.1.1) out of irc.example.net, so it only points to the IP address of irc2.

Any new users connecting to irc.example.net will then connect to irc2 (2.2.2.2). If you wait a while, like a couple of days or a week, then the number of users on irc1 will be much less. Only people on long running bouncers that have not disconnected in like a week would still be connected to irc1.

Now you do your maintenance. Take the server down, or do whatever you need to do. Start up the server again. Finally, you put the IP address of irc1 (1.1.1.1) back in the round robin DNS of irc.example.net so new users start to connect to irc1 too.

Or, of course, maybe now you would like prepare to do maintenance of the OTHER server, so you take out irc2 of DNS RR of irc.example.net, and repeat the same procedure.

Drawbacks
Taking a server out of DNS round robin and doing maintenance is the more user-friendly approach, because you will disconnect less users (only those that have been connected to irc1 for a week).

Still, there are some drawbacks, especially if you only have 2 servers (the drawbacks are less if you have 3 servers or more):
 * When you take irc1 out of DNS RR and you have irc.example.net only point to 1 IP address (the one of irc2), you are more vulnerable to downtime. If irc2 goes down due to some server or network problem, nobody can connect to your IRC network anymore because it was the only listed IP address. And if this period is like a week, that may be quite a risk.
 * When you do restart the server, and already quite before that, irc2 will have close to 100% of your users, and once irc1 is back up and re-added to irc.example.net it will take quite a while before the load goes from 100%/0% to 50%/50%.

For this reason, if you have a simple two server network and are ONLY doing a quick UnrealIRCd upgrade with a downtime of a few seconds, you may still prefer the old method of Upgrading a server without DNS round robin (see next).

Upgrading a server without DNS round robin
If you have multiple servers and you are only upgrading like irc1 from UnrealIRCd version X to version Y, and there will likely only be a few seconds of downtime (the IRC server restart), then you could also not take it out of DNS round robin and just do the upgrade. This will be a bit messy since there will be more users (re)connecting than if you did the Doing server maintenance as explained earlier. But it is an option if you don't want to wait a week or don't really care about some users reconnecting (eg. at night) or you don't like the drawbacks that were mentioned above.

More advanced DNS tricks
It is possible to use more dynamic DNS rules, like Amazon Route53. Then you can do things like: All of this is usually only done on the biggest networks though, so it isn't used much and the instructions very much depend on the DNS provider. This is beyond the scope of this article.
 * If a server goes down, automatically take the server out of DNS Round Robin so new clients don't try to connect to it anymore
 * Check the user load, and send more users to the IRC servers that have fewer users
 * Geographical distribution is possible too, eg. send European users to the European server(s)