Last month I went camping with a couple of tech buddies that also share a love of horses and riding. One of the more interesting campfire discussions that came up was the parallels between deploying a server, and starting a young horse. Maybe the inspiration for this theory came from an O'Reilly book cover that subtly reminds every administrator that Linux will not always behave, and you may need to use the spurs once in awhile. In both cases of equine and computer systems, we agreed that the initial training is the most critical aspect, because it sets up the horse (and the server) for success later on in its life.
Without going into the finer details, a horse is trained by a series of steps that include (but aren't limited to) leading, sacking out, longeing, saddling, driving, and the eventual first ride. After those steps are completed, the horse is put through a minimum of 30 days riding, slowly introducing different exercises based on what you want to do with the horse. A horse used for trail riding will turn out differently from a horse used for roping cattle, so there is no single standard procedure that applies to what you do in those 30 days.
A server, on the other hand, is mounted, booted up, installed over a network (or an install CD), configured, documented, and then put into service. I am a firm believer of the practice that a server install process should do everything from laying out filesystems, installing packages, and setting local configuration parameters. Once that server reboots at the completion of the process, it should be production ready. There is no official 30 days of server tuning, at least not in theory. In practice, we experience the 30 days without realizing it.
It's not uncommon to find bugs creeping into both systems after we assume the job is complete. On a server, the admin may discover that partitioning wasn't sufficient to handle the amount of logging data that's generated. On a horse, the trainer may discover that the horse has learned how to use tree branches to get rid of his rider. In both cases, the issue needs to be resolved immediately. If it's ignored, we trick ourselves into accepting the problem and settling for a server that will need constant attention, or a horse that will never be cooperative. Horse trainers know how to deal with these problems; a lot of systems administrators do not.
A new server deployment should go through increased scrutiny and extra non-invasive testing during its first 30 days of deployment. Was every package installed? Was a port scan performed? What happens if I run a command outside of its normal scheduled time in cron? Invite other administrators to log on and take a look around. There isn't always a written outline or test case for everything. Sometimes you can only find problems from random observation.
If an issue comes up, address it immediately. Apply the appropriate change to your install process, whether it's Jumpstart, Kickstart, or whatever. It doesn't make sense to correct a problem on your first server, only to repeat that same mistake when you deploy the second server. If the problem is serious and you can afford some downtime, it doesn't hurt to re-install it from the beginning, provided that you know you're fixing a problem that will make your job easier down the road.
Thirty days gives you a reasonable window to assess how your server will operate unattended. If you feel it's needed, extend it to 60 days. The important thing to do is to document what happens during that time frame, so other administrators know how the server was tested, and how to handle issues that may come up.
After the 30 days is over, hand off that server to another administrator, because it's fully trained and should be handled by the operational division.