Tuesday was one of my other colleagues birthday so that day I was 'on call'. There were about 20 alerts from the monitoring system and 6 calls for help. So much for a break. The most serious was that on the server that handles enquiries from listeners to radio stations or visitors to web sites. Our servers have multiple hard disks in what is called a RAID system. The theory of this is that if one hard disk fails the other takes over... but... this only applies to data not to the 'system' which cannot be RAIDed without extra expensive hardware. Since it's the data that is critical to us we throught that this was the best way forward.

Of course, this RAID system of multiple hard disks is the best way forward except during a Christmas break! The data was fine, but the system disk went unreliable. On Tuesday I 'patched it up' attempting to correct errors on the hard disk, with the aim of keeping it going till the following week when we were all back at work. Good theory. Didn't work in practice.
Wednesday... the hard disk failed totally. Peter is 'on call' so I can relax... hmmm... good theory? Other colleagues all round the world phoned me on my mobile, and didn't follow the correct procedure which would have put them in touch with him. Grrrrr... some un-Christian thoughts passed my mind! Messy day and Peter began the process of trying to sort out the mess.

In the process I have re-built the system in a different way. Instead of dual hard drives in a RAID system which protected the data but left us with rebuilds whenever there were problems with the system disk I have used the spare server as a 'mirror' for the main one. The theory of this is that if on or other server fails the other can take over. We shall see!
No comments:
Post a Comment