Changed the motherboard on one of the servers yesterday... the one with the groundhog day problem... and some things are running much better. It now runs on a clock that stays pretty much on time and we can correctly monitor the motherboard temperature, fan speed, power supply voltages etc. But... the second network card doesn't seem to work with this motherboard.
We have a system for monitoring all the servers, and all the services on each server which is supposed to send us an SMS message if something goes wrong. Peter has been off all week, so I have in between been trying to get the monitoring system working with all the new servers and services.
We have nearly 300 services to monitor and these are continually checked and and problems reported to us. About 10% of those services are proving an absolute pain to get monitoring properly. One such service is an alarm if the number of email messages waiting to be delivered gets greater than a preset limit. If this happens it normally means something else is wrong eslewhere, but its a useful indicator of a potential problem. Of 10 of the servers that have email going through them I configured 5 to monitor the number of email messages with no problem, but the other 5, with identical configurations are refusing to work. Why?
These technical problems are extremely boring. I would rather be working on media projects, but things like the monitoring service enable us to use the systems without spending all day every day just monitoring the technology. When I spend a week of in-between times trying to get the monitoring working and sorting out boring mother boards I wonder...
So... what we need is a good technical system administrator who likes keeping systems going!
No comments:
Post a Comment