Saturday, December 10, 2005

A week is a long time...

They say a week is a long time in politics... but it seems that two weeks is even longer in the work I do, since it's two weeks ago that I last wrote in this blog.

A couple of weeks ago the main email server in London had been attacked and just been rebuilt. Well... nearly rebuilt as we found out later. It was not that the attacker had left anything behind - we had been worried that he might have left a trojan or or two. Trojan programs are the computer equivalent of the Trojan horse in Greek history - they are programs hidden in the system that allow malicious attackers later access, without actually being obvious as being something nasty. There weren't any trojan programs left.

But... the server was totally unreliable for about 10 days. Servers like the ones we run have as part of their Operating System a method of self-protection. What that means if if they start running out of resources they automatically kill off normal programs so that the server keeps going. So, for instance, if it was running short of memory the server would kill off the email service and keep going. This stops us having what Windows users call the 'blue screen of death' but can be extremely irritating to find you have to restart the email service or whatever reguarly. But we couldn't work out why we were running out of memory. The server has inside it two computers [processors] and about four times the amount of memory a home computer has. It shouldn't run out of memory!

In this trauma we thought the memory was faulty so we got the leasing company to change the memory. Because we were having memory problems they tested the memory before installing and after they changed it we still had memory problems.

Eventually we traced the fault down to when the leasing company had re-installed the system they had installed the wrong 'kernel'. The kernel is the heart of the Operating System and the kernel they had installed was designed for very old processors that could not handle as much memory as we had installed in the server. We remotely installed a new kernel. This worried us as if the kernel didn't install correctly the server would crash and we would have to request another total install of the Operating System. However, it did install correctly and we now have the server behaving correctly, working faster and not running out of memory.

We took the opportunity of the rebuild of the server to implement some security enhancements that we had been planning to do around now anyhow. But we had been planning to do them on another server to test before installing them on the 'live' one. It also required writing a new user interface for part of the system... all to do done as fast as possible so that 'normal service will be resumed as soon as possible' as it used to say on the TV screens while I was a kid growing up.

In between all this I was co-ordinating a project for a new very large website we will launch in February. We have had a programmer here for a month [he leaves today actually] and there has been the need for a lot of thought about how the underlying structure will work so that it will be expandable in the future.

And... the mobile phone text message system is just about to go into phase two of development. We have proven it is both needed and doable, but we need a more reliable and expandable system, that would also be able to be easily installed in other locations around the world. We had a planning meeting about that leading onto research oabout equipment we can use in the future. Phase two starts as soon as possible...

Between all this I have been editing a video training series for the organization we grew out of. Eventually we hope it will be an interactive DVD. Training videos are always very difficult to do. The reason for this is that they have to be interesting. I know all films and videos have to be interesting, but training films are somehow more difficult to get to be interesting.

No comments: