As you will all have noticed, the server has been increasingly unstable lately and finally came to a point where it wouldn’t even stay alive for a day. Clearly not viable and very frustrating.
It was time to try something new, so I decided to spend some down time experimenting with kernels. However, I was unable to even finish compiling a kernel before it crashed again. Because I was now using a local console, I started spotting an odd “Disabling Interrupt #185” line being emitted shortly before every crash. Sitting just beside the server also allowed me to hear the faint click-click of a harddisk being reset, so the obvious conclusion was that this might be hardware related after all.
So, I pulled out the Promise controller used for the RAID array, which has been offline for a while anyway. I was instantly able to finish the kernel compilation, and the “would not last an hour”-kernel has now been operating for a whooping 3 hours.
This makes me hopeful that the cause for the instability problems finally has been located, and that brighter days lie ahead.
I’ll keep you updated, but right now I’m off to find a replacement for the defunct controller card..