Stability seems to be back!

After eliminating the faulty hardware controller, I did see one more kernel crash. As a consequence, I changed the kernel CPU target to generic i686 (as opposed to AMD64), and the server has now been running smoothly for 8 days.

So, if anyone else is having stability problems with Linux when using an AMD optimized kernel, I can only recommend recompiling it in generic x64 mode. I think it’s sad that a change like this can cause so many problems – it just goes to show that software development practices still have a long way to go before software “just works”. The kernel was compiled using gcc 4.1.1 and this is the most likely culprit to blame.