Amazing what can go wrong just by rebooting the server. Apart from the troubles with glibc suddenly LDAP wasn’t working and MySQL refused to start. At first I suspected the new kernel was responsible, but after many hours of troubleshooting it turns out to be a whole series of other things.
First, openssl 0.9.8 broke everything linked to 0.9.7 due to API changes, which resulted in much recompilation and waiting.
Next, MySQL needed to be started manually and the “mysql-fix-privileges” command executed to upgrade the permissions tables. One wonders why it couldn’t do this automatically. More by chance and sheer persistence I eventually came across a blog post explaining how to fix the problem, and 2 minutes later it was up and running again.
The system does appear to be stable and in working condition will all packages updated, however, this has certainly been a -5 score for Gentoo as a production/server platform.
I just added an additional 1 GB of memory to the server, fixed the glibc problem from the last post, and upgraded the kernel (to a hardended 2.6.17 with vserver support). Let me know if you spot any problems with the new setup.
Updating glibc of the host environment on the server seems to have broken some library dependency, causing basically everything to stop working, with the exception of processes already executing.
The weird thing about having a Linux-VServer installation is that only the root operating system is affected – all the virtual machines have their own files and aren’t affected. However, I cannot open new SSH sessions to the machine (as it runs in the root environment), and cron jobs are also likely failing (but all the interesting ones are in the guests anyway).
I need to get up early tomorrow and so cannot be bothered to boot from a CD to fix this right away, but do expect the server to be unavailable for an hour some time tomorrow (presumably late afternoon) as I perform the required recovery steps.
The folks at Verizon apparently haven’t figured out how to create a redundant network. They have a piece of equipment identified as 0.so-7-0-0.XL1.MTL1.ALTER.NET which keeps crashing every 5 seconds, which effectively ruins any attempt at making a persistent connection between here and anything on the other side of that broken hardware item (such as, most of canada, and probably other parts of north america). Apparently they also don’t know of SNMP since it’s been doing it for hours and they still haven’t fixed it.
This just cost me a wad of money (hence the primary reason for me to rant about their blatant incompetence here). That said, if you live in north america and are unable to connect here – now you know why. Go call that 800-FCK-VRZN number!
Someone messed up and severed yet another fiber link this afternoon, causing the server to be offline from roughly 16:30. Repairs are underway and I expect the problem will be fixed shortly.
The server will be offline from september 9th 2006 20:00 until september 10th 12:00 (all times CEST = UTC+2).
The reason for this is that Global Connect needs to perform some major roadwork and repairs on the recently severed fiber links. Sorry all for the invonvenience.