Amazing what can go wrong just by rebooting the server. Apart from the troubles with glibc suddenly LDAP wasn’t working and MySQL refused to start. At first I suspected the new kernel was responsible, but after many hours of troubleshooting it turns out to be a whole series of other things.
First, openssl 0.9.8 broke everything linked to 0.9.7 due to API changes, which resulted in much recompilation and waiting.
Next, MySQL needed to be started manually and the “mysql-fix-privileges” command executed to upgrade the permissions tables. One wonders why it couldn’t do this automatically. More by chance and sheer persistence I eventually came across a blog post explaining how to fix the problem, and 2 minutes later it was up and running again.
The system does appear to be stable and in working condition will all packages updated, however, this has certainly been a -5 score for Gentoo as a production/server platform.
I just added an additional 1 GB of memory to the server, fixed the glibc problem from the last post, and upgraded the kernel (to a hardended 2.6.17 with vserver support). Let me know if you spot any problems with the new setup.
Updating glibc of the host environment on the server seems to have broken some library dependency, causing basically everything to stop working, with the exception of processes already executing.
The weird thing about having a Linux-VServer installation is that only the root operating system is affected – all the virtual machines have their own files and aren’t affected. However, I cannot open new SSH sessions to the machine (as it runs in the root environment), and cron jobs are also likely failing (but all the interesting ones are in the guests anyway).
I need to get up early tomorrow and so cannot be bothered to boot from a CD to fix this right away, but do expect the server to be unavailable for an hour some time tomorrow (presumably late afternoon) as I perform the required recovery steps.
The folks at Verizon apparently haven’t figured out how to create a redundant network. They have a piece of equipment identified as 0.so-7-0-0.XL1.MTL1.ALTER.NET which keeps crashing every 5 seconds, which effectively ruins any attempt at making a persistent connection between here and anything on the other side of that broken hardware item (such as, most of canada, and probably other parts of north america). Apparently they also don’t know of SNMP since it’s been doing it for hours and they still haven’t fixed it.
This just cost me a wad of money (hence the primary reason for me to rant about their blatant incompetence here). That said, if you live in north america and are unable to connect here – now you know why. Go call that 800-FCK-VRZN number!
Someone messed up and severed yet another fiber link this afternoon, causing the server to be offline from roughly 16:30. Repairs are underway and I expect the problem will be fixed shortly.
The server will be offline from september 9th 2006 20:00 until september 10th 12:00 (all times CEST = UTC+2).
The reason for this is that Global Connect needs to perform some major roadwork and repairs on the recently severed fiber links. Sorry all for the invonvenience.
Someone broke roughly 120 fiber cables this afternoon (at roughly 15:20), which has severed the link between my ISP and the world.
Global Connect (who owns the fiber cables) estimates that repairs will last until noon tomorrow (thursday), but as the fibers are repaired individually there is hope that the server will be reachable some time before that.
I’ll be using this opportunity to add some additional RAM to the server (from 2GB to 3GB), hoping that this will finally be enough even for the insatiable Java apps.
The server went down saturday at roughly 2:00 am and wasn’t rebooted until I came back sunday at 3:45 am. Unless these issues with 64-bit and/or Linux-VServer (still not too sure which of the two is the root cause) are resolved real soon, I’ll have to look at alternatives for the server OS. This is clearly not sustainable for much longer.. sigh!
It turns out my ISP was upgrading their network yesterday – replacing the core routers with new hardware as well as migrating to a redundant setup – but ran into severe problems, causing the main link to be unavailable for several hours (from roughly 19:00 yesterday up until this morning). Let’s hope they don’t have too many upgrades like that in store for us..
My primary ISP seems to have some kind of problem (my link has been cut a few times today, even if only for a few minutes each time). You can continue to access this site using the alternate link on http://mirror.mertner.com.