Today it got worse. I bricked the box
15 of the patches are kernel modules. "smpatch" won't let you update them unless you are in single user mode. Um, ok, whatever. Single-user mode has no networking component. Well, the ethernet device may or may not be "plumbed" (enabled), but there is no inetd, no sshd, ie, no way in.
So one needs console access. This can be through the vga console (eeprom console = text) or serial console (eeprom console = ttya).
Well, the f**C**C*king LOM (light out management) device has old firmware so the active X control only works in the IE that shipped with NT4. F*cking lovely. Won't work in firefox, opera or any recent IE. So no remote vga console. I tried to configure the serial console (following Sun's docs) but it never worked. As soon as the boot loader (grub) passed control to the kernel, the serial console went dead.
So I decided to flash the firmware of the ILOM. The errata listed that the update makes the ILOM work with Firefox using a java applet (yeah!). The errata also stated that the main board BIOS is tied to the ILOM bios. Flash one, and the other gets updated too. Ok. Making me nervous, but what the hell, right?
So I flashed the bios remotely using the web admin for the ILOM. Rebooted the server. No server. Gone. the ILOM happily boots up with the new firmware. Remote console works great in the web browsers, except that it now reports "no video signal". Called the colo and had support monkey #17 check the VGA console. Dead. Checks the ILOM logs. The ILOM was on the new firmware, but for some unknown reason, the main-board never updated. The mainboard never POSTS.
So we tried to reflash the ILOM back to the old firmware. This failed, but miraculously, the ILOM survived a reboot. Tried again and it flash back to the old version. But the mainboard is still dead. There went 12 hours that I'll never get back.
So tomorrow morning at 6:00am, one of my coworkers will drive 80 miles to the colo with a magic free-dos bootable USB stick that contains a magic BIOS repair utility. The procedure: de-rack the server, pop the chassis lid, fiddle with some jumpers to enable an emergency alternate boot BIOS, boot from USB stick, flash BIOS and pray. Replace jumper, re-rack server and hope that the normal BIOS boots up. Oh, btw, the procedure will most likely nuke the EEPROM / NVRAM. The boot device raid config is stored in NVRAM.
I HATE SUN HARDWARE! All of this "enterprise grade" crap and its USELESS!
And to top it off there is a different co-worker who is "I would never update a server. never flash a bios. its not on the public internet, no need to ever patch it". This guy just doesn't get it. He's been on my case about it all day long. Drives me nuts.
At least this is a tripply redundant web server. The load balancer is just routing traffic to the others. No customer impact. The CIO might finally spring for a Sun support contract... until he sees the price sheet again.