Well, this morning was a bit busy. We had friends over who left around 10 something, and I finally got to sleep around 1 or so. Then at 0250 my cell phone buzzed. “Main server air intake too high: 77 deg F” Well, let’s see what happens in the next few minutes… same message. As I’m getting dressed to come downstairs and have a look at things, it continues to buzz every few minutes as the environmental sensor sends new traps to the management station. I login to it, and sure enough it’s warm – and even one of the machines in the room complained that its drives were getting hot inside the case. While I’m looking at things, I get the magical page: “Server room *WAY* too hot, shutting down Hydra” [our Beowulf cluster]. After watching to be sure Hydra shut down properly, I put on my shoes and left to head into the office, arriving just before 0400.
Sure enough, it was damned warm in there. The ecosaire (formerly Hiross) unit was cycling rapidly, an indication that it’s trying to cool and getting high head pressure enough to set off the auto-resetting control we installed so it wouldn’t trip permanently for a simple burp of air in the chilled water system. The Liebert unit was tripped on high head pressure, since we never installed such a device in it (and since it can run in “econo cool” mode, where it circulates chilled water through a coil inside the unit, more often than not we don’t need the compressor in there). So I turned off the ecosaire and called public safety to find out what’s going on with the chilled water.
As it turns out, there was some kind of catastrophic leak in the system, which forced them to turn off the pumps. That happened around 0200, but we didn’t see a problem until later because our unit is on a secondary loop within the building (so it was still circulating the whole time, just not getting any colder). I knew when the officer said “There’s a hazmat team on site” that it was probably going to be a long night. But actually, when I asked if she knew how long it would take, she said the system should be up shortly. So I stood and watched the thermometer on the pipes, and when it dropped from 70-something down to the 50s, I turned the A/C back on. Reset the high head pressure switch on the Liebert and it was happy, and between the two units the room cooled from the ~80F that it was down to ~65F in the amount of time it took me to have a smoke.
Fired up Hydra, and noticed that something seemed odd about the Liebert as the heat generators in the room spun back to life. Seemed like it would come on for a few seconds, and then shudder violently and turn the compressor off again. When doing this, the display would get all weird, like it had some sort of internal computer failure, but no errors or alarms were thrown up on the panel. After a couple times of doing this (one involving the lights in the room dimming, much to my dismay) I turned Hydra back off as well as the Liebert itself and called public safety again. “Would you please page whomever from HVAC is on duty tonight? I’m having some concerns about one of the units here…” “Hah.. page them? I’ll call them on the radio, they’re all already here.” Duh, if the chilled water died, a lot of units on campus might need attention. So they showed up, and of course I couldn’t make the unit act up again. Oh well.. they left eventually, and one of them called Liebert to have a technician come out and look over the unit (sure enough, he found something way out of adjustment that kept kicking the compressor off.. and I have to wonder if one particular moron “fixed” that setting to cause the problem).
At this point, it was about 0610, and in another 50 minutes OIT would be along to change our building network connections around – providing us with a gigabit feed to campus once we upgrade our firewall, and setting us up with a nice Cisco gigabit switch for future gigabit ports to things. So I putzed around in my office for a little bit, read some emails and websites, and went out for a smoke at 0657 finding one of OIT’s carts out front. By the time I walked around to the basement door and made it to the switch room, the cables were being dressed as the work was already finished. Ahh, something that went smoothly at least.
So, I made a couple tests, and verified everything was okay, and finally headed for home around 0735. Not too bad for two hours sleep I guess.