Last weekend, the speech group noticed that one batch of their compute servers was running much slower than they expected (compared to other, older machines). Exactly one of the newest batch was running fast, but the remainder were going perhaps 50% slower than was expected. All of the relevant machines were Dell R630s running ubuntu 12.04 (precise).
During May, we had a couple of nasty power cuts (with associated over-voltage spikes).
One thing that was suggested to us was that the machines might not be running with their BIOS “system profile” set to “performance”.
Rebooting and going into the BIOS – the relevant entry claimed profile = performance, but on attempting to quit without having changed any settings, the BIOS mysteriously asked if it should “save the changes”. Doing so and rebooting led to a return to correct performance (checked before and after reboot with the matlab “bench” command).
The single machine that didn’t exhibit the problem, was one which was out of service and powered off during the power cuts because of an unrelated (disk) fault.
A further symptom that became apparent was that the affected machines had occasional logged kernel errors when they were in the broken state that looked like
Uhhuh. NMI received for unknown reason 29 on CPU 14.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
which seems consistent with the hypothesis that the BIOS got into a confused state during the power spikes/cuts and wound up failing to set the system profile to “performance” (despite reporting it as that in its setup screen).
Worth checking syslogs for the NMI error message above.
Leave a Reply
You must be logged in to post a comment.