The server this site is hosted on was a bit unstable of late. I’m running the server from my home, and therefor I’m solely responsible to keeping the thing running.
Randomly, running services would crash and I was on a hunt figuring out the issue. Sometimes the server would be totally unresponsive, even when using it with a monitor and keyboard hooked up to identify the problems.
It would kernel panic every now and then, and at first I assumed it was some driver that was causing the issues. Since I changed some parts recently, like adding an extra SATA PCI Express card in order to have more disks attached to the motherboard. I also swapped the Intel onboard GPU drivers to have better transcoding support. And lastly I added a software RAID-5 to the mix.
So there were some moving parts in the setup of the server which I expected to be causing issues, since the problems started soon after these changes.
Assumption, the mother of all f-ups
After a lot of diagnosing, I figured none of the changes I initially thought of causing these issues were the culprits.
So now I figured this would have to be some hardware failure of some sort. After firing up Memtest86 the cause of all the issues were quickly explained; bad memory.
I have seen issues with bad memory before on Windows systems, I’ve never tried to diagnose the symptoms on Linux. Linux seems to handle faulty memory way better than any MS OS does, from my experience. It keeps the machine running (at least most of the time) where on Windows it would’ve blue screened long before.
Running Memtest it would show errors almost immediatly after startup. One of the sticks seems to be in such a bad state, it was completely unusable.
After removing the bad stick the server seems to be doing fine again!
Shout out to Azerty
I bought the RAM from Azerty, more then 2 years ago. Azerty offers a standard warranty of 2 years on the sticks I bought (and on most products they carry, I believe).
After contacting them and explaining the issue, they made no problem replacing the bad memory. Since I was already past the warranty date, they didn’t have to do this.
They sent a replacement kit of which arrived the day after. Even before I’d sent them the faulty memory for inspection.
So cheers to Azerty for helping me out and having the server back up and stable in no time!