Docunext

  • About

Another MCE Machine Check Error

November 18th, 2007

One of my servers threw another MCE this morning (Machine Check Error), and I guess it is being caused by ECC memory. I was able to capture the error codes:

CPU 0: Machine Check Exception: 0000000000000004
Bank 4: b63e200200080813 at 00000000048d7bc0
Kernel panic - not syncing: CPU context corrupt

And using mceparse, this is the explanation:

CPU 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(4): b63e200200080813 @ 48d7bc0
        External tag parity error
        Uncorrectable ECC error
        CPU state corrupt. Restart not possible
        Address in addr register valid
        Error enabled in control register
        Error not corrected.
        Bus and interconnect error
        Participation: Local processor originated request
        Timeout: Request did not timeout
        Request: Generic error
        Transaction type : Instruction
        Memory/IO : Other

http://lists.us.dell.com/pipermail/linux-poweredge/2003-February/006430.html

Probably not a good idea, but I've added nomce to my boot options.

http://www.redhat.com/docs/manuals/linux/RHL-9-Manual/install-guide/ch-bootopts.html

Yearly Indexes: 2010 2003 2005 2006 2007 2008 2009 2004 2011 2012 2013 2015 2019 2020