For compatibility with a new guest OS, I upgraded my ESXi to 5.5 today. During reboot, it crashes after a few seconds (it briefly flashes a message about starting up PCI passthrough on the yellow ESXi boot screen). The purple screen of death (PSOD) I get looks like this:
VMware ESXi 5.5.0 [Releasebuild-1474528 x86_64] #PF Exception 14 in world 32797:helper1-2 IP 0x4180046f7319 addr 0x410818781760 PTEs:0x10011e023;0x1080d7063;0x0; cr0=0x8001003d cr2=0x410818781760 cr3=0xb6cd0000 cr4=0x216c frame=0x41238075dd60 ip=0x4180046f7319 err=0 rflags=0x10206 rax=0x410818781760 rbx=0x41238075deb0 rcx=0x0 rdx=0x0 rbp=0x41238075de50 rsi=0x41238075deb0 rdi=0x1878176 r8=0x0 r9=0x2 r10=0x417fc47b9528 r11=0x41238075df10 r12=0x1878176 r13=0x1878176000 r14=0x41089f07a400 r15=0x6 *PCPU2:32797/heler1-2 PCPU 0: SSSHS Code start: 0x418004600000 VMK uptime: 0:00:00:05.201 0x41238075de50:[0x4180046f7319]BackMap_Lookup@vmkernel#nover+0x35 stack: 0xffffffff00000000 0x41238075df00:[0x418004669483]IOMMUDoReportFault@vmkernel#nover+0x133 stack: 0x60000010 0x41238075df30:[0x418004669667]IOMMUProcessFaults@vmkernel#nover+0x1f stack:0x0 0x41238075dfd0:[0x418004660f8a]helpFunc@vmkernel#nover+0x6b6 stack: 0x0 0x41238075dff0:[0x418004853372]CpuSched_StartWorld@vmkernel#nover+0xf1 stack:0x0 base fs=0x0 gs=0x418040800000 Kgs=0x0
When rebooting the machine now, it reverts to my previous version, ESXi 5.1-914609.
A bit of playing around revealed: This only happens if I am connected to the Intel AMT VNC server. If I connect after ESXi has booted up, it crashes a fraction of a second after I connect to VNC. Go figure! Apparently it’s not such a good idea to have a VNC server inside the GPU, Intel…
Before I figured this out, I booted up the old ESXi 5.1.0-914609 and even upgraded it to ESXi 5.1.0-1483097. Looking at dmesg revealed loads of weird errors while connected to the VNC server:
2014-02-13T11:23:15.145Z cpu0:3980)WARNING: IOMMUIntel: 2351: IOMMU Unit #0: R/W=R, Device 00:02.0 Faulting addr = 0x3f9bd6a000 Fault Reason = 0x0c -> Reserved fields set in PTE actively set for Read or Write. 2014-02-13T11:23:15.145Z cpu0:3980)WARNING: IOMMUIntel: 2371: IOMMU context entry dump for 00:02.0 Ctx-Hi = 0x101 Ctx-Lo = 0x10d681003
lspci | grep ’00:02.0 ‘ shows that this is the integrated Intel GPU (which I’m obviously not doing PCI Passthrough on).
So
- ESXi 5.5 panics when using Intel AMT VNC
- ESXi 5.1 handles Intel AMT VNC semi-gracefully and only spams the kernel log with dozens of messages per second
- ESXi 5.0 worked fine (if I remember correctly)
I have no idea what VMware is doing there. From all I can tell, out-of-band management like Intel AMT should be completely invisible to the OS.
Note that this is on a Sandy Bridge generation machine with an Intel C206 chipset and a Xeon E3-1225. The Q67 chipset is almost identical to the C206, so I expect it to occur there as well. Newer chipsets hopefully behave better, perhaps even newer firmware versions help.
Update November 2014: I just upgraded to the latest version, ESXi 5.5u2-2143827, and it’s working again. I still get the dmesg spam, but the PSODs are gone. These are the kernel messages I’m seeing now while connected via Intel AMT VNC:
2014-11-29T11:17:25.516Z cpu0:32796)WARNING: IOMMUIntel: 2493: IOMMU context entry dump for 0000:00:02.0 Ctx-Hi = 0x101 Ctx-Lo = 0x10ec22001 2014-11-29T11:17:25.516Z cpu0:32796)WARNING: IOMMU: 1652: IOMMU Fault detected for 0000:00:02.0 (unnamed) IOaddr: 0x5dc5aa000 Mask: 0xc Domain: 0x41089f1eb400 2014-11-29T11:17:25.516Z cpu0:32796)WARNING: IOMMUIntel: 2436: DMAR Fault IOMMU Unit #0: R/W=R, Device 0000:00:02.0 Faulting addr = 0x5dc5aa000 Fault Reason = 0x0c -> Reserved fields set in PTE actively set for Read or Write.
So basically, Intel AMT VNC is now usable again.
Update August 2015: ESXi 6.0 still spams the logs, no change over ESXi 5.5.
Hi,
I’ve got Jetway NF9E with ATM, using it on daily basis with no issues…ESXi 5.5 installed from scratch If you need any extra info, I’m happy to share.
Cheers,
I should add that my board is from the Sandy Bridge generation (C206 chipset, similar to the Q67), while yours is an Ivy Bridge Q77.
I’d appreciate if anyone with an older chipset could report whether it’s working for them. I’ll see if I can find out my Management Engine firmware version, perhaps this issue is even firmware specific. Or it might be due to customizations by the board vendor etc.
Same issue on Gigabyte Q67M-D2H-B3…
I found some more reports of this issue on VMWare’s forums. There, it has been reported on Intel Q67 (Cougar Point / Sandy Bridge / AMT 7.0), Q77 and C216 (Panther Point / Ivy Bridge / AMT 8.0).
https://communities.vmware.com/thread/464920
Glad I found your blog! I can confirm this issue also occurs on the Intel DQ67SW motherboard (latest BIOS version SWQ6710H.86A. 0066.2012.1105.1504) with Intel ME 7.1.52.1176. After hooking up a physical monitor and keyboard I was finally able to upgrade my machine from ESXi 5.1 to 5.5. Thanks!
my small experience – I’m using Jetway NF9E and played a bit with NICs (82579lm and 82574L).
with ESXi’s default install, 82579lm is not seen, therefore ESXi uses 82574L for management network.
As a result further installation of 82579lm driver into ESXi provides for 2nd NIC and there is no conflict between AMT and ESXi management.
Though if to reconfigure ESXi to use 82579lm for management (double check MAC in use), either AMT does not work (if started after ESXi boot) or ESXi crashes a few seconds after boot (if VLC is active).
BR
Same issue on Asus P8Q77-M
I have the same problem with my Intel DQ67OW. I also tried to install it via VNC and it crashed. I will try the installation with a real monitor and keyboard.
I have BIOS version (SWQ6710H.86A.0067.2014.0313.1347)
Some recent VMware KB entries suggest that the issue was either resolved in ESXi 5.5 Update 2 or introduced in ESXi 5.5 Update 2 and currently unresolved. The latter KB entry suggests disabling VT-d in the BIOS. Has anyone tried that? I use PCI passthrough and cannot disable VT-d.