Have you ever tried to power on a virtual machine only to get some cryptic error message about a file being locked? It’s a frustrating message to get, primarily because it’s so cryptic and provides so little useful information, or sometimes your machine will not power on returning an error message stating that a file cannot be found; but upon doing a listing of the virtual machine’s directory, all files appear to be present.

You can confirm this problem by examining vmware.log looking for references to a locked file, an example of this is:
DISKLIB-LINK : “/vmfs/volumes/47069165-c4ccb111-0513-001a4bbe40ba/ntadph1187m00/ntadph1187m00_1.vmdk” : failed to open (Device or
resource busy).

This indicates that a one or more members of the Virtual Machines cartel is running, but the host that is running it, has forgotten about it – hence the powered off state in VC.

When you encounter this situation the first thing to do is to check to see if the server is responsive via mstsc. If the server responds to an RDP session – we have encountered the easiest solution. Restart the management agents of all the hosts in the cluster; one at a time until VC reflects the correct state of the VM.

If the machine is unresposive, that means it is dead. We now have to go on a search and destroy mission and eradicate the VMs Cartel.

On the service console of a host in the affected cluster, change to the VMs directory. Execute the command sudo vmkfstools -D ./name-of-locked-file.

The command will execute, but will not print any output. To get the output of the command, execute sudo tail /var/log/vmkernel

You should see several lines of the log, similar to the log segment below. We are looking for the information that is bolded below – this is the MAC address for vmnic0 of the lockholder, sometimes vmnic1 (or whatever you have vSwif0 configured for). You need to execute ifconfig vmnic1 or ifconfig vmnic0 on each host in the cluster, until you locate a match.

Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)FS3: 130: <START ntadph1187m00-flat.vmdk>
Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)Lock [type 10c00001 offset 16216064 v 77, hb offset 3406336
Mar 18 10:34:47 vmvsph6120m00 vmkernel: gen 3913, mode 1, owner 470beb69-e4d9d99c-805a-0016357c8e59 mtime 13395847]
Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)Addr <4, 105, 8>, gen 24, links 1, type reg, flags 0×0, uid 0, gid 0, mode 100600
Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)len 7534018560, nb 7185 tbz 0, zla 3, bs 1048576
Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)FS3: 132: <END ntadph1187m00-flat.vmdk>

Once you have a match, execute ps -efwww | grep Virtualmachine Name. If you get a line of output similar to:
root 10793 1 0 Mar12 ? 00:00:15 /usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user -@ pipe=/tmp/vmhsdaemon-0/vmxf863abd25c13a492;vm=f863abd25c13a492 /vmfs/volumes/47069165-c4ccb111-0513-001a4bbe40ba/ntadph1187m00/ntadph1187m00.vmx

You can safely kill -9 PID (10793 in this example) – please be sure you have performed the previous check for VM responsiveness. Kill -9 essentially tears the legs out from under the process – it is not a clean kill. A normal kill PID, may work for this, but I have had much better sucess with kill -9. At this point you should be able to power the VM on.

If your ps -efwww does not return a vmx line, it means the host attempted to shut the VM down, but was unable to stop the entire Cartel – the only option left at this point is to reboot the host.