A former co-worker of mine recently blogged about trying to do a disk expansion on a server with a snapshot.  As a general rule – this is something you want to avoid doing.  Fortunately, the VI Client is smart enough to know this is a bad thing and will prevent you from doing it.

Unfortunately, the service console is not smart enough, and follows the general Unix rule of, well shucks if you have root level permissions, you must  know what you are doing.

So let’s say you try to expand a VMDK of a Virtual machine with a Snapshot in the VI Client, you’ll get an error message.  There are two different error messages you’ll see (as of VC Build: 119598 – VC 2.5 U3).

If the machine is powered off you’ll get:
snapshot-poweredoff

If the machine is powered on you’ll get:
snapshot-poweredon

Hopefully at this point, you will do the intelligent thing and do what you should have done initially, and Commit (Delete All) or Revert (Rollback) any snapshots this machine has.  If you happen to be less intelligent, and log on to the Service Console, and execute a
vmkfstools -x #gb /vmfs/volumes/datastore1/OMGIRDUMB/OMGIRDUMB.vmdk.  You’ll be pleased to know that your vmdk is now expanded.

What you won’t be so pleased to know is that your virtual machine won’t power on anymore.  Your snapshot for that drive is invalid.  The specific error message you’ll get will be along the lines of “Cannot open the disk /vmfs/volumes/datastore1/OMGIRDUMB/OMGIRDUMB-000001.vmdk or one of the snapshot disk it depends on.”  ROFLMAO, you are officially screwed now!

Haha, okay I am done laughing at your expense and am ready to help you keep your job.

You have two ways to resolve this issue, manually rehome your disks using the VI Client to the base disks and not the deltas; and then use vmware-cmd X removesnapshot and revert the VM to the state when the snapshot was taken, with the added bonus of an enlarged disk.

The better, albiet more complicated option is to fix your retardedness and save the server.  First things first, using the VI client, Delete All (Commit) snapshots.  This task will progress, but it will generate an error (it is likely to be the “Internal error” viewed above).

At this point you need to verify that VC believes that this machine has no snapshots, and that all VMDKs except the one you attempted to expand, point to the base disk.  This ensures that all the other drives in the system are up to the current state, and the snapshots state of the VM reflects this.

Next, you need to log back on to the Service console.  cd to the home directory of the virtual machine.  Again, verify that the -delta files for all VMDKs have been commited, except for your naughty vmdk.

A little background.  When you have a chain of vmdk’s, you can use vmkfstools -i child.vmdk target.vmdk, to consolidate the chain of vmdks into one disk.  We will be using this command to fix our problem.  If you attempt this straight-away, you will get an error message that ranges from “cannot read beyond end of disk”, to “parent VMDK has been modified”

In order to fix this problem we need to make sure that the parent VMDK and all the children involved are correctly tied together.  If you recall, last month, I wrote an entry that described the contents of a VMDK.  The entries we care about today is the parentCID value, and the Extent Description.

Start off by doing a cat /vmfs/volumes/OMGIRDUMB/OMGIRDUMB-000001.vmdk, and make note of the parentCID value, and the number that comes after RW in the Extent Description section.  Now, do vi /vmfs/volumes/OMGIRDUMB/OMGIRDUMB.vmdk, and ensure that the CID matches the CID of the child.  Also, update the value in this extent description to match the value in the -000001.vmdk.

At this point you should be able to successfully execute vmkfstools -i, and generate a new base disk.  You’ll now want to execute a vmkfstools -x #gb vmdk, to update the extent description field.  Finally, you will want to add this new disk to your VM, and remove the disk that points to the old delta chain.

Power your VM on, and verify that you have your data.  You may encoutner some issues with the filesystem inside of this disk (i.e. the disk will be 100 GB, but the file system will report it’s former size, and not be able to be modified); so I strongly recommend that you do a backup/restore to a new vmdk, or robocopy or something.

Once you have successfully recovered the data, clean up your vmdk mess so you don’t get confused.

Hopefully this helps you out some day, if you ever need help in a situation like this, or have questions about this process, feel free to contact me.