Comment Dissection of a vmdk - 01/8/09

Okay so I’m WAY late on this post, but it’s my site, so that’s OK.

Today’s post is going to examine the components of a vmdk.  A virtual machine’s VMDK is made up of two files – a binary file (In conversation I refer to this as the flat-VMDK), and a text file (text-VMDK).  Together these files define the hard disk of a virtual machine; one pair of files per hard drive.

On ESX, it is a safe assumption that the flat-VMDK is a thickly provisioned file.  That means however large you define the disk to be, that is how big the file will be – from the get go.  By default ESX creates “zeroedthick” disks, in other words: a disk that occupies all allocated space immediately, and this space is zeroed out (but not immediately).  If you are using iSCSI or Fibre Channel storage (VMFS) – this is your best bet, and is all that is “officially supported”.  There are a couple other “thick” options, that zero the space immediately, or don’t zero it at all.

Another option that you may consider using, or encounter is “thin”.  This is the default for virtual machines created on NFS storage.

Some other options that exist:

  • 2gbspare – a disk that is more or less thinly provisioned, with multiple flat components, each with a maximum size of 2 GB
  • rdm/rdmp – Raw disk mapping.  Allows you to assign a VM raw physical disk space.  If you are running ESX 3.5, I would suggest checking out NPIV instead.

I can’t go into to much detail regarding RDM as I have never worked with it, but it’s there if you want it.

Now time to look at the text-VMDK. 

Here is an example from one of my hosts

# Disk DescriptorFile
version=1
CID=e39002c2
parentCID=ffffffff
createType=”vmfs”

# Extent description
RW 104857600 VMFS “florida_1-flat.vmdk”

# The Disk Data Base
#DDB

ddb.toolsVersion = “7302″
ddb.virtualHWVersion = “4″
ddb.uuid = “60 00 C2 93 54 19 90 f8-f1 8f 7b 25 7b 72 e2 3a”
ddb.geometry.cylinders = “6527″
ddb.geometry.heads = “255″
ddb.geometry.sectors = “63″
ddb.adapterType = “lsilogic”
ddb.thinProvisioned = “1″

There is a lot of information packed into this tiny little file, rather than bore you with all the details, I’ll cover what you should know.

First of all there is the CID- this line is a unique identifier for the flat-vmdk this text file describes.  For machines that do not contain a snapshot, or no using sparse VMDKs, it’s just a piece of information (more on that when I talk about deltas tomorrow – I promise).

Next there is the parentCID- if this is the base VMDK, it will always be ffffffff, if it is a child VMDK the value of the field will be the CID of the next VMDK in the chain (more on this tomorrow).

The rest of the file contains information to describe to ESX how to present the disk geometry wise and other fun stuff to the Virtual Machine.

Tomorrow I’ll do a brief explanation of Delta vmdks, and how the ESX assembles a chain of VMDKs.

Comment What is a Virtual Machine? - 12/2/08

 

Have you ever browsed the directory of a virtual machines datastore and wondered what are all the files in the directory?  What do they do, what do they mean, and which ones could I live without and still recover a functioning virtual machine.

The answer can be long and complicated, but I’m going to attempt to answer it today, and over the next few days will attempt to go into more detail on the meanings of each file, and what (if anything) you might find of value inside each file.

I took a snapshot of this VM prior to taking this screenshot, in order to show the majority of files that are important.

Here is a brief run down of of all the files here

  • vmware.log
    • This is the current log file for this VM.  If the machine is running, this is where ESX is logging to, if the machine is stopped or suspended, this is the most recent log.
    • The VMware-#.lgo are log files from previous instances of this machine, a new log file is created each time the VM is started, be it power on, resume (from suspend), or vmotion.
  • .vmx and .vmxf
    • These files contain the configuration information for the VM, including: memory size, hard disks, CDROM information, network configuration etc.
    • You can sometimes add directives to these files to take advantage of interesting, and sometimes undocumented features – similar to what I discussed regarding using alternate datastore for snapshots
  • .vmsd
    • This is essentially the snapshot database, it contains information regarding all current snapshots, or if there are not current snapshots, the most recent one.
  • .vmsn
    • This file contains the memory state of a VM at the time a snapshot was taken, do not confuse this with the .vmss file described below
  • .vmss (not pictured)
    • This file contains the memory state of a Vm at the time it was suspended.
  • -flat.vmdk
    • This file is the physical file that contains that actual data stored on the drive of the VM.  There is one for each drive attached. 
    • By default this is a thickly provision file on SCSI volumes, thinly provisioned on NFS.
  • -delta.vmdk
    • This is a thinly provisioned “flat” vmdk, that is a log file for the .vmdk it is a child of.  It is allocated in 16MB heaps, and can only grow as large as the parent disk.  There will be one -delta.vmdk per drive, per snapshot.
  • .vmdk
    • This is a plain text descriptor file, that describes the characteristics of the “flat” vmdk, it contains information such as the SCSI device type, geometry, provisioning type, etc.
    • There is one for each hard disk attached to the virtual machine
  • -######.vmdk
    • This is the plain text descriptor file, that describes the characteristics of an associated -delta.vmdk.
    • There is one for each physical hard disk per snapshot
  • .hlog
    • This file is a brief logfile that helps track migrations
  • .nvram
    • This file contains the systems BIOS NVRAM data.

Hopefully, this provide some insight into the make up of a virtual machine.  Tomorrow or the next day I will dive into the .vmx and .vmxf files.

 

Until then!

    Comment VMware Snapshot alternate Datastore - 11/20/08

    Frustrated by the amount of disk consumed by multiple machines with Snapshots on the same LUN?  Got a server with a VDMK that consumes all but a tiny fraction of the datastore it is located?

    Here’s a potential solution leveraging a little documented VMX statement.  Please be aware that, while uses this method works from a functional point of view; I would tread carefully as it’s behavior could change unexpectedly in future versions of ESX and/or Virtual Center.

    Because this statement by default places the virtual machine swap file in the same alternate datastore as the snapshot, I recommend only executing this if you are running ESX 3.5, which allows you to control the swap file placement.

    I won’t hold your hand while executing this, but here is a outline of all the steps required, besure to use the datastores GUID and not the label.

    Steps

    1. Shutdown the Virtual Machine cleanly.
    2. Log onto the Service console of the host the VM is registered as root, or an id with access to root level permissions.
    3. Edit the vmx – sudo vi /vmfs/volumes/DatastoreofVM/VMname/VMname.vmx
    4. Remove the following lines from the vmx
      • sched.swap.derivedName
      • workingDir (if present)
      • Save the file
        • [Esc] [colon] w
    5. Re-add the following lines as follows
      • workingDir = “/vmfs/volumes/SNAPSHOT-DATASTORE-GUID/VMname/”
      • Don’t forget the trailing /
      • Save the file and exit
        • [Esc] [colon] wq
    6. Create the snapshot directory and set the correct permissions
      • sudo mkdir /vmfs/volumes/SNAPSHOT-DATASTORE-LABEL/VMname
      • sudo chown root.root /vmfs/volumes/SNAPSHOT-DATASTORE-LABEL/VMname
      • sudo chmod 775 /vmfs/volumes/SNAPSHOT-DATASTORE-LABEL/VMName

    Checkout
    In Virtual Center, locate the VM, right click on it and Edit Settings…, in the options tab you should observe that the Working directory parameter is set to [SNAPSHOT-DATASTORE-LABEL]/VMName.

    I have executed these steps, sometimes having to unregister the machine prior to beginning, and re-registering it after completing the changes, and others just making them and powering the VM on.

    If you choose to do this; in addition to support concerns, don’t forget to think about redundancy of access to the datastore, the performance of the datastore, etc. 

    Personally, the best use of this would be to place Snapshots on a high performance NFS mount that can be monitored for space consumption and expanded at will.

    Comment Locked VMDK? - 11/7/08

    Have you ever tried to power on a virtual machine only to get some cryptic error message about a file being locked?  It’s a frustrating message to get, primarily because it’s so cryptic and provides so little useful information, or sometimes your machine will not power on returning an error message stating that a file cannot be found; but upon doing a listing of the virtual machine’s directory, all files appear to be present.
     
    You can confirm this problem by examining vmware.log looking for references to a locked file, an example of this is:
    DISKLIB-LINK  : “/vmfs/volumes/47069165-c4ccb111-0513-001a4bbe40ba/ntadph1187m00/ntadph1187m00_1.vmdk” : failed to open (Device or
    resource busy).
     

    This indicates that a one or more members of the Virtual Machines cartel is running, but the host that is running it, has forgotten about it – hence the powered off state in VC.
     

    When you encounter this situation the first thing to do is to check to see if the server is responsive via mstsc.  If the server responds to an RDP session – we have encountered the easiest solution.  Restart the management agents of all the hosts in the cluster; one at a time until VC reflects the correct state of the VM.
     
    If the machine is unresposive, that means it is dead.  We now have to go on a search and destroy mission and eradicate the VMs Cartel.
     
    On the service console of a host in the affected cluster, change to the VMs directory.  Execute the command sudo vmkfstools -D ./name-of-locked-file.
     
    The command will execute, but will not print any output.  To get the output of the command, execute sudo tail /var/log/vmkernel
     
    You should see several lines of the log, similar to the log segment below.  We are looking for the information that is bolded below – this is the MAC address for vmnic0 of the lockholder, sometimes vmnic1 (or whatever you have vSwif0 configured for).  You need to execute ifconfig vmnic1 or ifconfig vmnic0 on each host in the cluster, until you locate a match.
     
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)FS3: 130: <START ntadph1187m00-flat.vmdk>
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)Lock [type 10c00001 offset 16216064 v 77, hb offset 3406336
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: gen 3913, mode 1, owner 470beb69-e4d9d99c-805a-0016357c8e59 mtime 13395847]
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)Addr <4, 105, 8>, gen 24, links 1, type reg, flags 0×0, uid 0, gid 0, mode 100600
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)len 7534018560, nb 7185 tbz 0, zla 3, bs 1048576
    Mar 18 10:34:47 vmvsph6120m00 vmkernel: 160:18:34:52.424 cpu1:1038)FS3: 132: <END ntadph1187m00-flat.vmdk>
     
     
    Once you have a match, execute ps -efwww | grep Virtualmachine Name.  If you get a line of output similar to:
    root     10793     1  0 Mar12 ?        00:00:15 /usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user -@ pipe=/tmp/vmhsdaemon-0/vmxf863abd25c13a492;vm=f863abd25c13a492 /vmfs/volumes/47069165-c4ccb111-0513-001a4bbe40ba/ntadph1187m00/ntadph1187m00.vmx
     
    You can safely kill -9 PID (10793 in this example) – please be sure you have performed the previous check for VM responsiveness.  Kill -9 essentially tears the legs out from under the process – it is not a clean kill.  A normal kill PID, may work for this, but I have had much better sucess with kill -9.  At this point you should be able to power the VM on.
     
     
    If your ps -efwww does not return a vmx line, it means the host attempted to shut the VM down, but was unable to stop the entire Cartel – the only option left at this point is to reboot the host.

    Bear