Comment Snapshots and Disk Expansions - followup - 04/7/09
Here is a good follow up on my post on snapshots and disk expansions
The post references the following VMware KB Articles
Here is a good follow up on my post on snapshots and disk expansions
The post references the following VMware KB Articles
Last night I migrated this website to a new web server. Well I wish it was just that:
new “physical” server
CentOS4->CentOS5
Apache upgrade
mySQL upgrade
php upgrade
If you notice anything out of place, please drop me a note.
Windows 7 seems to be one of the big things to talk about lately, so I’ll join in.
I’ve been using Windows 7 as my sole desktop OS at home for about 3 months. I have used a couple different builds.
Some of the things I love:
As a day-to-day OS it’s been quite good. Quicken 2008 runs fine, office 2007, my triathlon software (SRMwin, and WKO+). When I started with build 7000, I had a quite a few crashes and some issues with one driver, as well as major issues if I had any anti-virus software installed.
I upgrade to build 7048 - and it was rock solid - no issues at all. I have since upgraded to 7057, which seems to be almost a half step back from 7048 in stability, but has a little bit of new fluff as compared to 7048.
All-in-all Windows 7 is awesome, and I will likely upgrade to it once it is GA.
It just so happened that today I was browsing the VMware blog for the Windows VI Toolkit, and came across a post the described in a fair amount of detail why get-harddisk was slow for me.
Looks like the culprit is that I typically use get-view to pull my information out of the API, and as such I had to pass a VM name into get-harddisk, which results in some extremely slow object name translation voodoo.
This:
get-harddisk -vm (get-vm -name s092464).name
was over a minute slower than this:
get-harddisk -vm (get-vm -name s092464).
So imagine that times 1200.
That said, I think my script will still stay with the method I wrote about last week, simply because get-view suits my data needs better, plus it’s WAY faster:
get-vm -name s092464 = 2 minutes
get-view -viewtype virtualmachine -filter @{”Name” = “s092464″} = 5 seconds
I have been working a lot with the Windows VI Toolkit lately, it is a very straight forward way of working with your Virtual Infrastructure programmatically.
One of the things I have noticed is that there are a few cmdlets, that are exceedingly slow. These cmdlets aren’t so bad to use if you are working with a single VM, but if you start doing some scripting against hundreds or thousands of machines, whoa boy - I mean hit enter, go make a cup of coffee, drink it, drop the kids off at the pool, and you’ll still have time to spare.
Thus far the worst cmdlets I’ve come across are get-vm and get-harddisk. get-vmhost, and get-cluster are pretty pokey as well, but fortunately even in a large environment, there aren’t that many hosts and there are even fewer clusters so they don’t annoy me with their pokeyness much. Yet.
Here’s an example of how slow these cmdlets are, I have a script that I wrote that looks at my VI, and grabs every single VM that fits a certain naming pattern, which business rules dictate are VDI instances. Next, I take the VM collection, and pull some interesting data: name, memory allocation, guestOS, IP, VLAN (portgroup), and VMDK size. I knew from the get go that get-vm was slow, so I was using get-view. get-view doesn’t return quite as friendly an object, but if you are willing to explore, it has everything you need.
So my collection of ~1200 VMs, comes back, and I do a little dance with it, then I get the portgroup:
$tempvm.vlan = (Get-View -Id $vm.network).name fairly speedy return
Then I get the VMDK size:
get-harddisk -VM $vm.name | foreach-object -process {$tempvm.diskGB += $_.CapacityKB}
$tempvm.diskGB = [math]::round(($tempvm.diskGB / 1048576),0)
if ($tempvm.diskGB -eq 0)
{
$tempvm.diskGB = “N-A”
}
Let’s jsut say you can go take a nice nap, but it returns the total size of the VMDKs attached to the system.
I then scheduled this script to run via AutoSys, and it happily generated a CSV of all my VDI instances, and the info I wanted - in 35 hours!
I was a bit perturbed by the run time, but figured it was a batch job, and I could adjust the start times, and I’d just live with it.
Not any longer! Today I had an epiphany of sorts while exploring the get-view object returned for a VM - it was if the skies had parted, God spoke to me, while a chorus of Angels sang. Or I could have seen a Narwhal, I’m not really sure, but it was that cool.
The epiphony caused me to modify two lines of code - the temp.vlan line, and the get-harddisk line - replacing them with:
foreach ($device in ($myvm.config.hardware.device)) {if ($device.backing.devicename -like “VLAN*”) {$tempvm.vlan=$device.backing.devicename}}
and
foreach ($device in ($vm.config.hardware.device)) {if ($device.backing.filename -like “*vmdk”){$tempvm.diskGB += $device.capacityinKB}}
After the changes, I force started my job and waited eagerly for it to complete - 2 minutes and 15 seconds later. Same result, only 35 hours less of waiting.
And if you don’t believe me, here is the run log from AutoSys
Original Code:
| [FORCE_STARTJOB] | Fri Mar 13 11:50:24 2009 | ||
| Starting | Fri Mar 13 11:50:25 2009 | ||
| Running | Fri Mar 13 11:50:28 2009 | ||
| Success | Sat Mar 14 23:21:53 2009 |
Revised Code:
| Starting | Fri Mar 20 14:25:13 2009 | ||
| Running | Fri Mar 20 14:25:16 2009 | ||
| Success | Fri Mar 20 14:27:25 2009 |
Not long ago, I posted a powershell script, that would set a standard memory and CPU reservation across an entire virtual center instance.
As cool as that is, let’s make it even cooler, let’s do it as a scheduled task and so you don’t have to enter credentials!
The first thing you need to do is create a text file that contains an encrypted password.
read-host -assecurestring | convertfrom-securestring | out-file C:\securestring.txt
This will store the password you type into a text file, that is only decryptable by the user that ran the script.
Next we make a little modification of the script I posted earlier (the modfied script is below, with updated parts in italics)
add-pssnapin VMware.VimAutomation.Core -erroraction Silentlycontinue
$vcServer=”virtualcenter”
$user=”Cooluser”
$credentialFile=”C:\securestring.txt”
$pass = cat $credentialFile | convertto-securestring
$credentials = new-object -typename System.Management.Automation.PSCredential -argumentlist $user,$pass
$vcConnection=connect-viserver -server $vcServer -credential $credentials -warningaction Silentlycontinue
$vms=get-view -viewtype VirtualMachine -server $vcConnection$vms | % {$spec = new-object VMware.Vim.VirtualMachineConfigSpec;
$spec.memoryAllocation = New-Object VMware.Vim.ResourceAllocationInfo;
$spec.memoryAllocation.Shares = New-Object VMware.Vim.SharesInfo;
$spec.memoryAllocation.Shares.Level = “normal”;
$spec.memoryAllocation.Limit = -1;
$spec.memoryAllocation.Reservation = 512;
$spec.cpuAllocation = New-Object VMware.Vim.ResourceAllocationInfo;
$spec.cpuAllocation.reservation = 0
$spec.cpuallocation.Shares = New-Object VMware.Vim.SharesInfo;
$spec.cpuAllocation.Shares.Level = “normal”;
$spec.cpuAllocation.Limit= -1;
Get-View($_.ReconfigVM_Task($spec))}
disconnect-viserver -Confirm:$false
The best part of this script is, if you do run it interactively, it won’t display a certificate warning message, and it will disconnect your VC session - kind of like “Be kind, rewind”
A second short, quick, but to the point post. This post is a disclaimer as to the purpose of my blog. I am in favor of the free sharing of information.
I don’t care if that information is about Triathlons, Computers, Home Theater, or narwhal’s - I feel strongly that when you know something better then someone else you shouldn’t hold it above them, or hoard the knowledge, but share it. Karma is a cold hearted mofo if you ask me.
Maybe that makes me a bit of an idealist, and/or a bit naieve, but I think Jesus summed it up best when he said “Give a man a fish and feed him for a day; teach a man to fish and feed him for a lifetime.”
Anyways, what was I getting at - yes, my blog. I will continue to do my best to post often, and with quality - but the posts will be a collection of my thoughts about real situations I encounter in my day-to-day computer shenannigans - a sudo cat /var/log/brain > blog if you will. I can’t promise it will be easy to understand, and I definitely don’t promise anything I post will be a turn key solution, but I will promise to answer any questions I generate (publicly of course), and admit when I don’t know something, and point you at somebody I think might.
It totally pisses me off that when you use Exchange 2007 in conjunction with Windows Server 2008 - there is no built in ability to back up the exchange data nicely.
Appearantly in SBS they added the ability to do it, and have been touting a plugin for Windows Server Backup for 8 months or so, but still nothing that I have seen.
Me, I’m a big fan of standards. Standard processes, standard this, standard that - standards are a great way to make your life easier when dealing with hundreds or thousands of objects.
Inside of my Virtual Infrastructure, I’m a huge fan of ignoring the micro level of things, and paying attention to things at a macro level. One of the ways I do that, is by ensuring that every virtual machine is configured with identical reservations - regardless of how much resources they are allocated. If a given virtual machine (or group of virtual machines) requires a guaranteed amount of resources, I create a resource pool and toss them in there.
Not only does this prevent a “rouge” VM from stealing resources it may not need at a moment in time, it calls out the big hitters and puts them out in the open.
Our virtual machine creation process defines all new virtual machines with our standard reservation, but over the course of weeks and months, these drift. Machines change, an admin “tests” something and forget’s to change it back, or we might even change the standard. Here is a script that will locate all virtual machines in your infratructure, and update them to whatever makes you happy.
add-pssnapin VMware.VimAutomation.Core -erroraction Silentlycontinue
$vcServer=”virtualcenter”
$credentials = get-credential
$vcConnection=connect-viserver -server $vcServer -credential $credentials
$vms=get-view -viewtype VirtualMachine -server $vcConnection$vms | % {$spec = new-object VMware.Vim.VirtualMachineConfigSpec;
$spec.memoryAllocation = New-Object VMware.Vim.ResourceAllocationInfo;
$spec.memoryAllocation.Shares = New-Object VMware.Vim.SharesInfo;
$spec.memoryAllocation.Shares.Level = “normal”;
$spec.memoryAllocation.Limit = -1;
$spec.memoryAllocation.Reservation = 512;
$spec.cpuAllocation = New-Object VMware.Vim.ResourceAllocationInfo;
$spec.cpuAllocation.reservation = 0
$spec.cpuallocation.Shares = New-Object VMware.Vim.SharesInfo;
$spec.cpuAllocation.Shares.Level = “normal”;
$spec.cpuAllocation.Limit= -1;
Get-View($_.ReconfigVM_Task($spec))}
A former co-worker of mine recently blogged about trying to do a disk expansion on a server with a snapshot. As a general rule - this is something you want to avoid doing. Fortunately, the VI Client is smart enough to know this is a bad thing and will prevent you from doing it.
Unfortunately, the service console is not smart enough, and follows the general Unix rule of, well shucks if you have root level permissions, you must know what you are doing.
So let’s say you try to expand a VMDK of a Virtual machine with a Snapshot in the VI Client, you’ll get an error message. There are two different error messages you’ll see (as of VC Build: 119598 - VC 2.5 U3).
If the machine is powered off you’ll get:

If the machine is powered on you’ll get:

Hopefully at this point, you will do the intelligent thing and do what you should have done initially, and Commit (Delete All) or Revert (Rollback) any snapshots this machine has. If you happen to be less intelligent, and log on to the Service Console, and execute a
vmkfstools -x #gb /vmfs/volumes/datastore1/OMGIRDUMB/OMGIRDUMB.vmdk. You’ll be pleased to know that your vmdk is now expanded.
What you won’t be so pleased to know is that your virtual machine won’t power on anymore. Your snapshot for that drive is invalid. The specific error message you’ll get will be along the lines of “Cannot open the disk /vmfs/volumes/datastore1/OMGIRDUMB/OMGIRDUMB-000001.vmdk or one of the snapshot disk it depends on.” ROFLMAO, you are officially screwed now!
Haha, okay I am done laughing at your expense and am ready to help you keep your job.
You have two ways to resolve this issue, manually rehome your disks using the VI Client to the base disks and not the deltas; and then use vmware-cmd X removesnapshot and revert the VM to the state when the snapshot was taken, with the added bonus of an enlarged disk.
The better, albiet more complicated option is to fix your retardedness and save the server. First things first, using the VI client, Delete All (Commit) snapshots. This task will progress, but it will generate an error (it is likely to be the “Internal error” viewed above).
At this point you need to verify that VC believes that this machine has no snapshots, and that all VMDKs except the one you attempted to expand, point to the base disk. This ensures that all the other drives in the system are up to the current state, and the snapshots state of the VM reflects this.
Next, you need to log back on to the Service console. cd to the home directory of the virtual machine. Again, verify that the -delta files for all VMDKs have been commited, except for your naughty vmdk.
A little background. When you have a chain of vmdk’s, you can use vmkfstools -i child.vmdk target.vmdk, to consolidate the chain of vmdks into one disk. We will be using this command to fix our problem. If you attempt this straight-away, you will get an error message that ranges from “cannot read beyond end of disk”, to “parent VMDK has been modified”
In order to fix this problem we need to make sure that the parent VMDK and all the children involved are correctly tied together. If you recall, last month, I wrote an entry that described the contents of a VMDK. The entries we care about today is the parentCID value, and the Extent Description.
Start off by doing a cat /vmfs/volumes/OMGIRDUMB/OMGIRDUMB-000001.vmdk, and make note of the parentCID value, and the number that comes after RW in the Extent Description section. Now, do vi /vmfs/volumes/OMGIRDUMB/OMGIRDUMB.vmdk, and ensure that the CID matches the CID of the child. Also, update the value in this extent description to match the value in the -000001.vmdk.
At this point you should be able to successfully execute vmkfstools -i, and generate a new base disk. You’ll now want to execute a vmkfstools -x #gb vmdk, to update the extent description field. Finally, you will want to add this new disk to your VM, and remove the disk that points to the old delta chain.
Power your VM on, and verify that you have your data. You may encoutner some issues with the filesystem inside of this disk (i.e. the disk will be 100 GB, but the file system will report it’s former size, and not be able to be modified); so I strongly recommend that you do a backup/restore to a new vmdk, or robocopy or something.
Once you have successfully recovered the data, clean up your vmdk mess so you don’t get confused.
Hopefully this helps you out some day, if you ever need help in a situation like this, or have questions about this process, feel free to contact me.