profile for Gajendra D Ambi on Stack Exchange, a network of free, community-driven Q&A sites

Tuesday, December 30, 2014

AIO Powercli Script for Esxi hosts of a vCenter

watch out for the updates on
https://docs.google.com/document/d/19wPb-pUe9dwTzzvwk1O_4ygDNqn0PXN0o0yzf5kahAU/edit?usp=sharing

#AIO Script[tested ok]###
#Start of Script
############################################
#Configure DNS
#Configure NTP,
#Configure Domain
#Configure Dump Collector
#NFS settings
#TCP settings
#VAAI settings
#compiled by: MrAmbiG
############################################
#prompt for vCenter details
$VC = Read-Host “vCenter IP?”
$VCuser = Read-Host “vCenter administrator username?”
$VCpass = Read-Host “vCenter Password?”
#connect to vCenter
Connect-VIServer $VC -User $VCuser -Password $VCpass -SaveCredentials
Write-Verbose "connecting to vCenter" -foregroundcolor Cyan
#prompt for DNS, Domain, Dump Collector, NTP information
$dns1 = read-host "Primary DNS address?"
$dns2 = read-host "Secondary DNS address?"
$ntp1 = read-host "Primary NTP address?"
$ntp2 = read-host "Secondary NTP address?"
$domain = read-host "The Domain name (hint: vce.local)?”
$dump = read-host "Dump Collector's address (hint: VUM IP)?"
$NFSsettings = read-host "If you want to include NFS settings then type 1 or else type 0"
$TCPsettings = read-host "If you want to include TCP settings then type 1 or else type 0"
$VAAIsettings = read-host "If you want to include the VAAI settings then type 1 or else type 0"
#listing hosts
$esxHosts = get-VMHost
#looping the script for each host
foreach ($esx in $esxHosts)
{
Write-Host "Configuring DNS and Domain Name on $esx" -ForegroundColor Magenta
Get-VMHostNetwork -VMHost $esx | Set-VMHostNetwork -DomainName $domain -SearchDomain $domain -DNSAddress $dns1 , $dns2 -Confirm:$false
Write-Host "Configuring NTP Servers on $esx" -ForegroundColor Green
Add-VMHostNTPServer -NtpServer $ntp1 , $ntp2 -VMHost $esx -Confirm:$false
Write-Host "Configuring NTP Client Policy on $esx" -ForegroundColor Green
Get-VMHostService -VMHost $esx | where{$_.Key -eq "ntpd"} | Set-VMHostService -policy "on" -Confirm:$false
Write-Host "Restarting NTP Client on $esx" -ForegroundColor Blue
Get-VMHostService -VMHost $esx | where{$_.Key -eq "ntpd"} | Restart-VMHostService -Confirm:$false
Write-Host "Configuring Dump Collector on $esx" -BackgroundColor Blue -ForegroundColor Gray
$esxcli = Get-EsxCli -vmhost $esx
$esxcli.system.coredump.network.set($null,“vmk0",$dump,6500)
$esxcli.system.coredump.network.set(1)
$esxcli.system.coredump.network.get()
}
###NFS Settings
if ($NFSsettings -eq '1')
{
Write-host "Including NFS Settings" -BackgroundColor Blue -ForegroundColor Green
foreach ($esx in $esxHosts)
{
Set-AdvancedSetting -VMHost $esx -Name NFS.HeartbeatMaxFailures -Value 10 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name NFS.HeartbeatFrequency -Value 12 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name NFS.HeartbeatTimeout -Value 5 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name NFS.MaxVolumes -Value 256 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name NFS.HeartbeatTimeout -Value 5 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name NFS.HeartbeatDelta -Value 5 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name Disk.UseDeviceReset -Value 0 -Confirm:$false
Write-Host -ForegroundColor Green "VAAI Setting Configured  on $esx"
}
}
else
{
Write-host "NFS Settings" -ForegroundColor RED
}
###TCP Settings
if ($TCPsettings -eq '1')
{
Write-host "Including TCP Settings" -BackgroundColor Green -ForegroundColor Gray
foreach ($esx in $esxHosts)
{
Write-Host "Configuring TCP Settings on $esx" -ForegroundColor Magenta
Set-AdvancedSetting -VMHost $esx -Name Net.TcpipHeapSize -Value 32 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name Net.TcpipHeapMax -Value 512 -Confirm:$false
}
}
else
{
Write-host "TCP Settings" -ForegroundColor RED
}
###VAAI Settings
if ($VAAIsettings -eq '1')
{
Write-host "Executing VAAI disable" -ForegroundColor Green
foreach ($esx in $esxHosts)
{
Write-Host -ForegroundColor Yellow "Configuring VAAI Settings on $$esx"
Set-AdvancedSetting -VMHost $esx -Name Disk.UseDeviceReset -Value 0 -Confirm:$false                                   
Set-AdvancedSetting -VMHost $esx -Name DataMover.HardwareAcceleratedMove -Value 0 -Confirm:$false
Set-AdvancedSetting -VMHost $esx -Name DataMover.HardwareAcceleratedInit -Value 0 -Confirm:$false                                        
Set-AdvancedSetting -VMHost $esx -Name VMFS3.HardwareAcceleratedLocking -Value 0 -Confirm:$false
Write-Host -ForegroundColor Green "VAAI Setting Configured  on $esx"
}
}
else
{
Write-host -ForegroundColor RED "Excluding VAAI disable section"
}
#End Of Script

set DNS, NTP, Domain via powercli on all hosts of a vCenter

#Start of Script
############################################
#Configure DNS
#Configure NTP,
#Configure Domain
#compiled by : MrAmbiG
############################################
#prompt for vCenter details
$VC = Read-Host “vCenter IP?”
$VCuser = Read-Host “vCenter administrator username?”
$VCpass = Read-Host “vCenter Password?”

#connect to vCenter
Connect-VIServer $VC -User $VCuser -Password $VCpass -SaveCredentials
Write-Verbose "connecting to vCenter" -foregroundcolor Cyan

#prompt for DNS, NTP, Domain information
$dns1 = read-host "Enter Primary DNS IP"
$dns2 = read-host "Enter Secondary DNS IP"
$ntp1 = read-host "Enter Primary NTP IP"
$ntp2 = read-host "Enter Secondary NTP IP"
$domain = read-host "Enter the Domain name ex:- vce.local"

#listing hosts
$esxHosts = get-VMHost

#looping the script for each host
foreach ($esx in $esxHosts)
{
Write-Host "Configuring DNS and Domain Name on $esx" -ForegroundColor Magenta
Get-VMHostNetwork -VMHost $esx | Set-VMHostNetwork -DomainName $domain -DNSAddress $dns1 , $dns2 -Confirm:$false

Write-Host "Configuring NTP Servers on $esx" -ForegroundColor Green
Add-VMHostNTPServer -NtpServer $ntp1 , $ntp2 -VMHost $esx -Confirm:$false

Write-Host "Configuring NTP Client Policy on $esx" -ForegroundColor Green
Get-VMHostService -VMHost $esx | where{$_.Key -eq "ntpd"} | Set-VMHostService -policy "on" -Confirm:$false

Write-Host "Restarting NTP Client on $esx" -ForegroundColor Blue
Get-VMHostService -VMHost $esx | where{$_.Key -eq "ntpd"} | Restart-VMHostService -Confirm:$false
}

Write-Host "b33 is all done here, have a good one; End of Script" -foregroundcolor Blue
#End of Script

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Wednesday, December 24, 2014

Syslog Server and Powercli

#Sets a SysLog server on the Host virtual machine host
Set-VMHostSysLogServer -SysLogServer '<syslog ip>:133' -VMHost Host
#End
----------------------------------------------------------------------------------------------------------------------------
#######script for looping#######
#start of script
#replace IP with the vCenter IP
connect-viserver IP
Write-Host "b33 is connected to your server" -foregroundcolor Cyan -backgroundcolor DarkMagenta
#replace the <syslog server ip> with the correct sys log server ip
#default port is 514 but u may replace it with a custom port if desired.
Get-VMHost | Set-VMHostSysLogServer -SysLogServer <syslog server ip> -SysLogServerPort 514
Write-Host "b33 updated the syslog on all the hosts in return for a smile on your face" -foregroundcolor DarkYellow -backgroundcolor DarkBlue
#End of Script
----------------------------------------------------------------------------------------------------------------------------
https://docs.google.com/document/d/1jTrin_2iHqE7QTwbnSLHiejAANMrvl4HaJ_sqYk7ciY/edit?usp=sharing

Unattended vSphere Client Installation

If you ever wanted to install the vsphere client unattended then here is the simple solution.
start /wait <path to the file inside the vpx folder>\VMware-viclient.exe /q /s /w /L1033 /v" /qr /L*v \"%TEMP%\vmvcc.log\""

Sunday, November 30, 2014

The frigging flash plugin on fedora linux with firefox





Issue:  

  •  Install flash player on linux (since officially they arent supporting new version on linux anymore
  • make firefox work with it
  • make fullscreen work on your browser 
Solution:
install flash
 
## Adobe Repository 32-bit x86 ##
rpm -ivh http://linuxdownload.adobe.com/adobe-release/adobe-release-i386-1.0-1.noarch.rpm
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux
 
## Adobe Repository 64-bit x86_64 ##
rpm -ivh http://linuxdownload.adobe.com/adobe-release/adobe-release-x86_64-1.0-1.noarch.rpm
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux
 
su
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux
yum -y install flash-plugin 
exit 


make web browsers detect it
copy the libflashplayer.so to 
/usr/lib64/flash-plugin   (for 64bit OS)
/usr/lib/flash-plugin     (for 32bit OS)

[Don't reboot yet]


make the fullscreen work

yum install devilspie
$ mkdir ~/.devilspie
 
create the file "~/.devilspie/flash-fullscreen-firefox.ds" and paste the content below
 
(if
(is (application_name) "plugin-container")
(begin
(focus)
)
) 
 
use gnome-session-properties to put devilspie in the autostart.
 
 
 
 

Monday, November 24, 2014

2/3 Node Cluster and HA Admission Control

Everybody knows how much we love HA from VMware which is one of the key selling point of VMware and an awesome fail over feature to reduce unplanned downtime. There is however something which many of the VMware users overlook or give less importance to and that is admission control settings. Now Let us see what the heck is this first ( I know you know it but there are some who don't so skip a paragraph if you want).
Admission control as the name itself suggests it controls the admission of VMs into a host when one of the co host goes down unexpectedly and all of it's VMs are now being restarted onto different hosts. To put it simply if a BUS 'A' (HOST A) and 'B' (HOST B) are going from X to Y where both can accommodate 52 passengers (VMs) each but carrying only 30 in each. Let us say the BUS B breaks down (HOST B) and the passengers (VMs) too want to travel by BUS A. If the company policy (admission control) is to not allow such a movement (admission control set to default or enabled) then these 30 passengers will be stranded there itself till the BUS B is repaired (HOST B is repaired and brought up). If the company policy okays such an adjustment where all passengers may not travel comfortably but they will however travel from X to Y (All VMs will not get all the resources promised but they will be powered on).
Now If you have a 2 node cluster and you have somehow set the admission control to default or enabled then think again. The default HA setting is to tolerate 1 host failure (In other words always make sure to reserve enough resources to accommodate all the VMs of 1 host) but if 1 of the 2 host fails then there is only 1 host running and there is no other host where HA can reserve resources and also admission control will stop the 2nd host's VMs to be powered on on the 1st host if the 2nd host goes down if it violates the HA rules and most probably it will. So please test your 2 node cluster for fail over with default HA settings (which most probably will fail). You have 2 solutions.
1) Disable admission control
2) Add a 3rd host
I believe the 1st option is much more sensible than the 2nd one. please let me know if you agree or differ from what I just said but before you do please check out this http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007006 and I know it talks about DPM too but the article is mostly true when you aren't using the DPM or even DRS. I myself have always advised my clients in the past against enabling admission control on a 2node cluster or a cluster with resource crunch you may differ
but if you haven't tested your cluster then yes you will suffer. [I like rhymes ;) ]

Not enough resources to failover this virtual 
machine. vSphere HA will retry when resources 
become available.
warning
1/12/2011 7:11:08 AM
EXCHANGE

Friday, August 22, 2014

manage your ntfs with your linux OS

Issue: the ntfs partitions werent automounted on boot. I needed a nice GUI tool to do so.
Resolution :
get the pysdm application.
in my case it was

yum list pysdm
Loaded plugins: langpacks, protectbase
0 packages excluded due to repository protections
Available Packages
pysdm.noarch                         0.4.1-7.fc20                         fedora

yum install -y pysdm
Installed:
  pysdm.noarch 0:0.4.1-7.fc20

Now launch the pysdm and select the partition, click on assistant and check or uncheck on what you want from the partition or drive.
[this is a note for me to look back when i need it]

Sunday, August 17, 2014

Install driver/firmware on Esxi

Method 1: vCli (command line)
  1. Find your driver on VMware HCL (Hardware Compatibility List) http://www.vmware.com/resources/compatibility/search.php
    example on how to find a driver http://ambitech.blogspot.com/2012/12/how-to-find-driver-for-device-for.html?q=find+driver
  2. upload the offline-bundle.zip file (which contains the driver, firmware or both) to a shared datastore simply using the datastore browser or to /tmp directory of the Esxi using the WinScp.
  3. Enter Maintenance Mode for the Esxi host in question.

  4. Run this command to install drivers using the offline bundle (this requires an absolute path):

    esxcli software vib install –d /path/offline-bundle.zip

    For example:

    esxcli software vib install –d /var/log/vmware/offline-bundle.zip
    Note:
    If this command fails, unzip the file and try running the same command. Use localcli instead of esxcli if the hostd is not responding. For example

    localcli software vib install –d /var/log/vmware/offline-bundle.zip
    5. reboot.
    6. Exit maintenance mode.
NOTES
If you don't give full path of the error then you will face the following error
"MetadataDownloadError" reading:
Could not download from depot at zip:/var/log/vmware/*update name*.zip?index.xml, skipping (('zip:/var/log/vmware/*update name*.zip?index.xml', '', "Error extracting index.xml from :/var/log/vmware/*update name*.zip: [Errno 2] No such file or directory: '/var/log/vmware/*update name*.zip?index.xml'"))
url = zip:/var/log/vmware/*update name*.zip?index.xml
Please refer to the log file for more details.
solution:

http://ambitech.blogspot.com/2012/06/esxi-5-patch-installation-fails-with.html?q=full+path+driver
 Method 2.  Update Manager
  1. Find your driver on VMware HCL (Hardware Compatibility List) http://www.vmware.com/resources/compatibility/search.php
    example on how to find a driver http://ambitech.blogspot.com/2012/12/how-to-find-driver-for-device-for.html?q=find+driver
  2.  From vCenter Server, go to Home > Update Manager.
  3. Click the Patch Repository tab.
  4. Click the Import Patches link at the top right of the screen.
  5. Click Finish. The async driver is added to the patch repository.
  6. Create a Host Extension baseline and remediate the ESXi host. For more information, see the Update Manager  
  7. Administration Guide.
  8. Reboot the ESXi host once the remediation is complete.

 

Friday, July 18, 2014

false GPU overhearting warning on esxi 5.5

Issue :
there are 2 clusters with 10 hosts in each, amounting a total of 20 esxi 5.5 bl460c gen8 in c7000 enclosure.
All of them are showing a temperature warning for “add in card 10 35 gpu 2” under hardware status.
There are no graphics cards.

What worked :
Clear the ipmi logs from the vcli command
localcli hardware ipmi sel clear

What didnt work :
since it is happening on 20 hosts at the same time and the VMware OS or the hardware cannot go bad on all the servers at the same time so it must be a false positive.
/etc/init.d/hp-ams.sh stop
disconnect and reconnect the host, refresh the hardware status page but no go.
Connected directly to 1 of the host via the vSphere client but no go.
https://communities.vmware.com/message/2379288
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2076665
Suggested to install the VMware esxi 5.1u1 and then the heartbleed bugfix as per the kb article and monitor it for the error.
The error is not instantaneous.

update 24/7/2014
apparently the fix was temporary and it seems we are still chasing the cause and solution. 

Thursday, July 17, 2014

Mixed SIOC environment causes DRS migrations to fail

Issue : The VM migration failed continuously for a vm causing it to freeze and power off.

Why ?: Apparently few of the hosts of the cluster didnt have the enterprise or required license to enable SIOC and thus the DRS was trying just trying to migrate it to the host which unfortunately didnt have the SIOC enabled because it didnt have the enterprise license.
So if the target host doesnt support SIOC, source host from which the VM is moving has SIOC then you will face this problem. I am yet to see such more incidents to conclude that DRS is not  SIOC aware. May be some giants should shed some light on it. Obviously they werent using SIOC on all of the datastores but you see the conflict of interest when some hosts are relying on SIOC to make decision on the same datastores on which the SIOC isnt enabled/used for other hosts of the same cluster.

Resolution: upgrade the remaining few hosts with the enterprise license and enable SIOC on them too.

ESXi 5.x datastore browsers shows no data

Issue: The hosts are being slowly upgraded to 5.1 from 4.1 (clean install) but the upgraded hosts dont see any data when browsed from the datastore browser.
If we browse from the datastore browser launched from the vsphere client then it shows up the data for all the hosts.
If we browse from the datastore browser launched from the vcenter server then it doesnt show up the data for few of the 5.1 hosts.
the storage is being migrated from clariion to ibm Storwize V7000 FC.

Resolution : Disable ATS on the storage.
Disable vmfs3.HardwareAcceleratedLocking in advanced settings.
kb.vmware.com/kb/2006858
Disable VAAI too
http://kb.vmware.com/kb/1033665
Some of volumes mounted ATS only mode - designed for older storage techniques
To change this, it would require taking the datastore out of production.

What didn't work:
rescaning all the hosts hasnt helped.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010832
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015650
there are no special characters in the folde names since they are all alpha numeric.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005566
./sbin/service.sh restart
refreshed the storage and tried browse the datastores but no go.
disconnect and reconnect the host, rescan all but no go.
http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=san&productid=22034&deviceCategory=san&details=1&keyword=v7000&isSVA=0&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc
the drivers/firmware are up to date.
readded the hosts but it works temporarily and the issue comes back again after a reboot.

HP Blades with ESXi 5.x fail to display hardware status!

Issue : When you go to hardware status of the esxi 5.x host we get an error "hardware monitoring service on this host is not responding or not available".

Cause : iLO 2 firmware version 2.07

Resolution: Upgrade iLO 2 firmware version to 2.09/2.15/2.25 or higher.

Image: HP Custom Esxi 5.x

What didn't work:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1013080
it is not a DMZ site.
/etc/init.d/sfcbd-watchdog restart
got an error in the command saying
< sh: bad number
sh: you need to specify whom to kill >
reconnected the host and checked the hardware status but got an error
"hardware status communication error with the server"
tried to restart the sfcbd-watchdog process but the command got hung.
https://communities.vmware.com/thread/458686
checked firewall setting by looking Configuration (Tab) -> Security Profile -> Firewall; according to the Firewall page, the CIM Server service runs on both TCP ports 5988 and 5989.
disabled symantic end point, restarted the inventory service, did a update/refresh of the hardwae status tab but no go.
the hardware status plugin 5.5 is installed and enabled in the vcenter server.
http://www.everything-virtual.com/2011/03/hardware-status-not-displaying-on-vsphere-client-fix/
but no go.
tried restarting the cim server but it failed with an error saying the remote server took too long.
tried stopping the cim server but it failed with the same error.
issue seems to be with the servers but not vcenter since vcenter is working with 2 hosts.
Applied all the HP patches and udpates for the host via the update manager but no go.

Tuesday, July 15, 2014

Redhat p2v produces ustable VMware VM

Issue : customer did a p2v of a redhat 6.x and the vm had troubles in booting up. the boot process failed saying "no fstab.sys , mounting internal default"

Resolution : Disable the SElinux and p2v produces a nice virtual vm of the physical redhat counterpart.

Tuesday, July 8, 2014

vCenter Datastore Browser is empty

Issue : When a client of ours upgraded their hosts to 5.1 from 4.1 they saw that they can't see the data inside the IBM V7000 FC storage datastores through datastore browser but they can see the data if they connect directly to the esxi host using the vsphere client.

What worked : Remove and Re add the host to the vcenter.

Why? : The fact that it works when connected directly to the host but not through the vcenter gives us a hint that it is the vcenter's agent (vpxa) which is installed in the host is not sending proper information to the vcenter. We somehow needed to reinstall it and the only way to do it was re add the host to the vcenter server.

What didn't work or might work for you:
./sbin/services.sh restart
the above command will restart all services in the host including the vpxa (vcenter's agent).
disconnect, reconnect the host to the vcenter server. This will reconfigure the vpxa agent in the host.

What I referred:
http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=san&productid=22034&deviceCategory=san&details=1&keyword=v7000&isSVA=0&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010832
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015650
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005566

BTW someone else resolved the same issue by powering on all the hosts back on which were put in standby mode by the DPM...really crazy!







Monday, July 7, 2014

Esxi error saying tmp is full

Issue :  you get an esxi error saying tmp is full but if you clear it then it will happen again after some time.
cause : The storage adapter's logs fill up the space.
Here http://wp.alphaparallel.com/2013/10/vmware-esxi-ram-disk-full-due-to-adaptecs-arcconf-bug/
it was
 but in my client's case it was Qlogic.
we udpated the qlogic driver/firmware and boom, the log cleared off itself and all is well now.

Fedora 20 or RHEL 7 Can't mount ntfs drives

Issue : My windows 8 & Fedora 20 dual boot adventure went soar when the fedora 20 wasn't mounting the ntfs drives of windows 8.
What worked : ntfsfix /dev/sdx1
(of course you need to have your ntfs-3g and ntfsprogs installed in your fedora for this to work)
what didnt work : disable fast restart in windows and do a clean shutdown.
mount the drives as read only.
Note :
If you are using fedora 20 or higher then i highly recommend you install the following repositories first
http://tecadmin.net/top-5-yum-repositories-for-centos-rhel-systems/
+
rpmfusion too if the above are not serving your thirst.

Friday, July 4, 2014

Change the default multipathing policy to round robin on Esxi 5.x

This is something for me to look back and check if i need it in future. how to change the default multipathing policy on the host for all the existing and future luns/datastores.
find out the satp on your esxi
esxcli storage nmp satp list

set the default mpp (multi pathing policy) to  round robin
esxcli storage nmp satp set --default-psp VMW_PSP_RR --satp VMW_SATP_EQL

 set the default mpp (multi pathing policy) to fixed
esxcli storage nmp satp set --default-psp VMW_PSP_FIXED --satp VMW_SATP_EQL

set the default mpp (multi pathing policy) to MRU [Most Recently Used]
esxcli storage nmp satp set --default-psp VMW_PSP_RR --satp VMW_SATP_EQL

source: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017760
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011340

Friday, June 20, 2014

migrating vcenter 5.x express DB to SQL cluster

Issue: We need to migrate the windows 2008 express edition database of the vcenter 5.1 to a remote sql cluster (cluster of 3 windows 2008 r2 sql servers).

What worked :
Backup the DB of the existing express edition db and restore it to the primary sql server using the restore from the backup and of course you can do that after you create a dummy db in the primary sql server with the same name as the original db (ex:VCDB)  and then right click on it, restore from backup and point it to the backup files of the express db. make sure you use a username with a sysadmin role.
now please use the odbc and point the sso and vcenter server to the new sql cluster ip (not individual sql servers).
this is for me to lookback and use it when i need it in future.

Thursday, June 12, 2014

VMware HA agent unreachable :(

Issue : one of the host has an ha error message
"the vsphere ha agent on the host cannot be reached.
this condition indicates that
1)a situation exists which is preventing the agent on the host from running or existing the uninitialized state or
2)vcenter server is unable to connect to any of the agents running on the cluster hosts due to a networking failure or total of cluster failure."

What really worked :
Disable HA on the cluster.
restarted all hosts in the cluster (one by one after moving off all the VMs).
remove hosts from the cluster.
Enable HA on the cluster and make sure check ssl cert is enabled.
add hosts back to the cluster.

What should have worked:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1019200
all hosts in the cluster have the same management network configuration.

it is a new installation (3 weeks old) and it hasnt worked properly since then.
forward and reverse nslookup works from the vcenter to the hosts.
using telnet made sure the 902 port is open to the esxi hosts from the vcenter server.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001596
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1003735
updated the vcenter ip under runtime settings, reconnected the host but the operation timed out.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001493
the vpxa.cfg has the right ip addresses.
ntp and time sync are fine.
there are no advanced configurations set for ha.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011974 but no go.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2008609
fdm.log has Error message "[ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip" showing in /var/log/fdm.log on ESXi hosts.
http://tech.zsoldier.com/2012/06/esxi-hosts-timing-out-during-ha-cluster.html
 the vm, management network all are on the same vlan and there isnt a firewall configured between the hosts.

hostd.log entries
"http transaction failed on stream tcp (error:transport endpoint is not connected) with error n7vmacore15systemexceptione(connection reset by peer)"

fdm log entries
2014-06-09T13:50:32.006Z [FFEB9B90 verbose 'Cluster' opID=SWI-6058ed8] [ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip.

Found SSL related errors:
2014-06-09T13:51:23.069Z [6DD59B90 error 'Message' opID=SWI-29e297b3] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:24.842Z [6DD18B90 error 'Message' opID=SWI-5992c13d] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:42.841Z [6DE1CB90 error 'Message' opID=SWI-2f2d0b51] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:43.071Z [6DD59B90 error 'Message' opID=SWI-29e297b3] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore16TimeoutExceptionE(Operation timed out)
creating ssl stream or doing handshake

2014-06-09T14:03:58.959Z [6DCD7B90 info 'Cluster' opID=SWI-4b7216e3] [ClusterManagerImpl::VerifyHost] Untrusted thumbprint (02:2D:63:09:48:E3:D8:7F:94:C1:7A:
FB:11:12:B7:C7:EB:F5:20:3F) for host 10.1.100.233 - failing verify
2014-06-09T14:04:59.032Z [6DD18B90 info 'Cluster' opID=SWI-18eb3cb4] [ClusterManagerImpl::VerifyHost] Untrusted thumbprint (02:2D:63:09:48:E3:D8:7F:94:C1:7A:
FB:11:12:B7:C7:EB:F5:20:3F) for host 10.1.100.233 - failing verify

2014-06-09T13:42:05.513Z [6DD9AB90 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x0d9062cc, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
2014-06-09T13:24:30.312Z [FFC92B90 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x04d1117c, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
2014-06-09T13:56:23.892Z [FFE15460 verbose 'HttpConnectionPool-000000'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x0d90316c, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 2

2014-06-09T13:32:58.357Z [FFBEE460 error 'Message' opID=SWI-14a96433] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read) creating ssl stream or doing handshake --> * unable to get local issuer certificate) on handshake
2014-06-09T13:33:59.431Z [FFF5CB90 error 'Message' opID=SWI-77ccbfb7] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read) creating ssl stream or doing handshake

vpxd log:

During election:

2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] Timed out waiting for election to complete or for host to join existing master
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] EnableDAS failed on host [vim.HostSystem:host-1476,uk-mal-esx-p05.dyson.global.corp]: class Vim::Fault::Timedout::Exception(vim.fault.Timedout)
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] Timed out waiting for election to complete or for host to join existing master
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] EnableDAS failed on host [vim.HostSystem:host-1476,uk-mal-esx-p05.dyson.global.corp]: class Vim::Fault::Timedout::Exception(vim.fault.Timedout)

FDM log:

2014-06-09T10:58:35.777Z [FFC63B90 error 'Cluster' opID=SWI-46c45c9d] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d934-a159136a-43cd-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy
2014-06-09T10:58:35.777Z [FFADEB90 error 'Cluster' opID=SWI-3bb36853] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d96e-7feaf4e4-1c30-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy
2014-06-09T10:59:05.819Z [FFD67B90 error 'Cluster' opID=SWI-6c77b0d1] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d96e-7feaf4e4-1c30-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2017233
our action plan was to
Review SSL configuration and certificates in vCenter
disable the Denial-of-Service protection feature.
Review any security scan on your ESXi host via VMware HA agent port (port 8182)
Update NIC Adapter firmware to the latest on all the hosts since they were out of date

did the following but that didnt work too
1. Disable HA under Cluster settings
2. Ensure that SSL Certificate Checking is enabled.

For vCenter Server 5.1 and later:
In the vSphere Web Client, navigate to the vCenter Server instance.
Click the Manage tab.
Under Settings, click General.
Click Edit and select SSL settings.

3. Select vCenter requires verified host SSL certificates. If there are hosts that require manual validation, these hosts appear in the host list at the bottom of the dialog.
4. Click OK.
5. Click OK. Hosts that you have not selected are now disconnected.
6. Reconnect the host to vCenter Server.
7. Enable HA under Cluster setting

SSL certs have been validated – the certificates are valid and are issued from a template also used for ESX hosts which don’t have this issue.

Wednesday, June 11, 2014

MCU (Most Commonly Used) vmware commands

All you command junkies out there dont make fun of me for writing this since i am only writing this to look back to when i or someone like me need to refer it. :P

check dead paths on an esxi
esxcfg-mpath -b grep | -i dead

Please try this command to test the entire snapshot chain .. This will display any errors related to snapshot ..
Vmkfstools –t0 –v10 lastsnapshot-00000n.vmdk - This command has to be issued for each hdd of the virtual machine when you have more than one hdd per VM.
or
vmkfstools -q -v10 "your_disk.vmdk"
Try this command to display the CID , PID & parent file names for all the snapshots of a VM..
Change directory to the VM, then issue this command .

 telnet (alternative) in an esxi
nc -z <target ip> <port>


Tuesday, June 3, 2014

windows 2008 r2 VM black screens after VMware tools upgrade

Issue: after upgrading vmware tools on windows 2008 r2 vm few or many of the windows 2k r2 VMs black screen.

Cause: unstability of the svga video driver on win sk8 r2 platform.

resolution: boot the VM to the BIOS and after that as soon as you exit the BIOS keep tapping f8 to see the advanced options.
Try booting to the last known good configuration and if that doesnt work boot to the safe mode with networking.
Uninstall the vmware tools.
custom install the vmware tools without the video drivers.
copy the C:\Program Files\Common Files\VMware\Drivers\wddm_video
from another VM (which is working fine) to this VM on the same place (just to be consistent).
Right click on the video adapter under device manager, udpate driver >search your computer for the driver and select the C:\Program Files\Common Files\VMware\Drivers\wddm_video location and it will auto udpate the driver.
Issue Resolved.