LSI 9211-8i HBA & vSAN troubles in HomeLab <-- Resolved


I finally had time to attempt configuring vSAN in my lab. I want to compare to HP StoreVirtual. I know its like comparing apples to oranges as they operate completely different:
VMware’s vSAN – Operates at the kernel level in each host
HP’s StoreVirtual – Requires VM appliances in each host
Performance wise vSAN should have the upper hand as it runs directly in the kernel but still want to compare the two and see the IOPS difference running in kernel vs in an appliance. At my job we leverage HP’s StoreVirtual (All flash 10TB) for our VDI environment which we haven’t had any issues with.
Installing vSAN
Before jumping in I checked the vSAN HCL to ensure the cards I plan to purchase were supported.
I went with LSI 9211-8i (3) HBA controller. After receiving the cards I flashed them with the latest IT firmware/bios (P20) and each host has two Sandisk Ultra II 240GB. With all required components lets begin configuring vSAN:
One additional requirement for vSAN to work is having at least one SSD and one HDD in each host. Since I have all SSDs we need to tag one as a HDD by running the following commands:
Identify SSD to tag by running the following command:
esxcli storage nmp device list | grep “Local ATA Disk
Next we apply rule to disable SSD (no feed back = success):
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d naa.5001b44c8a824825 -o disable_ssd
Then we reclaim the device in order for rule to take effect (reboot required):
esxcli storage core claiming reclaim -d naa.5001b44c8a824825
Confirm SSD has been tagged as a HDD:
esxcli storage core device list -d naa.5001b44c8a824825 | grep SSD
vSAN Configuration:
Enable Virtual SAN traffic on vmkernel interface:
Setup vSwitch or vDS. I chose to go with vSwitch and attached the HP ConnectX card to it:
Enable Virtual SAN (vSAN):
Select Cluster (Homelab)
Manage tab
Settings
Virtual SAN – General
Edit…
Turn ON Virtual SAN
Manual
OK
I chose Manual to show the issue I am experiencing normally I would go with Automatic.
Select Disk Management
Select Host (ESXI01)
Click on “Create Disk Group” icon
Select SSD
Select HDD
OK
Repeat for all remaining hosts
Verified Disk Group Status and Resources:
Unhealthy Disk Groups in vSAN
During the creation of a VM or any I/O on the HBAs the disk groups suddenly become unhealthy:
Deleting the groups and recreating them restores the status of healthy. Checking the Hardware Status page show all drives and controller are OK:
Troubleshooting
I plan to continue troubleshooting this week by performing the following:
Disable onboard AHCI controller and test
Remove HP ConnectX cards (show up as Storage adapters & Network adapters) reconfigure vSAN network and test
Roll back firmware to v19 and test
Upgrade to vSphere 6 and test
(03/28/2015) 9:28PM:
DOH!! Just noticed Sandisk Ultra II is not on the HCL supported list. Thats a bummer….but would think it should still work. Going to attempt the above and see if things workout.
Resolution
After upgrading my homelab from 5.5 to 6 and tagging one SSD in each host with the capacityFlash. The unhealthy SSD disk issue is resolved. During the upgraded I also cleaned up and removed the old vibs.
Removed all Mellanox drivers before the upgrade:esxcli software vib remove -n net-ib-cm -n net-ib-core -n net-ib-ipoib -n net-ib-mad -n net-ib-sa -n net-ib-umad -n net-mlx4-en -n net-mlx4-core -n net-mlx4-ib -n scsi-ib-srp
After upgrading to vSphere 6 removed the nmlx4 drivers:esxcli software vib remove -n nmlx4-en -n nmlx4-core -n nmlx4-rdma
Then reinstalled the Mellanox 1.8.2.4:
vSAN is working!
Subscribe to my newsletter
Read articles from Dave Davis directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
