Lost connectivity to the device backing the boot filesystem

One of our HP Gen 9 blades had the following configuration error:

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disk/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.

The boot filesystem is on an internal 8 GB SD Card. I logged into iLo, went to Diagnostics and found the system no longer saw the SD Card as mounted:

iLo 4 - Issue with media

We evacuated the host and reseated the SD Card. It mounted without issue but to be safe we went ahead and installed a replacement SD Card. Issue reoccurred. This is caused by the version of iLo running on the server.

iLo 4 - Issue Resolved

EDIT (9/14/2015): There seems to be a issue with the iLO firmware version 2.20 causing this issue. The firmware should be updated to version 2.22. HP should be releasing version 2.30 in the next few weeks that hopefully be a full fix for this issue! Big thanks to Mike B. for reporting his findings from HP!!!

EDIT (10/2/2015): The new HP Service Pack for Proliant (SPP) (dated version 2015.10.0) contains updated iLO firmware 2.30! We are rolling this through our environment over the next week.

EDIT (12/4/2015): The 5+ blades we upgraded to iLo 2.30 haven’t had the issue repeat.

 EDIT (12/10/2015): Multiple reports of the issue reoccurring on iLo 2.30 firmware have been received. Though 2.30 firmware seems to have helped the issue it isn’t a permanent fix. iLo firmware 2.22 should be used until we hear back from HP.

EDIT (2/5/2016):  Bjorn reports HP gave him iLo 2.40 firmware to load. Just checked HP Downloads and it isn’t available yet. Awaiting to read the release notes to see if it specifically addresses this issue.

EDIT (4/20/2016): HP released iLo 2.40 firmware on April 1st. Release notes do not specify this issue specifically, only IPv6 enhancements. We still have not had the issue reoccur on any of our blades running iLo 2.30 firmware. We are going to continue on 2.30 as the enhancements with 2.40 isn’t worth the gamble of a new version.

83 thoughts on “Lost connectivity to the device backing the boot filesystem

  1. Are you writing your logs to the SD card? Have you defined the scratch partition to be located in the SD card? This problem has happened to me twice now and I’m not sure how to avoid it.

    Ultimately, has this problem resurfaced since you proceeded with your corrective replacement of SD cards?

    • Hi Joseko!

      I have the logs being sent to our Splunk server. Once the SD card was reseated then later replaced we haven’t had any further issues. Out of 18 hosts we have had two with this issue. I am leaning on flaky Gen 9 SD hardware 🙂

  2. We are having the same issue in our Gen9’s. In diagnostics we don’t see the card unmounted. We will reboot the host and everything is fine for a little while. Our scratchconfig is set to a datastore so this shouldn’t be our problem. Guess it is time to call HP. 🙂

  3. I have a case opened right now. First thing they said is update the iLO. Which I already did before these blades went in to production 30 days ago. So I didn’t take that answer. They are escalating it to some higher level support. I will update this when I hear more.

  4. Same here with DL380 gen 9 servers…
    Two months ago i saw this problem. Since I had the luxury of turning off the machine without impact, I turned of the machine. Took the card out and put it back in and the problem was gone.
    Today (14-09) I found the same problem on another DL380 gen 9 server. Interested to hear what HP tells you!

    • Same error was on the other server. (ps: since my postings are awaiting moderation, can the moderator merge all this into the originial posting? thnx)

      • So they got back with me and said the iLO firmware I am on 2.20 is bad and has been pulled from the HP site. They said go to 2.22 iLO firmware. They then said on BL’s that the issue can come back on the 2.22 and if it does go to 2.10 firmware. Then in the next few weeks 2.30 is coming out. Either way they are terrible. I have 4 servers with the error right now out of my 32 GEN9’s. I am going to update them tomorrow to 2.22 and see how it goes. Hope this helps anyone.

  5. Thanks for the reply Mike!!! We have had lots of quirky issues with our Gen 9’s, the resolution seems to always be apply another firmware/VIB update!! I will edit the post and add this into it!

    • To add to the calamity. I put my first of 4 hosts in maintenance mode and updated the firmware to 2.22. The error was still present so I rebooted the host. The host is back but its secure channel with vCenter is lost so I have to do a “Connect” in vCenter and re-enter the userid and password to join it back to my vCenter. I now have phantom VMs in my cluster associated with rejoining the Host back again. They say, Unknown, Unknown 10 or 11 etc… Also my NTP settings were removed and had to re-enter them again.

      I have update 2 more hosts. 1 is just fine no issue the third is doing the same as the first, phantom VMs and re-Connecting to the vCenter.

      After all this I feel a reload of each host may be necessary since the Host install to the SD card seems to be compromised.

      This is all my personal decision and what I feel needs to be done. So everyone else please make your own decisions. I do have 32 brand new Gen9’s and 4 have the problem, so far. I just feel a complete reload again is in my future for all 32 just to be safe but I don’t know if the 2.22 firmware is the final change or not?

      Let me know if you need any more details. I am blowing up my HP ASM and Account Team about this. This is very unacceptable.

  6. Thanks for the update. I’m looking at the HP site, but cant find a newer version than the 2.20 firmware yet.
    You received the links from HP or are you in a different support program? 🙂

  7. After all the back and forth our HP account team is going to replace our SD cards with USB keys. Apparently they have no issue with the iLO firmware. So if this is your case you may pursue this as an option. So yeah reloading ESXi 32 times. whoohoo!

    From our ASM: I’m told that Support strongly recommends replacing the SD cards with the USB option.

  8. We ran into this issue with BL460c Gen 8 and Gen 9 . We upgraded all to 2.22 and so far no issues. That being said, we have had two failures out of 48 servers that use those new USB/SD card devices in the gen 9s. these servers are only a month old so i’d proceed with caution before going the USB route.

  9. Fantastic post, thanks Daniel – our HP support couldn’t figure this out and continued to lay blame on vmware. While on hold I found this article, navigated the disaster which is HP’s new support site, and applied the new firmware. When they return to my call I’ll let them know I found a solution.

    Some days I wonder how much profit they make on our support contracts when they hire such unskilled labor.

  10. So, did you deploy 2.30 ILO FW? Did this resolve issues.

    I have similar issues with 16 blades in 2 separate chassis. all Gen8 and Gen9. Already existing blades show no issues like this (/bootbank missing).
    I plan on deploying this next week (First week Nov 2015), so I was wandering if you have already and have you had any issues with it after deploy?

    Thanks

    S

  11. So glad to stumble (finally!) on a post with this issue. I’ve had 2 HP models – Proliant DL380p Gen8 and DL360 Gen9 rackmount servers have this issue. Gen9’s have micro-SD cards and Gen8’s have mini-SD cards. Not all of my Hosts have experienced the error, but 3-4 have. And, like you all, a power down, reseat SD card, power on resolves the issue for several days then reappears. I attempted a SD card replacement and ESXi 6 reinstall, but that didn’t work either. So, it indeed must be a hardware and/or driver issue. I guess I will attempt to install/upgrade iLO to 2.30 and see if that works. Thanks again for this post & all the additional comments! Blessings!

    • Aaaa

      With Dell servers I have the same problem but it usually requires me to reboot to get the USB stick to be recognized again and then I have to go into UEFI and tell it to boot from the USB stick again.

      For giggles I tried it /etc/init.d/hostd restart came back up /etc/init.d/vpxa restart stayed stopped I did /etc/init.d/vpxa start and it said it was running. Logged into the the client and it can write to the USB stick again!

      This is a stand alone machine that handles routing and firewall for like 20 zones. I have over 55vms running on it. It takes forever to go down and come back up. You saved my night. Now I can run the VMA assistant and make a backup config in the event this doesn’t work next time and I can do a restore.

      I really should move to Docker or maybe Photon 🙂

      Thanks for the tip

    • I’d like to confirm that Aaaa’s suggestion to restart management services also corrected the warning states on both of my HP Gen9 Proliant Servers in this condition. I’m running ILO Firmware 2.20 and have replaced SD cards on 2 of the servers without success in resolving the issue.

  12. Hello,
    we have de same issue on two of our Gen 8 servers. We are currently on ilo firmware 2.30. Installing the vmware tools fails. They can’t be found. Could it be the firmware 2.30 is based on 2.20 and does not have the fix from 2.22?
    Tonight we are restarting the hosts, if that does not give a solution we go back tot 2.22. See if that helps. Anyone downgraded a ilo 4 firmware?
    thanks

  13. No good for me, I have HP DL380 Gen9 with SSD drives and just got this error. Have been on 2.30 iLO for a while now. Know idea what is causing this!!

    • Hi Shawn,

      You are getting this error and you have SSD drives? The SSD drives are going through your RAID/motherboard controller. iLo firmware should not be affecting them. The SD cards in the Gen 9 go through iLo to mount which is why the iLo firmware would affect them. It sounds like you have an actual SSD drive issue?

  14. The newest Servicepack from HP 2015.10 will definitely not solve the problem. After 44 days uptime, one of our ESXi-Server (DL380 Gen9 with SD-Card) shows the error message “Lost connectivity to the device mpx.vmhba32:C0:T0:L0…” again.

      • Sure will! It will be official “shortly” according to the support engineer. As long as you have a valid support agreement it should be available upon request, I guess.

        2.40 includes HPE and new theme. Looks pretty, but also pretty hard to read.

      • I got this error on one of the hosts running firmware 2.40 again.

        Uptime: 94 days.

        I would guess that this is probably something in the dual usb kit, than iLO firmware that causing this. If HP doesnt fix this soon I will also rip out the USB dual SD card kit and add a single into the single SD slot.

  15. Hi, thanks for writing this post. I have also been having this issues with a number of blades. I logged a call with HPE to get the update and they sent me an FTP link to the bin file but then followed that up with this link when I asked for any release notes:

    https://h20566.www2.hpe.com/hpsc/swd/public/detail?idx=&action=driverDocument&itemLocale=&swItemId=MTX_d92948b2605f4dc7bc69628b56&mode=#tab1

    There isn’t anything in the release notes for this issue. Are you seeing any issues on 2.40?

    • Chris, been running 2.40 for 11 days without loosing SD card. On the other hand, I usually loose connection after +1 month.

      One issue I encountered with 2.40 is that none of the upgraded hosts are reporting their new iLO firmware level to HP OneView. Still shows 2.30 there. Anyone else seen this?

      • Hi Bjorn, I’m in the same position. All the servers I’ve upgraded are ok currently but it typically a month or so before the failures appear (although some have been as quick as two weeks). We are not using OneView so I can’t help with that but I can confirm that the chassis onboard administrator and iLo are reporting 2.40 correctly. I installed using the bin file and then performed a cold boot of the server.

  16. I have the same issue on my DL380 Gen9’s with iLo 2.30. I’m going to open up a case but will probably not update to iLo 2.40 for a while until the verdict is in on this issue. I’m too old to be a Beta tester anymore for people that can’t fix their own friggin product as we slip into a third world country. This issue also seems to occur after 30+ days. Doing a cold reboot or two from ILO seems to bring it back without having to reseat the SD cards.

    • The USB keys will not solve your problem David. We’re using the keys (USB Kit) in our environment with the same loosing access problem. iLO 2.40 is installed on a handful Servers on HPE’s advice, without any issues since 2 weeks.

      • Hi Marco,
        Can you please let me know the SKU you are using for USB keys? We have upgraded all blades to Firmware version 2.40, still loosing access to the Micro SD cards. We have replaced two servers with USB devices and Firmware 2.40, and they are working OK so far. All servers had a full poweroff/power on (e-fuse) reset.
        So perhaps there is a hardware defect with our particular model of Micro SD Cards and all firmware versions, and a firmware only issue with iLO version less than 2.40 and USB keys.
        The Micro SD SKU we are having issues with is 726116-B21

        Thanks,
        David

        • Hi David
          We have the “HP Dual 8GB micros SD Enterprise Midline USB Kit” installed (741279-B21). But our newest Gen9 Server came with the 726116R-B21 part.

  17. Thanks for the Input Marco, we have tested with both (741279-B21) which is essentially just a Micro-SD card in a USB Key adapter, and a USB Key (737953-B21)

    Both, so far for us, are OK on iLO Firmware 2.40 because they use the USB bus and not MicroSD

    However, most servers we have will lose access to Native Micro-SD cards running SKU 726116-B21 and Firmware 2.40.
    I believe this is because the SD card slot and iLO share a controller. Any resets also affect the MicroSD slot, which removes it from ESXi causing a APD event. Doesn’t affect VM’s, but you cannot write any ESXi host configuration if this happens. Plus any operations that need to write persistent configuration to the SD card such as poweron/power off or writing annotations will frequently fail. As soon as the host is rebooted or management agents are restarted, it comes back OK.

    Hopefully Marco the “R” in your new 726116R-B21 SKU stands for Reliable, as you may run into issues with this part.

  18. We’re still seeing this lost boot filesystem issue even with iLO 2.40, and we are having HP Dual 8GB micros SD Enterprise Midline USB Kits on all servers. The issue has happened several times > 4 with 2.30 and once with 2.40

    • We swapped our servers over to straight USB kits… the Dual 8GB micro SD kits were failing as were SD cards on the ilo no matter the versions we went with. Have been happy ever since the swap… i’d suggest the same for all those having this issue to be honest… good luck all

  19. We have DL380 Gen8 with iLo 2.30. It happend first time on one of 8 same servers. Also we supermicro server with usb flash inside and it got same problem. vSphere 5.5 everythere.

  20. If you have the HP Dual 8GB microSD Enterprise Midline USB Kits in your servers then, iLO4 firmware plays absolutely no role here. On the other hand, if you have a SDCard on the Embedded SDCard slot you better upgrade iLO4 to version 2.40 or later.

  21. I am seeing this error on a BL460c Gen9 – iLO 2.40 with the HP Dual 8GB MicroSD EM USB Kit. Currently have a ticket open with HP but so far they have been unable to resolve the issue. Does anyone have a resolution?

  22. There is a new BIOS for BL460 Gen9 (2.20) in the release notes it fixes some issues related to boot from a dual MicroSD with one bad SD-card and only one working.

  23. We had 5 BL460c Gen 8 blades reporting this error with the iLO4. Four of these in our DR environment and one in the production. I opened a case with HP and support had me remove the blade, clear the NVRAM, and clear the iLO. All of that was done and there was no change – the error persisted. Support then dispatched a CE to replace the system board and that resolved. I’ve opened three other cases since then and replacing the system board has been the resolution. We have one of these remaining in the environment – the one in production. I’ve been holding off until we have a maintenance window to handle that one.

    I did see that there was a new iLO 4 firmware released this week – 2.44 –

    http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=5228286&swItemId=MTX_77bedd49fc264f04a1cf8a54f1&swEnvOid=4166

    There is nothing in the release notes of this one that explicitly states that this issue is corrected. However, I did see that there was a prior iLO release (2.42) as a customer advisory that does outline this issue and the resolution –

    http://h20566.www2.hpe.com/hpsc/doc/public/display?docId=c04996097&hprpt_id=HPGL_ALERTS_1872040&jumpid=em_alerts_us-us_Feb16_xbu_all_all_627308_1872040_Servers_critical__/

    If anyone has applied either of these yet, please share your experience. I’m looking to apply the latest in the coming week.

    • We never had these problems, running on a very old iLO version. We upgraded our servers to 2.44, and these issues suddenly started appearing.
      So safe to say it is not fixed. Reminds me of the reason why I HATE HPe firmware updates. They usually cause more issues then they solve, and then when trying to solve those issues you get into other issues.

  24. We have the same issue with a lot of Gen9 Servers in combination with the dual sd usb sticks.
    Opened a few cases and got a respond to install a firmware update on the usb sticks.
    http://h20564.www2.hpe.com/hpsc/swd/public/detail?swItemId=MTX_8695cbf927064323a7ff72468e&swEnvOid=4166

    We tried this but no change, the device is still getting offline after days/weeks/months.
    This bug is really annoying, we have hundreds of HP Gen9 servers with ESXi installed and every week we have a few with a lost device.

    So far there is no solution… last words from HP.

  25. I’ve had no “device backing lost” issues since updating to p89 v2.00 (12/27/2015). I’ve had 3 of these servers without issue since April when I updated. I’ve since added 3 servers and so far, no issues although they’ve only been racked for a couple of weeks. I’m using HPE 8GB Micro SD Cards and have never tried or used USB sticks on these servers.

    Additionally; here’s a snippet of the other levels of firmware currently in use for reference:

    http://imgur.com/a/Mt62f

  26. I had the same problem on a Gen9. Updating iLO to 2.40 didn’t change anything, but after reseating, the sd card was recognized again. Before it wasn’t even recognized as a valid boot option. Server BIOS version is 1.32.

  27. We have had this issue on several USB devices, annoyingly the SD cards boot when on-board but when the replacement USB comes, if you put the SD in slot 1 it comes back as says it’s failed. So that to me means the USB device has written something to the card so say it’s failed when it hasn’t (SD card from slot 1 had failed). There is a newer firmware for the USB itself which needs to be applied in online mode (SPP won’t detect it, but it’s in the ISO).

  28. Still happening on iLO Firmware v2.50. VMware KB2144283 reports that the issue only affects v2.22 and v2.30, however I have experienced the issue with all successive versions released since v2.20 to a greater and lesser extent.

    • Had these errors over and over, every few months. Micro SD’s ended up failing completely on both my Servers within a few weeks of each other. Replaced the dual micro SD on one. HP didn’t have a replacement for the second. Had them send me a couple of small SSD’s for boot mirror instead. When the micro sd stick fails again on the one server I will have them replace with SSD’s as well.
      I suggest you all do the same. List price for the two ssd’s is about the same as the dual micro SD’s, so HP shouldn’t complain The dual micro SD solution is an unresolved failed boot method that needs to go away

  29. I can confirm that I am now too experiencing complete MicroSD USB RAID device failure on affected servers with iLO v2.50. HPE are due to replace the first one in the next few days so I will attempt to ascertain what is going on with this issue ‘behind the scenes’.

  30. Engineer wouldn’t be drawn on whether this issue was something that he had been called to with increasing frequency to other customers sites who were using this combination of hardware. I am going to pursue it through HPE from this point on.

    • Problem is, when the USB controller fails, the Micro SD’s get corrupted and cannot be placed in a replacement controller.
      So in actuality you’d be better off with a single SD card boot device, because there would be less chance of a failure. and it would be easier to make a copy of the SD card for backup if needed.
      I had HP replace the dual Micro SD controller with 2 80GB SSD’s (With Trays) HP part # 805361
      I’ll do the same next time my other Gen9 gets an error or failure of the USB controller.

  31. We’re still getting this on iLo 2.50 – has anyone got anything official from VMware/HP on an actual fix? We’ve been chasing this for over 12 months now.

    Michael

  32. Hello

    Same problem for me except that I have 3 DL380 G9 but only 1 problem me.

    Despite the installation of the latest firmware (SPP 2016.10.0) the problem continues

    I will open a call to HP to try to have a solution but apparently, compared to your testimony, not much more conclusive.

    I’ll keep you updated.

  33. I confirm the issue still persist – 2017/02/02.
    We have ProLiant DL360 Gen9 (Firmware 2.50 Sep 23 2016) with ESXi 6.0.0, 3620759 installed. The same issue on it.

  34. For the MicroSD USB Keys that we have had replaced due to this fault, they appear to be shipping with an even newer firmware that is not generally available as well, probably only an internal release at present, v1.3.2.202.

  35. After the HP technician intervenes, the motherboard has not been changed but the usb key has been changed by a new version key

    Hp now recognizes the problem and replaces the keys, so I also change the keys of my other 2 servers.

    However, they already run out of stock and do not know the lead times.

  36. Do the majority of the people have – 741279-B21 HP Dual 8GB microSD EM USB Kit? We have a few and we have the same issue but restarting the agents will resolve the issue. Any ideas how we can extract the serial numbers from ESXI Console? As it doesn’t appear in ILO at all.

    • Hey – Yep, we had those, rebooting does indeed fix it but the error reappears again after an undetermined amount of time. The serial numbers are on the USB Drives themselves, according to HP that is the only place they are.

      Michael

    • Rebooting ussually will bring it back. It might take a couple of try’s. However I have had them fail completely. I just had another one go on the fritz again. According to HP, they came out with a new version of the Dual Micro USB kit in February 2017 and they said it resolves the issue. As I am no longer interested in Beta testing this product, I replaced it with a couple of small SSD drives.

  37. I’ve been chasing this issue since we installed our Gen 9’s in April 2016. In April 2017, HP did come out and replace motherboard and usb flash drives in all hosts. Everything was running find until yesterday when one host has the error again. I looked to VMware and they have a KB that says need to be on ILO version 2.40 or better for the fix, we are on ILO 2.50.

Leave a Reply