- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
#6905: Appliances of IB-14x5 and IB-8x5 hardware types may intermittently fail to boot...
#6905: Appliances of IB-14x5 and IB-8x5 hardware types may intermittently fail to boot...
#6905: Appliances of IB-14x5 and IB-8x5 hardware types may intermittently fail to boot when a soft reboot command is invoked:
---------------------------------------------------------------------------------------------------------------
Problem Summary
Appliances of IB-14x5 and IB-8x5 hardware type may intermittently fail to come up on reboot when soft reboot command is invoked. The command may be issued via GUI ("Product Restart") or CLI ("Reboot"), or as a part of the NIOS upgrade procedure (soft reboot is required for the appliance to switch to the new upgraded code). If the affected appliance is a grid member, it would show as "off-line" member when soft-rebooted, or stuck in "upgrade reboot" if upgrading. It will not be pingable if standalone.
Customer Environment
Customers who are in possession of Infoblox appliances with hardware type of IB-14x5 or IB-8x5
Versions
NIOS version independent.
Cause
The root cause of the issue is associated with the early BIOS boot-up process and in the area of memory detection and is a race condition. The issue is not related to the NIOS product but rather associated with the motherboard BIOS installed on the appliances with hardware type specified above. Should a unit run into this problem, it will hang after reset and before displaying the BIOS screen.
Symptoms confirming the issue:
- Terminal/Serial console session output (please note that the CLI session output may appear up to 10 minutes after the session was opened due to the appliance's state):
From serial console perspective (or terminal session), the appliance would not report further messages after the one in yellow:
May 22 07:39:40 /etc/rc.d/rc6: sending all processes the TERM signal
May 22 07:39:45 /etc/rc.d/rc6: sending all processes the KILL signal
May 22 07:39:45 /etc/rc.d/rc6: turning off swap
May 22 07:39:45 /etc/rc.d/rc6: turning off ethernet devices
May 22 07:39:45 /etc/rc.d/rc6: unmounting filesystems
May 22 07:39:48 /etc/rc.d/rc6: rebooting system
[496902.643267] reboot: Restarting system
***Note: Under normal circumstances, several early BIOS messages would be displayed briefly and about 1 minute after "Restarting system" message the BIOS banner would start with:
Version 2.17.1254. Copyright (C) 2017 American Megatrends, Inc.
- Support bundle logs:
Once the appliance is recovered through a hard reboot (power off/on), a review of the support bundle files and specifically, the ipmi "System Event Log" will almost certainly show a watchdog event (this happens 10 minutes after the reboot):
8 | 03/30/2017 | 07:10:50 | Watchdog 2 #0xca | Timer interrupt () | Asserted
9 | 03/30/2017 | 07:10:51 | Watchdog 2 #0xca | Timer expired () | Asserted
Those recorded messages may also be accompanied by a "no memory installed" System Firmware Error (either before and/or after the watchdog messages):
5 | 03/30/2017 | 07:01:36 | System Firmware Error | No system memory installed () | Asserted
Resolution
Engineering is working on a BIOS code bug fix. The hotfix previously attached to this KB was removed due to a potential issue experienced with the fix integrated in that hotfix. This is urgently being investigated and more updates on a potential revised hotfix will be updated in this KB.
Workaround
Until permanent fix is provided, Support strongly suggests power off/on the appliance with hardware type IB-14x5 or IB-8x5, should it expose the symptoms described above.