- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Grid problem - HA broken
[ Edited ]- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-11-2017 07:14 AM - edited 08-11-2017 07:33 AM
Hello,
My name is Fovad and I've one of my customers grid members that won't join the grid.
The Customer facing issues in associatin one of the infoblox device to Grid.
All of a sudden they lost the management of the device and also the high availability is broken and the device got dis-associated with grid manager.
As we didn't get the console access we rebooted the device but unable associate the device to GM.
I asked the customer to run following tests
- Ping the unit from Grid-manger - Works
2- Check if all Access lists in place. OK
4- Check ip setting and HA status.- IP confirmed but HA not happening and joining grid not happening.
We tried with following link without any results
And here we have the model and version of the devices.
Version : 6.12.24-349737
Model – IB1410
I appreciate if someone could send some advice here.
Best Regards
/Fovad Adami
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-14-2017 03:05 AM
My two cents. Can the HA port IP addresses be pinged? Can the two nodes ping each other on the LAN1 port as well as the HA port IPs? Just a thought as I had a situation last week that one of the nodes had the HA cable plugged into the wrong port.
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-15-2017 11:50 PM
Thank you RichA.
I ask the customer to run the pings and wait for the reults.
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-16-2017 07:02 AM
Here is the setting.
--------------------------
Node 1
Ha1 10.255.255.11
Lan1 10.255.255.13
MGM1 10.119.245.112
Node 2 (lost Grid connection)
Ha2 10.255.255.12
Lan2 10.255.255.14
MGM2 10.119.245.113
Vip 10.255.255.10
Grid Master
GM = 10.107.198.10 (vip)
Grid traffic to GM via 10.255.255.1
Ping from from all interfaces to all interfaces on 10.255.255.0/24 = OK
SSH to MGM1 och MGM2 = OK
------------+-------+------------------------+-------+-------+
|Lan1 |Ha1 |Lan2 |Ha2
|.13 |.11 |.14 |.12
+------------+ +------------+
| Node 1 | | Node 2 |
| | | |
+------------+ +------------+
| |
|Mgm1 .112 |Mgm2 .113
----------------+--------------------------------+
And here we have som logs :
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: OpenVPN 2.1_rc20 x86_64-redhat-linux [SSL] [LZO2] [EPOLL] [PKCS11] built on Oct 14 2016
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: WARNING: --keepalive option is missing from server config
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: TUN/TAP device tun1 opened
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: /sbin/ip link set dev tun1 up mtu 1500
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20831]: /sbin/ip addr add dev tun1 local 169.254.255.1 peer 169.254.255.2
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20835]: Close error on pid file /infoblox/var/vpn_pids/tun1.pid: No space left on device (errno=28)
2017-08-05T02:50:50+02:00 10.119.245.113 openvpn-master[20835]: Exiting
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: OpenVPN 2.1_rc20 x86_64-redhat-linux [SSL] [LZO2] [EPOLL] [PKCS11] built on Oct 14 2016
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: WARNING: --keepalive option is missing from server config
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: TUN/TAP device tun1 opened
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: /sbin/ip link set dev tun1 up mtu 1500
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24199]: /sbin/ip addr add dev tun1 local 169.254.255.1 peer 169.254.255.2
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24203]: Close error on pid file /infoblox/var/vpn_pids/tun1.pid: No space left on device (errno=28)
2017-08-05T02:55:58+02:00 10.119.245.113 openvpn-master[24203]: Exiting
2017-08-05T02:57:41+02:00 10.119.245.113 openvpn-member[21514]: event_wait : Interrupted system call (code=4)
2017-08-05T02:57:41+02:00 10.119.245.113 openvpn-member[21514]: SIGTERM received, sending exit notification to peer
2017-08-05T02:57:42+02:00 10.119.245.113 openvpn-member[21514]: /sbin/ip addr del dev tun2 local 169.254.0.8 peer 169.254.0.1
/Regards
/Fovad
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-17-2017 06:11 AM
I would look at this error first "No space left on device (errno=28) "
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
08-21-2017 01:00 AM
I checked the "No space left on device (errno=28) " first and according to the customer, they have enough space.
Regards
/Fovad
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
09-01-2017 03:50 PM
Hello Fovad,
Issues like these require active troubleshooting and hence it would be best to open a case with Infoblox Support.
It is not normal for the HA interface 10.255.255.12 to respond to ping. By design HA of the passive node does not respond to ping unless you have specifically enabled a setting "Enable ARP on HA Passive Node".
If you are still facing issues on the grid, I would suggest verifying the below.
1. Verify UDP ports 1194 and 2114 are bidirectionally open between the Grid Master and member.
2. Login to the CLI of both 10.255.255.13 & 10.255.255.14 and issue 'show status' to verify whether they display 'Active' and 'Passive' correctly. If both of them show 'Active', VRRP communication may be broken.
3. Issue 'show interface' in the CLI of both the above nodes and verify the displayed 'Status', 'Speed', 'Duplex' for LAN1 and HA. If this member is configured to perform grid cmmunication via MGMT port, you would want to verify that as well.
4. If you find anything suspicious, verify the physical cable connections and switch port link status.
5. Verify siwtch port confguration to ensure that it meets the prerequisites for an HA pair to function properly. Some of the generally required settings are explained in the below Infoblox knowledgebase article.
NIOS uses the Virtual Router Redundancy Protocol (VRRP) for HA communications and HA-failover.
Best Regards,
Bibin Thomas
Re: Grid problem - HA broken
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
10-06-2017 05:22 AM
Fovada,
I just ran into a somewhat similar situation in our lab. Unfortunately in our lab, I have never seen the GM HA pair being online since I have been here. We were needing to upgrade our NIOS code and since the GM HA pair was basically broken and when doing an upgrade, the passive node of the HA pair is the first to be upgraded. We were unable to distribute the new code with the pair being broken.
Here is what I did to fix our problem. First, I broke the HA pair by making the active node a standalone device. I then proceeded to upgrade our Grid with the new NIOS version that we wanted to test. That was a success. I then pre-configured the HA pair by selecting HA pair in the Network section of the GM. Configured the VRID and all the IP addresses of the HA pair. After the system rebooted, I left it sit overnight to just let everything settle out. The next I logged into the GUI of the passive node and upgraded it to the new code running in the Grid. When it rebooted after the upgrade, I attempted to rejoin the Grid as the passive node of the HA pair. It worked!
The only thing that I can think of was that there was something wrong with that node's previsould NIOS install or something. Either way, this is what fixed my issue in our lab. I do not know if you got your fixed or not but, I thought that I would throw my experience out there.