02-22-2019 02:22 AM
We're doing some tests with mockup devices prior to a major change in production environment, and we have found a strange behaviour regarding ARP with Infoblox devices.
We have two sepparate GRIDs with the config below:
- GRID1 - IB-1050 in standalone mode
- GRID2 - IB-1050 in HA mode (with its passive node down)
VIP address: 192.168.1.2
Node1 LAN: 192.168.1.5
Node1 HA: 192.168.1.6
Node2 LAN (offline): 192.168.1.3
Node2 HA (offline): 192.168.1.4
Our plan is to change GRID1 node to HA using the VIP address of the GRID2 trying to cause the less impact possible, and convert the GRID2 node to standalone mode, so the end configuration will be as follows:
- GRID1 - IB-1050 in HA mode (with its passive node down)
VIP address: 192.168.1.2
Node1 LAN: 192.168.1.3
Node1 HA: 192.168.1.4
Node2 LAN (offline): 192.168.1.5
Node2 HA (offline): 192.168.1.6
- GRID2 - IB-1050 in standalone mode
Both Grids have the same configuration at DNS level, so that won't be an issue. We can redirect the traffic of the clients using 192.168.1.3 as DNS server easily, so in order to achieve the final scenario in the less impacting way, we are trying to do as follows:
- Redirect all clients using 192.168.1.3 to 192.168.1.2
- Shutdown HA switch interface of GRID1 Node1 to avoid duplicate IP errors
- Convert GRID1 Node1 to HA with the end configuration and wait for reload
- Convert GRID2 Node1 to Standalone
- Shutdown HA switch interface of GRID2 Node1 to avoid duplicate IP errors
- No-shut HA switch interface of GRID1 Node 1
The problem that we have found is that it seems that the Infoblox device is not sending gratuitous ARP when the HA interface is back online, so the ARP table in the router is not updated automatically. This won’t be an issue because we can clear it manually, but in the real environment will cause a major outage because the ARP table of the clients won’t be automatically updated until they reach their timeout (the time depends on the OS version).
We have tried to do the same procedure waiting for the devices to reload without shutting the ports down and it seems to be ok, it only happens when the device is booted without an active network connection on the HA interface.
Just to add more information, I replaced an IB-4010 standalone appliance last week and we had the same issue with the ARP table on the router: the new device was configured with the same IP that was used by the old one, but the network cable wasn't connected to the new device until it was fully booted. As a result, the new device was unreachable until I manually cleaned the ARP table on the router, so the problem is not related just to HA pairs.
Has anyone tried to do this kind of change before? Please note that the impact target is less than 10 secs according to our client needs