04-29-2020 12:04 PM
We are looking to replace our 2 grid members that are in a DHCP Failover association with each other and with additional devices.
For simplicity the DHCP failover situation is like this:
Server A <--> Server B
Server A <--> Server X
Server B <--> Server Y
We are replacing Server A and B, currently VM's, with new physical devices that will have new IP's. I've outlined and completed some steps in our test enviornment to simulate the changes. I'm looking to validate the below process is acceptible.
Additionally I'd like to address some legacy DHCP Ranges that are discontiguous or do not follow our standard DHCP range in a subnet.
Step 1: CSV Script - Add New Members to the Networks - Restart Grid Services. Can be done days/weeks before any of the next steps
Step 2: Run CSV Import to change "Override" the DHCP ranges from Failover_Association = Legacy to Failover_Association = New (Do not restart services)
Step 3: Run CSV Import to "Delete" DHCP Ranges that need to fixed to 1 contiguous range (Do not restart services)
Step 4: Run CSV Import to "Add" 1 contiguous range (Do not restart services)
Step 5: OK to Restart Grid Services now. (Previous steps, change failover association for all dhcp ranges, delete dhcp ranges that need to be fixed, add dhcp ranges that were fixed)
Step 6: Wait and confirm all DHCP Failover Associations are "Running Ok".
- Things to Expect from DHCP Failover Associations:
- If no DHCP Ranges are assigned to a Failover Association then it will be listed as "Failure" with "Unknown" for each member
- DHCP Failover will go to a Degraded state and one of the members will be in a recovery wait state. Wait 1 hour (MCLT) for it to go back to "Running Ok"
- A client will be able to renew it's DHCP lease during these above changes. Additionally doing a ipconfig /release && ipconfig /renew allowed the client to get both it's original DHCP lease and one other time a completely different IP. The DHCP servers in the original failover association is capable of managing the DHCP until the new association is completed
So far these above steps in testing work just fine. My concern is being a large production network and changing multiple failover assocations in bulk may have delays or issues in clients renewing or getting new leases. Obviously this would be done after hours but being 24x7 we still have users connecting in the evning.
Is it better to do this in stages knowing that the MCLT of 1 hour will leave DHCP Failover assocations in a degraded state? Or do a subset of test networks to validate the process is ok in production and then do all the rest once verified?
Thanks in advanced