06-14-2021 07:15 AM
I am doing a network migration, we are taking a grid database backup from an existing Infoblox grid and loading it onto a new Infoblox grid. The environment is currently updating MS DNS via GSS-TSIG updates, and will only be used to provide DHCP.
It will be a couple weeks between loading the backup and enabling DHCP ready for all the ip-helpers to be changed, but that means most of the leases will have expired in the database, so Infoblox will send thousands of DDNS delete messages to MS DNS even though those devices are active, this will cause an outage.
Is there a way to turn off "DDNS update on lease expiry"? The only way I can prevent this from causing an outage is to disable DDNS completely until most of the clients have switched over to the new Infoblox - I will also have to disable DDNS on the old Infoblox so that also doesn't delete DNS records when leases expire for VLANs that have been migrated.
06-16-2021 03:16 AM
While i understand the target use-case, just wanted to let you know that your requirement might be against the recommendation in RFC-4702, but it is NOT a violation though(Quote from RFC below) :
"When a server detects that a lease on an address that the server leases to a client has expired, the server SHOULD delete any PTR RR that it added via DNS update. In addition, if the server added an A RR on the client's behalf, the server SHOULD also delete the A RR."
Since the RFC uses SHOULD(instead of MUST), it isn't mandatory.
Keeping in mind that the DHCP server would attempt to delete the DDNS record upon expiry, only if it has the client-hostname in lease info, finding a way to remove that specific info from the dataset is the way. Now you know that it isn't possible from the GUI. According to the Infoblox Perl API/WAPI documentation, the 'modify' method of 'Infoblox:: Session' object doesn't let you alter the values retrieved for the 'Infoblox:: DHCP: :Lease' object[client_hostname, on_expiry methods to be specific. get, remove & search methods work]. Now if you intend to use the 'remove' session method to completely free-up the lease(from GUI or API), that'll trigger a DHCPEXPIRE & eventually initiates command to remove the record from Microsoft server(As long as client_hostname exist in the lease info).
If this was a lab, I'd try removing this piece of data from the database directly by modifying the, "client_hostname" PROPERTY of lease info to NIL. This is represented as shown below inside the onedb.xml of your grid backup(Probably one can remove all such instance with few lines of bash & extended timeout - Higher execution time for a production database). But if the name disappears from the lease info, no deletion from MS upon lease Expiry. Modifying the database directly has high chances for corruption & is strictly not advised in Production. :
<PROPERTY NAME="client_hostname" VALUE=""CLIENTHOSTNAME""/>
Thought 2 :
At your existing grid, you may forcefully defer dynamic updates to specific MS domains by giving non-existent/unreachable IP address as the AUTH for your MS domains at DHCP -> Configure DDNS & then perodically deleting the deferred updates to avoid any capacity/performance issues. CLI commands you might need then :
Infoblox > show dhcp_dddns_updates
Infoblox> delete dhcp_dddns_updates
To make bulk changes to Configure DDNS -> External domains , we could use Infoblox Perl API module(WAPI schema doesn't support this seemingly). I can help with a script for this if needed. Challenge here is that from the time you modify the MS IP address here, all dynamic updates for those specific set of domains will pause till your new grid is live. This approach is almost like disabling DDNS updates, but the take away is that you're blocking updates only to your MS servers temporarily & from what i read, the IP addresses of clients would always/mostly remain constant.
Grid A - Existing grid
Grid B - New grid
May be run cron jobs to periodically GET the lease file from Grid B & disable DDNS updates at Grid A, for those networks for which a lease has been written in that specific lease file(Say a lease exist in the lease file for 22.214.171.124 at Grid B, disable DDNS update for 126.96.36.199 network at Grid A). It sounds a bit complicated/hard, but appears to be logical as i think. The main challenge with this approach is the time taken for parsing the lease file each time, but can be autonomous as a process as long as the time complexity has been gauged.
ACL restriction for these domains at MS side ? Block updates from Grid A & allow only nodes of Grid B ? Almost similar to #2 , but least harmfull.
Feel free to share any thoughts/ideas opposing my thoughts or something more reliable that you came up with - in any case, benefit to the Community
06-16-2021 03:44 AM
Thank you, some interesting ideas, I was thinking along the lines of thought 3, but the customer said they don't have direct read/write access to the existing grid as it's a managed service, this means it's going to be very difficult to schedule the disabling of DDNS updates on a per subnet basis as the contract with the 3rd party managed service provider is being terminated (this is why we are moving the data to a new grid). I need to see if there is some way I can get access to the grid master, if the contract is being terminated I am not sure how long they will have network access for - if the network is being terminated at the same time then the existing grid might just "disappear" meaning it won't be a problem (as there will be nothing to send DDNS delete messages!).
My problem then will be to enable DDNS on the new grid as we migrate each VLAN, but they will need to do this quickly before all the leases issued by the old DHCP server expire (assuming the network connectivity is lost). However, the customer told me yesterday that they have 800+ sites to migrate, so I don't think it's going to happen very quickly.
I thought about maybe using an router ACL to block port 53 from the old Infoblox grid so the DDNS updates are blocked, but the customer doesn't have access to the switches (again managed by the 3rd party). So we will have to see if we can block them on the MS side like you said in thought 4.
At the end of the day I don't see a clean solution so might have to plan for the fact that lots of DNS entries will go missing at some point until all the leases are re-issued from the new Infoblox grid.
One question, if I disable DDNS updates at the grid DHCP properties level on the new grid, then start the DHCP service on the new grid, I assume it will NOT send any updates, so no DDNS delete due to leases expired. But if I then enable DDNS later, will it send retrospective updates for leases that expired whilst DDNS was disabled? I hope not! :-)
06-16-2021 09:34 AM - edited 06-16-2021 12:39 PM
It should not. I ran a quick test & observed the same result in NIOS 8.4.x. The server attempts to delete the DDNS record only if it added it before. Over this, depending on your TXT handling method, the new DHCP server MAY not be able to update the DDNS record created by the old server, if TXT handling method is set to strict ISC. (Pre-requisites stuff..you might have already checked). I guess, that'd be like a backup armour to restrict unexpected deletion in your case (Keeping in kind that this would add up to the deferred updates stack). To permit subsequent updates, you MAY need to switch to ISC transition, during the transit(If at all needed), whenever the switchover is complete.
06-17-2021 01:22 AM
I think the customer is running in ISC strict mode, but I don't see why that would stop the new DHCP server from deleting or updating the records, I thought the TXT record was a hash of the client identifier/MAC address (and host name?) and did not "tie" the record to any particular DHCP server, else a failover peer wouldn't be able to update records added by the other peer.
06-17-2021 11:25 AM - edited 06-17-2021 11:29 AM
That's not what i meant. I should have expanded the possiblity which i was referring to. Dynamic forward lookup records created by one DHCP server CAN be overwritten by another server, as long as the TXT record exist/matches[While, DDNS Update Method = Interim & TXT verification method = ISC .. Note : other combinations exist as well].
Occassionally, such migration tasks may involve dealing with orphan records at the MS DNS side(Ones overwritten by the client themselves after the DHCP server did or may be scavenging at the MS side etc). This eventually mess up with ISC strict mode, where it will not find a corresponding TXT RR for something that its attempting to update. Infoblox normally advises ISC transition for few lease cycles, in order to accommodate those issues.
Apparently, I overlooked the part 'gss-tsig updates' to MS DNS, from your initial post(Which means, likely there's no chance of a client update). Over that this is from IB to IB, where something like that could have happened in the first place.
06-28-2021 08:44 AM
Ok so I did some testing in the lab, if a client gets a short 5 minute lease, and I shut down the DHCP server (and client NIC) so that the lease expires, when the DHCP server starts up, I see a DHCPEXPIRE message in the syslog and it deletes the DNS entries just as I expected. Now I tried this by disabling DDNS updates on both the network and server level, and surprisingly the server will still do updates and remove the DNS entries when the lease has expired. I was quite surprised about this. The only way I can get it to NOT delete the entries is to make sure the subnet is disabled for DHCP so the subnet and scope doesn't appear in the DHCP config at all, then it will leave the DNS entries alone.
However, as soon as I re-enable the subnet the DHCP server will delete the DNS entries - this means if I time the migration so that we only enable the subnets required for each batch of ip helpers, the DNS entries will only be deleted at that point and clients will have to renew their leases to be re-added back into DNS, but I don't know how long the lease is for some of these VLANs, if it's several days it could take a while for clients to re-register, unless we are able to get the lease time reduced beforehand, but I think that will be quite hard as we don't have access to the other grid - we will also have a problem with that grid deleting DNS entries when the leases expire on the old grid, so I'll need to see if I can get the subnets disabled there to prevent the DDNS updates occuring.
Your trick of using a dummy destination IP for the DDNS server configuration might work as they only have two zones defined - I can delete the deferred updates then update them with the real IPs when we start migrating (and hope that all leases that were going to expire have expired by then!) - I'll have a look at that next.
06-28-2021 08:46 PM - edited 06-28-2021 08:49 PM
Thank you for posting your observations. While i do not recall the specifics from Infoblox dynamic DNS updates architecture docs, I believe your observation post disabling DDNS MAY be expected/reasonable. I say so because of the statement that i pointed out from RFC earlier :
"When a server detects that a lease on an address that the server leases to a client has expired, the server SHOULD delete any PTR RR that it added via DNS update. In addition, if the server added an A RR on the client's behalf, the server SHOULD also delete the A RR. When a server terminates a lease on an address prior to the lease's expiration time (for instance, by sending a DHCPNAK to a client), the server SHOULD delete any PTR RR that it associated with the address via DNS update. In addition, if the server took responsibility for an A RR, the server SHOULD also delete that A RR."
As it uses SHOULD, its indeed a recommendation(& not mandatory ofcourse). It's an Infoblox documented behavior to remove A,TXT,PTR RRs upon lease expiry. I'm thinking this *Removal* process doesn't check if DDNS updates is currently enabled or not(Likely to adhere to the RFC recommendation - "Delete if you created it!").
Note that I could be wrong here as well, since the term "Updates" would generally include "additions & deletions". So from unchecking "Enable DDNS Updates", one could generally infer that both addition & deletion actions will be disabled. I'm just trying to connect the logical pieces of info together. Tech support may be able to confirm this from the wiki docs written by the engineering team.
As what you're considering, I feel that the usage of a Dummy IP for two external domains, would be less harmful compared to the pain of disabling multiple DHCP networks.