Using DTC for DNS Failover in an A/P ISP Scenario
Hi Community,
I’m working on the following setup and need some guidance:
- Environment:
- 1x Data Center with dual-homed ISPs (Active/Backup for published services)
- 3x Infoblox grid members (v8.6.3)
- 1x internal DDI
- 2x external authoritative NS (NS1 & NS2)
- NS1 & NS2 each have public IPs from both ISPs.
- Current state:
- Through ISP1 (primary), everything works fine — NS1 & NS2 resolve published services without issues.
- Published services currently have A records from both ISP IP pools with same priority & TTL, e.g.:
app1.mycompany.com → 12.12.12.1 (ISP1)
app1.mycompany.com → 55.55.55.1 (ISP2)
- DTC is licensed and configured on all grid members.
- Routing to both ISPs is already working and tested.
- Goal:
I want NS1 & NS2 to always reply with ISP1 (primary) IPs for published services. Only if ISP1 fails, then they should start replying with the ISP2 IPs instead.
Question:
What’s the best way to configure Infoblox DNS/DTC in this dual-ISP setup so that authoritative NS always answer with ISP1 IPs, and only fail over to ISP2 IPs if the primary ISP is down?
Best Answer
-
I've never come across a use case like this. Its interesting. Be sure the read
as that should help with understanding the components of the solution. Also, as an Infoblox customer you now (as of this month) have access to the Infoblox Training portal. It has a short but informative DTC course which you may find very helpful. Talk to your account team for access.So long as the NIOS nameservers have one private IP address each and public IP addresses on ISP1 and ISP2 are both DNAT'd to the private IP addresses, then public access to the nameservers will be handled by the routing at the firewall/router level and ISP failover isn't a problem NIOS needs to worry about (for public access to nameservers)….
Make sure that the nameservers (specifically the nameservers) are available at all times on both ISP1 and ISP2. The desire to keep ALL traffic off ISP2 is a bad one if that includes stopping access to the nameserver on ISP2. There is a whole lot to unpack on that topic alone so I'll leave it at that for this post. Also, make sure you fully test ISP1 going offline as part of this project.
A LBDN (e.g. app1.mycompany.com) consists of one or more "pools". A Pool consists of one or more "Servers". Servers typically represent the backend server (common scenario is that you have two web servers in DC1 and two web severs in DC2. You might use the LBDN to steer DC1 traffic to the DC1 web servers and DC2 traffic to the DC2 web servers (this is called "Topology" load balancing. Then you use the Pool to steer all incoming traffic for the DC1 pool to the first server in DC1 and only use the second server in DC1 is the first server fails (this is called "Global Availability" load balancing. In your case, it seems you only need to balance at the LBDN level (if you have 1 LBDN, 2 pools and 1 server in each pool). You could also have 1 LBDN, 1 pool and two servers in that pool and balance at the pool level. Both methods should work in your scenario so lets assume the former for now)
In order for DTC to be able to tell if a server is up or not, both "Server" objects and "Pool" objects can have health monitors created. Simple ones are just ping (i.e. is the server "up"). More useful ones are HTTP(S) calls that test the result (i.e. is app1.mycompany.com responding with "sampletext" in the page).
The important part here is that it is often assumed that healthchecks are going from the DNS server to the backend server to monitor. In your case, you don't seem to be interested in monitoring the application server, you just want to know if the ISP link is up. That means you (probably) want to monitor ICMP from DNS servers out to an ISP router at the ISP side of your internet link for health monitor. That will tell you if ISP1 is up.
The DTC config you are likely after is (for every published service)
2 x Server objects ( app1_isp1 12.12.12.1 and app1_isp2 55.55.55.1). In health monitors, make sure you specify the IP you want to monitor (it will likely be the ISP router on the ISP end. I've not tested the scenario so don't take my word for it). Example images below. Actually, you might want to put healthcheck on the first server. If the first and second server both fail healthchecks, what do you want to happen? You can configure a "default" value but the logic could also be "if ISP1 link fails, always provide ISP2 IP and don't bother checking if ISP2 is up. If they are both down, who cares.
2 x Pool (e.g. app1_pool and app2_pool) each contains one server each
1 x LBDN (e.g. app1.mycompany.com) configured with "Load Balancing Method" = "Global availability" between two pools (i.e. always use the first pool unless all members of the pool are offline and then use pool 2)(Note, you could also have 1 LBDN, 1 Pool and two Servers in that pool with Global Availability. Same outcome)
Do this with 1 LBDN per published service.
Server Config Example. Assume 9.9.9.1 is the ISP1 router. A key detail here is that the health monitor is pointed at something that is not the service. It is pointing at the ISP1 router while the server (host) is still set to the ISP1 public IP of the published service.
Finally, I'm still not sure this is the best way of achieving the required business outcome and I don't have full context so please don't take this as confirmation that this is the correct "architecture". Its just a nudge in the direction of getting it working they way you have described. I really recommend engaging with your aligned solution architect at Infoblox.
0
Answers
-
So NS1 has two public IP addresses configured locally on the appliance? one for ISP1 and one for ISP2? Or does it have a single private IP and there is destination NAT going on on both routers/firewalls for the two public IP addresses (per appliance)?
The answer is probably going to require a healthcheck between DNS server and the ISP router (at the ISP end, not your premises) to verify connectivity. Is DTC also balancing between dual instances of published service or are you using it for just ISP1 vs ISP2?
The question will be how to get the healthcheck working properly. If the NS server has an IP on both ISP links, how it is routing to Internet? You say "Routing to both ISPs is already working and tested." Have you tested what happens if you disconnect ISP1 and try and query the NS IP on ISP2? If not, its possible that the current test queries are coming in on ISP2 and the response is going out on ISP1 (asymmetric routing; its caught me out before)If the nameservers have a single IP each and it is DNAT that is being used on both ISP links to permit inbound traffic flow, then you will likely need both nameservers to have an ICMP/Ping healthcheck of the ISP router (at their end of the ISP connection) to verify connectivity in a "Global Availability" configuration. If that fails, they can fall back to the second option (IP value for ISP2).
I'd recommend running this by your aligned solution architect as more context is needed around your specific environment to give a more detailed answer.
Side note: NIOS 8.6.x is EOL and end-of-support (Since April 2025). Look into upgrading to 9.0.7 sooner rather than later (as always, read the release notes carefully).
0 -
@bstafford
Thanks for the reply, Answering your questions:
- Yes, both ns1&2 have S/DNATted from both ISPs, and both ns have their A records from both ISPs as well, with the A record using the ISP2 IP currently "disabled", and so for all of the published services as well.
- So far DTC are not configed anywhere, but the requirement is to consume ISP1 while ISP2 remains in standby. I think LB on both links would've been much easier, but this is what the requirements currently dictates.
- Routing is well placed, no test has been done (for dns queries) though, because of what you've mentioned (If ISP1 goes down & no correct resolutions through ISP2, ISP1 will still be used!).
- I'm thinking about ICMP monitor to ISP1 router egress_int (pingable) but I kind of not fully familiar of how to get it working using DTC.
- I've spent considerable time in IB KBs trying to understand how DTC would do this job, but I'm stumbling up all those kinds of DTC configs (Pools, LBDN, Servers, Monitors). while all examples I found were talking about quite the opposite! monitoring the services themselves, then do the LB.0 -
Alright!, I got it to work. Basically, I did the followings (in a lab env):
- 1 LBDN per Pool per Server containing 1 service using ISP1 IP. The Server is doing HM (icmp to ISP1 interface_IP). Once done, I've found DTC LBDN Record for each service automatically created.
- In the DNS authoritative forward zone for (mycomany.com) I created an equivalent records with ISP2 IPs. Each record has a
scratch line on it!. - I tested it by shutting ISP1 facing interface, and it's working!. A test user "behind ISP1&2" was able to resolve DNS records through ISP2 IP. Once enabled the connection back on ISP1, the client was able to resolve through ISP1 IP.
- I'll see the possibility to add a second ICMP HM to 1.1.1.1 or 8.8.8.8 in "or" fashion for more certainty.
- What I can conclude now: For x amount of services I need x amount of LBDNs, Pools and Servers, which can be complex to configure and manage in a large environment. Is there -could be- any more cleaner way?.
0 -
Nice. As you say, the scratched out record is the "real" record that any active DTC entry will override. If the healthcheck disables the DTC entry, NIOS will default to giving out the "real" entry.
If you have multiple FQDNs (services) pointing at the same IP (e.g. reverse web proxy/WAF), then you may be able to just enter those FQDNs as multiple "patterns" in the same LBDN object. However, if you have different public IP's per service, then you need a separate service object for each on (because the IP address the LBDN passes back is going to be based on the configured IP in the server object). The API is very powerful so consider using that for automating the creation of configuration.
0
Categories
- All Categories
- 5.1K Forums
- 4.7K Critical Network Services
- 469 Security
- Visibility and Insights
- Ideas Portal
- Webinars & Events
- 273 Resources
- 273 News & Announcements
- Knowledge Base
- Infoblox Documentation Portal
- Infoblox Blog
- Support Portal
- 8 Members Hub
- 4 Getting Started with Community
- 4 Community Support