04-07-2020 02:05 AM
Have an infoblox setup where there are 4 forwarders in the grid level list, the documentation indicates the infoblox doesn't round robin but always selects the top one in the list (which we want). I have been having issues with a site as the 3rd one in the list is being selected sometimes (even though all 4 are available).
Just so happens the 3rd/4th in the list do not have a record of this site so send back the SOA which sits in the cache till that times out. Causing the site to be down.
tcpdump actually shows the only one tried was the 3rd in the list even though there is also requests going to the 1st 2 in the list at the same time.
Does anyone have insight to how the forwarders list is utilised, is it not always top down?, or do certain conditions cause others in the list to be chosen.
Solved! Go to Solution.
04-07-2020 06:49 AM
While using more than one Forwarders, Infoblox just like BIND Name Servers uses a metric called Round Trip Time, or RTT, to choose among the Configured Servers to Forward to. Roundtrip time is a measurement of how long a remote Name Server takes to respond to Queries.
Each time a BIND Name Server sends a query to a remote server, it starts an internal stopwatch. When it receives a response, it stops the stopwatch and makes a note of how long that remote Server took to respond. When the Name Server must choose which of a group of Authoritative Name Servers / Forwarders to query, it simply chooses the one with the lowest Roundtrip time.
Before a BIND Name Server has queried a nameserver, it gives it a random Roundtrip time value lower than any real-world value. This ensures that the BIND Name Server queries all nameservers authoritative for a given zone in a random order before playing favorites.
On the whole, this simple but elegant algorithm allows BIND Name Server to "lock on" to the closest nameservers quickly and without the overhead of an out-of-band mechanism to measure performance.
To give you a clearer picture, here’s how it works:
- Initially, each forwarder’s RTT is seeded with a random, low value.
- When the recursive name server needs to forward a query, it chooses the forwarder with the lowest RTT.
- When it sends a query to the chosen forwarder, it starts an internal timer. When it receives a response, it stops the timer.
- If all the recursive name server has is the seeded value for the chosen forwarder, it replaces that value with the value from the timer.
- If the recursive name server has a real RTT based on previous responses, it updates the RTT based on the timer’s value: new RTT = (.7 * old RTT) + (.3 *timer).
Forwarders that aren’t selected have their RTT values “decayed” by multiplying them by .98. This enables all the configured Forwarders to eventually get their turn.
In short, No. Forwarders are not always used in the Order in which they are configured. It is actually based on RTT Values.
Hope this helps.
10-07-2020 04:02 PM
That is an excellent explaination of how forwarders work under normal conditions..
Could you explain what occurs if one of the configured forwarders goes down? For example, if the lowest RTT forwarder is down, will another forwarder be queried? What are the timeout values? Are any of these settings configurable? Etc.
10-14-2020 08:19 AM
BIND will work around the failed fordwer, that's why you see it regularly query all the forwarders in the list even if they aren't the nearest, it's so that it can update it's RTT database. When one of the forwarders fails it will detect this because the RTT will suddenly be very high, so it will stop sending queries to that forwarder. Actually, it will still occasionally probe it to check if it's back, but generally most queries will go to the others. As for how long it takes and what retry/timeout algorithm is used, you'll have to google that, I think there's even been whitepapers written about it and the impact it can have (or not).
PCN (UK) Ltd
All opinions expressed are my own and not representative of PCN Inc./PCN (UK) Ltd. E&OE