testing infoblox grid connectivity by hand before joining new member

leifeste · ‎10-09-2019

a situation i used to always run into in the past was: i needed to set up a new infoblox device or ha pair, and the grid master and grid master candidates were in different datacenters/locations and there were one or more firewalls between them. i would submit firewall rules, then hope they were implemented correctly, waiting for the stated time to try my grid join and hope it worked. then, if it didn't, trying to figure out for sure if it was a firewall issue or something else.

i always wished infoblox had an option somewhere to "test" grid communication in some way, just to verify the lines of communication were open without doing the actual join at that point. but they didn't. (and don't, that i'm aware of.)

at some point, i became aware of the expertmode option, and access to the new/different command line tools that mode provides. using that, i was able to figure out a way to test/verify connectivity to the grid master.

caveat: the device must not be already joined to a grid. if the new device is already joined to something, then the openvpn udp ports (at least 1194) will already be in use so this trick doesn't work.

so here's what i do now to verify the firewalls are opened properly, well in advance of my implementation date, so i can follow up with the firewall teams to get things fixed before the join date and time arrives...

run a tcpdump on the grid master looking for the ip of the new device, e.g.

Infoblox> set expertmode on

Expert Mode > tcpdump -i eth2 (udp port 2114 || udp port 1194) && src <lan1-ip.of.new.device>

(which interface you have to listen on depends on the infoblox device, but in general eth0 is mgmt, eth1 is lan1, eth2 is ha - in my experience. ha pairs do openvpn tunnels between each other on their lan1 ips, while devices talk to the grid master from their lan1 ip to the grid master's vip. if a grid master is an ha pair, the passive member talks to the active member from its lan1 ip to the active member's vip (instead of lan1 to lan1 like grid member ha pairs do with each other. so "eth2" above is the grid master's ha interface, and since you should be on the active member, the vip is riding on this interface. technically, vips are on eth2 or eth2 is empty on the passive member, and the ha ip for each member is on eth2:1. i know i used to have some devices that didn't ).

leave this running on the grid master, then on the new device run a couple of traceroutes.

(i originally was going to use ping, but the options on ping didn't allow me to set what i needed. i forget what it was though, and i don't feel like looking it right now.)

first specify the source and destination ports as 1194/udp, then on a second traceroute specify 2114/udp. e.g. for 1194:

Expert Mode > traceroute -i eth1 -U --port=1194 --sport=1194 <vip.of.grid.master>

(so this is going to send out of eth1, the grid member's lan1 interface, to the vip of the grid master. the -U is for udp, --port is destination port, --sport is source port.)

if there is nothing blocking the communication, then once the traceroute packets get through the other hops in the path to the grid master you'll see some packets show up in the tcpdump screen on the grid master. something like...

19:20:14.125494 IP <ip.of.new.device>.1194 > <ip.of.grid.master>:1194: UDP, length 32

19:20:19.130534 IP <ip.of.new.device>.1194 > <ip.of.grid.master>:1194: UDP, length 32

etc.

if not, then the screen will stay blank and something (like a firewall) isn't set up right and is blocking connectivity.

repeat for 2114.

anyone please feel free to correct, expand, question, etc.

i just thought i'd mention this as an option, since i've found it helpful myself in making sure everything is ready to go before the actual night of a grid join and finding out the firewalls weren't actually opened correctly.

as a bonus: if you are doing the join and it says it's successful, then it never finishes joining (syncing, usually)...you can go into expert mode and look at a tcpdump for 2114 and 1194. if you see pretty consistently spaced 1194 packets of the same smaller size (a properly working openvpn tunnel will have a lot of 1194 traffic, of varying sizes, not just packets of the same size with a fairly consistent time gap between them), then the openvpn tunnel is having some problems. in my experience, going into the grid member settings and lowering the vpn tunnel mtu from 1450 down to 1000 or something has a decent chance of fixing this.

psanggabuana · ‎10-10-2019

Hi Leifeste,

If I want join a New Member to Grid Master than I will try to ping Grid Master Address.

You can try it for first step.

After that, you can show at Grid Master Log or Firewall Log.

Sometimes, PING is oke but VPN Connectivity (UDP 1194 or 2114) is not running well.

I hope for a new release NIOS, Infoblox can support "telnet command" and "Grid Master Connection Test" feature.

Thanks,

PSS

Jelle · ‎10-11-2019

I would like to add the following option for a connectivity test:

1. enter maintenance mode (set maintenancemode)

2. use the following command to test connectivity towards in this example the grid master:

Maintenance Mode > show network_connectivity proto udp x.x.x.x 1194

Starting Nmap 7.31 ( https://nmap.org ) at 2019-10-11 08:58 UTC
Nmap scan report for x.x.x.x
Host is up (0.00033s latency).
PORT STATE SERVICE
1194/udp open openvpn
MAC Address: xx:xx:xx:xx:xx:xx (VMware)

Nmap done: 1 IP address (1 host up) scanned in 7.09 seconds

You can just run "show network_connectivity" in maintenance mode to check the correct syntax. Thank you.

Jelle
Escalations Engineer EMEA

leifeste · ‎10-15-2019

whaaa... infoblox has nmap hidden in the bowels of maintenance mode? how is it that there is all this cool stuff secreted away?

do i need to be a part of some secret society to know about it? i mean, if i have to wear a signet ring or know some secret handshake or something, i could probably be okay with that. having to get an infoblox tattoo or brand might be a deal breaker for me.

thanks for the tip, jelle!

(and i'll just go into maintenance mode and start poking around with all the commands instead of getting an infoblox tattoo.)

DEvans · ‎10-17-2019

I wonder how long that command has been around. I've taken the advanced admin that gives bunch of the maintenance mode commands in the book and it wasn't included in 2008 or 2013 versions of the class materials. But my RFE-1737 for this this type of feature to test GMC's avaibilty(firewall rules) without actually failing to it was still open when I checked several years go. So hopefully its new-ish.

I'd messed with attempting to do this with tcpdump and dig generating traffic on the VPN ports at one time and never finished the script. I think this will be much easier to script a GMC firewall rule validatation without doing a fail over. It's nice to see it finally available \ made public.

leifeste · ‎10-22-2019

i've been messing with expertmode for some time, but i'd never poked around in maintenancemode.

while there are some interesting things in there, i've lost some of my excitement about my use of show network_connectivity. while it runs nmap, it's a wrapper, and it severely limits some of nmap's functionality that would provide what i need.

like i would assume in most environments, firewall rules where i've worked are punched very specifically, port-to-port. so if i can't specify both a source and destination port, it's not going to show connectivity - because the firewall will block random udp ports to the port specified with the network_connectivity wrapper. (by default, nmap will use a random udp port as source port.) so unless network_connectivity configures source port to match destination port, the nmap run will use a random udp source port. and the firewall will *only* allow 1194<->1194 and 2114<->2114, so the nmap test through network_connectivity will fail to show/prove anything. (and if any udp port can get to the 1194/2114 port on the other end, then it's not really as much of a firewall test as just a general connection test.)

so i guess right now i still have to stick with my abuse of traceroute and tcpdump to test that firewall rules have been properly entered. but my technique doesn't work if the system is already grid joined, since openvpn will be using 1194 and usually 2114 so i can't specify them as source ports.

DEvans · ‎10-22-2019

It would depend on how well the firewall rules are written. With the default being "any" port for the source port for the vast majority of applications, it is generally a one off to lock down the source port on a rule. Well written firewall rules this would not test, but I would question how many customers took the time to actually lock down the source port on the rule as they should(could) have with this VPN connectivity.

On a related tangent, having the same source and destination port is a very good way to test NAT code. I've found several vendors NAT's that fail in specific ways when you have multiple connections through their NAT that have the same source and destination ports. (A GM on one side to multiple nodes on the other side) Specifically any kind of clustering HA hand off of connections for NAT redundancy. Somewhere in some likely reused NAT code someone made some bad assumptions on how likely that was to occur.

bkoshy · ‎10-22-2019

True.

In a scenario where your appliance/member is already grid joined and you are perhaps trying to check its connectivity to a grid master candidate on the grid (with potential master promotion in mind), I would recommend just using a random source port for the test and 1194/2114 as destination ports (and vice versa).

Best Regards,

Bibin Thomas

leifeste · ‎10-22-2019

i guess i've just had the...fortune?...to work in shops where the firewall rules implemented for infoblox grid communication have always been locked down to one port on both the source and destination sides. i guess since the infoblox documentation lists that as the case, that's what we keep submitting. maybe i'll just have to make sure everyone on the ddi team starts relaxing the specifity of their firewall requests, so easier testing can ensue. : )

that's an interesting nat failure behaviour. i guess i can see why same source and destination port wouldn't be immediately thought of as a scenario, but you'd think it would have been noticed at some point and had to be dealt with. glad i haven't had to run across that one!

leifeste · ‎11-09-2019

just answered that. ping is fine, but doesn’t prove much and isn’t guaranteed to work anyway. if you are in an environment that is wide open and allows pings and doesn’t have firewall(s) between your new device and the gm, then ping is fine to prove basic network connectivity (at least for icmp echo packets).

JGennaro · ‎10-29-2020

When using this command obviously a response of "Open" is good. I have found a response of "Filtered" is bad. What does the response "Open | Filtered" mean?

leifeste · ‎10-29-2020

those are nmap responses. doing some google (or your search engine of choice) searches on nmap should give you a lot more detail about them. here's a bit from an nmap man page from the nmap.org site:

The state is either open, filtered, closed, or unfiltered. Open means that an application on the target machine is listening for connections/packets on that port. Filtered means that a firewall, filter, or other network obstacle is blocking the port so that Nmap cannot tell whether it is open or closed. Closed ports have no application listening on them, though they could open up at any time. Ports are classified as unfiltered when they are responsive to Nmap's probes, but Nmap cannot determine whether they are open or closed. Nmap reports the state combinations open|filtered and closed|filtered when it cannot determine which of the two states describe a port.

(from https://nmap.org/book/man.html)

and here is some info specifiically related to nmap for udp ports:

Probe Response Assigned State
Any UDP response from target port (unusual) open
No response received (even after retransmissions) open|filtered
ICMP port unreachable error (type 3, code 3) closed
Other ICMP unreachable errors (type 3, code 1, 2, 9, 10, or 13) filtered

The most curious element of this table may be the open|filtered state. It is a symptom of the biggest challenges with UDP scanning: open ports rarely respond to empty probes. Those ports for which Nmap has a protocol-specific payload are more likely to get a response and be marked open, but for the rest, the target TCP/IP stack simply passes the empty packet up to a listening application, which usually discards it immediately as invalid. If ports in all other states would respond, then open ports could all be deduced by elimination. Unfortunately, firewalls and filtering devices are also known to drop packets without responding. So when Nmap receives no response after several attempts, it cannot determine whether the port is open or filtered. When Nmap was released, filtering devices were rare enough that Nmap could (and did) simply assume that the port was open. The Internet is better guarded now, so Nmap changed in 2004 (version 3.70) to report non-responsive UDP ports as open|filtered instead.

(from https://nmap.org/book/scan-methods-udp-scan.html)

JGennaro · ‎10-30-2020

Thank you for your very detailed and thought ou response. Makes sense.

pparker2 · ‎08-20-2021

This feature was release in NIOS 8.5 with the following information provided in the Release Notes:

Testing the Grid Master Candidate Connection Before Promotion (RFE-1737)

You can now test the connection and also schedule a test connection of the Grid Master Candidate with the other Grid members before promoting it to Grid Master. You can do this either by using the GMC Promote Test option on the Grid Manager or by using the NIOS CLI.

The following new commands have been introduced to test the connection:
• show test_promote_master: Enables you to view the results of the test promotion of a Grid Master Candidate to Grid Master.
• set test_promote_master: Enables you to check whether the Grid Master Candidate is connected to the rest of the Grid members.

You need the new ADP ruleset version to use this feature. For information about the GMC Test option, see the “Managing a Grid” topic in the NIOS 8.5 online documentation. For information about the CLI commands, see the “show test_promote_master” and the “set test_promote_master” commands.

leifeste · ‎08-20-2021

i haven't seen 8.5 yet, i don't believe. can the grid master candidate test be used on any grid member? or just a gmc failover test? and for systems that are not yet joined to the grid? that's the problem i was trying to solve with my original commentary -- how can you test connectivity with a new grid member you are planning to join to the grid, before you actually have to try and do it live? how can you verify all firewalls have been open so the join will be successful during your change window. i'm not sure the gmc test would be usable for that?

millix · ‎09-27-2022

Please note that the GMC Promote Test up through at least 8.6.2 does not take into account MGMT ports configured for Grid communications. So it's mostly useless in common deployment scenarios that include appliances in DMZs, external, or hybrid cloud designs.

simdi · ‎02-06-2023

Thank you for the link (from https://nmap.org/book/scan-methods-udp-scan.html

Der Doppelstabmattenzaun erfreut sich sowohl in gewerblichen als auch in privaten Bereichen großer Popularität als Zaunsystem, das aus Drahtgittermatten besteht. Er besticht durch seine schlichte, dennoch zeitgemäße und praktische Gestaltung und gewährleistet zusätzlich Sicherheit. Die Monatge des Doppelstabmattenzaun von www.zaun7.de ist unkompliziert und bietet höchste Stabilität. Dieses erstklassige Zaunsystem kann äußerst vielfältig verwendet werden.

creaftomp · ‎02-06-2023

I hope for a new release NIOS, Infoblox can support "telnet command" and "Grid Master Connection Test" feature.

JDV · ‎12-12-2023

This thread has been very informative and useful. Given that the grid member is behind firewall, does it need bi-directional policy on both ends for UDP/1194 and UDP/2114 or only the grid master needs to initiate session to the grid member?

Thanks in advanced!

NIOS DNS DHCP IPAM

testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member

Re: testing infoblox grid connectivity by hand before joining new member