10-09-2019 03:17 PM
a situation i used to always run into in the past was: i needed to set up a new infoblox device or ha pair, and the grid master and grid master candidates were in different datacenters/locations and there were one or more firewalls between them. i would submit firewall rules, then hope they were implemented correctly, waiting for the stated time to try my grid join and hope it worked. then, if it didn't, trying to figure out for sure if it was a firewall issue or something else.
i always wished infoblox had an option somewhere to "test" grid communication in some way, just to verify the lines of communication were open without doing the actual join at that point. but they didn't. (and don't, that i'm aware of.)
at some point, i became aware of the expertmode option, and access to the new/different command line tools that mode provides. using that, i was able to figure out a way to test/verify connectivity to the grid master.
caveat: the device must not be already joined to a grid. if the new device is already joined to something, then the openvpn udp ports (at least 1194) will already be in use so this trick doesn't work.
so here's what i do now to verify the firewalls are opened properly, well in advance of my implementation date, so i can follow up with the firewall teams to get things fixed before the join date and time arrives...
run a tcpdump on the grid master looking for the ip of the new device, e.g.
Infoblox> set expertmode on
<expert mode disclaimer>
Expert Mode > tcpdump -i eth2 (udp port 2114 || udp port 1194) && src <lan1-ip.of.new.device>
(which interface you have to listen on depends on the infoblox device, but in general eth0 is mgmt, eth1 is lan1, eth2 is ha - in my experience. ha pairs do openvpn tunnels between each other on their lan1 ips, while devices talk to the grid master from their lan1 ip to the grid master's vip. if a grid master is an ha pair, the passive member talks to the active member from its lan1 ip to the active member's vip (instead of lan1 to lan1 like grid member ha pairs do with each other. so "eth2" above is the grid master's ha interface, and since you should be on the active member, the vip is riding on this interface. technically, vips are on eth2 or eth2 is empty on the passive member, and the ha ip for each member is on eth2:1. i know i used to have some devices that didn't ).
leave this running on the grid master, then on the new device run a couple of traceroutes.
(i originally was going to use ping, but the options on ping didn't allow me to set what i needed. i forget what it was though, and i don't feel like looking it right now.)
first specify the source and destination ports as 1194/udp, then on a second traceroute specify 2114/udp. e.g. for 1194:
Expert Mode > traceroute -i eth1 -U --port=1194 --sport=1194 <vip.of.grid.master>
(so this is going to send out of eth1, the grid member's lan1 interface, to the vip of the grid master. the -U is for udp, --port is destination port, --sport is source port.)
if there is nothing blocking the communication, then once the traceroute packets get through the other hops in the path to the grid master you'll see some packets show up in the tcpdump screen on the grid master. something like...
19:20:14.125494 IP <ip.of.new.device>.1194 > <ip.of.grid.master>:1194: UDP, length 32
19:20:19.130534 IP <ip.of.new.device>.1194 > <ip.of.grid.master>:1194: UDP, length 32
if not, then the screen will stay blank and something (like a firewall) isn't set up right and is blocking connectivity.
repeat for 2114.
anyone please feel free to correct, expand, question, etc.
i just thought i'd mention this as an option, since i've found it helpful myself in making sure everything is ready to go before the actual night of a grid join and finding out the firewalls weren't actually opened correctly.
as a bonus: if you are doing the join and it says it's successful, then it never finishes joining (syncing, usually)...you can go into expert mode and look at a tcpdump for 2114 and 1194. if you see pretty consistently spaced 1194 packets of the same smaller size (a properly working openvpn tunnel will have a lot of 1194 traffic, of varying sizes, not just packets of the same size with a fairly consistent time gap between them), then the openvpn tunnel is having some problems. in my experience, going into the grid member settings and lowering the vpn tunnel mtu from 1450 down to 1000 or something has a decent chance of fixing this.
10-10-2019 01:16 AM
If I want join a New Member to Grid Master than I will try to ping Grid Master Address.
You can try it for first step.
After that, you can show at Grid Master Log or Firewall Log.
Sometimes, PING is oke but VPN Connectivity (UDP 1194 or 2114) is not running well.
I hope for a new release NIOS, Infoblox can support "telnet command" and "Grid Master Connection Test" feature.
10-11-2019 02:01 AM
I would like to add the following option for a connectivity test:
1. enter maintenance mode (set maintenancemode)
2. use the following command to test connectivity towards in this example the grid master:
Maintenance Mode > show network_connectivity proto udp x.x.x.x 1194
Starting Nmap 7.31 ( https://nmap.org ) at 2019-10-11 08:58 UTC
Nmap scan report for x.x.x.x
Host is up (0.00033s latency).
PORT STATE SERVICE
1194/udp open openvpn
MAC Address: xx:xx:xx:xx:xx:xx (VMware)
Nmap done: 1 IP address (1 host up) scanned in 7.09 seconds
You can just run "show network_connectivity" in maintenance mode to check the correct syntax. Thank you.
Escalations Engineer EMEA
10-15-2019 11:41 AM
whaaa... infoblox has nmap hidden in the bowels of maintenance mode? how is it that there is all this cool stuff secreted away?
do i need to be a part of some secret society to know about it? i mean, if i have to wear a signet ring or know some secret handshake or something, i could probably be okay with that. having to get an infoblox tattoo or brand might be a deal breaker for me.
thanks for the tip, jelle!
(and i'll just go into maintenance mode and start poking around with all the commands instead of getting an infoblox tattoo.)
10-17-2019 08:23 AM
I wonder how long that command has been around. I've taken the advanced admin that gives bunch of the maintenance mode commands in the book and it wasn't included in 2008 or 2013 versions of the class materials. But my RFE-1737 for this this type of feature to test GMC's avaibilty(firewall rules) without actually failing to it was still open when I checked several years go. So hopefully its new-ish.
I'd messed with attempting to do this with tcpdump and dig generating traffic on the VPN ports at one time and never finished the script. I think this will be much easier to script a GMC firewall rule validatation without doing a fail over. It's nice to see it finally available \ made public.
10-22-2019 10:06 AM
i've been messing with expertmode for some time, but i'd never poked around in maintenancemode.
while there are some interesting things in there, i've lost some of my excitement about my use of show network_connectivity. while it runs nmap, it's a wrapper, and it severely limits some of nmap's functionality that would provide what i need.
like i would assume in most environments, firewall rules where i've worked are punched very specifically, port-to-port. so if i can't specify both a source and destination port, it's not going to show connectivity - because the firewall will block random udp ports to the port specified with the network_connectivity wrapper. (by default, nmap will use a random udp port as source port.) so unless network_connectivity configures source port to match destination port, the nmap run will use a random udp source port. and the firewall will *only* allow 1194<->1194 and 2114<->2114, so the nmap test through network_connectivity will fail to show/prove anything. (and if any udp port can get to the 1194/2114 port on the other end, then it's not really as much of a firewall test as just a general connection test.)
so i guess right now i still have to stick with my abuse of traceroute and tcpdump to test that firewall rules have been properly entered. but my technique doesn't work if the system is already grid joined, since openvpn will be using 1194 and usually 2114 so i can't specify them as source ports.
10-22-2019 10:26 AM
It would depend on how well the firewall rules are written. With the default being "any" port for the source port for the vast majority of applications, it is generally a one off to lock down the source port on a rule. Well written firewall rules this would not test, but I would question how many customers took the time to actually lock down the source port on the rule as they should(could) have with this VPN connectivity.
On a related tangent, having the same source and destination port is a very good way to test NAT code. I've found several vendors NAT's that fail in specific ways when you have multiple connections through their NAT that have the same source and destination ports. (A GM on one side to multiple nodes on the other side) Specifically any kind of clustering HA hand off of connections for NAT redundancy. Somewhere in some likely reused NAT code someone made some bad assumptions on how likely that was to occur.
10-22-2019 10:30 AM
In a scenario where your appliance/member is already grid joined and you are perhaps trying to check its connectivity to a grid master candidate on the grid (with potential master promotion in mind), I would recommend just using a random source port for the test and 1194/2114 as destination ports (and vice versa).
10-22-2019 11:08 AM
i guess i've just had the...fortune?...to work in shops where the firewall rules implemented for infoblox grid communication have always been locked down to one port on both the source and destination sides. i guess since the infoblox documentation lists that as the case, that's what we keep submitting. maybe i'll just have to make sure everyone on the ddi team starts relaxing the specifity of their firewall requests, so easier testing can ensue. : )
that's an interesting nat failure behaviour. i guess i can see why same source and destination port wouldn't be immediately thought of as a scenario, but you'd think it would have been noticed at some point and had to be dealt with. glad i haven't had to run across that one!
11-01-2019 08:33 AM - edited 11-01-2019 08:39 AM
11-01-2019 09:50 AM
ping is a fine basic step to *possibly* verify that there is any level of connectivity at all. unfortunately, if you work in an environment where there is/are firewall(s) blocking things, even if a ping works (which it might not, because the firewall(s) might not allow icmp echo), that is no guarantee the firewall(s) has/have been properly set up to allow traffic on udp/1194 bi-directional and udp/2114 bi-directional (the ports used by openvpn for grid communication). and if ping doesn't work, the firewall(s) still might have been set up correctly for openvpn traffic for grid communication and the join will work. the *only* way to verify for sure that a grid member will be able to communicate and join is if you can actually send udp traffic on new member udp/1194 to gm udp/1194 and new member udp/2114 to gm udp/2114. (this doesn't even prove the gm or another gmc will be able to contact the new member though if you attempt a gm promotion, just that the new member will be able to contact the gm and join the grid. you can use the trick i described above to test 1194/2114 from the new member to all gm and gmc's before you join it though, so at least you know the new member can contact any possible gm.) believe me, this scenario has happened to me numerous times. and infoblox provides no way to test this. (it would be cool if they did).
11-09-2019 04:41 AM - edited 11-17-2019 05:18 AM
11-09-2019 08:22 AM