The operational trouble with the IPv6 address format
One of the interesting aspects of my job in advocating, teaching, and implementing IPv6 is I get to experience firsthand some of the struggles that people have with IPv6. Because of how long I have been working with it, often these subtle points don’t occur to me until I mention something in a presentation or technical training. These typically lead into a wide-ranging impromptu white boarding session about backing up my opinions on things or explaining what issues might actually come up that perhaps they had not considered.
One of those commonly confusing topics is the actual IPv6 address format. There are some interesting stories around some of the reasoning for how the address format was chosen, how it was implemented, and some of the challenges it might present to a company trying to implement a dual-stack network today. I won’t go into all the stories (there isn’t enough room to do that!) but we will go into some of the operator challenges you might have to address.
As a brief refresher, let’s review the IPv6 address format itself. An IPv6 address is a 128 bits in length and is broken into eight 16-bit sections (which I call a quibble or quad-nibble with a nibble being 4-bits, technically it is a Hexadectet or hextet for short). When written down, the IPv6 address uses colons “:” between each of these quibbles to make it easier to see. It is far easier to show an address so we will use the documentation prefix range to build out some examples. In this first example, we will look at a global unicast address.
The address above is an example of a fully expanded IPv6 address. It does not leverage removing leading zeros nor does it use zero compression. There is a hex character displayed for every nibble and the address is from the global unicast address range as defined by IANA http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xh.... Because we don’t have a continuous sequence of zeros in this example address, all we can implement from a simplified display point of view is to remove leading zeros. This would result in:
Not particularly any easier or shorter (we only removed two nibbles) but technically those two address formats represent the exact same address.
The other address you will have on every interface is a link-local address. All link-local addresses start with fe80:: (from the fe80::/10 prefix reserved for link-local). The rightmost 64-bits are either randomly generated (what is commonly called a privacy address), deterministically generated using a method called modified EUI-64, or manually assigned. A common fully expanded link-local IPv6 address would look like:
Because we have a sequence of zeros we will leverage zero compression to condense the writing of the address. You can see that we’ve used a double colon (: to represent a continuous sequence of zeros as a placeholder for the nibbles we are leaving out. We can only use zero compression once in a given address. There is a bit of debate about if you have multiple sequences of zeros in an address which one you choose to do zero compression on. Truth be told, this comes down to a bit of personal preference and RFC 5952 (https://tools.ietf.org/html/rfc5952). For networking teams who do some clever work with their addressing plans they may get some benefit of doing zero compression on their low-order prefix portion of the address. The recommendation is to do zero compression on the longest sequence of contiguous zeros but if you have equal sequences then to do the leftmost. In the example for link local, we almost always perform the zero compression on the prefix. So the link-local address in the example would end up as:
It is not uncommon for networking teams to decide they would like to assign a manual value for the link-local address on networking devices. For instance, they can leverage zero compression to simplify the link-local addresses on routers. If we wanted to embed a router ID (RID) and a VLAN ID in a link-local address we could end up with something like:
Router ID: 10
VLAN ID: 145
The manually configured link-local address options:
As you can see, the address is a lot shorter and provides us with some useful information (embedded in the address). It can also come in handy making the IPv6 routing table next-hop addresses more meaningful and easier to troubleshoot. At the same time but there is an interesting dilemma from the two example addresses given.
The first address fe80::a:91 is utilizing all hexadecimal representations of the decimal values of the router id and VLAN id. The second address fe80::10:145 is using decimal values in the last two quibbles. These are technically two different IPv6 addresses and while the second one is perhaps more readable for operators (which is a perfectly fine reason for using it) it technically isn’t representing what you think it is and certainly is not the same as the first address at all. This is where format and operator standards are important for companies. You shouldn’t have operators and designers using different methods of embedding information in IPv6 addresses without coordinating with each other. Consensus on all hexadecimal verse decimal, (where you mix the two), embedding IPv4 info, attempting to spell out names with hexadecimal characters, zero compression ordering and leading zero treatment should definitely be worked out in advance before you start implementing IPv6 throughout your . Remember, you should try and stick as close as practical to RFC 5952 to avoid issues with those that are following the standard. Have a really good reason for deviating and document why and realize you might have to work around it later.
Hopefully this makes sense. This consensus is especially important if you have teams doing some sort of automation or using a regular expression to evaluate and do something with the information embedded in that IPv6 address. Even something as simple as deciding to embed IPv4 addresses in an IPv6 address (which I mostly advise against) you still have to choose: do you represent that address in hexadecimal or decimal format?
Down the rabbit hole we go because now that we have worked out the simple stuff we get to tackle some more interesting format issues. If we switch gears and talk about how applications use IPv6 addresses and how we input them into common applications there are some interesting things that come up. Most of these issues are easier to talk about with examples so we will start with our old friend IPv4 first. For IPv4, we use a dot or period as a delimiter between octets. To describe the TCP or UDP port we would like to connect to, or run a service on, we use a different delimiter – and it happens to be a colon (. So in IPv6, how in the world do we differentiate between the address portion and the port portion of an address? The solution was to “wrap” the IPv6 address in square brackets and continue to use the colon as the port delimiter (RFC 6874). This makes for some very strange formatting.
Example ! Assuming we were running a special web service on port 8080 on a server that was dual-stacked with an IPv4 address of 10.10.10.1 and an IPv6 address of 2001:db8::a:a:a:1 we would using the following values in our browser to connect to that webserver:
IPv4 – http://10.10.10.1:8080
If we wanted to use a simple UNIX tool like curl to test our webserver our command syntax would look like:
curl -4 http://127.0.0.1:8080
curl -4 http://10.10.10.1:8080
Compare the above to this:
curl -6 http://[::1]:8080
curl -6 http://[2001:db8::a:a:a:1]:8080
Clearly, understanding and implementing these format differences for things like scripts is pretty important. If you miss enclosing an IPv6 address in square brackets there are many applications that will not understand what value is being passed to it and this will cause a failure.
With this in mind, for many application developers, the question is how should they store the IPv4 or IPv6 values and how to display them? It is possible to represent all IPv4 addresses in IPv6 and we have a range actually allocated for that purpose. This prefix is ::ffff:0:0/96 and the last two quibbles (32-bits) are where the IPv4 address in embedded This method allows developers to store all addresses, regardless of family type in a single method with 128 bits. Ideally, if you need to store an IPv6 address the safest way is to store it as a binary value that takes up 128 bits. Alternately, you could potentially use an integer. However, storing the address as a string presents a problem: do you store the leading zero/zero compression value or the uncompressed version. Do you use upper case or lower case for the a-f characters in hex?
The complexity of doing regular expression matching against IPv6 string values may be prone to lots of errors. Matching upper/lower case is relatively easy to solve with regex, simply use the /i (or, “ignore case”) flag. But try to do matching when using a shortened IPv6 address and things start getting interesting. How do we account for missing leading zeros and zero compression? More importantly, how are systems like logging servers and monitoring systems doing address matching? In a dual-stack situation, is the IPv4 address being logged and matched as an IPv6 address or is it stored as a 32-bit value or string using a separate matching expression? This is why agreeing and understanding how your address data is represented, stored and analyzed is pretty important, even for supposedly simple things like a networking protocol.
Believe it or not, there are additional things to consider around the format and display of IPv6. IPv4 is ubiquitous in our daily operations and we are very familiar with it. Until we build up the same skills and capabilities around IPv6 you might want to plan, coordinate and spend a bit more time thinking in detail around the address and its impacts on your operations.
You can find me on twitter as @ehorley and remember…
IPv6 is the future and the future is now!