How to work with IP addresses correctly?

Hacker · Oct 17, 2021

This article was written for educational purposes only. We do not call anyone to anything, only for information purposes! The author is not responsible for your actions

IP-address (from the English Internet Protocol) is a unique numeric identifier of a device in a computer network, operating over the TCP / IP protocol.

Globally unique addresses are required on the Internet; in the case of working in a local network, the uniqueness of the address within the network is required. In the IPv4 version, the IP address is 4 bytes, and in the IPv6 version it is 16 bytes. Every application that works with the network in one way or another must verify that the IP addresses are correct. This is harder than it sounds. It is easy to go to extremes here: with overly strict validation, the user will not be able to enter correct data, with insufficient validation, only with low-level error messages (if they will be transmitted at all). In this article, we will analyze a number of difficulties that arise when validating addresses, then consider ready-made libraries that help in this.

Validating addresses
Errors in addresses can appear in three ways:

typos;
misunderstanding;
deliberate attempts to break the application.

Verifying the address alone will not help against hacking attempts. This can make such attempts difficult, but it does not replace full authorization checking and error handling at all stages of the program, so the increased security should be seen rather as a useful side effect. The main goal is to make life easier for users who accidentally entered an incorrect address or misunderstood what is required of them.

Checks can be conditionally divided into checks in form and content. The purpose of formal validation is to ensure that the string entered by the user can be a valid address. Many programs are limited to this. We'll go ahead and see how to verify that the address is not only correct, but also suitable for a specific purpose, but more on that later.

Form checks
Format validation may seem like a task for a simple regular expression only superficially - in fact, it's not that simple.

In IPv4, the complexity begins with a standard for this format - no such standard exists. Dot-decimal format (0.0.0.0–255.255.255.255) is normal, but not standard. The IPv4 standard does not contain any mention of the format for recording addresses at all. No other RFC says anything about the format of IPv4 addresses, so the format adopted is nothing more than a convention.

And this is not even the only deal. The inet_aton () function allows you not to write zero bits at the end of the address, for example 192.0.2 = 192.0.2.0. In addition, it allows you to enter an address with a single integer 511 = 0.0.1.255.

Can the host address be null-terminated? Of course it can - in any network greater than / 23 there will be at least one such. For example, 192.168.0.0/23 contains host addresses 192.168.0.1–192.168.1.254, including 192.168.1.0.

If we restrict ourselves to processing only the full decimal point from four groups, without the possibility of skipping zero digits, then the expression (\ d +) \. (\ D +) \. (\ D +) \. (\ D +) can detect a large number of typos. If you have a goal, you can write an expression for any valid address, although it will be quite difficult. Better to take advantage of the fact that it is easy to divide it into groups and clearly check if each of them is in the range 0-255:

Code:

def check_ipv4(s):
groups = s.split('.')
if len(groups) != 4:
for g in groups:
num = int(g)
if (num > 255) or (num < 0):
raise ValueError("Invalid octet value")

With IPv6, everything is both simpler and more complex at the same time. Easier because the IPv6 authors took the IPv4 experience into account and added the address notation format to RFC 4291. It's safe to say that alternative formats are against the standard and will be ignored. On the other hand, the formats themselves are more complex. The main difficulty lies in the abbreviation: groups of zero bits can be replaced by the symbol ::, for example B. 2001: db8 :: 1 instead of 2001: db8: 0: 0: 0: 0: 0: 1. For the user, of course, this is useful, but for a developer, the opposite is true: impossible. Much more complex logic is required to obtain an address by splitting colons into groups. In addition, the standard prohibits the use of :: more than once in an address, which further complicates the task.

So, if the application supports IPv6, a full-fledged analyzer is needed to validate the addresses. It makes no sense to write on your own, since there are several ready-made libraries that provide other useful functions.

Essential checks
If we have already started connecting the library and parsing addresses, let's see what additional checks we can perform to filter out invalid values and make error messages more informative.

The necessary checks depend on how the address is used. For example, suppose a user tried to enter 124.1.2.3 in the DNS server address field, but a typo made it 224.1.2.3. The format checker does not recognize this typo - the format is correct. However, this address can in no way be the address of the DNS server, since the 224.0.0.0/4 network is reserved for multicast routing, which DNS never uses.

If you want to filter out all addresses that cannot be hosts on the public Internet, see RFC 5735 (Special Use of IPv4 Addresses) for an almost complete list of reserved networks. It is "nearly complete" because it does not include the CG-NAT 100.64.0.0/10 (RFC 6598 ) network. A complete list of all reserved IPv4 and IPv6 ranges can be found in RFC 6890, but it is not so conveniently organized.

In this case, you need to pay attention to the subnet masks. Some people assume that the private network is 172.16.0.0/16 (172.16.0.0–172.16.255.255). Reading RFC5735 will easily dispel this myth: it is actually much larger, 172.16.0.0/12 (172.16.0.1–172.31.255.254). A real example of this error in GoatCounter is a statistics collection script that was incorrectly counting visits from the local network.

Note that networks "reserved for future use" can no longer be reserved. RFC 5735 networks are reserved forever and in this sense are secure. But the authors of the once popular Hamachi virtual network among gamers believed that the 5.0.0.0/8 network could be used for their needs, since it was reserved for future use - until the future came and the IANA did not transfer this network to RIPE ...

Libraries
netaddr
The Python 3 standard library already has the ipaddress module, but if it is possible to provide a third-party library, netaddr can make life a lot easier. For example, it has built-in functions to check if an address belongs to a reserved range.

Code:

>>> import netaddr
>>> def is_public_ip(s):
... ip = netaddr.IPAddress(s)
... return (ip.is_unicast() and not ip.is_private() and not ip.is_reserved())
...
>>> is_public_ip('192.0.2.1') # Reserved for documentation
False
>>> is_public_ip('172.16.1.2') # Reserved for private networks
False
>>> is_public_ip('224.0.0.5') # Multicast
False
>>> is_public_ip('8.8.8.8')
True

Even if these functions were not there, we could easily implement them ourselves. The library uses magic methods very cleverly to make the interface as easy to use as Python's built-in objects. For example, you can check if an address belongs to a network or a range using the in operator, so working with them is no more difficult than with lists or dictionaries.

Code:

def is_public_ip(s):
loopback_net = netaddr.IPNetwork('127.0.0.0/8')
multicast_net = netaddr.IPNetwork('224.0.0.0/4')
...
ip = netaddr.IPAddress(s)
if ip in multicast_net:
raise ValueError("Multicast address found")
elif ip in loopback_net:
raise ValueError("Loopback address found")
...

libcidr
Even for pure C, a handy library such as Matthew Fuller's libcidr can be found. On Debian it can be installed from the repositories. As an example, let's write a check for an address belonging to a multicast network and put it in the is_multicast.c file.

Code:

#include <stdio.h>
#include <libcidr.h>
void main(int argc, char** argv) {
const char* ipv4_multicast_net = "224.0.0.0/4";
CIDR* ip = cidr_from_str(argv[1]);
CIDR* multicast_net = cidr_from_str(ipv4_multicast_net);
if( cidr_contains(multicast_net, ip) == 0 ) {
printf("The argument is an IPv4 multicast address\n");
} else {
printf("The argument is not an IPv4 multicast address\n");
}
}
$ sudo aptitude install libcidr-dev
$ gcc -o is_multicast -lcidr ./is_multicast.c
$ ./is_multicast 8.8.8.8
The argument is not an IPv4 multicast address
$ ./is_multicast 239.1.2.3
The argument is an IPv4 multicast address

Conclusion
Checking addresses and issuing informational messages about incorrect settings seems to be an insignificant part of the interface, but attention to detail is a sign of professionalism, especially since ready-made libraries make this task much easier.

How to work with IP addresses correctly?

Hacker

Professional

Similar threads