IP address without errors. Learning how to work with IP addresses
Any application that somehow works with the network must validate the correctness of IP addresses. This is harder than it might seem. It is easy to go to extremes here: if the validation is too strict, the user will not be able to enter the correct data, and if it is insufficient, they will be left with low - level error messages (if they are transmitted at all). In this article, we will analyze a number of difficulties that arise when validating addresses, and then look at ready-made libraries that help with this.
ADDRESS VALIDATION
Errors in addresses can occur in three ways:
- typos;
- misunderstanding;
- deliberate attempts to break the app.
Address validation alone won't help you stop attempts to break the app. It can make such attempts more difficult, but it is not a substitute for full-fledged authorization verification and error handling at all stages of the program, so improving security should be considered more as a useful side effect. The main goal is to make life easier for users who accidentally entered the wrong address or misunderstood what is required of them.
Checks can be divided into formal and substantive checks. The purpose of a formal check is to make sure that the string entered by the user can be a valid address at all. Many programs limit themselves to just that. We'll go further and see how you can check that the address is not only correct, but also suitable for a specific purpose, but more on that later.
Form checks
Checking the correctness of the format only in appearance may seem like a task for a simple regular expression - in fact, everything is not so simple.
In IPv4, difficulties start with a standard for this format - there is no such standard. The dot-decimal (0.0.0.0–255.255.255.255) format is generally accepted, but not standard. The IPv4 standard makes no mention of the address record format at all. No other RFC says anything about the IPv4 address format either, so the generally accepted format is nothing more than a Convention.
And this is not even the only agreement. This function inet_aton()allows you not to write zero digits at the end of an address, for example 192.0.2 = 192.0.2.0. In addition, it allows you to enter the address as a single integer, 511 = 0.0.1.255.
INFO
Can a host address end in zero? Of course, it can — in any network larger than /23, there will be at least one such one. For example, 192.168.0.0/23 contains host addresses 192.168.0.1–192.168.1.254, включая 192.168.1.0.
If we limit ourselves to supporting only the full dot-decimal of four groups, without the ability to omit zero digits, then the expression (\d+)\.(\d+)\.(\d+)\.(\d+) it can catch a significant part of typos. If you set a goal, you can create an expression for any valid address, although it will be quite cumbersome. It is better to take advantage of the fact that it is easy to divide it into groups, and explicitly check that each of them falls in the range 0-255:
Code:
def check_ipv4(s):
groups = s.split('.')
if len(groups) != 4:
for g in groups:
num = int(g)
if (num > 255) or (num < 0):
raise ValueError("Invalid octet value")
With IPv6, everything is both simpler and more complex. It is easier because the authors of IPv6 took into account the experience of IPv4 and added the format for recording addresses in RFC 4291. You can safely say that any alternative formats are against the standard and ignore them. On the other hand, the formats themselves are more complex. The main difficulty is the shortened notation: groups of zero digits can be replaced with a character, for example 2001:db8:1 Instead of 2001:db8:0:0:0:0:0:1. For the user, this is certainly convenient, but for the developer, it's exactly the opposite: it's impossible to divide the address into groups by colon, and you need noticeably more complex logic. In addition, the standard prohibits the use of :: more than once in a single address, which further complicates the task.
So if your application supports IPv6, you need a full-fledged parser to validate addresses. There is no point in writing it yourself, since there are ready-made libraries that provide other useful functions.
Substantive checks
Since we're already starting to connect the library and parse addresses, let's see what additional checks we can perform to filter out the following options:identify erroneous values and make error messages more informative.
The necessary checks will depend on how the address is used. For example, if the user wanted to enter the value in the DNS server address field 124.1.2.3 but a typo turned it into 224.1.2.3. Checking the format won't catch this typo - the format is correct. However, this address can't possibly be the DNS server address, because the network 224.0.0.0/4 reserved for multicast routing, which DNS never uses.
If you want to drop outfind all addresses that cannot be host addresses on the public Internet, and an almost complete list of reserved networks can be found in RFC 5735 (Special use IPv4 addresses). It is "almost complete" because it does not include the network 100.64.0.0/10 allocated for CG-NAT (RFC 6598). A very complete list of all reserved IPv4 and IPv6 ranges can be found in RFC 6890 however, it is not so conveniently organized.
At the same time, you need to pay attention to subnet masks. Some believe that the network is for private use — 172.16.0.0/16 (172.16.0.0–172.16.255.255). Reading RFC5735 will easily dispel this myth: in fact, IT is noticeably larger, 172.16.0.0/12 (172.16.0.1–172.31.255.254). A real example of this error is in GoatCounter - the statistics collection script mistakenly counted sessions from inside the local network.
You should also keep in mind that" reserved for future use " networks may no longer be reserved. RFC 5735 networks are reserved forever and are safe in this sense. But the authors of the once popular Hamachi virtual network among gamers once believed that the network 5.0.0.0/8 you can use it for your own needs, because it was reserved for the futureuntil the future arrives and the IANA allocates this RIPE network.
LIBRARIES
netaddr
The Python 3 standard library already has a module ipaddress, but if you can install a third-party library, netaddr can greatly simplify your life. For example, it has built-in functions for checking whether an address belongs to reserved ranges.
Code:
>>> import netaddr
>>> def is_public_ip(s):
... ip = netaddr.IPAddress(s)
... return (ip.is_unicast() and not ip.is_private() and not ip.is_reserved())
...
>>> is_public_ip('192.0.2.1') # Reserved for documentation
False
>>> is_public_ip('172.16.1.2') # Reserved for private networks
False
>>> is_public_ip('224.0.0.5') # Multicast
False
>>> is_public_ip('8.8.8.8')
True
Even if these functions didn't exist, we could easily implement them ourselves. The library uses magic methods very well to make the interface as user-friendly as the built-in Python objects. For example , an operator can check whether an address belongs to a network or rangein, so it's no more difficult to work with them than with lists or dictionaries.
Code:
def is_public_ip(s):
loopback_net = netaddr.IPNetwork('127.0.0.0/8')
multicast_net = netaddr.IPNetwork('224.0.0.0/4')
...
ip = netaddr.IPAddress(s)
if ip in multicast_net:
raise ValueError("Multicast address found")
elif ip in loopback_net:
raise ValueError("Loopback address found")
...
libcidr
Even for pure C, you can find a library with a user-friendly interface, such as libcidr Matthew Fuller. In Debian, it can be installed from repositories. For example, let's write a check for whether an address belongs to the multicast network and put it in a file is_multicast.c.
Code:
#include <stdio.h>
#include <libcidr.h>
void main(int argc, char** argv) {
const char* ipv4_multicast_net = "224.0.0.0/4";
CIDR* ip = cidr_from_str(argv[1]);
CIDR* multicast_net = cidr_from_str(ipv4_multicast_net);
if( cidr_contains(multicast_net, ip) == 0 ) {
printf("The argument is an IPv4 multicast address\n");
} else {
printf("The argument is not an IPv4 multicast address\n");
}
}
$ sudo aptitude install libcidr-dev
$ gcc -o is_multicast -lcidr ./is_multicast.c
$ ./is_multicast 8.8.8.8
The argument is not an IPv4 multicast address
Code:
$ ./is_multicast 239.1.2.3
The argument is an IPv4 multicast address
CONCLUSION
Validating addresses and sending informative messages about erroneous settings seems to be an insignificant part of the interface, but attention to detail is a sign of professionalism, especially since ready - made libraries are significantly simplified.they complete this task.