Biometrics in payments - basic technology

Tomcat

Professional
Messages
2,656
Reputation
10
Reaction score
647
Points
113
Today we constantly hear in the news “a project on facial recognition in transport has been launched”, “facial recognition has been implemented in the NN cafe for payment”, “the ZZ company has organized access to the office by face” and other “bright” headlines. Additionally, many people are accustomed to using FaceID on their smartphones. But, as usual, what is known is not therefore known. Let's start with the basics to understand the subject.

Authentication Factors​

Authentication is confirmation that “I am I.” All known methods of subject authentication (the subject is the one who acts) belong to three classes of factors:

Factor 1. What a person knows. This refers to the possession of unique or secret information: password, answer to a question, date of birth, passport number, etc.

Factor 2. What a person owns. We are talking about the possession of any item, for example, a document, a car key, a plastic card for access, a key fob, etc.

Factor 3. What a person is. This refers to the inherent natural characteristics of a person: face, fingerprint, signature (handwriting) and many others.

We will talk about the last, third authentication factor as applied to solving practical problems.

What is biometrics​

When they say “biometrics,” they mean technologies that allow us to measure, classify and compare the natural physical characteristics inherent in each person. Ultimately, all this is done to recognize a person among many other people (identification) or further confirm that the subject is who he said he is (authentication). In short, biometric technologies allow you to determine and confirm a person’s identity based on a specific set of data.

What data can be used to achieve this goal? I will list the main requirements:
  • Universality: data can be obtained or removed from any person, regardless of age, height, gender, etc.
  • Uniqueness: The resulting data set must be unique for each individual, i.e. it is necessary to minimize the likelihood of finding two different people with the same or similar data

Here are a few examples of characteristics, or biometric modalities, that are common to all people, but at the same time are unique to each person:
  • Fingerprint
  • Face
  • Voice
  • Palm vein pattern
  • Iris of the eye
  • Handwriting
  • Gait

The list goes on. At the same time, in practical tasks, for example, for identifying a person when making a payment or when going through passport control, additional requirements for biometric technologies are important:
  • Speed and ease of collection and processing: data must be collected and processed quickly enough, the entire process should take a second. If the process takes more than a second, this makes people incredibly irritated because people are already accustomed to the speed of contactless payments.
  • Cost of sensors for collecting biometric data and systems for processing them
The characteristics of the first three biometric modalities from the list above can be taken and processed quite quickly and relatively simply. For example, to take a picture of veins, you need a special device, which today cannot be installed on every inexpensive smartphone. Despite the fact that this device (a regular camera that sees a little wider than the visible range) is very simple, it is not customary to install it on devices.

In addition, it should be noted that for reliable operation of algorithms for recognizing any human feature, it is necessary that this feature be little susceptible to changes both over time and in the presence of various interferences. For example, if a finger gets stained with paint, it can no longer unlock the device using the fingerprint sensor; if the voice is hoarse, i.e. changed its timbre during a cold, recognition of a person by voice becomes temporarily unavailable.

Some human characteristics are extremely little subject to change, for example, the pattern of the iris of the eye practically does not change during a person’s lifetime. A person's face changes over time, but this process is quite slow and, as a rule, continuous. In addition, these two signs (iris and face) are little susceptible to random changes, for example, injuries, scratches, etc. Thus, one more requirement for biometric characteristics can be added:
  • Durability: data must be consistent over time or change very slowly
Finally, biometric technologies can be divided into contact and non-contact, based on the principle of data collection. For example, when taking a fingerprint, contact is assumed between a part of the human body and the surface of the device, which is not always convenient. In contrast, direct contact is not required to record a voice or take a picture (photo) of a person. Therefore, another requirement:
  • User-friendliness: Data should be collected in a simple and user-friendly manner (preferably contactless) and not require
user effort to provide his biometric data

Thus, we can distinguish six main requirements for biometric characteristics: universality, uniqueness, cost, ease of collection and processing, stability, convenience.

I will focus on technologies that make it possible to recognize a person by face, i.e. facial biometrics, or facial recognition, since this biometric modality sufficiently satisfies all the requirements we have put forward.

Identification and authentication based on biometric characteristics​

It is important to understand the difference between two concepts: identification and authentication.

Identification is understood as the selection or finding of one object among many similar ones. A person faces this problem regularly, for example, seeing a familiar face of an actor on the screen, a person immediately identifies him, that is, recognizes him, finds him in his memory, among many other familiar faces. Having recognized an actor by his face, we immediately remember, for example, a list of films with this actor, age, name, etc. In other words:
  • The task of identification is to find an object based on a certain characteristic among many that are similar to each other. This task is also called a 1:N (one-to-many) comparison, where N is the total number of objects being searched.
    Authentication is understood as confirmation of the fact that the obtained characteristic of an object is similar to the characteristic recorded earlier. That is, getting an answer to the question “is X really in front of us?” A similar task also often confronts a person: for example, having seen the caller’s name “Y” on the phone screen, we assume in advance that we will talk with Y, but only after hearing a familiar voice, that is, having compared what we heard with what we have in our memory, we make sure that we are really talking to Y. In other words:
  • The task of authentication is to confirm that an object is sufficiently similar to its previously recorded image. The problem is also called a 1:1 (one to one) comparison.
    It should be noted that complete similarity cannot be achieved with biometric authentication, since each person is both equal and unequal to himself. Having smiled once for a photograph, it is then impossible to repeat exactly the same smile: the smallest details of the face will still be different. However, this circumstance does not prevent us from successfully performing reliable facial authentication: by setting a sufficiently high similarity threshold for making a decision, very high accuracy can be achieved.

Solving the problem of facial image recognition and classification​

Modern methods of facial recognition are based on computer vision technologies, which allow a computer program to independently recognize and classify certain details of an image without human assistance. For example, it is possible to identify a human face among many details in an image (recognition) and compare it with existing samples of faces to establish possible similarities (classification).

The development of individual algorithms for recognizing objects in images began in the 1960s, but it was only towards the end of the 1980s that it became possible to use significantly increased computing power to test the ideas underlying the algorithms in practice. In the 21st century, namely since the mid-2010s, the ubiquity of smartphones equipped with cameras has made possible a dramatic improvement in the quality of technology due to the huge amount of input data for algorithms (photos of faces) and devices capable of producing the results of their work. According to Gemalto research, since 2013, 500 smartphone models have been released that support recognition of at least one biometric modality, and the total number of payments made from mobile devices reached almost 2 billion in 2017 [1], i.e. we can talk about the widespread use of devices that support biometric authentication/identification for payments.

The entire process of applying machine vision technologies to the problem of identifying a person by face can be divided into three stages:
  1. Finding a face in an image (photo)
  2. Extracting facial features and characteristics to obtain a biometric sample
  3. Comparison of a sample with those stored in the database, or search in the database of samples
Let us consider the basic principles of the algorithms underlying each stage of solving the problem.

Finding a face in an image​

I will give a classic algorithm that underlies many modern approaches to solving the problem of finding faces in images; an algorithm that well describes the very principle of searching for objects in photographs is the Viola-Jones method, published in 2001 [2].

The algorithm has the following advantages:
  • Robust: high number of true positives and very few false positives
  • Operation speed
  • The ability to use the algorithm only to solve the problem of finding a face in a photograph, without user identification/authentication
The algorithm begins by searching for certain features in the image: Fig. 1 shows 4 different types of features. The value assigned to each feature is calculated as follows: the sum of the values of all pixels in the light area is taken and subtracted from the sum of the values of all pixels in the shaded area. The size of a pixel can, for example, be the numerical value (number) of the color recorded in it. Thus, the value assigned to feature A can be the difference between the illumination (brightness) of the right and left parts of the rectangle.

Example of rectangular feature templates

Example of rectangular feature templates

How can such a principle be used to find faces in an image? All human faces have more or less similar properties, or features. A search using a similar feature allows you to find these features in photographs:
  • The nose area is always lighter than the eye area
  • The eye area is always slightly darker than the top of the cheeks
nose-like pattern

nose-like pattern

pattern-sign similar to the eye area

pattern-sign similar to the eye area

Using other rectangular feature templates, as well as feature templates of different sizes, you can create a map of the locations and sizes of facial features: eyes, mouth, nose, facial boundaries.

Let's give an example of how the algorithm works: suppose we are searching for four features from Fig. 1 from an image measuring 100 by 100 pixels. The image is divided into 25 segments, 20 by 20 pixels in size, and the search for features is carried out using templates of the same size. We one by one apply each of the templates to a section of the image, look at the resulting value, and then write the result into a table:

A, 40​
B, 51​
B, 29​
B, 44​
B, 23​
B, -41​
A, -4​
B, 322​
B, 40​
B, -15​
B, -5​
A, 311​
C, -283​
A, -309​
B, -4​
B, -77​
C, 5​
V, -276​
B, 103​
C, 87​
C, 49​
B, 38​
B, 40​
B, 50​
C, 3​

On this table map, the cells contain the type of pattern that received the largest absolute value in a given segment, as well as this value. It can be seen that in the central part of the map there are strongly pronounced features that correspond to certain patterns-features, for example, it is very likely that there is a nose in the central part of the image.

This is the basic principle of mapping facial features in an image. It can be noted that, firstly, it does not take into account that faces can be tilted both in whole and in part, secondly, more or, conversely, less detailed partitioning of the original image may be required, thirdly, partitioning elements may intersect, etc. All these comments are taken into account in modern approaches, for example, a larger number of templates are used, more “strong” feature classifiers are compiled, as linear combinations of “simple, weak”, etc.

Obtaining a biometric sample from a photograph​

After the face in the photograph has been found and selected, we can talk about converting the facial images into a biometric sample, i.e. into a form in which it can be compared with others according to formal criteria, in an automatic (machine) way. It is obvious, for example, that a simple pixel-by-pixel comparison of two images is unlikely to be successful - in addition to differences in the smallest details of the face, there will certainly be differences in shooting conditions. Therefore, it is necessary to identify a certain set of facial features and compare faces based on these features.

I will cite an algorithm published in 1991 [3], which clearly demonstrates the basic principle of creating a biometric template: “ eigenfaces” (by analogy with eigenvectors).

First, a set of images is formed that are used for training, the so-called training set. The images must be taken under identical conditions: the same lighting, the same head position, etc. They should also be the same size, for example 100 by 100 pixels.

Then all images are rewritten in vector form, that is, a column vector is created in which the values taken from the pixels (color numbers) are recorded; thus the vectors will have 10,000 components each. A matrix (table) T of column vectors is formed. After this, it is necessary to calculate the “average” image and subtract it from each of the column vectors; we can say that in this way we select everything “common” from each image and discard it, leaving only the distinctive features.

Next, a covariance matrix S is compiled (informally speaking, displaying the dependence of the change in the vector component on other components) and eigenvalues and eigenvectors are calculated for it. In fact, this step is the most computationally expensive, but it can be simplified by finding the eigenvectors of S without explicitly calculating the matrix S itself .

Each eigenvector obtained in our example will have 10,000 components, i.e. itself can be “deciphered” as an image. These images form the basis of “own faces” . That is, each image from the training set, and in general each image that satisfies the original format, can now be written in the form:

Face N = 0.0007 Face 1 + 0.0002 Face 2 + 0.0005 Face 3 + ...

It is clear that 10,000 eigenvalues are too many, and storing and using them is no different from storing 10,000 pixels of the original image. Therefore, the main eigenvalues and the corresponding “ eigenfaces” are selected . The choice is made by sorting the eigenvalues by magnitude and setting an arbitrary threshold value t for their total variation, i.e.:

\frac{\lambda_1+\lambda_2 + \dots +\lambda_k}{\lambda_1 + \lambda_2 + \dots +\lambda_n }\geq t

In other words, we choose a set of eigenvalues such that their sum will be a significant part of the sum of all eigenvalues (we can set the value of t, for example, 0.85). The value of k in practice turns out to be relatively small, for example for 100 by 100 images it is often close to 30.

Now we can talk about the main set of “ eigenfaces” , which contains the most significant features of all faces from the training set, and any face image, both from the training set and from the outside, can be decomposed into components - in the form of a weighted set (linear combination) “ own persons . " That is, any face image can be represented as a vector of, for example, 30 components (a 1 , a 1 , ..., a 30 ). In this form, it can already be stored and compared with others according to formal criteria; this will be a biometric sample.

Two additional points should be noted: firstly, without access to a set of “own faces”, it is impossible to recreate a face image from a biometric sample. Secondly, “own faces” translated into images do not always even resemble faces in the usual sense:

Example of “own persons” from AT&T Laboratories

Example of “own persons” from AT&T Laboratories

This algorithm certainly has its drawbacks, for example, almost always the main eigenvalues are those responsible for illumination, it copes poorly with facial expressions, and is very dependent on the shooting conditions. However, it shows well the general approach to generating biometric samples from facial images.

Modern solutions for identifying a person by face mainly use approaches based on deep learning , in which many steps are performed implicitly using neural networks, for example, compiling a set of facial features, a weighted set of which will form the biometric sample. Therefore, algorithms in general, and sample formats in particular, vary from solution to solution; much is determined by the quality (diversity, size) of the set of faces on which the solution is trained, the architecture of the neural network and other parameters.

Search and comparisons in the database of biometric samples​

As I already wrote, a biometric sample is a vector of facial characteristics; at the moment, for many well-known solutions, it consists of about 100 components (values). In other words, a biometric sample is a point in, say, 100-dimensional space. Now, how can you determine which person is shown in the photograph by taking a photo with your smartphone camera?

First, an image of a person’s face is extracted from a photograph, then the image is converted into a biometric vector sample. The vector is then sent to the recognition server, where sample records for different users are stored. If you imagine a sample as a point in 100-dimensional space, then the question “what kind of person is shown in the picture?” is equivalent to the question “which point is closest to the given one for comparison?” The proximity between points can be determined in different ways, depending on the algorithm, for example, it can be the usual Euclidean distance between points. Thus, by finding the closest point (closest vector) to the given one for comparison, we will know who the person whose face is depicted in the photograph most resembles.

For each user, several samples can be stored in the database for subsequent comparison, then we can talk about the vector falling into a certain region of 100-dimensional space, identified with a specific user. Thus, it is possible to answer the question “is a person who he says he is?”, i.e. perform authentication. In our language, the question will sound like “will the sample vector fall into the desired area?”

The quality of the algorithms can be assessed, for example, by the number of correct identifications. By setting a high similarity threshold—by reducing the distance between compared samples to make a decision “this is the same person”—you can achieve a very small number of type I errors, or false admissions. However, obviously, this will lead to a significant increase in the number of errors of the second type - false refusals. The better the algorithm is, the more balanced its errors are; in other words, it produces an acceptably small number of errors of both the first and second types.

In the next article, I will look at ways to evaluate solutions, namely, how to calculate which algorithm does a better job of identifying and authenticating.

Nikita Lukyanov​

Analyst-Developer of the Innovation Department of NSPK

Links:
[1] https://www.gemalto.com/biometrics
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.6807
[3] https://www.cin.ufpe.br/~rps/Artigos/Face Recognition Using Eigenfaces.pdf
 
Top