Face recognition for dummies

Tomcat · Jun 2, 2024

Facial recognition system is a very common system today. It is widely used in various fields such as security, entertainment, social media, etc. This technology is developing every year at an incredible pace. This article will break down how the facial recognition system works in very simple terms.

Introduction

Which also evaluate this article quite well, but they use a lot of formulas of scientific terms, etc. In my article, I wanted to talk about the facial recognition system on behalf of an ordinary developer in the simplest form for the average person, hence the name. But don’t be scared by the title, not so simple things will be described here. Yes, I don’t think that dummies will be interested in such topics

. The name itself was inspired by the title of Stefan Davis's book: C++ for Dummies.

Since this topic is too large, it will be divided into 3 parts.

The basis of facial recognition systems and how FaceID works.
The role of neural networks in face recognition systems.
Problems and prospects for the development of face recognition systems.

Why is facial recognition needed at all?

Of course, before explaining everything, it’s worth understanding why we need it?

Facial recognition is needed to find out who is in front of us. This can be useful in different situations, for example:

When we want to put some kind of mask on ourselves on Snapchat or Instagram.
When we want to open our phone or computer using our face and not our password.
When we want to find our friends or celebrities in photos or videos.
When we want to check that the person who shows us his passport or driver's license is really who he says he is.
When we want to monitor security and crime by identifying the faces of suspects or criminals.

And in many other specific areas and needs.

What is facial recognition anyway?

Imagine a routine situation: you go to work, school or university and see hundreds of different faces in front of you. And suddenly, among the unfamiliar faces, you recognized your friend, but how is it that you can recognize one or several people among hundreds of others? In fact, our brain automatically analyzes each face in a split second, namely its eye color, nose shape, hairstyle and other details. Next, the brain compares them with those features of the faces that it remembers.

Based on this, we can say that facial recognition is the ability to recognize and distinguish the faces of different people from each other based on certain facial features.

How does a computer recognize faces?

We know how our brain recognizes people’s faces, but how can a piece of metal understand who is standing in front of it: its ~~victim~~ , the owner Ivan, or Styopa from the neighboring village? In fact, there is no need to invent a wheel here, but simply ~~steal~~ and use the methods that Mother Nature invented.

The legend about black and white images

A long time ago, when people did not yet know how to recognize faces, they lived in a world where everything was colored. They loved different colors and decorated themselves and their homes with them. But they could not distinguish each other by their faces, because the colors confused them. They often confused their friends and relatives, and sometimes even their enemies. This led to many misunderstandings and conflicts.

The moral of the story is: ~~being emo,~~ always convert your images to black and white before recognizing them.

Of course, the question arises: why?

And the answer to this question is painfully simple. Images are made black and white before recognition to make it easier and faster for the computer to view them. A black and white image has less detail than a color image, making it easier to view. Also, a black and white image shows only the shape and lightness of objects, and not color, which can be redundant or confusing during recognition. For example, if you want to recognize words in an image, the color of the words and background is not as important as their distinction and clarity. The same thing works with faces.

Who is who?

Face detection

First of all, of course, we must understand where the face is, and not the wall or the book. Once we have received the webcam stream or photo and converted it to black and white, we can begin the face detection process. To do this we can use different methods.

Haar signs were developed in 2001 by Paul Viola. These signs help you know what a face looks like based on its parts, such as the eyes, nose, mouth, etc. They do this by using rectangles of different colors that they place on a picture of a face. Using these features, you can understand how light or dark it is inside these rectangles and compare them with each other. This way they can understand where on the face there are transitions from light to dark or vice versa. Haar signs come in different types, depending on the number and location of rectangular areas. The simplest are 2-rectangular features, which consist of two general areas. There are also 3-rectangular and 4-rectangular features, which consist of three or four areas, respectively. In addition, there are oblique Haar features, which have an angle of 45 degrees. This allows you to increase the coverage of the feature space and improve the quality of recognition. These features have a number of advantages, such as high computation speed, low sensitivity to changes in lighting, and the ability to detect objects of different scales. However, they also have disadvantages, such as low accuracy when rotating or deforming objects, difficulty in selecting the optimal set of features, and the need for a large amount of training data.

Haar filters

How filters are applied to the face

These features are stored in files, usually these are haar-cascade.xmlfiles that describe the geometry of the face.

Examplehaar-cascade.xml file

Code:

<?xml version="1.0"?>
<opencv_storage>
<cascade type_id="opencv-cascade-classifier"><stageType>BOOST</stageType>
  <featureType>HAAR</featureType>
  <height>20</height>
  <width>20</width>
  <stageParams>
    <maxWeakCount>93</maxWeakCount></stageParams>
  <featureParams>
    <maxCatCount>0</maxCatCount></featureParams>
  <stageNum>24</stageNum>
  <stages>
    <_>
      <maxWeakCount>6</maxWeakCount>
      <stageThreshold>-1.4562760591506958e+00</stageThreshold>
      <weakClassifiers>
        <_>
          <internalNodes>
            0 -1 0 1.2963959574699402e-01</internalNodes>
          <leafValues>
            -7.7304208278656006e-01 6.8350148200988770e-01</leafValues></_>
        <_>
          <internalNodes>
            0 -1 1 -4.6326808631420135e-02</internalNodes>
          <leafValues>
            5.7352751493453979e-01 -4.9097689986228943e-01</leafValues></_>
        <_>
          <internalNodes>
            0 -1 2 -1.6173090785741806e-02</internalNodes>
          <leafValues>
            6.0254341363906860e-01 -3.1610709428787231e-01</leafValues></_>
        <_>
          <internalNodes>
            0 -1 3 -4.5828841626644135e-02</internalNodes>
          <leafValues>
            6.4177548885345459e-01 -1.5545040369033813e-01</leafValues></_>
        <_>
          <internalNodes>
            0 -1 4 -5.3759619593620300e-02</internalNodes>
          <leafValues>
            5.4219317436218262e-01 -2.0480829477310181e-01</leafValues></_>
        <_>
          <internalNodes>
            0 -1 5 3.4171190112829208e-02</internalNodes>
          <leafValues>
            -2.3388190567493439e-01 4.8410901427268982e-01</leafValues></_></weakClassifiers></_>
    <_>
</stages>

This is only a small part of the data. The files themselves are very large and consist of approximately 12-100 thousand lines.

It is also worth noting that Haar features are not the only way to recognize a face, but they are very common. It is still used, but in my personal opinion, this algorithm is not very effective in the current realities.

A better option is neural networks, but we will talk about them in detail in the second part of the article.

Obtaining facial geometries

So, we understand where the image is, now we need to take the necessary data from there, namely: the distance between the eyes, the shape of the cheekbones, the contours of the lips and other features that distinguish this face from others.

Visualization of the data we take

In this picture we see Interest points on the left and Landmarks on the right.

Let's talk about them in detail:

Interest points are parts of the face that are easy to see and remember, for example, the corners of the eyes, the tip of the nose, the edges of the lips, etc. For each face, you can select from 68 to 194 such points. These points help make a unique code ( feature vector ) for each face, which, relatively speaking, serves as its passport in the face recognition system.

Landmarks are lines on the face that connect points and show its shape and location. They can be, for example, the corners of the eyes, nose, mouth, etc. Landmarks help measure the face, that is, how many centimeters from one part of the face to another. Support lines also help to rotate and enlarge the face so that it is in the same position and size. Support lines can be found in different ways, for example, using neural networks or algorithms (for example, Haar features).

A Vector feature is the coordinates of vectors that describe what a face looks like. These coordinates can describe different parts of the face, for example, eyes, ears, nose, mouth, etc. Coordinates help you know when making comparisons which faces are similar or different.

Another example with points of interest

Saving received data

After we have found a face, recognized it and received a feature vector, we must, of course, save the received data. You can save both images in format (PNG, JPG, JPEG) and feature vectors in JSON or other files, depending on the logic of your code.

Example of saving feature vectors in JSON

Code:

{
  "faces": [
    {
      "name": "Alice",
      "vector": [-0.096, 0.123, 0.045, ..., 0.067]
    },
    {
      "name": "Bob",
      "vector": [0.087, -0.134, -0.056, ..., -0.078]
    },
    {
      "name": "Charlie",
      "vector": [-0.034, 0.098, 0.012, ..., 0.055]
    }
  ]
}

The recognition process itself

So, finally, after we have converted the image to black and white, determined where the face is in the image, obtained the facial geometry itself, saved the faces we need, we can begin the recognition process itself. When we “scan” a new face, we obtain its geometry, we compare the resulting geometry with what we have. The comparison process itself depends on your algorithm and application logic.

Visualization of the comparison process. Source NetchLab

Visualization of the comparison process

Congratulations, we've just gone over the most basic but fundamental things about facial recognition systems.

How does FaceID work?

Of course, when mentioning facial recognition, many will mention FaceID. Indeed, the Apple company did a good job making its facial recognition system. But how does FaceID work? I would like to say in advance that all further information is just speculation, since the source code of FaceID itself is closed and we can only guess how it works. But be that as it may, we should look into it in more detail, as a good example of facial recognition.

Face ID uses a TrueDeph infrared camera that sees a face in 3D and compares it to what it previously registered. Face ID works even in the dark because the camera uses infrared light. Face ID can recognize a face even if it changes over time, such as growing a beard or changing hairstyle

How does TrueDeph work?

The way the TrueDepth camera works is it sends small dots of light onto your face, which bounce off your skin and come back to the camera. The camera reads these dots and puts them together to create a picture of the face, like a puzzle. This picture helps you find out who is in front of the camera

Example of TrueDeph in action

How to build a 3D face model

In this case, Apple's neural network is used for training and facial recognition, and all facial data is stored in a secure Secure Enclave module inside the iPhone processor.

Conclusion

The article turned out to be quite large, but even so it does not describe in detail the training methods, the use of false images for training, datasets and much more, which will be described in the 2nd part of the article.

Face recognition for dummies

Tomcat

Professional

Table of contents

Introduction

Why is facial recognition needed at all?

What is facial recognition anyway?

How does a computer recognize faces?

The legend about black and white images

Face detection

Obtaining facial geometries

Saving received data

The recognition process itself

How does FaceID work?

How does TrueDeph work?

Conclusion

Similar threads

Face recognition for dummies

Tomcat

Professional

Table of contents​

Introduction​

Why is facial recognition needed at all?​

What is facial recognition anyway?​

How does a computer recognize faces?​

The legend about black and white images​

Face detection​

Obtaining facial geometries​

Saving received data​

The recognition process itself​

How does FaceID work?​

How does TrueDeph work?​

Conclusion​

Similar threads

Table of contents

Introduction

Why is facial recognition needed at all?

What is facial recognition anyway?

How does a computer recognize faces?

The legend about black and white images

Face detection

Obtaining facial geometries

Saving received data

The recognition process itself

How does FaceID work?

How does TrueDeph work?

Conclusion