Artificial intelligence. Detectron.

Tomcat

Professional
Messages
2,695
Reaction score
1,060
Points
113
2f319470a8c2e307caa27.png


Detectron is software that allows you to find out about a person his essence even before he entered into an open dialogue.
The article discusses the development of a software module for the analysis of physiological parameters and an audio channel for assessing the emotional state of a person from a video stream in real time using artificial intelligence models.

The project being developed helps to solve communication problems in relations with any strangers or acquaintances without entering into a dialogue with them. The program works online. Detectron measures and analyzes human behavior, which allows you to adapt to any person and achieve your goals in sales, management, monitoring, marketing, medicine and other areas of human activity. The product works in real time. It takes 20 to 90 seconds for the program to diagnose and solve any problem related to people that the customer has set. For this, in most cases, an ordinary smartphone with a built-in video camera (which is equivalent in technical characteristics to the iPhone-6 and above) is sufficient. In rare cases, additional audio and video equipment is required.

Why is Detectron useful?
The product being developed makes it possible to predict a person's reaction to the stimuli and questions by appearance and the prevailing emotional state. The following indicators are analyzed: the manner of speech of the interlocutor, the script of the dialogue, the offered subjects, as well as the level of stress of the interlocutor. All of the above data are used in a complex, that is, one prediction confirms or corrects another, thus an architecture is built that allows analyzing the input data stream at a high speed - about 60 frames per second.
Separately for the security services (SB) and personnel departments, a software package is offered, designed to optimize and increase the reliability of the work of the security and personnel management departments.

The complex allows you to conduct an initial assessment of personnel for the reliability of an employee, his motivation and abilities, immediately weeding out unreliable personnel, as well as attracting the attention of specialists to personnel who, for various reasons, should be more attentive or cautious. The program also allows you to track the dynamics of each individual employee, signaling deviations before they manifest themselves in the form of problems in the team and organization, making it possible to prevent the complication of the situation, as well as assess the abilities and motivation of existing employees for more effective formation of project teams and personnel reserve, and to scale the listed functions to any size of the enterprise staff.
For sales departments, the complex allows assessing the interests and values of a potential client based on his psychotype, as well as making recommendations on values in order to immediately offer a product corresponding to them, gaining customer loyalty and interest, which allows managers to evaluate the potential effectiveness of communication by mood and state client, and thus communicate only with sufficient performance.

Methods and ways of solving the assigned tasks to obtain the expected characteristics
At the moment, the beta version of the program recognizes a person's emotions, psychotype, baseline behavior, stress level, and a number of medical indicators. Next, 3-5 models are built that predict the behavior of the subject based on fast neural networks. The higher order model either confirms or neutralizes the conclusions of the lower order models. About 20,000 video recordings are used as initial data, in which time intervals are highlighted where characteristic reactions took place, intervals where there was no reaction, and intervals where reactions were uncharacteristic are highlighted, that is, what the program needs to ignore. Then the model is supplemented by the calculation of physiological characteristics (pulse, breathing, blinking, facial expressions, gestures, voice timbre, pauses in speech, etc.) and the rate of their change. The operation of the system is designed as a web service that accepts a video or a sequence of frames and returns an array of numeric values with a time stamp - in json format. An intermediate dispatcher is developed between the system and the end user, which distributes the load and displays the answers in a visual form. For calculations, an array of servers equipped with an NVidia GTX 1080 graphics card is used. The emotional state assessment subsystem can be conditionally divided into 2 parts: the part responsible for identifying the components of behavior is implemented in the form of a neural network using algorithms;

Subsystems for structuring, storing and analyzing results are developed as software services that use the results of calculations of the subsystem for assessing the emotional state. Also, during the work, technical tools and practices will be introduced: recognition of emotions by facial expressions using the Active Appearance Model technology based on the facial movement coding system (FACS) P. Ekman; determination of heart rate using PCA video decomposition; determination of respiratory rate using the Eulerian Video Magnification method; fixation of changes in complexion using PCA video decomposition; determination of the direction of gaze (computer vision technology); determination of the position of the body, arms, legs, head (computer vision technology); recognition of items of clothing, shoes and accessories (computer vision technology); gait determination (computer vision technology); recognition of voice patterns (acoustic modeling systems); analysis of the amplitude-frequency characteristics of the voice; Mel-cepstral analysis of voice phonograms (MFCC); modern approaches and practices of building neural networks.
The volume and capacity of the product market, analysis of the current state and development prospects of the industry in which the project is being implemented.

Now the emotion detection market is booming, and, according to Western experts, by 2021 it will grow, according to various estimates, from $ 19 billion to $ 37 billion. Emotion detection and recognition systems (EDRS) and emotional computing (affective computing) form their own ecosystem in artificial intelligence (AI) development field. Estimates of the volume of this market and its prospects for the period up to 2022 differ, since they are based on different metrics and calculation formulas. According to MarketsandMarkets, the global emotion market in 2016 was $ 6.72 billion and is expected to reach $ 36.07 billion by 2021 with an annual growth of 39.9%. Reportlinker and Orbis Research adhere to more conservative forecasts - $ 29.17 billion / 27.4% and $ 19.96 billion / 21.2% by 2022, respectively. Gartner claims that our smartphone in 2021-2022 will know us better than our friends and relatives, and interact with us on a subtle emotional level. Three geographic zones remain reference for the industry: the Asia-Pacific region (APR), North America (USA and Canada) and the European Union. The most attractive rates are still demonstrated by two channels of emotion analysis: facial microexpression recognition and biosensory sensors built into wearable devices. This is followed by voice / speech and video oculography (eye tracking). Emotional and behavioral technologies are in demand in various fields, including medicine. The Israeli company Beyond Verbal, together with the Mayo Clinic, is looking for vocal biomarkers in the human voice, by which not only emotions are determined, but also the possibility of predicting coronary heart disease is laid. Parkinson's and Alzheimer's diseases, which already brings emotional issues to the topic of gerontology and the search for ways to slow down aging. If we talk about the applicability of technologies, then the B2B sphere is mainly involved in sectors such as intelligent transport, retail, advertising, HR, IoT, gaming. But there is also a demand in B2C: EaaS (Emotionas a Service) or a cloud-based analytical solution (human data analytics) will allow any user to upload a video file and receive all emotional and behavioral statistics for each piece of the post. If we are talking about pre-election debates for the presidency (be it Russia or the United States), then there is hardly anything hidden from the algorithm. Moreover, in a couple of years, emotion recognition technology will be in every smartphone. The trend will be the creation of smart interfaces for recognizing human emotions - the software will allow you to determine the state of the user at an arbitrary point in time using a regular webcam. This is a promising niche, since the determination of human emotions can be used for commercial purposes: from analyzing the perception of video and audio content to investigating criminal cases.

On the other hand, there are endless possibilities in the entertainment industry. For example, the new iPhones have built-in Face ID facial recognition technology, which not only unlocks your phone, but can also create emoji with your facial expressions. The bulk of new products in the field of emotional science are based on seven basic emotions and facial micro-expression, which reflects our emotions at a level beyond the control of the brain. We can consciously suppress a smile, but slight twitching of the corners of the lips will remain, and this will be a signal for emotion recognition technologies. There is also a block of technologies specializing in the analysis of speech, voice and gaze. The use of these methods in psychiatry or criminal proceedings will allow you to find out the maximum about the emotional state of a person and his true mood thanks to information about the smallest changes in facial expressions and body movements. Companies and teams can now leverage open scientific data on emotion recognition and stack it with technology to shape the field of affective computing. The FAANG five (Facebook, Apple, Amazon, Netflix, Google) and tech giants like IBM have made a huge contribution to the development of the emotional technology market. At the same time, universal digitalization, the proliferation of gadgets and devices of any kind, the ubiquitous use of images and videos (several billion videos hit the Web every day), publicity in social networks allows you to effectively extract emotional data from the general flow and use it to analyze a person as a consumer of goods and services and as a user. And all this must take place in the legal field, correctly and ethically.

Health and Healthtech
The health industry is actively adopting the most modern methods for collecting and analyzing data from patients or users, as machine algorithms determine symptoms using hundreds or thousands of similar cases. There are already mobile applications that analyze psycho-emotional state by photo and text, and the more a person communicates with the program, the better it learns, “understands” it and gives accurate predictions of treatment. It's one thing when a device simply picks up, “understands” your mood at its level and, in accordance with it, turns on music, adjusts the light or prepares coffee. Another, when it assesses the degree of fatigue by your appearance or determines some deviations from the norm. Or diseases. For example, Alzheimer's or Parkinson's.
Long before its manifestation, the disease begins to affect the muscles of the face, the speed of eye movement, seemingly imperceptible changes in the voice and micromovements. As for the advertising sphere, already now global retail networks are integrating online into offline as much as possible, trying to find out what the buyer wants and what he is most likely to buy. When neurointerfaces reach the level of accurate highly sensitive recognition of emotions, advertising in a shopping center window will adjust in a split second to the mood of people passing by. In April 2017, a San Francisco-based research team taught the LSTM neural network to more accurately recognize the emotional component of text. The machine now recognizes mood almost unmistakably in Amazon customer reviews and Rotten Tomatoes movie reviews.

Game industry
When the first model of Google Glass was released, it was assumed that gesture control would reach a new level - in order to read the text on the inside of the lens, it was enough to swipe your eyes from top to bottom for the system to understand that you had already read this paragraph and you can show the next one. ... Despite the fact that the gadget itself did not go beyond the prototype, the story of the study of eye movements moved to a new field - the game field.

Competitive advantages of the created product, comparison of technical and economic characteristics with world analogues
Loom.ai is creating a new era of virtual communication through animation and the sharing of personalized 3D avatars. AY Combinator Fellowship and Academy Award-winning team formed in San Francisco to create a best-in-class solution based on Deep Learning and Computer Vision. Binary VR develops real-time face detection technology by the world's best computer vision and Deep Learning experts. The technology includes a wide range of face recognition, face landmark tracking and facial expression recognition. A 3D character is generated in real time, using an AR filter, a VR avatar is created, endowed with the facial expression of the prototype.

Affectiva was created from the MIT Media Lab. This company is a pioneer in the field of emotion recognition using artificial intelligence. Affectiva understands how important emotions play a role in all aspects of our life. They shape experiences, interactions and solutions. Today, in our technological world, emotions are either absent or greatly simplified.
Cyntient AI is a software platform that uses artificial intelligence to simulate human behavior in video games and simulations. This allows you to make virtual characters that react to the player's behavior, analyze the situation and learn as the game progresses. This creates realistic personalities that are intelligent, intuitive and emotional.
Target segments of consumers of the created product and assessment of effective demand
Target segments: company sales departments, HR services, security services, medical and educational institutions.

Human Resources Department
On the selection of an employee, the company spends an average amount equal to 1.5-2 of the employee's salary (average total monthly income). What does this amount consist of:
  • Direct costs are what we pay to have a candidate to whom we will be ready to make an offer. This includes the cost of paying for work sites and other resources on which a candidate is looking for, the time of the employee engaged in the selection, the time of other specialists involved in the assessment of the candidate, here we also include the time it takes to discuss the need for selection, drawing up a portrait, vacancies, evaluation performing a test task, the cost of preparing personnel documents during registration, etc. This amount is already approaching an amount approximately equal to the average monthly income of the desired employee.
  • Indirect costs should take into account all additional costs - for example, for a bonus or an extra day of vacation for someone who works with increased workload during the selection of a new candidate, equipment for a newcomer's workplace, dismissal costs and compensation, if we are not talking about expanding the staff. Total: we get that the selection, for example, of an average developer with a salary of 60,000 rubles. will cost about 100,000 rubles. If you take the first candidate that comes across, then the costs will not decrease. More precisely, at first they will decrease, and then they will more than pay off with a failed project, return of money to the client and huge reputational losses. Adaptations are often overlooked.

In any case, adaptation as a process will take place, whether we like it or not, but it is within our power to influence the process and its duration. This process lasts from 3 to 6 months and is conventionally divided into two parts - production and psychological. At this time, the employee requires more attention to himself and his efficiency ranges from 50 to 70%. This means that his salary is paid somewhat in advance. Plus the time colleagues spend explaining and answering questions rather than completing their current tasks. And also communication on tasks is more difficult, because the arrival of a new person is stressful for the whole team. If you calculate how much a company loses if an employee leaves on probation: recruitment costs + probationary salaries + colleagues' time + indirect costs, including office fees.

And if a person did not adapt and left, having worked not 3, as we calculated, but 6 months, this amount will almost double. It is unlikely that anyone wants to just throw this money away. The Russian market for recruiting services began to grow in 2017 after a protracted stagnation, according to a study by the Association of Private Employment Agencies (ACHAZ). This association brings together the largest Russian recruiting companies. In 2017, Russian recruiting agencies filled much more vacancies than in the previous three years, the study showed. According to ACHAZ, in 2017, recruiters filled 44,000 vacancies for RUB 7.4 billion. and this is 15% higher than in 2016 (6.5 billion rubles).

The total number of vacancies transferred to agencies increased by 28% to 2016, and the recruitment services market as a whole to 66.3 billion rubles. According to the study, the number of vacancies closed by recruiters increased for the first time in three years since 2014. In 2017, the labor market revived due to a slight economic growth - by this time, the demand for development had accumulated in companies, and at the first opportunity they began to look for people. says Rustam Barnokhodzhaev, Director of Key Accounts at Unity.

In addition, he notes, in 2017, candidates also began to behave more actively in the market: during the crisis years, they held on to work more, and now they decided to look for a better life. In 2017, employers mostly continued to recruit staff on their own, but in industries where the economic recovery was felt more, they more often filled vacancies with the help of external recruiters, says Tatyana Baskina, deputy general director for work with the professional community of the Ankor HR holding. According to her, this happened in the production and sale of consumer goods, in heavy industry, energy and the agro-industrial complex. With the help of agencies in 2017, companies most often attracted specialists - 36% (36% in 2016) and less often line managers - 30% (33% in 2016).
Above, one of the vectors of target customer demand from the sales market is described. We can unequivocally say that the above-described areas, such as campaign sales departments, HR services, security services, medical and educational institutions, need and are interested in the project being developed, as this will significantly reduce the cost of evaluating candidates and increase the accuracy and speed of decision-making. For sales managers - the inability to quickly find a common language with the client, which leads to the loss of the client. For companies from the security sector - a lie detector test takes three hours or more, the program needs no more than 3 minutes to detect a lie. The program is able to conduct round-the-clock monitoring of the psycho-emotional state of any employee. For medicine - making a primary diagnosis within 2 minutes based on video recording, made by the patient. Education - constant monitoring of the level of interest of the audience in offline and online learning. Thus, the need for product development lies in the market demand for the following areas: increasing sales using an individualized approach without long-term training of low and medium-skilled employees; the staff of the HR department by reducing the time spent on interviewing for employment without losing the quality of the applicant's assessment; receiving objective feedback from the client without direct questions about the product; identifying fraudulent schemes in the organization; preventive measures against them; automatic monitoring of employees' predisposition to deviant behavior. Thus, the need for product development lies in the market demand for the following areas: increasing sales using an individualized approach without long-term training of low and medium-skilled employees; the staff of the HR department by reducing the time spent on interviewing for employment without losing the quality of the applicant's assessment; receiving objective feedback from the client without direct questions about the product; identifying fraudulent schemes in the organization; preventive measures against them; automatic monitoring of employees' predisposition to deviant behavior. Thus, the need for product development lies in the market demand for the following areas: increasing sales through an individualized approach without long-term training for low and medium-skilled employees; the staff of the HR department by reducing the time spent on interviewing for employment without losing the quality of the applicant's assessment; receiving objective feedback from the client without direct questions about the product; identifying fraudulent schemes in the organization; preventive measures against them; automatic monitoring of employees' predisposition to deviant behavior.

There are several areas of activity:
  • Business. In this case, the program can replace specialists who recognize the lies, accompanying clients in business meetings, in order to determine if the future partnership is not a scam.
  • Insurance companies. The services of verifiers and insurance companies are turning to. The essence of the work comes down to identifying fraud in the execution of insurance claims.
  • Banking organizations. Creation of an online credit scoring system.
  • Auditing companies. When checking the integrity of accountants, this is an effective addition to the traditional checking of accounting documents.
  • Transport sector. Traffic safety prevention, anti-terror.
  • Hotel business. The action is aimed at preventing crime in hotels.
  • Recruiting. The program helps the recruiter to recognize deception on the part of the candidate for the position, to get to the bottom of the unsightly truths in his biography (large debts, criminal history, gambling addiction, etc.).

The main technical parameters that determine the quantitative (numerical) and qualitative characteristics of the finished product
The video stream is processed online. The results of its analysis are returned several times per second, including:
  • determine the current emotional state and its severity on the scale of emotion / strength;
  • we determine the current emotional state and its severity on the scale of positive-negative / excitement-inhibition;
  • predict a person's response to stimuli based on the prevailing emotional state;
  • predict a person's response to stimuli based on the analysis of appearance and behavior;
  • we compare both predictions - we see which reactions are confirmed and which are neutralized;
  • we build a graph of the change in emotional states in time;
  • we build a graph reflecting an increase in heart rate, blinking, rapidity / holding of breath, the appearance of mimic reactions characteristic of stress;
  • we analyze the change in the speed of speech, pitch and timbre of the voice.

Output information:
  • emotional state and its strength (the value of each of the factors is a matrix, normalized);
  • stress level (heart rate, breathing, blinking, facial expressions - numbers, excitement - yes / no);
  • anticipated response to stimuli (text).

Options:
  • continuous processing of the input stream with a resolution of at least 1280 × 720;
  • input stream processing speed - at least 40 frames per second;
  • forecasting accuracy - not less than 90%;
  • the frequency of appearance of numerical data at the output - at least 4 per second;
  • time interval when displaying information on charts - at least 30 seconds. Prototypes of tools to improve accuracy have been prepared (testing and training work units are available).
Design requirements for the finished product: video camera resolution 1280 × 720 and higher.
Minimum system requirements: cloud disk storage, purchase and rent of computing power and RAM.
Functional requirements: the ability to use all software functions.

Non-functional requirements
Security requirements: include three broad categories - requirements related to access control; requirements related to working with private data; requirements aimed at mitigating risks from external attacks.
Plans for the creation and protection of intellectual property: during the implementation of the project, it is planned to receive in 2020-2021. patent "Method for measuring and analyzing human behavior".

Additional development in this project is planned by 2021:
  • a subsystem that uses video and audio data to assess the emotional state of a person;
  • a subsystem that structures the emotional state with the help of context (modules of structured conversation, interview, arbitrary context);
  • subsystems for regular collection and storage of information about the emotional state of a person;
  • subsystems for analyzing the collected data to identify deviations.

It is also planned to create a software package for medical services, designed to optimize and improve the reliability of the work of medical care institutions.
As a result, the complex will allow:
  • highlight deviations or absence of deviations (only for diseases for which the neural network was trained);
  • keep statistics, save states, analyze changes over time and signal potentially critical changes (only for diseases for which the neural network has been trained);
  • scale the listed functions to any region and any number of patients;
  • Scale the listed functions for any diseases and disorders that allow direction for additional diagnostics based on face and body movements, skin condition and other external signs.
 

Artificial intelligence taught to search for a person by signs using video cameras​


5fb382e817017400de741.png


A group of Indian researchers taught the algorithm to independently search for people in video surveillance footage. To do this, it is enough for the system to tell the height, gender and color of the person's clothes - and after a while it will designate those who are suitable for the description.

The developers say that their system most of all resembles a search engine - only, unlike the usual Google, this one works with a video stream. The request can be a set of accents of a person. Scientists were able to get the algorithm to associate data about a person's height, gender and clothing color with what actually happens in the video. To do this, they used a convolutional neural network architecture. It is one of the deep learning tools that works effectively with image recognition.

In practice, it looks like this: in response to the request “to find men 180 cm tall, wearing red shirts,” the algorithm can return a set of frames that satisfy the request. There is no need for a person to view the entire video. At the moment, the system correctly identifies 28 people out of 41.

For the early stage, this is a decent result, especially since the algorithm was tested, including on low-quality and complex samples, Next Web reports. The researchers promised that in the future the list of criteria will increase, and the quality of recognition will increase. In a few days, a person on the wanted list can appear under hundreds of such cameras. The total amount of video footage from them is tens of thousands of hours.
 
Top