How we listen and analyze every call to the bank

Tomcat · Jun 1, 2024

First, DSS LAB makes voice-text, then LSTM classifiers, Spacy + Yargy entities (Natasha), Pymorphy2 lemmatization, Fasttext and Word2Vec edits, 3 different adders and our solutions. We can analyze your voice not only to understand what the dialogue is about, but also to look for places to improve banking products after the dialogue.

For example, by recognizing certain key words in speech, such as “letter of credit” or “escrow,” the operator is shown a certificate; by the names of deposits, their exact tariffs are shown, and so on. There is no need to press anything. The opportunity is currently being tested in beta.

Example:
At the entrance: “...why is an ATM with a transfer to another bank for me? ”
Correction: “why does the ATM refuse to transfer me to another bank? ”
Highlighted key message: “ATM fails . ”
Action: the operator is offered call routing, the call is classified for statistics.

The recognition itself works as follows:

The voice is divided into phonemes. Phonemes are assembled into words using the same solution.
Various client data are removed from the collected data: card numbers, code words, and so on.
Then the resulting stream of words is supplied with punctuation (dots and commas) and capital letters: this is necessary for neural networks that are very sensitive to this. Typos are corrected, terms (geography) are corrected.
And the output is text dialogues, like in a chat: they are analyzed by a neural network, trying to attach meaning in real time.
After the call ends, the texts are also analyzed by neural networks responsible for collecting various metrics for voice and chat support.

Let's show real (impersonal) examples of dialogues to make it clearer.
The first and main use of post-analytics is to obtain assessments of features that are needed in the application. What comes first in the call center in terms of the number of questions or requests is automated or completed via the interface the fastest.

The second most important feature is to automatically highlight key things in dialogues and form a communication history in the form of short summaries like “the client wanted by courier”, “lives in the village” and so on, rather than pages of text.

Practical use

The main practical application is to see a picture of what is happening with products in real time. That is, after rolling out to the entire contact center (now in beta for several dozen operators), we will see the following:

what questions are clients asking about right now?
where are the bottlenecks in the product;
where something goes wrong.

Naturally, operators already know this, but they cannot always accurately and quickly assess the widespread nature of the problem (since everyone sees only their own requests). Plus, it is not always possible to do normal post-analytics on some new section. If we have the texts of all calls, then we can simply start the analysis again and get the result quite quickly.

Practical application in assessing the value of automation application features (to relieve the contact center and give the client the opportunity to better solve their problems).

What’s at the entrance: “... the only thing is that in ours we live in a village, well, the closest store in the city is Kurchatova, next door, we don’t have your bank, so I won’t be able to get such a card.”
Highlighted key message: “We don’t have your bank near us.”

When a customer rejects a product for some reason, marketers want to know why: this is necessary to understand that the product is competitive. Here we were talking about getting a card. It will be important for a product specialist to know: the fact is that there are no branches or ATMs nearby. Accordingly, after recognition, a decision is made to replenish the classifier of reasons for rejecting the product. The neural network recognizes that this is a refusal by the fact that it has not highlighted key messages characteristic of confirming that the client wants to receive a card.

Next: we have a neural network that builds a summary of the dialogue. Thanks to this, the operator can understand the client’s call history in 10 seconds, rather than reading entire dialogues. These summations are not perfect, but they still help to understand what happened. This potentially removes one of the most frustrating things about phone support, having to re-explain to each operator what happened and what happened in previous calls. Or it simply allows you to keep a lot of silences. For example, it’s good if the previous dialogue in the summary looks like this: “The client took a debit card for his wife.” Here’s an example:

What’s at the input: “yes, that’s true, yeah yeah, and the career is better, yeah, yeah, but it’s better, let’s say in the morning and there until 4 something like this, and if let’s say tomorrow, but it turns around after 6, yeah, no, no child’s school "

Highlighted key message: “courier is better.”

If we already know that it is more convenient for the client to use a courier, then next time he will be offered this method of receiving the card first. For such records there is no need to train an operator: these tags are generated automatically by the robot.

Here's a not-so-good example of summation:

At the input: “Oh yes, hello, yeah, and then how to turn off SMS notifications Another client may be, yes, I understood the key well. No, no, no. Yes, in principle, this will probably be the kind of day off that will forever be. Yes, you can for Lenin Avenue 15, well, tell me which (last name). Yes, yes, well. Yes, of course, uh-huh uh-huh. Yes, I wanted 1000 N on a card from this card, let’s say, to others and you can transfer it to someone else for free or with a commission. So, yes, for any banks. Uh-huh, everything’s fine, thank you, uh-huh, goodbye.”
Key messages: “Well, yes, for any banks Uh-huh, everything is fine, thank you.”

Here the topic was defined as “transfers to any banks” (correctly), but in the summary of the dialogue the robot put: “Uh-huh, everything is fine, thank you.”

You can provide context clues.

At the entrance: “...tell me, I once saw a question like this in the bank’s app, I wanted to clarify what the conditions are now for my installment plan offer...”
Highlighted key message: “what are the conditions for the installment plan offer.”

This is still in the plans, when we finish it, the operator will see in real time the specific conditions for this particular client. You don't have to click anything to get a hint; it's similar to pop-up notifications. That is, you can simply use it, or you can ignore it. Previously, the operator would have gone to manually search for these conditions in the interface, or, if the request was complex, he would have gone to the portal to figure it out. Now about 70% of the time the clue is correct. This reduces the wait on the line, meaning the customer is more satisfied.

Finally, you can control operators.

At the entrance: “yes, just a minute, please wait for approval for the product, what’s ready for you, your application has been approved for the amount of 400,000 rubles with an interest rate...”
Highlighted key message: “400,000 rubles.”

You may have heard stories where a client is told one loan offer amount (and some specific conditions), but he comes to the bank and receives a different result. To prevent this from happening, you need to clearly control which numbers were given to which client and how they relate to what the software showed to the operator from the preliminary loan assessment. Here the key message automatically goes up with the number from the software, and if they match, everything is in order. If they are different, then a person will listen to the dialogue and make a decision.

Another feature of control is to track whether the client has confirmed that it is him, check it with his credentials, and then, if these indicators do not match, check whether the amounts were mentioned in the dialogue. The operator does not have the right to name, for example, the amount of debt to third parties, including his wife and children, and they are often very persuasive.

Recognition Features

Voice-text recognition - model from DSS LAB. To classify texts, we use LSTM models on Keras, as well as fine-tune Bert models. Before classification with “heavy” models, to assess the quality of training data, we perform an assessment with fast models - logistic regression, naive Bayes, etc. - in order to identify possible controversial points, for example, when two classes are poorly distinguishable, i.e. when the provided examples are not cover the topic with sufficient completeness. Such cases can occur quite often, and for classes with rare occurrence, for example, when bank products have a similar meaning and name. In such cases, for better separability, we try to use meta-information about the client (what products the client has, previous dialogues with the client), as well as a semi-manual approach with setting trash holds for such classes.

To isolate entities (full name, cities, streets, currencies), Spacy is used, as well as Yargy parsers (Natasha library), and for lemmatization (reducing words to their initial form) - Pymorphy2.

Client data (personal data and payment data, secret words) is cut out by our own solution before analysis begins.

To correct critical typos (cities, names, addresses, names of bank products), Fasttext and Word2Vec models are used, as well as statistical approaches, for example, Levenshtein distance, dictionaries/rules.

To correct grammar (punctuation marks), the BERT model is used, described.

To summarize texts we use the python libraries Sumy, Summa, as well as T5 based on transformers.

Dialogues are written in two channels: the bank operator - in one, the client (subscriber) - in the other, that is, we can always distinguish between everyone’s speech, even if it overlaps. The operator’s dialogue is structured enough to mark a lot of key points: in fact, each dialogue is a script with fixed branches, and it can be taken almost from the CRM. For example, when collecting the reasons why a client chose our banking product over others on the market, the operator usually says something like: “Why did you choose us?” Binary choices are even more common: in product selection scripts there are things like: “What do you want more - receive cashback or use the bank’s money interest-free?” The client makes a fairly clear choice at each stage, from which the operator understands what is best to offer, and we receive very well classified data sets.

Naturally, the operator does not speak in strict, fixed phrases, and neither do the clients. The neural network should be able to classify equally well both “I’m happy with your cashback” and “Uh-huh, yes” by analyzing the context.

We currently process 100% of conversations this way. Previously, when there was no automation, 10% of the dialogues were listened to selectively, and statistics of what was happening were manually noted on them. Now there are specific numbers like “70% of conversations were recognized with a quality above 95%, they said “Benefit card” 793 times, which means that presumably only 1,132 customers chose the product for this reason.”

The specific recognition quality is 86%. Of the 100 characters of text that would be typed by a person listening to the conversation, 86 will coincide with the result of the first layer of the model before adjustments. The adjustment model adds about another five percentage points to quality. Further: the Russian language is rich, and the same meaning can be restored from the context quite accurately, but the number and surname are no longer there. The practical result is that 80% of dialogues can be automated - these are the results of beta, where the client may be on the street, other people may be talking in the background or a baby will scream, the caller may have an accent or a diction defect. Up to 97% of dialogues can be automated in terms of monitoring some important key points.

Additional Applications

Based on the corpus of texts and the results of dialogues, we can quickly determine in them those things that lead to the client’s negativity or work well. This has a direct impact on changing operator scripts and scaling good practices.

Understanding the distribution of customer requests at the moment makes it possible to respond correctly to this very situation. There is a predictable period when bank clients need to submit reports: at this moment, the request for certificates increases by 300%. This period is predictable, but if something unexpected happens that sharply increases the importance of some functionality, then, for example, it will be able, based on the need, to “pop up” at the top of the interface so that the client does not call, but immediately does everything himself.

By the way, about the abnormal: quite recently, during the acute period of late February - early March, when the situation was constantly changing, based on speech analytics, we were able to quickly identify the top issues that worried our clients (and this is not only exchange rates and product design, but also worries about your savings, about the future work of the company, etc.). And taking this data into account, we quickly prepared our operators to answer new difficult questions and provided them with all the necessary information.

Or before, clients had questions about the composition of the loan payment. Most often they were interested in decomposition: why is he like this? What does it include? Based on speech analytics, we saw how exciting this was, and immediately made the payment details in the application, and exactly as people ask. Not just the amount and when to deposit, but clearly the components: what part is in the principal debt and is available for purchases, what part is for commissions, interest, etc. Naturally, they could have done it without analytics, but when the statistics are collected themselves exact numbers are much easier to justify.

Unprofessional behavior of operators can be monitored. For example, there is a stop list of jargon and internal terms (that the client does not understand) that can be recognized and then sent analytics results to the senior team in the contact center. I’m not even talking about “greeted correctly”: we have one of the best NPS on the market in terms of politeness, but the automation still recognizes this and continues to track it.

Among the mistakes, we tried to follow the call checklist (following the script) on the fly as the call itself progressed. It turned out that colleagues from Yandex also tried it, and it didn’t work out very well for them either. They also made a recommendation: the operator does not need to show the progress status of all stages of the script. Only the main thing is needed, like the next step, otherwise the operator gets confused.

We were also, of course, concerned about the question of whether the operators of the beta group would change their speech towards greater intelligibility and more “protocol” phrases, knowing that it was not the smartest robot in the world listening to them. We analyzed the rate of speech - it did not change during the experiment, that is, everyone continues to speak as they spoke.

Beta results

Post-analytics collects 20 NLP reports: analysis of the quality of interaction with customers, classifiers of customer requests, evaluation of Home Credit Bank product offers, identification of reasons for refusal, adherence to federal laws, development of online tips, and so on. Based on the results of data processing using ML models or Python scripts on the business side, final reports are generated for each task.

We process 1.5 million calls monthly, 100% are converted to text.

How we listen and analyze every call to the bank

Tomcat

Professional

Practical use

Recognition Features

Additional Applications

Beta results

Similar threads

How we listen and analyze every call to the bank

Tomcat

Professional

Practical use​

Recognition Features​

Additional Applications​

Beta results​

Similar threads

Practical use

Recognition Features

Additional Applications

Beta results