Fixing ATMs: fixing hardware and software

Tomcat · May 24, 2024

Software errors are a common reason why an ATM can get stuck. The software is constantly being updated and developed. For example, we learned a lot after the NFC update, when phones were used instead of cards. It was very interesting during biometric updates, when money was withdrawn not from the card, but from the face. For example, a distance measurement hotfix was rolled up there (the client chose to withdraw money, bent down to tie a shoelace, and the money was withdrawn from the account of the next person in line).

The second reason is that every piece of hardware has a failure time. Now we are one of the first companies on the market to engage in preventive maintenance.We try to anticipate when something will break and treat it in advance so as not to lead to unexpected breakdowns.

And the third story that causes a malfunction is external circumstances, including customers. Putting a wad of banknotes with an elastic band or punctured with a stapler clip is every day. Bent and taped cards are also common.

The ATM is constantly being improved in order to filter out excess garbage. There are metal sensors to filter out paper clips. For some models, the tray is designed in such a way that the money is first conditionally shaken up, and everything that could get stuck between them falls lower into the tray, where garbage is collected, and the bills are recalculated inside the ATM.

If you managed to get rid of this garbage and recalculate the money, the ATM validates the notes for compliance with all the parameters of the Central Bank, that they are real money.

And if everything is successful, then it credits them to the client's account. If it fails, then some of the bills can be returned to the client with a comment that the bills were not recognized. If some fatal technical failure occurs, for example, a jam of bills, and part of the money remains in the ATM, the client will have to contact the bank's hotline.

I'll tell you how we handle all these challenges and maintain our ATM fleet.

Let me briefly remind you:

The ATM consists of a system unit with Linux or Windows OS;
of the monitor;
printer for printing receipts;
the card reader that accepts and issues the card back;
module for cash transactions: issuing and receiving;
keyboards;
inside, where there is money, paint and many other interesting things.

From the point of view of architecture, this is almost an ordinary PC, except for the parts that accept and issue money and cards. On top of the OS — the level of drivers for connected devices and the level of basic software. The latter combines all the" insides " of the ATM. Above the basic software, there is an application software that works to transfer the display of information from the browser to the monitor screen and provides remote control.

Longer!

Task # 1 is to ensure the longest possible error-free operation of the ATM. In other words, the availability of the ATM for the client. To do this, we rely on two principles. The first is to increase the possible uptime between system crashes. The second is a quick response to errors.

The error may continue to resolve:

by rebooting;
operations with software (for example, cleaning a huge log) can be performed by a robot or a remote operator;
on the road — an engineer working with hardware.

Check-out is a complicated story, because there is money there and you often need to go with collectors. It is unlikely that you will want to work with a screwdriver, taxed with 8 million rubles. Preventive maintenance is also a difficult part, because it all needs to be coordinated and done.

One repeated failure costs us like ten overdue applications. In this scenario, the option with a long but high-quality prevention of equipment is even more economical than a quick "turn it up, turn it up and turn it on", since as a result, the error-free operation time increases, and the risk of repeated failure decreases.

Jarvis the robot-our assistant

We have developed robots that control the delivery of software to ATMs, when a person connects only in case of an error. When errors occur and it is not possible to fix the defect remotely, we take a software impression, automatically send a message to the engineer with information about the nature of the error, if necessary, then form a request for closing the ATM and send it to the engineer.

But ideally, you should not go to the most rapid elimination of reactive errors, but to proactive maintenance.:

Perform diagnostics of errors in order to learn how to solve them more and more effectively, and tell the engineer before they leave where the error occurred, what the problem is, and what they need to take with them.
Perform analysis of device operation time, time between failures, and error categories.
Launch event-based monitoring based on signals from a running ATM.

The goal is to prevent a fatal crash when the function is completely degraded.

And just for such a proactive service, we resort to the help of robots.

The smart robot we created, called Jarvis, monitors events that occur on a running network and generates proactive incidents with devices. And even if there is a failure, it is not fatal and you can fix it in a planned mode and at a convenient time. Thus, situations where the ATM installed in the office ate someone's card on a payday or did not issue money are excluded.

Nero Burning ROM

Now an ATM is no longer just a machine for receiving and issuing money. Payment for housing and communal services, telephone communications have already become its usual functionality. With the advent of the biometric system, it became possible to identify the client by face, and the introduction of QR codes allows you to perform financial transactions without a plastic bank card. For example, a topless ATM where a large monitor is installed instead of the usual upper part of the machine is not intended at all for inserting a card into it. The customer simply scans the QR code on such an ATM and withdraws cash via the mobile app.

The functionality will continue to be improved so that the client can get the maximum possible number of services in a convenient way through this iron box. The ATM has become a kind of sales channel, with personal accounts of each user, with a different pool of settings.

Most common mistakes

The top 3 most frequent incidents are failures of the bill acceptance module. This is the most vulnerable part of the ATM, because it is most affected by the customer. All crumpled bills mixed with candy wrappers and paper clips are the root cause of the device's failure.

Once in the ATM, the money does not fall to the bottom, as in a piggy bank, and do not lie there all in one pile. The bills need to be sorted and put into cassettes depending on the denomination.

All this is done by the ATM's hardware mechanism. And its repair, replacement of any parts in case of breakage is a long and expensive process. And to the cost of spare parts, losses due to ATM downtime, you also need to add the cost of attracting collectors to fully unload the ATM, the cost of leaving the service department on the spot to fix everything.

The second most frequent category of errors is cards that are forgotten by the client or bent. A forgotten card will not stop the ATM from working — they forgot the card, moved away, the card went to the tray, and the customer then goes to the bank. And if the customer somehow shoved the bent card into the ATM, but it didn't come out again, or it got caught on the module, the ATM stops working because of this.

And the third error is failures related to communication problems, including due to a power outage — for example, someone turned off the ATM from the wall outlet to charge their phone.

The ATM is connected to the central host via radio communication. At the time of its loss, it cannot provide services to customers.

Types of communication errors:

DOWN — communication error;
RSRV-backup channel error;
CHNL-error of the primary communication channel.

If the ATM remains connected with RSRV and CHNL and incidents are resolved in a quiet mode, then it is impossible to perform any operations with DOWN. In this case, the idle time may be even longer than when the bill acceptance module fails. But at the same time, communication problems are solved almost without the involvement of engineers, collectors — everything can be repaired by remote diagnostics, using robots, without the participation of operators. On average, 80-90% of ATMs in the network recover from such failures within a few minutes. And even if you have to call a service center, they are low-skilled, inexpensive specialists. Therefore, this category of errors is much cheaper than failures due to incorrect customer actions.

Simply put, everything that requires disassembling and assembling equipment is a costly mistake.

And it is expensive not only because of paying for people's work, but also because of lost transactions with customers. Therefore, we need to reduce the cost of solving the problem and minimize the number of failures. It is best if we also learn to predict where and why a failure may occur.

I have already mentioned above about such a concept as time to failure. Each module has its own amount of time to failure — this is the number of bills that this module allows you to pass through without an error in the laboratory. This parameter needs to be evaluated for each specific line, for a specific region, in order to predict with the most accurate probability what level of time between failures at a particular ATM will be normal.

This indicator allows you to determine when to wait for a failure. But, of course, we will not wait and run in a panic to fix it. We start preventive work at a time when it is cheaper for us, for example, during a planned collection, when there is no salary period.

Another way to make predictions is through ATM diagnostics. In the event log (logs), we note when, for example, there are messages warning that the card reader has not read the cards. This gives us an understanding that if the event repeats, then there is probably a problem in the card reader and it may soon break, for example, with the comment, the chip station does not work. In the meantime, it is either dirty or works with periodic comments.

Thus, we plan to work on restoring all the functions of the device before its complete breakdown, so that this does not come as a surprise to us.

A very unpredictable source of problems for ATMs was small stickers from supermarkets, the same ones that the cashier in the store asks: "Do you collect stickers?".

These small, small stickers are lost among customers ' bills, get into the ATM and destroy the entire system. When the ATM starts counting bills, these sticky stickers are peeled off from the money and glued to the mechanisms. This leads to a fatal failure and subsequent disassembly of the entire hardware. Stickers — small, difficult to detect, fall between sensors and the ATM simply does not see them.

To minimize the number of failures due to such errors, the ATM mechanism has to be constantly improved. For example, in a number of devices, the trays are arranged in such a way that the money is first, conditionally, shaken up so that everything that could get stuck between them falls into a special garbage tray, and then the bills are already recalculated and sorted. The ATM mechanism can repeat this procedure several times with the same bills, and if something still prevents it from recognizing the money, it will return it to the client with a corresponding comment. But it may not be returned or credited to your account.

We have already worked out such a negative scenario. The money got stuck in the ATM, but it didn't arrive to the client's account. An angry customer is about to call the bank, but we stop them by automatically receiving an SMS message or notification from the bank's mobile app with information that we know about the ATM error and the funds have been credited to the account and that we will fix everything.

We tracked the error remotely because we automated the process of returning funds to the client by analyzing logs. We are following the path of robotization of all these processes. At the moment, almost all incidents are resolved by robots, with the exception of those where the decision is made on the basis of reports transmitted by collectors and engineers when they close their applications. Let me explain that the report is text. It is written by people: abbreviated, with errors. The robot may not always recognize this text correctly. But here, too, we have already worked to understand this text without people. We will use machine learning to transform it into a formalized format for robots. Based on this, automated incident resolution scenarios occur. While this zone is partially covered — 10-15%, it is necessary to connect operators. But in general, 85-87% of incidents, depending on the month, are resolved automatically. There are a number of practical cases when the team does not even know about the technical moment of the error occurrence — everything is fixed automatically.

Advantages of robotization

Robots have already enabled it:

Reduce the cost of processes — if the number of defects from the robot is the same or lower than the number of defects from the human factor, then we completely transfer the process to automated rails.
Increase coverage — the robot solves more tasks than a human, and we don't waste operator resources.

The main thing is accessibility for customers

In 2012, we started working on automation, intelligent business process development, training of our partners and engineering services. With the transition to automatic rails, the availability of our ATMs — the amount of time when all the functionality is working properly-has tripled compared to 2012. Now it is more than 97%.

During this time, both the ATM network and the service itself have become more complex. Initially, we had rather low network availability on ATMs that work only for issuing, and this is despite the fact that such a device itself technically has much greater availability than an ATM that works for accepting cash. Now we have one of the highest network availability at ATMs that accept money.

In the future, ATMs should be serviced like cars: once a year, the car is sent for maintenance to replace filters, oils and other things, and all the rest of the time it regularly takes its owner to where it needs to go. That's how the ATM works — once a year for THAT, and the rest of the time it works. You can only dream about this for now, but you should strive for it.

Fixing ATMs: fixing hardware and software

Tomcat

Professional

Longer!

Jarvis the robot-our assistant

Nero Burning ROM

Most common mistakes

Advantages of robotization

The main thing is accessibility for customers

Similar threads

Fixing ATMs: fixing hardware and software

Tomcat

Professional

Longer!​

Jarvis the robot-our assistant​

Nero Burning ROM​

Most common mistakes​

Advantages of robotization​

The main thing is accessibility for customers​

Similar threads

Longer!

Jarvis the robot-our assistant

Nero Burning ROM

Most common mistakes

Advantages of robotization

The main thing is accessibility for customers