Tomcat
Professional
- Messages
- 2,689
- Reaction score
- 932
- Points
- 113
Inspired by this post. I love this genre, I read and remembered my stories, I wanted to share them. It wasn't always my fault, but I was more or less involved. I hope it was an interesting Friday reading experience.
As it happens, I've spent most of my life in fintech, so all the stories are related to banks and / or finance, so sometimes it was possible to cause quite real damage. The stories are in chronological order, with the most "heavy" being the last one, which is logical, as an engineer can do more damage with more experience. These stories are old, so it is possible that I forgot some details or misrepresented them-this is not on purpose.
In the last few days before the draw, there was such a flood that operators who knew how to work with the system could not cope, so they put everyone who came to hand to help them, allocated a range of numbers to everyone, and people simply wrote down on a piece of paper who got what numbers. Naturally, then all this data had to be entered into the system, and then the operators sat for another half-night and drove it all in. Since we couldn't keep the numbering sequence consistent anymore, I had to quickly disable some validations in the data entry form. Data is entered by people who know the system well, so what can go wrong?
And when the final was close, there was nothing left to enter and I was already mentally preparing to go home, one of our cameramen comes in on weak legs and in a state close to hysterical, says that she did something wrong. I run to look, and it turns out that she accidentally mixed up the input fields of the initial number and quantity, that is, it turned out that the client instead of a couple of shares bought tens, if not hundreds of thousands. By the way the computer thought about it, although everything worked instantly before, she realized that there was a jamb, and immediately turned off the computer (yes, the logic was in the client, there was only data on the server), and they managed to rub data on a small number of shares, maybe a couple of thousand (each share is a separate entry in the database, such a model there was no data), but it was necessary to find the correct information from paper records and enter everything by hand. I sat and dictated data to the operator, and she typed it in. We parted in the early morning. No one was beaten, although I had something to listen to (and rightly so, of course).
Starting with some version, Novell dropped support for the DOS version of Betrieve, and shortly before that, we upgraded NetWare, and Btrieve with it. So when I saw a message about a corrupted data file, I immediately realized that Btrieve had changed the file format during the upgrade, and all this can't work locally. Somewhere on TV, the lights were already warming up the spotlights to show the whole world the lucky draw, not knowing that there would be no draw, because we would not be able to immediately find out who the number belongs to. I realized that this was my last day at work, so I sat down on my desk and lit a cigarette, right in the office, looking thoughtfully out the window.
By my atypical behavior, others realized that something was wrong. And the same operator, to whom I dictated data on shareholders for half a night just now, said a brilliant thing: "make a report on all shareholders with stock numbers in a text file, and just search for the text file." A simple and obvious solution, especially since such a report has already been written, is to launch it in a minute. And within 15 minutes we were on our way to the TV station, where everything went like clockwork, because even I can't break the text file search so quickly.
The day of the first auction arrives. The equipment is mounted, the ticker is filled with numbers, and the printer is loaded with expensive paper with watermarks-cool to the point of impossibility. Half an hour before the start, the big boss comes up and says that the bank managers have now rubbed off and decided to change the principle of holding the auction. I don't remember exactly what the change was, but the main thing is that it didn't fit into the logic I programmed. I said that I would not change anything half an hour before the launch, and was about to sit down on the table and light a cigarette, but then I felt the strong hand of my immediate supervisor on my shoulder, which he pressed me into a chair, and I heard his angry voice: "show me the code!". I do not know why I took with me floppy disks with the source code and compiler, probably a flair. I quickly installed the compiler, opened the code, my boss and I looked at the code, and decided that the easiest and most reliable way would be to add a goto in one place. Yes, goto. To the Pascal program! It was a good thing old Wirth wasn't around at that moment.
Have you guessed what will happen next? And here it is not! Everything worked out as it should. This is what happened in full Agile at a time when there was no such word yet.
And then one day a colleague of mine who worked on ATM hardware told me about a problem ATM that constantly loses connection, but the problem is clearly not iron, so I have to go figure it out. It is necessary - it means necessary, the next morning we got in the car with him and drove, 150 km to the ATM. We arrive-an off-line ATM. We call cash collection, wait for them, they take the money and leave us an open ATM. Just in case, I reinstall all the software and call operator X.25 networks, check all communication parameters, everything is fine, there is a connection. We wait for half an hour, an hour - there is a connection. We call cash collection, they charge the money, enter the encryption keys into the ATM, and close the ATM.
We leave the bank branch and sit across the street for a bite to eat. While waiting for my order, I notice that the picture on the ATM screen stops changing. I come closer-exactly, off-line. I call the operator - yes, the connection is lost on our side. Lunch is canceled, we return to the branch, call the collection service, they take the money, leave us an open ATM. After some time, the connection magically restored itself! But what has changed? Money cassettes can't really have an impact, can they? Maybe a door? I close the door - after five minutes, the connection disappears. I open it and it is restored. That's something. We examine the door - it doesn't squeeze anything, but it does. What else can a closed door affect? Temperature and humidity. Maybe there is a bad contact somewhere, cold soldering? Since the diagnostics does not complain about anything, most likely the problems are already somewhere completely at the output, maybe the wires to the RS-232 connector are poorly soldered? I pull on a bundle of wires, and one wire comes off. We solder the wire, close the door, wait - there is a connection! We call the collection office again, they come here specifically dissatisfied, load the money and look so angry, we got them, and I don't like to get harsh uncles with weapons, so I promise them that this is the last time, and I hope it's true.
We decide to have a snack on the other side, but this time we manage to do it, and after half an hour the ATM is still open, so we leave.
PS By the way, about the cassettes - once I went to a client, and the client asked me to pick up a couple of broken cassettes, give them to my colleagues. I'm walking down the street, carrying cassettes, and I hear small boys talking behind me, and one says to the other: "these are cassettes for an ATM!". And then I realized that I was walking down the street alone with cassettes, and people around me guessed that they were cassettes, but they didn't know that they were empty. And the time was harsh, the mid-90s. But nothing, I got to the office and brought the tapes.
For some reason, they sent me to change the disk, despite the fact that we had three clean pieces of iron. Probably just because I was the youngest, and I was not lazy to go to the data center at 3 in the morning. Come. They turned off the Euromodule, I changed the disk, I turn it on according to the instructions (this is generally a greeting from the 70s - first you turn on the disk, you can tell by the sound that it has spun up, then you turn on the system unit) - the error code appears on the panel. I look at the error directory - there is a problem with the network interface. I pull out the card of "my favorite" X. 25, put it back - the Euromodule starts throwing an error that the card is not detected at all. I still suffered for a couple of hours, until the banks started calling and asking what was going on. Then the guys from the data center told me to go to hell, they'll figure it out for themselves. As I understand it, they ordered a new card from Europay, and it was sent to them on the same day, but I didn't change it anymore. In any case, no card transactions took place all day.
I do not know how I managed to break this card, because I did everything very carefully, and even put on an anti-static bracelet like an adult. So far, I faintly hope that it just so coincided unsuccessfully.
I then managed to restore the picture bit by bit, and the scheme looked something like this : this employee was engaged in installing ATMs, including entering ATM keys, and once he had both components of the key in his hands. Since he was involved in ATMs, he either used logs (this was long before the introduction of PCI DSS, which forbade storing PIN blocks in any form), or simply read all ATM traffic with a sniffer, where he was able to find the transport key, and they already decrypt PIN blocks from the same traffic. Well, the contents of the magnetic strip, by itself. Then he slapped a few cards and went to the ATM of Bank A, right on the same street, so I don't rule out that he was just tracked by the cameras - they are very thick in that area.
How did this story affect me? Unfortunately, directly. Bank B was my client, and I trained this guy, told him about the Diebold 912 protocol (protocols for communicating ATMs with the controller they used), about possible attacks and ways to prevent them, so it's quite possible that he got the idea from my stories and the documentation I gave him. Then I had a long conversation with the bank's IT manager - he wanted to find out if I was involved in this story. It was very unpleasant.
To handle such situations, the driver started the listener again after accept (for some reason, we closed the first listener), and when a new connection arrived, it checked the IP address - if it was an ATM, then the old socket was closed, and the new one was used. In theory, it should work, and on all unit tests it worked as it should. But it didn't work as part of the system. I took it upon myself to deal with this problem, and I remember that I did a lot of work, actually duplicated the listen queue in the application (because it didn't seem to work as well as we needed in the kernel), and a structure in memory was allocated for each slot of this queue. In the test environment with one ATM, everything worked out perfectly. We put the driver to the client, and immediately matyugs arrive from the client - each driver consumes about 3 megas of memory, and dravers are launched by the number of ATMs, that is, hundreds. It was 2000 or so, so the normal amount of memory for a decent HP 9000 L-class was in the region of 1-2 gigs, and here 200 ATMs immediately chewed off 600 megas. I sat down to investigate, and it turned out that the ATM subsystem, when communicating with the TCP/IP driver, transmits parameters in the wrong order, and the timeout value of 10,000 milliseconds falls into the queue length parameter, and the driver allocates memory for 10,000 structures, as requested. And the mystery with the original driver was solved - when it used such a huge queue size, the HP'shny kernel did not return an error, but it also did not accept incoming connections.
That is, if, instead of rushing to solve the problem, I carefully read the code and swapped the two constants in the function call, the driver would not have to be rewritten at all. However, the new driver solved a few minor problems, so we left it, but the order of parameters was corrected.
And a couple of hours later, a former colleague calls me and tells me a story. But this story requires a backstory.
The fact is that the card number consists of a 6-digit prefix (called BIN), 9 arbitrary digits, and one check digit. Binds cost money, so they try to use them sparingly, and use the first few digits out of 9 arbitrary ones as the product code (different products may have different sets of allowed operations, limits, commissions, and even log in differently). Thus, they usually allocate a card prefix, which consists of BIN and a 2-3-digit code of the product, leaving 6-7 digits arbitrary. Using this prefix, the card system identifies the product, and determines how to process a transaction with this card, and this is done by an internal rooter.
So, as a former colleague told me, the one who wrote this rooter allocated an array of 9 bytes for the card prefix, in case the bank configured a 3-digit product code for the asset, and everything worked fine. And then at some point, the bank set a 4-digit product code for the new asset, that is, the prefix became 10 characters, and it no longer fit into the buffer. And that lazy programmer didn't do the check, then who would think of doing such a long product code? At the first transaction with a card with a new long prefix, the rooter crashed, and without it, nothing works at all.
Well, you know who that rukozhop was. I was very ashamed. Probably the best time to commit seppuku if I were Japanese.
There are actually more stories, but from the very beginning I limited myself to the round number 8, and I eliminated the less interesting stories. It's been 20 years since the last story, so I probably learned something after all, since I don't have a fresh one. Or maybe life just got more boring.
As it happens, I've spent most of my life in fintech, so all the stories are related to banks and / or finance, so sometimes it was possible to cause quite real damage. The stories are in chronological order, with the most "heavy" being the last one, which is logical, as an engineer can do more damage with more experience. These stories are old, so it is possible that I forgot some details or misrepresented them-this is not on purpose.
1. About the benefits of data validation
In the first half of the 90s, I wrote a register of shareholders for a fairly large holding company at that time. People actively carried money under the promise of 60% per annum (they really were, not cheating), and to make it even more attractive, a lottery was announced - the owner of the promotion with the winning number received a significant amount, I don't remember exactly how much, but I remember that it was decent, an apartment could be bought.In the last few days before the draw, there was such a flood that operators who knew how to work with the system could not cope, so they put everyone who came to hand to help them, allocated a range of numbers to everyone, and people simply wrote down on a piece of paper who got what numbers. Naturally, then all this data had to be entered into the system, and then the operators sat for another half-night and drove it all in. Since we couldn't keep the numbering sequence consistent anymore, I had to quickly disable some validations in the data entry form. Data is entered by people who know the system well, so what can go wrong?
And when the final was close, there was nothing left to enter and I was already mentally preparing to go home, one of our cameramen comes in on weak legs and in a state close to hysterical, says that she did something wrong. I run to look, and it turns out that she accidentally mixed up the input fields of the initial number and quantity, that is, it turned out that the client instead of a couple of shares bought tens, if not hundreds of thousands. By the way the computer thought about it, although everything worked instantly before, she realized that there was a jamb, and immediately turned off the computer (yes, the logic was in the client, there was only data on the server), and they managed to rub data on a small number of shares, maybe a couple of thousand (each share is a separate entry in the database, such a model there was no data), but it was necessary to find the correct information from paper records and enter everything by hand. I sat and dictated data to the operator, and she typed it in. We parted in the early morning. No one was beaten, although I had something to listen to (and rightly so, of course).
2. About the benefits of testing in a combat environment
This story happened right after the first one, the very next morning. Here we need to make a small digression regarding the architecture of my application - all the logic, as I already said, was in the client, and the data was stored on the server, in Btrieve, this is such an ISAM built into NetWare, very fast. Btrieve had two options - for NetWare (network) and for DOS (personal), and it is important to note that the data file format was common, so the plan was to take the server data, transfer it to the laptop, and search for it locally, since the application can work with local Btrieve too (the whole process has been repeatedly tested). Have you guessed what will happen next?Starting with some version, Novell dropped support for the DOS version of Betrieve, and shortly before that, we upgraded NetWare, and Btrieve with it. So when I saw a message about a corrupted data file, I immediately realized that Btrieve had changed the file format during the upgrade, and all this can't work locally. Somewhere on TV, the lights were already warming up the spotlights to show the whole world the lucky draw, not knowing that there would be no draw, because we would not be able to immediately find out who the number belongs to. I realized that this was my last day at work, so I sat down on my desk and lit a cigarette, right in the office, looking thoughtfully out the window.
By my atypical behavior, others realized that something was wrong. And the same operator, to whom I dictated data on shareholders for half a night just now, said a brilliant thing: "make a report on all shareholders with stock numbers in a text file, and just search for the text file." A simple and obvious solution, especially since such a report has already been written, is to launch it in a minute. And within 15 minutes we were on our way to the TV station, where everything went like clockwork, because even I can't break the text file search so quickly.
3. About the harm of using a boss with a strong hand
Several large financial organizations decided to stir up a currency auction. My office contributed its share of the software that our department developed according to the requirements received from the management. The system turned out to be very expensive-network, multi-user, fashionable Turbovision interfaces, data is again in Btrieve, clients send notifications to the server via SPX, data is displayed on a ticker, beautiful transaction confirmations are automatically printed (how much we fought with PCL5e!). And it worked fast. For those times when almost everything was written on Clipper or FoxPro, it was space.The day of the first auction arrives. The equipment is mounted, the ticker is filled with numbers, and the printer is loaded with expensive paper with watermarks-cool to the point of impossibility. Half an hour before the start, the big boss comes up and says that the bank managers have now rubbed off and decided to change the principle of holding the auction. I don't remember exactly what the change was, but the main thing is that it didn't fit into the logic I programmed. I said that I would not change anything half an hour before the launch, and was about to sit down on the table and light a cigarette, but then I felt the strong hand of my immediate supervisor on my shoulder, which he pressed me into a chair, and I heard his angry voice: "show me the code!". I do not know why I took with me floppy disks with the source code and compiler, probably a flair. I quickly installed the compiler, opened the code, my boss and I looked at the code, and decided that the easiest and most reliable way would be to add a goto in one place. Yes, goto. To the Pascal program! It was a good thing old Wirth wasn't around at that moment.
Have you guessed what will happen next? And here it is not! Everything worked out as it should. This is what happened in full Agile at a time when there was no such word yet.
4. About what programmers can do in hardware
After some time, my office was covered with a copper basin, and I went to work in one well-known, at that time mostly iron-clad office, but I was engaged in software support there, again financial.And then one day a colleague of mine who worked on ATM hardware told me about a problem ATM that constantly loses connection, but the problem is clearly not iron, so I have to go figure it out. It is necessary - it means necessary, the next morning we got in the car with him and drove, 150 km to the ATM. We arrive-an off-line ATM. We call cash collection, wait for them, they take the money and leave us an open ATM. Just in case, I reinstall all the software and call operator X.25 networks, check all communication parameters, everything is fine, there is a connection. We wait for half an hour, an hour - there is a connection. We call cash collection, they charge the money, enter the encryption keys into the ATM, and close the ATM.
We leave the bank branch and sit across the street for a bite to eat. While waiting for my order, I notice that the picture on the ATM screen stops changing. I come closer-exactly, off-line. I call the operator - yes, the connection is lost on our side. Lunch is canceled, we return to the branch, call the collection service, they take the money, leave us an open ATM. After some time, the connection magically restored itself! But what has changed? Money cassettes can't really have an impact, can they? Maybe a door? I close the door - after five minutes, the connection disappears. I open it and it is restored. That's something. We examine the door - it doesn't squeeze anything, but it does. What else can a closed door affect? Temperature and humidity. Maybe there is a bad contact somewhere, cold soldering? Since the diagnostics does not complain about anything, most likely the problems are already somewhere completely at the output, maybe the wires to the RS-232 connector are poorly soldered? I pull on a bundle of wires, and one wire comes off. We solder the wire, close the door, wait - there is a connection! We call the collection office again, they come here specifically dissatisfied, load the money and look so angry, we got them, and I don't like to get harsh uncles with weapons, so I promise them that this is the last time, and I hope it's true.
We decide to have a snack on the other side, but this time we manage to do it, and after half an hour the ATM is still open, so we leave.
PS By the way, about the cassettes - once I went to a client, and the client asked me to pick up a couple of broken cassettes, give them to my colleagues. I'm walking down the street, carrying cassettes, and I hear small boys talking behind me, and one says to the other: "these are cassettes for an ATM!". And then I realized that I was walking down the street alone with cassettes, and people around me guessed that they were cassettes, but they didn't know that they were empty. And the time was harsh, the mid-90s. But nothing, I got to the office and brought the tapes.
5. That programmers can't (always) get into hardware
There was such a card network - Europay, the European operator MasterCard, and they also had their own cards. To connect to their network, a Euromodule was used based on the IBM Series / 1, a very ancient mini-machine. This Euromodule was served by a dozen large local banks, so the main flow of transactions on MasterCard/Eurocard/Maestro cards went through it. And it was necessary to change something in this Euromodule, it seems, to put a larger disk, and it was necessary to do this at night, when there are few transactions.For some reason, they sent me to change the disk, despite the fact that we had three clean pieces of iron. Probably just because I was the youngest, and I was not lazy to go to the data center at 3 in the morning. Come. They turned off the Euromodule, I changed the disk, I turn it on according to the instructions (this is generally a greeting from the 70s - first you turn on the disk, you can tell by the sound that it has spun up, then you turn on the system unit) - the error code appears on the panel. I look at the error directory - there is a problem with the network interface. I pull out the card of "my favorite" X. 25, put it back - the Euromodule starts throwing an error that the card is not detected at all. I still suffered for a couple of hours, until the banks started calling and asking what was going on. Then the guys from the data center told me to go to hell, they'll figure it out for themselves. As I understand it, they ordered a new card from Europay, and it was sent to them on the same day, but I didn't change it anymore. In any case, no card transactions took place all day.
I do not know how I managed to break this card, because I did everything very carefully, and even put on an anti-static bracelet like an adult. So far, I faintly hope that it just so coincided unsuccessfully.
6. The fact that you never know where it might come from
With the spread of cards and ATMs, fraud is inevitable. And then one day Bank A caught a carder who was withdrawing money from their ATM. Things happen. The piquancy of this story was added by the fact that it turned out to be an employee of the card department of Bank B, a person who managed ATMs in this very bank B.I then managed to restore the picture bit by bit, and the scheme looked something like this : this employee was engaged in installing ATMs, including entering ATM keys, and once he had both components of the key in his hands. Since he was involved in ATMs, he either used logs (this was long before the introduction of PCI DSS, which forbade storing PIN blocks in any form), or simply read all ATM traffic with a sniffer, where he was able to find the transport key, and they already decrypt PIN blocks from the same traffic. Well, the contents of the magnetic strip, by itself. Then he slapped a few cards and went to the ATM of Bank A, right on the same street, so I don't rule out that he was just tracked by the cameras - they are very thick in that area.
How did this story affect me? Unfortunately, directly. Bank B was my client, and I trained this guy, told him about the Diebold 912 protocol (protocols for communicating ATMs with the controller they used), about possible attacks and ways to prevent them, so it's quite possible that he got the idea from my stories and the documentation I gave him. Then I had a long conversation with the bank's IT manager - he wanted to find out if I was involved in this story. It was very unpleasant.
7. The benefits of mindfulness
Then I worked in another office, where we wrote software for card processing and for managing ATMs. And we had a TCP / IP driver that was used to communicate with ATMs, and it was a little glitchy. Networks in those days were unreliable, communication was often lost, but it was lost badly, without sending RST, and we received the classic half-open.To handle such situations, the driver started the listener again after accept (for some reason, we closed the first listener), and when a new connection arrived, it checked the IP address - if it was an ATM, then the old socket was closed, and the new one was used. In theory, it should work, and on all unit tests it worked as it should. But it didn't work as part of the system. I took it upon myself to deal with this problem, and I remember that I did a lot of work, actually duplicated the listen queue in the application (because it didn't seem to work as well as we needed in the kernel), and a structure in memory was allocated for each slot of this queue. In the test environment with one ATM, everything worked out perfectly. We put the driver to the client, and immediately matyugs arrive from the client - each driver consumes about 3 megas of memory, and dravers are launched by the number of ATMs, that is, hundreds. It was 2000 or so, so the normal amount of memory for a decent HP 9000 L-class was in the region of 1-2 gigs, and here 200 ATMs immediately chewed off 600 megas. I sat down to investigate, and it turned out that the ATM subsystem, when communicating with the TCP/IP driver, transmits parameters in the wrong order, and the timeout value of 10,000 milliseconds falls into the queue length parameter, and the driver allocates memory for 10,000 structures, as requested. And the mystery with the original driver was solved - when it used such a huge queue size, the HP'shny kernel did not return an error, but it also did not accept incoming connections.
That is, if, instead of rushing to solve the problem, I carefully read the code and swapped the two constants in the function call, the driver would not have to be rewritten at all. However, the new driver solved a few minor problems, so we left it, but the order of parameters was corrected.
8. The benefits of checking buffer sizes
After a while, I was working in a different place, and I didn't deal with ATMs. One day, the ATM network of a large bank suddenly crashes, and their cards are not accepted in stores. Well, as usual, we laughed at the programmers-rukozhopami, and forgot. After a couple of hours, everything worked again.And a couple of hours later, a former colleague calls me and tells me a story. But this story requires a backstory.
The fact is that the card number consists of a 6-digit prefix (called BIN), 9 arbitrary digits, and one check digit. Binds cost money, so they try to use them sparingly, and use the first few digits out of 9 arbitrary ones as the product code (different products may have different sets of allowed operations, limits, commissions, and even log in differently). Thus, they usually allocate a card prefix, which consists of BIN and a 2-3-digit code of the product, leaving 6-7 digits arbitrary. Using this prefix, the card system identifies the product, and determines how to process a transaction with this card, and this is done by an internal rooter.
So, as a former colleague told me, the one who wrote this rooter allocated an array of 9 bytes for the card prefix, in case the bank configured a 3-digit product code for the asset, and everything worked fine. And then at some point, the bank set a 4-digit product code for the new asset, that is, the prefix became 10 characters, and it no longer fit into the buffer. And that lazy programmer didn't do the check, then who would think of doing such a long product code? At the first transaction with a card with a new long prefix, the rooter crashed, and without it, nothing works at all.
Well, you know who that rukozhop was. I was very ashamed. Probably the best time to commit seppuku if I were Japanese.
There are actually more stories, but from the very beginning I limited myself to the round number 8, and I eliminated the less interesting stories. It's been 20 years since the last story, so I probably learned something after all, since I don't have a fresh one. Or maybe life just got more boring.