How banks use machine learning to combat carding?

Mutt

Professional
Messages
1,376
Reaction score
920
Points
113
For educational purposes, I will provide a more detailed explanation of how banks use machine learning (ML) to combat carding, a fraudulent activity involving the use of stolen credit card data. I will cover key aspects: data, algorithms, anomaly detection processes, blocking of skimmed data, as well as examples, challenges, and future trends. The goal is to provide a deep understanding of the topic while keeping it accessible for an educational context.

1. What is carding and why is it important for banks?​

Carding is a type of fraud in which criminals use stolen bank card data (card number, CVV, owner name, expiration date) to make unauthorized transactions. The data can be obtained through:
  • Skimming: Devices that read card data from ATMs or terminals.
  • Phishing: Deceptive websites or emails that trick users into providing information.
  • Data leaks: Hacking of databases of retailers, banks or payment systems.
  • Darknet: Buying Stolen Data on Black Markets.

For banks, carding poses a threat not only because of financial losses (compensation of losses to clients), but also because of reputational risks and regulatory fines. Machine learning helps banks minimize these risks by analyzing huge amounts of data in real time and detecting fraudulent transactions with high accuracy.

2. Data types used to combat carding​

Banks collect and analyze a lot of data to identify suspicious transactions. This data can be divided into several categories:

a) Transactional data​

  • Transaction Amount: The amount of the payment (e.g. $10 or $10,000).
  • Time and date: When the transaction was made.
  • Transaction location: Geographical location (country, city) or online store.
  • Transaction type: Offline (POS terminal), online (e-commerce), cash withdrawal.
  • Merchant Category (MCC Code): For example, supermarkets, electronics, travel.

b) Behavioral data​

  • Customer transaction history: Typical amounts, frequency, purchase categories.
  • Behavior patterns: For example, a customer typically buys groceries within a 10 km radius of their home or uses the card only on weekends.
  • Interaction with the bank: Frequency of logging into the mobile application, requests for support.

c) Technical data​

  • Device: Type (smartphone, PC), operating system, browser version, screen resolution.
  • Device Fingerprinting: A unique identifier based on device characteristics (e.g. browser, fonts, plugins combination).
  • IP address: Geolocation, VPN/proxy usage, IP history.
  • Data entry speed: Time taken to complete the payment form.

d) External data​

  • Blacklists: Lists of compromised cards, IP addresses, devices obtained from payment systems (Visa, Mastercard) or law enforcement agencies.
  • Darknet data: Information on the sale of stolen cards (banks may cooperate with companies monitoring the darknet).
  • Social data: Linking with social media accounts to verify user authenticity.

This data is collected in real time and stored in Big Data such as Hadoop, Spark or cloud solutions (AWS, Google Cloud). Machine learning analyzes it to create a customer profile and identify deviations.

3. How Machine Learning Analyzes Data​

Machine learning to combat carding uses several approaches, each of which solves specific problems. Let's consider them in detail:

a) Supervised Learning​

  • How it works: The model is trained on historical data, where each transaction is labeled as either "legitimate" or "fraudulent." The model learns to find patterns that distinguish fraudulent transactions.
  • Algorithms:
    • Logistic Regression: A Simple Model for Estimating Fraud Probability.
    • Decision Trees and Random Forest: Efficient for handling multiple features (e.g. sum, IP, device).
    • Gradient Boosting (XGBoost, LightGBM, CatBoost): High accuracy by successively improving predictions.
    • Neural Networks: Used for complex data such as transaction sequences.
  • Example: If a customer typically buys a $5 coffee at a local coffee shop, and suddenly a $2,000 transaction appears at an online electronics store in another country, the model might assign it a high probability of fraud (e.g. 95%).
  • Process:
    1. Feature engineering: For example, the distance between the last transactions, the deviation of the amount from the average.
    2. Training a model on labeled data.
    3. Applying the model to evaluate new transactions in real time.

b) Unsupervised Learning​

  • How it works: The model looks for anomalies in data without prior labeling. It groups transactions by similarity and highlights those that do not fit typical clusters.
  • Algorithms:
    • K-means or DBSCAN: Clustering transactions to identify outliers.
    • Isolation Forests: Effective for detecting anomalies in large datasets.
    • Autoencoders: Neural networks that "compress" data and detect deviations while trying to reconstruct it.
  • Example: If a transaction occurs from a new device, through an IP associated with multiple suspicious transactions, and at an unusual time, the model may flag it as an anomaly even if there are no obvious signs of fraud.
  • Advantage: Allows you to identify new, previously unknown carding schemes.

c) Reinforcement Learning​

  • How it works: The model learns to make decisions (e.g. block or approve a transaction) based on feedback from the system (e.g. customer confirmation of fraud).
  • Usage: Less commonly used, but can be used to optimize blocking rules, minimizing false positives.
  • Example: A model might "experiment" by temporarily lowering the threshold for blocking certain types of transactions, and adjusting it based on the results.

d) Deep Learning​

  • How it works: Neural networks such as recurrent neural networks (RNNs) or transformers analyze complex dependencies, including temporal sequences of transactions.
  • Application: Detecting complex schemes such as "test" transactions (small amounts to test a card) before a major fraud occurs.
  • Example: If an attacker makes a series of small transactions ($1–$5) from different cards on the same site, a deep neural network can flag this sequence as suspicious.

e) Real-time analysis (Online Learning)​

  • How it works: The model updates in real time, adapting to new data.
  • Application: Rapid response to mass attacks such as skimmed cards in a short period of time.
  • Example: If multiple transactions with different cards are received from one IP within an hour, the model may temporarily increase the "suspiciousness" of transactions from that address.

4. The process of detecting anomalies​

Machine learning identifies anomalies by comparing the current transaction to normal customer behavior or global patterns. Here are the key aspects:

a) Key signs of anomalies​

  • Geographic: A transaction from another country or region where the customer has not previously transacted.
  • Temporal: Unusual times (e.g. buying at 4am when the customer is usually active during the day).
  • Behavioral: A sudden change in the type of purchases (e.g. switching from groceries to expensive electronics).
  • Technical: Using a new device, suspicious IP (e.g. associated with a VPN or darknet), or device fingerprint mismatch.
  • Transaction speed: Many transactions in a short period of time.
  • Amounts: Unusually high or low amounts (eg microtransactions to test a card).

b) Metrics for analysis​

  • Distance between transactions: Physical (e.g. 5000 km between two transactions in an hour) or virtual (different website domains).
  • Deviation from mean: Comparison of the current transaction with average values (amount, frequency).
  • Data Entry Speed: If card data is entered too quickly (e.g. copy-pasting), this may indicate an automated process.

c) Example of a process​

  1. A customer attempts to make a $1,000 purchase from an online electronics store.
  2. The ML model analyzes:
    • Geolocation: IP from Thailand, while the client is usually in Moscow.
    • Device: New smartphone, not associated with the client.
    • Behavior: The customer rarely buys electronics and typically spends no more than $200.
    • Time: 2:00 am customer's local time.
  3. The model assigns a high risk (e.g. 92%) to the transaction and sends a request for two-factor authentication (e.g. SMS code) or blocks the transaction.

5. Blocking skimmed data​

Once a suspicious transaction is identified, banks take the following measures:

a) Fraud Detection Systems (FDS)​

  • Platforms such as FICO Falcon, SAS Fraud Management or banks’ own developments use ML to assess the risk of each transaction in real time.
  • Models assign a "risk score" (e.g. 0 to 100) to a transaction. If the score exceeds a threshold (e.g. 90), the transaction is blocked or sent for manual review.
  • Example: Visa Advanced Authorization analyzes up to 500 features in milliseconds to make a decision.

b) Two-factor authentication (2FA)​

  • If a transaction is marked as suspicious, the bank may request additional identification (SMS code, biometrics, answer to a secret question).
  • Example: The customer receives an SMS with a code to confirm the purchase.

c) Blacklists​

  • ML helps update lists of compromised cards, IP addresses and devices.
  • Sources: Data from payment systems, law enforcement agencies, cybersecurity companies (e.g. Group-IB, ThreatMetrix).
  • Example: If a card is spotted on the darknet, all transactions with it are automatically blocked.

d) Adaptive rules​

  • ML models update blocking rules based on new data. For example, if a new carding scheme (massive microtransactions) emerges, the model can temporarily tighten controls on certain types of transactions.
  • Example: If there is a spike in transactions from a particular site during the day, the bank may temporarily restrict transactions with that merchant.

e) Feedback​

  • If the client confirms that the transaction was legitimate or fraudulent, the data is fed back into the model for further training, improving its accuracy.

6. Practical examples​

  • Visa and Mastercard: Their systems (Visa Advanced Authorization, Mastercard Decision Intelligence) use ML to analyze billions of transactions per second. For example, they can detect that a card is being used in two countries at the same time and block the suspicious transaction.
  • Sberbank (Russia): Uses ML to analyze transactions, including geolocation, device, and behavior. If a customer from Moscow suddenly makes a purchase in Brazil, the system requests 2FA or blocks the transaction.
  • PayPal: Uses ML to analyze online payments, including IP, account history, and associated devices. For example, PayPal may notice that an account is being used from a new device via a suspicious IP and temporarily freeze it.
  • Revolut: Uses ML to monitor transactions in real time, including checking geolocation and behavioral patterns. Example: If a card is used in a store and the customer’s phone is in another country, the transaction is blocked.

7. Problems and limitations​

  • False Positives: Models that are too strict can block legitimate transactions, causing customer dissatisfaction. For example, a purchase made while on holiday abroad may be flagged as suspicious.
  • Evolution of fraud: Carders are constantly developing new schemes (for example, using "clean" IPs via VPN or emulating legitimate devices), which requires constant updating of models.
  • Privacy: Data collection and analysis must comply with laws such as GDPR (Europe) or Federal Law No. 152 (Russia). Banks must balance between security and protection of customer data.
  • Resources: Processing large amounts of data requires powerful computing resources and qualified specialists (data scientists, ML engineers).
  • Processing delays: In rare cases, ML models may slow down transaction processing, especially if additional validation is required.

8. Future Trends​

  • Artificial Intelligence (AI) and Deep Learning: More sophisticated neural networks such as transformers will be used to analyze complex transaction sequences.
  • Biometrics: Integrate biometric data (fingerprints, facial recognition) with ML to improve authentication accuracy.
  • Federated learning: Banks can collaborate by sharing anonymized data to train models without compromising privacy.
  • Darknet Processing: ML will be used more actively to monitor the darknet and prevent stolen cards from being used before they are activated.
  • Real Time: Accelerate transaction processing with optimized models and quantum computing (future).

9. Conclusion​

Machine learning is a powerful tool in the fight against carding, allowing banks to analyze huge amounts of data (transactions, IP, devices, behavior) in real time. Algorithms such as gradient boosting, clustering, and deep neural networks help identify anomalies, block skimmed data, and adapt to new threats. Despite challenges such as false positives and fraud evolution, ML continues to evolve, balancing security and customer experience. For educational purposes, it is important to understand that success depends on data quality, algorithm selection, and continuous model updating in the face of rapidly changing fraud patterns.
 
Top