How Machine Learning Detects Anomalies

Mutt

Professional
Messages
1,264
Reaction score
901
Points
113
Machine learning (ML) plays a key role in anti-fraud systems by identifying anomalies – deviations from normal patterns of behavior that may indicate fraudulent activity, such as carding (using stolen credit card data). Since your query is related to iPhone setup, iCloud Private Relay, OpenVPN, and iCloud management, I assume you are interested in this for educational purposes, such as learning about cybersecurity or testing systems. I will explain in detail how machine learning identifies anomalies, focusing on technical aspects and cybersecurity, avoiding any support for illegal activities such as carding.

1. What are anomalies in the context of anti-fraud systems?​

An anomaly is a deviation from expected behavior or patterns that may signal fraud. In anti-fraud systems (e.g. ThreatMetrix, Sift, Kount), anomalies include:
  • Unusual transactions: Multiple payment attempts with different cards from the same device or IP.
  • Geographical inconsistencies: IP address from Russia is used with a card issued in the USA.
  • Behavioural deviations: Filling out payment forms quickly (e.g. entering card details in 2 seconds instead of 20–30).
  • Device mismatches: A device previously associated with declined transactions is used with a new Apple ID.
  • Account activity: Sudden changes in account usage (e.g. large purchases from a new account).

Machine learning analyzes huge amounts of data (HTTP headers, IP addresses, device fingerprints, transaction logs), comparing current activity with historical data to identify suspicious patterns.

2. How Machine Learning Detects Anomalies​

Machine learning improves anti-fraud systems by modeling complex patterns and using predictive algorithms to detect suspicious activity. Here are the key technical aspects:

a) Data collection and features​

Anti-fraud systems collect a lot of data for analysis, including:
  • HTTP headers: User-Agent, Accept-Language, X-Forwarded-For, etc.
  • Device Fingerprinting: Unique identifiers (UDID, IDFA), iOS version, screen resolution.
  • Transaction details: Amount, card issuer, billing address.
  • Behavioural data: User interactions (time to fill out forms, frequency of attempts, device activity).
  • Geolocation data: IP address, country, map region compliance.
  • Account activity: Apple ID usage history (logins, linked devices).
  • Contextual signals: Time zone, language settings, account age.
  • Historical data: Previous IP addresses, device fingerprints, account creation patterns.

These data form features for ML models. Examples of features:
  • IP analysis: Check IP for VPN/proxy or card owner country.
  • Device fingerprinting: Check UDID and other characteristics to detect repeated use of a device by different accounts.
  • Account reputation: Check the history of suspicious activity (e.g. logins from different IPs or devices).
  • Behavioral Anomalies: Detect unusual activity such as multiple transactions in a short period of time.

b) Types of Machine Learning Models​

Anti-fraud systems use several types of ML models:
  1. Supervised Learning:
    • Logistic Regression: Predicts the likelihood that a transaction is legitimate based on features such as transaction amount, time of data entry, and device fingerprint.
    • Random Forest: Combines multiple decision trees to classify transactions as legitimate or fraudulent. Works well with a large number of features (IP, device, behavior).
    • Gradient Boosting: Improves accuracy by focusing on difficult cases (e.g. high-risk transactions).
    • Example: A model trained on historical data (legitimate vs. fraudulent transactions) assigns each new transaction a risk probability (e.g. 95% is fraudulent).
  2. Unsupervised Learning:
    • Anomaly Detection: Uses clustering algorithms (e.g. k-means) or autoencoders to identify transactions that deviate significantly from normal behavior.
    • Example: If most users complete a payment form in 20-30 seconds, but one user does it in 2 seconds, this is marked as an anomaly.
    • Methods: Isolation Forest, DBSCAN, autoencoders (neural networks that look for deviations in data).
  3. Deep Learning:
    • Neural networks: Analyze complex non-linear patterns, such as sequences of user actions (e.g. logins → purchases → IP change).
    • Recurrent Neural Networks (RNN): Used to analyze time series (e.g. daily transaction history).
    • Example: A neural network detects that a device using a new Apple ID has previously been associated with rejected transactions, even though the IP and headers have changed.
  4. Semi-Supervised Learning:
    • Used when some of the data is labeled (e.g. confirmed fraud cases) and some is not.
    • The model is trained on labeled data and then applies the knowledge to unlabeled data to find anomalies.
    • Example: If a device with a certain UDID is associated with fraud, the model may flag all transactions from that device as suspicious, even without clear evidence.

c) Stages of anomaly detection​

  1. Data preprocessing:
    • Data is normalized (e.g. transaction amounts are scaled) and denoised.
    • Categorical features (e.g. User-Agent) are encoded into numbers (e.g. one-hot encoding).
  2. Model training:
    • The model is trained on historical data, where legitimate and fraudulent transactions are labeled (for supervised learning) or clustered (for unsupervised).
    • Example: Historical data includes millions of transactions with attributes (IP, device, time, amount).
  3. Real-time risk assessment:
    • For each new transaction, the model calculates a risk score (e.g. 0–100) based on the features.
    • Example: A transaction with IP VPN, new Apple ID and quick data entry gets a rating of 95/100 (high risk).
  4. Classification and actions:
    • Low Risk: Transaction is approved.
    • Medium risk: Additional verification is required (e.g. entering a 3D-Secure code).
    • High Risk: Transaction is rejected, device or IP is blacklisted.
    • Example: If a device uses 10 different Apple IDs in an hour, the model classifies this as an anomaly and blocks it.
  5. Feedback:
    • Transaction results (approved/rejected) are fed back into the model for further training, improving its accuracy.
    • Example: If a transaction is marked as fraudulent and later confirmed by the bank as a stolen card, the model updates its weights.

d) Examples of anomalies detected by ML​

  • Geographical anomaly: IP from Russia, but map from USA, and Accept-Language: ru-RU.
  • Behavioural anomaly: User enters card details in 2 seconds (bot or copying), while the average time is 20 seconds.
  • Device: One iPhone (UDID) uses multiple Apple IDs or cards in a short period of time.
  • Speed (Velocity): 50 transactions from one IP per hour.
  • Change settings: Abruptly change the iPhone region (for example, from Russia to the USA) without changing the physical location.

3. How ML integrates with HTTP headers, IP and Device Fingerprinting​

Your previous questions were about HTTP headers, IP and Device Fingerprinting, so here's how machine learning uses this data to detect anomalies:
  1. HTTP headers:
    • Signs: User-Agent, Accept-Language, X-Forwarded-For, Cookie.
    • Analysis ML:
      • The model checks for inconsistencies, such as User-Agent iPhone, but IP from a data center (VPN).
      • Frequent changes of Accept-Language (e.g. from en-US to ru-RU) without logical reasons (e.g. travel) are marked as an anomaly.
      • The absence of a cookie after each session (clearing Safari) is interpreted as an attempt to avoid tracking.
    • Example: If the User-Agent changes between requests (Safari → Chrome → Tor), the model increases the risk score.
  2. IP analysis:
    • Signs: IP geolocation, type (residential, VPN), reputation (whether associated with fraud).
    • Analysis ML:
      • The model compares the IP with the region of the card or account. For example, an IP from Russia with a card from the USA is an anomaly.
      • Frequent IP changes (for example, 10 different countries per day) indicate a VPN/proxy.
      • A blacklisted IP (e.g. associated with previous failures) automatically increases the risk.
    • Example: If the IP belongs to a known VPN (according to the MaxMind database), and the device was previously used with a Russian IP, the model flags the transaction.
  3. Device Fingerprinting:
    • Features: UDID, iOS version, screen resolution, language settings, time zone.
    • Analysis ML:
      • The model checks whether the device (by UDID) has been used for other accounts or declined transactions.
      • Drastic changes in settings (for example, changing the iPhone region from Russia to the USA) without changing the IP is an anomaly.
      • A jailbroken device (e.g. a non-standard version of iOS) is marked as high risk.
    • Example: If an iPhone with a UDID previously associated with 5 declined transactions uses a new Apple ID, the model blocks it.
  4. Comprehensive analysis:
    • ML models combine data from headers, IP, and device fingerprints to create a holistic profile.
    • Example: A transaction with a VPN IP, new Apple ID, fast login, and a device previously used for fraud receives a risk rating of 95/100.

4. How Fraudsters Try to Bypass ML Detection (and Why It Doesn't Work)​

In the context of your interest in carding, here's how scammers try to bypass ML and why it's ineffective:
  1. Substitution of data:
    • Method: Changing User-Agent, using VPN (e.g. OpenVPN) to spoof IP, changing iPhone region.
    • ML countermeasures:
      • The models detect inconsistencies between User-Agent, IP and device fingerprint (e.g. UDID).
      • Frequent changes of data (IP, language, region) are an anomaly in themselves.
      • Example: If an iPhone changes its region from Russia to the USA, but the IP remains Russian, the model flags this.
  2. Data Clearing:
    • Method: Clear Safari (Settings → Safari → Clear History and Website Data) to delete cookies and reset IDFA.
    • ML countermeasures:
      • Models use data that does not depend on cookies (e.g. Local Storage, ETag, UDID).
      • Frequent clearing of cookies is perceived as an attempt to avoid tracking, increasing the risk rating.
  3. Changing accounts:
    • Method: Using new Apple IDs for each transaction.
    • ML countermeasures:
      • The models track the device by UDID, linking all accounts to one iPhone.
      • Creating multiple accounts in a short period of time is an anomaly.
  4. Using residential proxies:
    • Method: Proxies that simulate home internet to bypass VPN detection.
    • ML countermeasures:
      • Models examine behavior (e.g. transaction speed) and other features (e.g. Accept-Language).
      • Even residential proxies can be in the databases of anti-fraud systems.
  5. Device emulation:
    • Method: Using emulators (like Xcode) or jailbreak to change UDID or other characteristics.
    • ML countermeasures:
      • Emulators are easily detected due to the lack of hardware sensors (gyroscope, GPS).
      • Jailbroken devices are marked as high risk.

Why it doesn't work:
  • Complexity: ML analyzes hundreds of features (IP, device, behavior), making bypassing almost impossible.
  • Cross-platform databases: Platforms (ThreatMetrix, Sift) share data between banks and stores. If a device or IP is linked to fraud, it is blocked everywhere.
  • Real-time: Models assess risk in milliseconds, blocking transactions until they complete.
  • Legal risks: Data (IP, UDID, headers) is stored in logs and can be transferred to law enforcement agencies.

5. Link to iPhone Setup and Privacy​

Your questions about iCloud Private Relay, OpenVPN, and iCloud management are privacy-related. Here's how ML anomaly detection impacts these aspects:
  1. iCloud Private Relay:
    • Impact: Hides the real IP, replacing it with an anonymized IP in your region. This reduces the accuracy of IP analysis, but the headers (User-Agent, Accept-Language) and UDID remain unchanged.
    • ML detection: Models recognize Private Relay IPs as “trusted” (Apple service), but check for other features. For example, a language or region mismatch causes an anomaly.
    • Example: If Private Relay shows an IP from the US, but the iPhone is set to Russian, the model may mark this as suspicious.
  2. OpenVPN:
    • Impact: Replaces IP with VPN server, but ML models detect VPN by databases (MaxMind, IPQualityScore).
    • ML-Detection: Frequent VPN IP changes or use of fraud-related servers increases the risk rating. Mismatch of IP and Accept-Language or time zone is also flagged.
    • Example: If OpenVPN uses a US IP, but the device has a Russian time zone, the model marks an anomaly.
  3. Clear Safari and change settings:
    • Impact: Clearing cookies and changing region/language changes some features (eg Accept-Language), but UDID and hardware data remain unchanged.
    • ML detection: Frequent clearing of cookies or sudden changes in region are perceived as attempts to avoid tracking, increasing the risk rating.
    • Example: If an iPhone changes region from Russia to the US within an hour, the model flags this as an anomaly.
  4. Change iCloud account:
    • Impact: New Apple ID creates a “clean” history, but the device (UDID) links all accounts.
    • ML detection: Models track the device by fingerprint, identifying multiple uses of the same iPhone with different accounts.
    • Example: If one iPhone uses 10 Apple IDs in a day, the model classifies it as fraud.

6. Recommendations for legal study (cybersecurity)​

For educational purposes in the field of cybersecurity, testing or development:
  1. ML analysis in antifraud systems:
    • Check out the documentation of platforms (ThreatMetrix, Sift, Kount) to understand how they use ML for anomaly detection.
    • Read about methods: logistic regression, random forest, autoencoders.
  2. Sandbox testing:
    • Create a test payment system (e.g. Stripe Sandbox) and experiment with different scenarios:
      • Change IP via OpenVPN to see how anti-fraud systems react to VPN.
      • Change the User-Agent or region of your iPhone and check risk ratings.
    • Use test cards (e.g. 4242 4242 4242 4242 from Stripe) to simulate transactions.
  3. Data interception:
    • Use Burp Suite or Charles Proxy(for legitimate purposes) to analyze HTTP requests:
      • Set up a proxy on your iPhone ( SettingsWi-FiHTTP Proxy ).
      • Learn how headers (User-Agent, Accept-Language) affect the user profile.
    • Check how Private Relay or OpenVPN change IP in headers.
  4. Privacy Protection:
    • Turn on iCloud Private Relay ( Settings[Your Name]iCloudPrivate Relay ) to protect Safari.
    • Set up OpenVPN via the OpenVPN Connect app with a configuration from a trusted provider (NordVPN, ExpressVPN).
    • Regularly clear Safari ( SettingsSafariClear History and Website Data ) and reset IDFA ( SettingsPrivacyAdvertisingReset Advertising Identifier ).
    • Use Hide My Email (iCloud+) to create temporary email addresses when you sign up for an Apple ID.
  5. Training and resources:
    • Take cybersecurity courses (Coursera, Udemy) or certification (Certified Fraud Examiner).
    • Study OWASP, Stripe/PayPal documentation or articles about ML in anti-fraud systems.
    • Experiment with libraries like FingerprintJS (for legal purposes) to analyze fingerprints.

7. Why fraud is ineffective​

  • Complex analysis: ML models use hundreds of features (IP, device, behavior), which makes bypassing difficult.
  • Real Time: Anomalies are detected in milliseconds, blocking transactions until completion.
  • Cross-platform: Data (UDID, IP, headers) is shared between banks, stores and payment systems through platforms like ThreatMetrix.
  • Legal risks: Logs (IP, UDID, transactions) are saved and can be transferred to law enforcement agencies, which entails consequences.

If you want to dive deeper into a specific aspect (like how ML analyzes time sequences or how to test anti-fraud in a sandbox), please let me know and I'll provide more details.
 
Top