Using PageRank in Banks to Fight Fraud

Cloned Boy

Professional
Messages
1,005
Reaction score
774
Points
113
Educational analysis of the algorithm and real cases.

1. What is PageRank?​

The PageRank algorithm, developed by Google to rank web pages, has been adapted in banks to assess the "importance" and riskiness of entities (cards, accounts, IP).

The key idea:
The more "incoming connections" (transactions, associations) an object has, the higher its "weight" in the system.

2. How do banks use PageRank?​

2.1. Identifying key nodes of fraud​

Example Graph:
Code:
graph LR
A[Card 111] -->|Transfer $500| B[Account X]
C[Card 222] -->|Transfer $500| B
D[Card 333] -->|Transfer $500| B
B -->|Cashout| E[Crypto Exchange Y]
Analysis:
  • Account X receives money from multiple cards → high PageRank → marked as "money mule".
  • Crypto exchange Y is the final point of cashing out → also high risk.

2.2. Ranking of suspicious BIN/cards​

  • Connections: The card is used in 50+ accounts → its PageRank grows.
  • Action: The bank blocks the card and checks the associated accounts.
PageRank formula (simplified):
Code:
PR(A) = (1-d) + d * (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Where:
  • PR(A) is the weight of object A (for example, a map).
  • T1...Tn — objects related to A.
  • C(Ti) is the number of outgoing connections at Ti.
  • d — attenuation coefficient (~0.85).

3. Real cases in banks​

Case 1. Detection of "botnets"​

  • Data: 1000 accounts are registered from one IP.
  • Graph: IP Connection → Accounts.
  • PageRank: IP gets maximum weight → entire network is blocked.

Case 2. Search for fraud organizers​

  • Data: Money from victims is transferred to 10 cards, then combined into one.
  • PageRank: The resulting account has the highest weight → priority for investigation.

4. Technical implementation​

4.1. Tools​

  • Neo4j (graph DB):
    cypher:
    Code:
    MATCH (c:Card)-[r:USED_BY]->(a:Account)
    RETURN c.number, pagerank(a) AS risk_score
    ORDER BY risk_score DESC
  • Apache Spark GraphX: Для Big Data.

4.2 Python Code (NetworkX)​

Python:
import networkx as nx

# Create transaction graph
G = nx.DiGraph()
G.add_edges_from([
("Card 1", "Account A"),
("Card 2", "Account A"),
("Account A", "Crypto exchange X")
])

# Calculate PageRank
pagerank = nx.pagerank(G, alpha=0.85)
print(pagerank)

Conclusion:
Python:
{'Card 1': 0.2, 'Card 2': 0.2, 'Account A': 0.5, 'Crypto Exchange X': 0.1}
→ Account A is the riskiest.

5. Limitations and improvements​

  • Problem : False positives (eg shared IP in coworking space).
  • Solution:
    • Combination with ML models (behavior analysis).
    • Time graphs: Taking into account the "age" of connections (new connections are riskier).

6. Where to study in depth?​

  1. Documentation:
  2. Courses:
    • "Graph Analytics for Financial Services" (Coursera).
  3. Research:
    • IEEE articles on the use of PageRank in fraud detection.

Summary​

PageRank in banks is:
✅ Ranking the risk of cards, accounts and IP.
✅ Finding central nodes in fraudulent schemes.
✅ A tool for investigations (connection "victim → mule → cashing out").

An example of a request for analysis:
"Show 10 cards with the highest PageRank in the last month that have received transfers from 50+ accounts."

Want to understand how to combine PageRank with machine learning? Or examples from practice? Ask!
 
Top