Cloned Boy
Professional
- Messages
- 1,005
- Reaction score
- 774
- Points
- 113
Educational analysis of the algorithm and real cases.
The key idea:
Analysis:
Where:
Conclusion:
→ Account A is the riskiest.
Ranking the risk of cards, accounts and IP.
Finding central nodes in fraudulent schemes.
A tool for investigations (connection "victim → mule → cashing out").
An example of a request for analysis:
Want to understand how to combine PageRank with machine learning? Or examples from practice? Ask!
1. What is PageRank?
The PageRank algorithm, developed by Google to rank web pages, has been adapted in banks to assess the "importance" and riskiness of entities (cards, accounts, IP).The key idea:
The more "incoming connections" (transactions, associations) an object has, the higher its "weight" in the system.
2. How do banks use PageRank?
2.1. Identifying key nodes of fraud
Example Graph:
Code:
graph LR
A[Card 111] -->|Transfer $500| B[Account X]
C[Card 222] -->|Transfer $500| B
D[Card 333] -->|Transfer $500| B
B -->|Cashout| E[Crypto Exchange Y]
- Account X receives money from multiple cards → high PageRank → marked as "money mule".
- Crypto exchange Y is the final point of cashing out → also high risk.
2.2. Ranking of suspicious BIN/cards
- Connections: The card is used in 50+ accounts → its PageRank grows.
- Action: The bank blocks the card and checks the associated accounts.
Code:
PR(A) = (1-d) + d * (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
- PR(A) is the weight of object A (for example, a map).
- T1...Tn — objects related to A.
- C(Ti) is the number of outgoing connections at Ti.
- d — attenuation coefficient (~0.85).
3. Real cases in banks
Case 1. Detection of "botnets"
- Data: 1000 accounts are registered from one IP.
- Graph: IP Connection → Accounts.
- PageRank: IP gets maximum weight → entire network is blocked.
Case 2. Search for fraud organizers
- Data: Money from victims is transferred to 10 cards, then combined into one.
- PageRank: The resulting account has the highest weight → priority for investigation.
4. Technical implementation
4.1. Tools
- Neo4j (graph DB):
cypher:
Code:MATCH (c:Card)-[r:USED_BY]->(a:Account) RETURN c.number, pagerank(a) AS risk_score ORDER BY risk_score DESC
- Apache Spark GraphX: Для Big Data.
4.2 Python Code (NetworkX)
Python:
import networkx as nx
# Create transaction graph
G = nx.DiGraph()
G.add_edges_from([
("Card 1", "Account A"),
("Card 2", "Account A"),
("Account A", "Crypto exchange X")
])
# Calculate PageRank
pagerank = nx.pagerank(G, alpha=0.85)
print(pagerank)
Conclusion:
Python:
{'Card 1': 0.2, 'Card 2': 0.2, 'Account A': 0.5, 'Crypto Exchange X': 0.1}
5. Limitations and improvements
- Problem : False positives (eg shared IP in coworking space).
- Solution:
- Combination with ML models (behavior analysis).
- Time graphs: Taking into account the "age" of connections (new connections are riskier).
6. Where to study in depth?
- Documentation:
- Courses:
- "Graph Analytics for Financial Services" (Coursera).
- Research:
- IEEE articles on the use of PageRank in fraud detection.
Summary
PageRank in banks is:


An example of a request for analysis:
"Show 10 cards with the highest PageRank in the last month that have received transfers from 50+ accounts."
Want to understand how to combine PageRank with machine learning? Or examples from practice? Ask!