Phishing emails, those deceptive messages designed to steal sensitive information, remain a significant cybersecurity threat. As attackers devise increasingly sophisticated tactics, traditional detection methods often fall short. Researchers from the University of Auckland, have introduced a novel approach to combat this issue. Their paper, titled “MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection,” authored by Yinuo Xue, Eric Spero, Yun Sing Koh, and Giovanni Russello, details a new system that employs multiple artificial intelligence (AI) agents working together to identify malicious emails with greater accuracy and provide clear explanations for its decisions. The study aims to create a more robust and adaptive defense against the evolving landscape of phishing attacks.
The persistent challenge of evolving phishing attacksPhishing is not a new problem, but it’s a persistent one that continuously adapts. Attackers are moving beyond simple scam emails to more complex strategies involving social engineering, highly personalized “spear-phishing” messages, and even AI-generated content to make their fake communications look incredibly legitimate. The Anti-Phishing Working Group (APWG) noted a rise in phishing attacks to over 930,000 in the third quarter of 2024 alone, underscoring the scale of the issue.
Traditional defenses like rule-based filters or denylists (blacklists of known malicious senders or sites) struggle to keep up with these new tactics. They are often too rigid to adapt to novel threats like domain spoofing (faking a legitimate sender’s address) or dynamic URL obfuscation (hiding malicious links in complex ways). While machine learning (ML) models have offered improvements, they too can be slow to adapt to entirely new strategies if they rely heavily on historical data and predefined features. Even more advanced deep learning models and Large Language Models (LLMs)—AI systems trained on vast amounts of text data—have typically been used as single tools focusing on one aspect, like email text. These single-model approaches might miss crucial clues in other parts of an email, such as the metadata or the URLs, and often operate as “black boxes,” making it hard to understand why they’ve flagged an email as phishing.
Introducing MultiPhishGuardTo address these limitations, the researchers developed MultiPhishGuard. Instead of relying on a single AI model, MultiPhishGuard uses a multi-agent system. Think of it like a team of specialized detectives, each an expert in a different area, all working on the same case. This system is built using LLMs, specifically leveraging their advanced natural language understanding capabilities.
The core of MultiPhishGuard consists of three primary “Basic Agents,” each tasked with analyzing a distinct component of an incoming email:
Each of these agents independently assesses the email and produces its own verdict (phishing or legitimate), a confidence score for its decision, and a rationale explaining its findings. These individual reports are then combined to form a more comprehensive and robust final decision.
Dynamic decision-making and continuous learningMultiPhishGuard doesn’t just collect opinions from its agents; it intelligently weighs their input and continuously learns to improve its defenses. This is achieved through two key mechanisms: dynamic weight adjustment and adversarial training.
Dynamic weight adjustment using reinforcement learningNot all emails are the same, and for some, the text might be the biggest giveaway, while for others, a suspicious URL or a forged sender in the metadata is the critical clue. MultiPhishGuard uses a technique from reinforcement learning called Proximal Policy Optimization (PPO) to dynamically adjust how much importance (or “weight”) is given to the output of each specialized agent. The PPO module considers features from the email (like the number of URLs or specific keywords) and the confidence scores from each agent. It then learns over time which agents’ opinions are most reliable for different types of emails. For example, if an email has many suspicious links, the system might learn to give more weight to the URL Analysis Agent’s findings for that particular email. This adaptive weighting helps reduce false positives and makes the system more attuned to the specific characteristics of each potential threat.
Strengthening defenses with an adversarial training loopTo ensure MultiPhishGuard remains robust against new and evolving phishing tactics, the researchers incorporated an adversarial training module. This module includes an “adversarial agent,” which is another LLM (GPT-4o in their experiments) tasked with creating tricky email variations. This adversarial agent generates subtle, context-aware variants of both known phishing emails and legitimate emails. The goal for phishing variants is to bypass detection, while for legitimate variants, it’s to subtly mimic phishing characteristics to test the system’s ability to distinguish fine lines. These generated emails are then fed back into the MultiPhishGuard system. If the system is fooled, it learns from this “mistake,” effectively creating a self-improving defense ecosystem. This process helps the system become more resilient to sophisticated evasion techniques attackers might use in the real world. The researchers note that these adversarially generated emails are used strictly for internal testing within a controlled environment to prevent any misuse.
Making AI decisions understandableA common criticism of AI systems, especially in security, is their lack of transparency. If a system flags an email as dangerous, users (whether they are everyday email users or security analysts) need to understand why. MultiPhishGuard addresses this with an Explanation Simplifier Agent.
This agent takes the potentially technical and separate rationales from the Text, URL, and Metadata agents and synthesizes them into a single, coherent, and easy-to-understand explanation. The aim is to provide a clear, jargon-free summary of why an email was classified as phishing or legitimate. For instance, instead of just saying “phishing,” it might explain: “This email is likely a phishing attempt because the sender’s address appears forged, it uses urgent language to pressure you into clicking a suspicious link, and the link itself tries to mimic a well-known banking website but has subtle differences.” The researchers also plan an “Expert Mode” to provide more detailed technical explanations for security professionals. This focus on interpretability is crucial for building user trust and for enabling more informed decision-making.
Testing MultiPhishGuard: How it performedThe researchers conducted extensive experiments to evaluate MultiPhishGuard’s effectiveness. They tested it on six public datasets, encompassing nearly 4,000 emails (a mix of phishing and legitimate messages), including the Nazario phishing corpus, Enron-Spam dataset, and SpamAssassin public corpus.
The results were promising. MultiPhishGuard achieved a high accuracy of 97.89%. Importantly, it demonstrated a low false positive rate of 2.73% (meaning very few legitimate emails were incorrectly flagged as phishing) and an even lower false negative rate of 0.20% (meaning very few actual phishing emails were missed). This balance is critical for a practical email security system.
When compared to other approaches, including Chain-of-Thought prompting (a technique to make LLMs explain their reasoning steps), a single-agent LLM model, and a strong baseline model called RoBERTa-base, MultiPhishGuard generally performed significantly better across various metrics. For instance, while Chain-of-Thought and single-agent models had high recall (caught most phishing), they also had much higher false positive rates, leading to lower overall precision and F1-scores (a combined measure of precision and recall).
Ablation studies, where individual components of MultiPhishGuard were temporarily removed to see their impact, confirmed the value of each part. For example, removing the URL agent or the metadata agent led to a noticeable decline in performance. Similarly, using static (fixed) weights instead of the PPO-based dynamic weighting, or removing the adversarial training agent, also reduced the system’s effectiveness. The absence of the explanation simplifier agent, while not affecting detection accuracy, resulted in explanations that were less fluent, coherent, and readable.
Furthermore, a human evaluation involving a cybersecurity expert assessed the quality of explanations. MultiPhishGuard’s explanations were found to be more aligned with expert reasoning, more complete, and more readable compared to those from other methods, achieving higher scores on metrics like ROUGE-1 (which measures content overlap with expert analysis) and cosine similarity (which measures semantic similarity).
Implications and the future of AI in phishing defenseThe development of MultiPhishGuard represents a significant step forward in the fight against phishing. Its multi-agent architecture allows for a more comprehensive analysis of emails by looking at various components in parallel. The use of reinforcement learning for dynamic weighting makes the system adaptive, capable of tailoring its focus based on the specific characteristics of each email. The adversarial training component helps it stay ahead of attackers’ evolving tactics by continuously learning from challenging, artificially generated examples.
One of the key real-world benefits is the enhanced interpretability. By providing clear explanations, the system can help educate users about phishing threats and enable security teams to understand and verify detections more effectively. The modular design is also a practical advantage; for example, an organization with strict privacy constraints could potentially deploy a version with only the text agent active, avoiding the processing of sensitive metadata if necessary.
While the researchers acknowledge limitations, such as the difficulty in having “ground truth” for explanations (since datasets don’t typically come with pre-defined reasons for why an email is phishing), their work showcases a robust pathway to more effective and trustworthy AI-powered security tools. The principles behind MultiPhishGuard could potentially be extended to other cybersecurity challenges, such as malware detection or malicious website classification.