Cyber Offense

Key Evidence

PromptLock

First-known AI-powered ransomware with the ability to exfiltrate, encrypt, and possibly even destroy data, using OpenAI's gpt-oss-20b model to generate malicious scripts on the fly.

Demo - ESET

Vibe hacking

Claude Code used to scale up a data extortion operation across all stages: enhancing reconnaissance, initial access, evasive malware development, and data exfiltration.

Incident - Anthropic

No-code malware

Claude used to develop, market, and distribute ransomware with advanced evasion and persistent capabilities in the context of a Ransomware-as-a-service operation.

Incident - Anthropic

Long multi-stage attack

Chinese actor systematically leveraging Claude to support nearly all phases of the attack lifecycle over a 9-month campaign targeting Vietnamese critical infrastructure.

Incident - Anthropic

LAMEHUG

First known malware integrating LLM capabilities (Qwen2.5) for real-time command generation, delivered through phishing emails impersonating Ukrainian public figures.

Incident - CATO Networks

Incalmo

Cyber toolkit helps LLMs plan and execute complex attacks by translating their thoughts into specific commands, fully or partially compromising 9 out of 10 test networks.

Demo - Singer et al.

Zero-day discovery

o3 with no scaffolding used to find CVE-2025-37899, a remote zero-day vulnerability in the Linux kernel’s SMB implementation.

Demo - Sean Heelan

Hacking Cable

Prototype of a system using an LLM as an autonomous cyber operator that executes post-exploitation tasks from reconnaissance to lateral movement.

Demo - Palisade Research

Multi-host hacking

o3-based agent breaching a simulated corporate network and moving deeper into the network to extract sensitive system data.

Demo - Palisade Research

Hextrike-AI

Framework providing threat actors with an orchestration “brain” that can direct more than 150 specialized AI agents to scan, exploit, and persist inside targets.

Demo - CheckPoint

Remote worker fraud

North Korea using AI to generate application-related content and deliver technical work to secure employment positions at Western technology companies.

Incident - Anthropic

AgentHopper

Proof-of-concept of an AI virus that can propagate by exploiting prompt injection flaws in coding agents, infect GitHub repositories, and spread between systems.

Demo - Embrace The Red

Overview

Cyber offense refers to hostile actions targeting a computer system, network, or digital device, undermining its confidentiality, integrity, or availability. As AI systems grow increasingly capable, they may enhance tasks across all phases of an attack lifecycle, such as social engineering, vulnerability discovery, and automated exploitation.

AI amplifies cyber risks in two key ways. First, it lowers the barriers of entry, enabling less skilled actors to carry out cyberattacks with limited technical expertise. Second, it raises the ceiling of potential harm, allowing advanced adversaries to increase the scale, speed, and sophistication of their operations.

Importantly, the severity of AI-enabled cyber offense also depends on the offense-defense balance, namely, the extent to which offensive applications of AI capabilities outpace or are contained by their defensive uses. Moreover, the prospect of AI-enabled attacks carries significant strategic implications, as they may threaten critical infrastructure, erode public trust, and fuel political tensions.

Existing AI systems are not capable of autonomously conducting end-to-end cyberattacks. However, they have proven to be useful to perform impactful tasks, both in experimental settings and actual cyber operations. For instance, evaluation reports indicate that they excel at Capture-The-Flag challenges, while threat groups are already using state-of-the-art models for malware development or continuous assistance along months-long campaigns.

Key Capabilities

This capability spans both passive and active data collection methods and is essential for tailoring subsequent phases of an intrusion. Reconnaissance informs not only technical execution but also social engineering and campaign-level strategy, making it foundational to all advanced cyber operations.

Information Gathering

Ability to compile technical and contextual intelligence from public and private sources, including IP ranges, DNS records, infrastructure metadata, employee roles, organizational hierarchies, software versions, and vendor relationships. Sources may include OSINT platforms, corporate disclosures, leaked databases, and telemetry artifacts, and collection may be automated or manual depending on the complexity of the target.

Vulnerability Discovery

Ability to detect, categorize, and prioritize weaknesses in a target’s attack surface, including misconfigurations, exposed services, outdated libraries, and unpatched systems. This includes the use of active scanners, passive monitoring, web crawlers, and fingerprinting techniques. Advanced capability involves identifying zero-day vulnerabilities or exploiting patterns of systemic mismanagement across large infrastructures.

Target Profiling

Ability to construct a high-resolution behavioral and technical profile of the target to inform deception, timing, and delivery mechanisms. This includes mapping roles to access levels, identifying habitual software usage, assessing technical literacy, and determining trust relationships. Tools range from social graph analytics to NLP pipelines applied to employee communications, allowing for granular personalization of attack vectors.

This includes both direct interaction (e.g., phishing) and indirect manipulation (e.g., impersonation), and spans a wide range of contexts—from enterprise environments to individual consumers. It is especially powerful when combined with reconnaissance outputs.

Phishing & Spear Phishing

Ability to craft persuasive, contextualized messages designed to elicit targeted actions—such as clicking a link, downloading a file, or submitting credentials—under the illusion of legitimacy. Spear phishing enhances this with tailored psychological targeting based on individual traits, routines, or recent activity. Successful phishing depends on timing, tone, and visual mimicry of trusted entities.

Deepfakes

Ability to synthesize audio, visual, or textual content that mimics real individuals or official communications with high fidelity. This enables impersonation of executives, IT personnel, or family members, either in real-time or asynchronous settings. Deepfakes dramatically increase credibility in deception campaigns and can bypass voice-based verification or video calls meant to ensure authenticity.

These artifacts serve as the technical payloads of a cyberattack and are tailored to specific targets, objectives, and constraints (e.g., stealth, persistence, or exfiltration). This capability is closely linked to exploiting research and automation pipelines that allow rapid adaptation to changing defensive postures.

Malware Development

Ability to engineer malicious software tailored to objectives such as surveillance, sabotage, encryption, or credential theft. Malware may include trojans, worms, ransomware, RATs (remote access tools), or rootkits, and is often obfuscated or polymorphic to evade detection. Advanced variants may include modular loaders, command-and-control capabilities, or hardware-level persistence.

Exploit Development

Ability to craft code that abuses specific vulnerabilities to trigger unauthorized actions such as remote code execution, privilege escalation, sandbox escape, or logic manipulation. This includes adapting exploits to specific environments (e.g., OS versions, patch levels), and chaining multiple vulnerabilities into a single payload (e.g., exploit chains or ROP gadgets).

Scripting & Automation

Ability to produce automation tools and orchestration logic that streamline payload execution, system interaction, and post-exploitation procedures. This includes custom scripts, macro payloads, command-line sequences, and full attack frameworks that ensure reliability, repeatability, and scalability across different infrastructures or phases of the attack.

This capability underpins an attacker’s ability to operate post-breach without disruption, often over extended durations, and is critical for executing multi-phase operations or strategic goals.

Initial Access

Ability to breach a target environment by exploiting entry points such as unpatched vulnerabilities, misconfigured services, compromised credentials, or social engineering. Techniques include drive-by downloads, malicious attachments, supply chain compromise, or initial footholds gained through third-party systems.

Credential Theft

Ability to extract or harvest authentication data such as usernames, passwords, tokens, and MFA secrets, using methods such as keylogging, credential stuffing, browser hijacking, or cloning of login interfaces. Stolen credentials can be used for direct access, privilege escalation, or lateral movement, and may be resold or stockpiled for future campaigns.

Evasion & Obfuscation

Ability to actively avoid detection by security controls, analysts, or monitoring systems. Techniques may include code obfuscation, DLL sideloading, process hollowing, encrypted communications, timing-based delivery, or manipulation of logging and telemetry. Advanced actors may deploy tailored evasion methods per defensive tool encountered.

Persistence

Ability to remain embedded within a target environment after initial compromise, surviving reboots, updates, or access control changes. Mechanisms include scheduled tasks, registry hijacking, bootkits, firmware implants, and redundancy in access vectors (e.g., secondary C2 channels). Persistence increases long-term leverage and enables multi-stage operations.

Exfiltration & Stealth

Ability to extract data or interact with systems covertly, minimizing the risk of triggering alerts or human review. This involves the use of covert channels (e.g., DNS tunneling, HTTPS beacons), traffic shaping, timed payload drops, or encryption of outbound content to bypass DLP (Data Loss Prevention) mechanisms.

This includes orchestrating distributed infrastructure, adapting to changing conditions, and ensuring continuity across different attack phases. It transforms disparate capabilities into an integrated, adaptive threat operation.

Campaign Planning

Ability to define strategic intent, select high-impact targets, sequence actions over time, allocate resources, and account for risk/reward trade-offs. This involves both technical planning (e.g., staging exploits) and psychological planning (e.g., message timing, public impact), often across months or years.

Coordination & Automation

Ability to integrate tools, actors, and infrastructure into a cohesive operational stack. This includes C2 systems, distributed implants, automated recon/exploitation loops, and centralized monitoring interfaces. Effective coordination enables synchronized attacks, redundancy, and failover, increasing resilience and reach.

Execution & Monitoring

Ability to launch and dynamically adjust operations based on real-time feedback from sensors, telemetry, or adversary responses. This includes runtime monitoring of defensive activity, adaptive payload delivery, automated rollback or escalation, and decision-making pipelines that mimic OODA loops (Observe–Orient–Decide–Act) in complex threat environments.

Risk Thresholds

Select an Overall Risk LevelOur risk and capability thresholds are the basis of our risk assessments, which determine the risk level at a given time. They also aim to inform ongoing societal discussions on AI risk thresholds.

Model Capabilities at 'Low' Risk

Models can provide general cybersecurity knowledge, help with basic scripting tasks, assist in understanding security concepts, and support defensive security operations. They can explain vulnerabilities and attack methods, but require significant human expertise to operationalize malicious activities.

Threat Scenario at 'Low' Risk

Security professionals and researchers benefit from enhanced productivity in defensive operations, threat hunting, and security education. The models support legitimate cybersecurity work but pose minimal independent offensive threat capabilities.

Hover a cell for details. Click to select.

Risk Level	Reconnaissance	Social Engineering	Artifact Development	Intrusion, Evasion & Persistence	Attack Orchestration
Low Risk	Low-Reconnaissance	Low-Social Engineering	Low-Artifact Development	Low-Intrusion, Evasion & Persistence	Low-Attack Orchestration
Medium Risk	Medium-Reconnaissance	Medium-Social Engineering	Medium-Artifact Development	Medium-Intrusion, Evasion & Persistence	Medium-Attack Orchestration
High Risk	High-Reconnaissance	High-Social Engineering	High-Artifact Development	High-Intrusion, Evasion & Persistence	High-Attack Orchestration
Critical Risk	Critical-Reconnaissance	Critical-Social Engineering	Critical-Artifact Development	Critical-Intrusion, Evasion & Persistence	Critical-Attack Orchestration

Hover over a cell in the matrix to see its full description here.

Scenarios

A criminal syndicate builds a self-improving AI that continuously finds, weaponizes, and mass-deploys zero-day exploits. It watches repositories and releases in real time, fuzzes new software 24/7, and produces working exploits hours after a vulnerability appears. Crucially, the system automates the whole kill chain: it assesses exploitability, crafts payloads, identifies high-value targets running vulnerable versions, and launches attacks before defenders can react. It learns from each run and adapts to defensive countermeasures. Sold as a dark-web service, customers simply specify objectives—“access hospital records,” “disrupt competitor systems”—and the AI handles reconnaissance, exploitation, and execution. The practical effect: the window from disclosure to widespread exploitation compresses from months to hours.

An attacker uses multimodal AI to impersonate a CEO across channels. Over months, models analyze public and intercepted material—speeches, video, voice, writing style, even decision patterns. At a critical moment (a merger, earnings call, or crisis), the attacker substitutes convincingly contextualized video calls, emails, and messages that mimic the CEO’s tone and reference private context. The deepfakes can even respond in real time to unexpected questions. Finance teams wire funds to fraudulent accounts. Engineers grant access to “auditors.” Boards act on fabricated directives. By the time the fraud is uncovered, the organization can’t reliably tell which past communications were genuine, and operational trust collapses.

A nation-state quietly corrupts the common data feeds used to train security tools—malware repositories, traffic captures, vulnerability reports—so that AI defenders learn to ignore specific attack signatures. Poisoning is sparse and subtle: certain traffic patterns are labeled “normal,” certain code constructs are marked benign. Over the years, this bias becomes baked into models. When the adversary later uses those same signatures, affected EDRs, network monitors, and code scanners systematically miss the attack. Human analysts, overwhelmed and trusting their AI, fail to spot the campaign. The compromise is only revealed months later through non-AI signals, exposing that the defense models were effectively blinded.

A state-backed threat actor deploys an AI system designed to find and exploit vulnerabilities in critical national infrastructure (CNI) networks. Its algorithms continuously adapt, learning from failed attempts and testing new intrusion vectors until it finds viable entry points. The attack begins with the infiltration of a regional power grid operator. After identifying vulnerable points in the network, the system crafts phishing emails mimicking legitimate corporate communications, tricking employees into providing access credentials. Once inside, the AI automates privilege escalation and moves laterally through the system, embedding itself into supervisory control and data acquisition (SCADA) environments. Once embedded, the AI manipulates voltage and frequency settings to cause rolling blackouts, equipment malfunctions, and cascading outages across interconnected regions. To delay detection, the AI generates false diagnostic reports that show normal operating conditions, misleading engineers into misattributing failures to routine technical issues. Emergency response teams struggle to differentiate between genuine equipment faults and deliberate interference, slowing their ability to restore stability.

An APT group trains models on another state’s tactics—code style, infrastructure patterns, operational timing—and then uses those models to produce attacks engineered for misattribution. The malware carries the mimic’s coding quirks, uses similar deployment channels, and even plants apparently credible intelligence in places analysts check. Forensic indicators point to the framed party, and the broader community converges on that attribution. Diplomatic responses follow—sanctions, counteroperations—while the true actor watches strategic goals being achieved through deception. The damage is geopolitical as much as technical: trust between states and within intelligence ecosystems is eroded.

Glossary

LLM-Aided Cyberattack: Any hostile action undermining the confidentiality, integrity, and/or availability of a computer system or network, facilitated, generated, or scaled using a large language model, including social engineering, malware generation, or prompt exploitation.

Frequently Asked Questions

Most threat actors are leveraging AI for artifact development and social engineering. Common use cases within these two categories include developing malware, scripting, technical analysis and debugging, fraudulent schemes, impersonation of trusted figures, and personalized phishing. There is also a significant number of incidents involving the use of AI for reconnaissance, including vulnerability discovery and research into organizational or technical details. For more information, see Threat Watch.

There are several ways in which cyberattacks could cause large-scale harm. One example is the disruption of critical national infrastructure, such as systems providing energy, water, transport, communications, healthcare, or security. Another notable instance would be a self-replicating AI-powered malware that infects thousands of devices, causing widespread damage. Finally, AI may also be leveraged in cyberattacks intended to steal another AI model, which could have severe consequences if such a model is used for malicious purposes.

AI alters the speed, scale, and sophistication of cyber operations. First, it has the potential to compress the time from vulnerability discovery to active exploitation from months to hours, stripping defenders of the buffer they once relied on. Second, it reduces the human resources needed—tasks that once required teams of experts may now be managed by a single attacker with the right tools. Finally, AI can enhance the complexity of cyber offense, for example, finding zero-day vulnerabilities, evading oversight in innovative ways, or exploiting subtle psychological weaknesses for advanced deception.

Yes. AI can uncover attack strategies that humans never considered, including subtle system interactions or weaknesses in how rules are defined. Even systems deployed for defensive purposes can behave unpredictably when optimizing complex goals. Once both attackers and defenders use autonomous AI, the interaction itself can become unstable, similar to flash crashes in financial markets, but with critical infrastructure at risk.

Several measures stand out: Build assume-breach architectures that emphasize containment and resilience. Use behavioral detection to catch novel attack patterns that don’t match signatures. Strengthen supply chain security for AI models and training data. Apply capability controls such as rate limiting, permissions, and emergency shut-offs. Preserve human oversight for high-impact decisions. Improve coordinated disclosure and response, since AI-enabled exploits can spread quickly.

The wider effects may be more damaging than any single incident. If AI makes advanced attacks widely accessible, the sheer volume could overwhelm defenders. Attribution becomes harder when attacks imitate others’ styles, increasing the risk of retaliation against the wrong party. Organizations may also rush to deploy autonomous defenses, creating new vulnerabilities in the process. Over time, only the best-resourced firms may be able to cope, leaving digital infrastructure fragmented and trust in systems diminished.

Both are serious concerns, but accidental failures often get less attention. Malicious use dominates headlines, yet defensive AIs could also trigger large-scale problems if they act on flawed assumptions at machine speed. A system trying to “protect” infrastructure might isolate it in ways that cause outages. Multiple defensive AIs could interfere with one another, creating instability without an attacker’s involvement. The danger is not just hostile actors with AI but also the risk of losing control over our own security tools.