Two new research papers, both published today, May 15, 2026, reveal a deepening chasm in AI safety: while new methods promise faster ways to probe AI vulnerabilities, existing assurance techniques are fundamentally insufficient to verify critical safety claims. This suggests that the speed of innovation in AI exploitation may outpace the ability to secure these systems, placing an unquantifiable burden on society.
For years, policymakers and ethicists have warned about the rapid deployment of artificial intelligence systems without corresponding advancements in safety and accountability. Between 2019 and early 2026, various governance frameworks emerged, aiming to mandate reviewable evidence for AI properties like the absence of hidden objectives and resistance to catastrophic failure arXiv CS.LG. However, the very methods designed to evaluate these systems are now being called into question.
The Dual Edge of Adversarial Research
On one side, researchers at arXiv CS.LG announced a "new" family of adversarial attacks designed to generate "adversarial examples at scale" arXiv CS.LG. These attacks, detailed in a paper published today, dramatically speed up the process by predicting input gradients from forward-pass hidden states, eliminating the costly "backward pass" arXiv CS.LG. While proponents argue these techniques are crucial for "robustness evaluation, adversarial training, and red-teaming," the increased efficiency in creating system vulnerabilities cannot be ignored. This advancement makes probing system weaknesses cheaper and faster.
Assurance Gaps and Governance Failures
Simultaneously, another arXiv CS.LG paper, also published today, starkly concludes that "behavioural assurance cannot verify the safety claims governance now demands" arXiv CS.LG. Current assurance methodologies, primarily behavioral evaluations and red-teaming, are "being asked to carry safety claims it cannot verify" arXiv CS.LG. This means that despite growing regulatory demands for proof of AI safety — covering properties like "absence of hidden objectives" and "bounded catastrophic capability" — the industry's go-to methods are inadequate. We are building powerful systems and then attempting to verify their safety with tools that are fundamentally insufficient.
This convergence of accelerated attack generation and deficient safety verification poses a profound challenge to the AI industry. Companies deploying AI systems must reconcile their stated commitment to safety with the increasing ease of exposing vulnerabilities and the confirmed limitations of their current assurance practices. The frameworks enacted since 2019 are demanding a level of proof that the current technology stack cannot provide. This creates a regulatory gap, leaving both industry and the public exposed. The promise of "safe AI" rings hollow when the mechanisms for proving that safety are demonstrably broken.
We are at a critical juncture. The rapid development of AI capabilities, including the ability to rapidly exploit weaknesses, continues unabated. Yet, the foundational methods meant to guarantee the safety of these systems are failing. Who bears the cost when an AI system, deemed "safe" by inadequate methods, fails in the real world? We must demand more than "behavioral assurance" and faster red-teaming if we truly wish to build AI that serves, rather than harms. The question remains: how long can we build increasingly powerful systems without the means to truly ensure they won't turn against us?