Anthropic and OpenAI just exposed SAST's structural blind spot with free tools

Contents

How Anthropic and OpenAI reached the identical conclusion from completely different architectures What Vendor Solutions Show 7 issues to do earlier than the subsequent board assembly

OpenAI launched Codex Safety on March 6, getting into the appliance safety market that Anthropic disrupted 14 days earlier with Claude Code Safety. Each scanners use LLM inference as an alternative of sample matching. Each demonstrated that conventional static utility safety testing (SAST) instruments are structurally incapable of recognizing complete vulnerability courses. The enterprise safety stack is someplace in between.

Anthropic and OpenAI launched their very own inference-based vulnerability scanners, and each found courses of bugs that pattern-matching SASTs have been by no means designed to detect. Aggressive pressures between two laboratories with a mixed non-public market valuation of greater than $1.1 trillion means detection high quality will enhance quicker than any single vendor might ship alone.

Neither Claude Code Safety nor Codex Safety is meant to interchange your current stack. Each instruments completely change procurement calculations. Each are presently out there free of charge to enterprise prospects. Earlier than you ask which scanner your board is utilizing and why, you want a head-to-head comparability and these seven actions.

How Anthropic and OpenAI reached the identical conclusion from completely different architectures

Anthropic introduced a zero-day research on February fifth, coinciding with the discharge of Claude Opus 4.6. In response to Anthropic, Claude Opus 4.6 found greater than 500 beforehand unknown high-severity vulnerabilities in a manufacturing open supply codebase that has withstood a long time of peer assessment and hundreds of thousands of hours of fuzzing.

Claude found a heap buffer overflow within the CGIF library by inferring the LZW compression algorithm. This flaw couldn’t be detected by coverage-guided fuzzing, even with 100% code protection. Anthropic shipped Claude Code Safety as a restricted analysis preview on February twentieth. It is out there to Enterprise and Staff prospects, and free and fast entry for open supply maintainers. Gabby Curtis, head of communications at Anthropic, advised VentureBeat in an unique interview that Anthropic constructed Claude Code Safety to make its defenses extra broadly out there.

OpenAI’s numbers come from a distinct structure and a wider scanning floor. Codex Safety developed from Aardvark, an inside device powered by GPT-5 that entered non-public beta in 2025. Through the Codex Safety beta, OpenAI’s brokers scanned greater than 1.2 million commits throughout exterior repositories, revealing 792 vital findings and 10,561 high-severity findings introduced by OpenAI. OpenAI reported vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, leading to 14 CVEs being assigned. In response to OpenAI, Codex Safety’s false optimistic fee decreased by greater than 50% throughout all repositories throughout the beta interval. Over-reported severity decreased by greater than 90%.

Checkmarx Zero researchers have demonstrated that reasonably complicated vulnerabilities generally escape detection by Claude Code Safety. Builders can trick brokers into ignoring weak code. In a full scan of the production-grade codebase, Checkmarx Zero discovered that Claude recognized eight vulnerabilities, however solely two have been true positives. If the scanner is disabled by reasonably complicated obfuscation, the detection restrict is decrease than the headline quantity signifies. Neither Anthropic nor OpenAI has submitted detection claims to impartial third-party audits. Safety leaders ought to deal with reported numbers as indicators, not audited.

Merritt Baer, CSO at Enkrypt AI and former deputy CISO at AWS, advised VentureBeat that the aggressive scanner competitors is squeezing everybody’s slots. Baer suggested safety groups to prioritize patches primarily based on exploitability within the runtime context, not simply CVSS scores, to cut back the time between discovery, triage, and patching, and preserve visibility into the software program invoice of supplies to immediately know the place weak elements are operating.

Though the strategies are completely different and there’s little overlap within the scanned codebases, the conclusion is identical. Sample matching SAST has an higher restrict, and LLM inference extends detection past the higher restrict. Twin-use calculations grow to be disagreeable when two competing labs distribute their capabilities on the identical time. Monetary establishments and fintechs operating business codebases ought to assume that if Claude Code Safety and Codex Safety can discover these bugs, then so can an adversary with API entry.

Baer mentioned bluntly: Open supply vulnerabilities uncovered by inference fashions needs to be handled extra like zero-day class discoveries than backlog gadgets. The time between discovery and exploitation has solely shortened, and most vulnerability administration applications are nonetheless triaged solely by CVSS.

What Vendor Solutions Show

Snyk, a developer safety platform utilized by engineering groups to search out and repair vulnerabilities in code and open supply dependencies, acknowledged technological advances however insisted vulnerabilities are by no means tough to search out. Modify at scale throughout a whole bunch of repositories with out breaking something. That is the bottleneck. In response to Veracode’s 2025 GenAI Code Safety Report, Snyk pointed to analysis displaying that AI-generated code is 2.74 instances extra prone to introduce safety vulnerabilities in comparison with human-written code. The identical mannequin that detects a whole bunch of zero-days introduces new courses of vulnerabilities as you write code.

Cycode CTO Ronen Slavin writes that whereas Claude Code Safety represents a real technological advance in static evaluation, AI fashions are inherently probabilistic. Slavin argued that safety groups want constant, reproducible, audit-grade outcomes, and whereas scanning capabilities constructed into IDEs are helpful, they do not represent infrastructure. Slavin’s place: SAST is one space inside a broader scope, and free scanning will not be a alternative for a platform that handles governance, pipeline integrity, and runtime conduct at an enterprise scale.

“As soon as code inference scanners from main AI labs grow to be successfully free to enterprise prospects, static code scanning turns into a commodity in a single day,” Baer advised VentureBeat. Over the subsequent 12 months, Baer expects the finances to maneuver in the direction of three areas.

Runtime and exploitability layers, together with runtime safety and assault path evaluation.

AI governance and mannequin safety (guardrails, immediate injection safety, agent monitoring, and so on.).

Automate remediation. “The web impact is that AppSec spending in all probability will not shrink, however the middle of gravity has shifted away from conventional SAST licensing and towards instruments that shorten remediation cycles,” Baer mentioned.

7 issues to do earlier than the subsequent board assembly

Run each scanners on a consultant subset of the codebase. Examine Claude Code Safety and Codex Safety outcomes to current SAST output. Begin with a single consultant repository somewhat than your complete codebase. Each instruments are in analysis preview and have entry restrictions that make full asset scanning untimely. Delta is a blind spot inventory.

Construct your governance framework upfront, not after the pilot. Baer advised VentureBeat to deal with each instruments like their crown jewels: new knowledge processors for his or her supply code. Baer’s governance mannequin features a formal knowledge processing settlement with clear statements about coaching exclusions, knowledge retention, and subprocessor utilization, a segmented submission pipeline to make sure that solely the repositories you propose to scan are submitted, and inside classification insurance policies that distinguish between code that may go away the boundary and code that can’t. In interviews with greater than 40 CISOs, VentureBeat discovered that there are nonetheless few formal governance frameworks for inference-based scanning instruments. Baer flagged spinoff IP as a blind spot that the majority groups have not addressed. Can mannequin suppliers retain traces of embeddings and inferences, and are these artifacts thought-about mental property? One other hole is knowledge residency within the code. Whereas this has traditionally not been regulated in the identical means as buyer knowledge, it’s more and more topic to export controls and nationwide safety evaluations.

Map what will not be coated by both device. Software program composition evaluation. Scanning containers. Infrastructure as code. mud. Runtime detection and response. Claude Code Safety and Codex Safety work on the code inference layer. The prevailing stack handles every little thing else. The pricing energy of that stack has modified.

Quantify dual-use publicity. All the zero-days Anthropic and OpenAI which have surfaced are among the many open supply initiatives that enterprise purposes rely upon. Each labs are answerable for publishing and patching, however the interval between discovery and patching is strictly the place attackers function. AI safety startup AISLE has independently found all 12 zero-day vulnerabilities in OpenSSL’s January 2026 safety patch. This features a stack buffer overflow (CVE-2025-15467) that could possibly be remotely exploited with out legitimate key materials. Fuzzers have been run in opposition to OpenSSL for years, however all have failed. Assume that the adversary is operating the identical mannequin in opposition to the identical codebase.

Have a board comparability prepared earlier than they ask. Claude Code Safety makes inferences about your code in context, tracks knowledge flows, and makes use of multi-step self-verification. Earlier than scanning, Codex Safety builds a project-specific risk mannequin and validates the ends in a sandbox surroundings. Every device is in analysis preview and requires human approval earlier than patches may be utilized. This board requires parallel evaluation somewhat than a single vendor pitch. When the dialog turned to why current suites have been lacking what Anthropic had discovered, Baer proposed a framework that works on the board degree. Sample Matching SAST has solved a distinct era of issues, Baer advised VentureBeat. Designed to detect recognized anti-patterns. This potential stays essential and reduces threat. Nevertheless, whereas inference fashions can consider multi-file logic, state transitions, and developer intent, many trendy bugs exist. Mr. Baer’s abstract for the board follows: “We purchased the correct instruments to cope with the threats of the final decade. Expertise has simply superior.”

Observe aggressive cycles. Each firms are aiming for an IPO, and enterprise safety wins will assist gas progress. If one scanner misses a blind spot, it turns into a part of different labs’ characteristic roadmaps inside weeks. Each labs ship mannequin updates on a month-to-month cycle. This tempo will outpace any single vendor’s launch calendar. Baer says doing each is the correct alternative, saying, “Totally different fashions infer in a different way, and deltas between fashions can reveal bugs that can not be persistently detected by both device alone. Within the brief time period, utilizing each will not be redundant; it is a protection via the variety of inference programs.”

Set a 30-day pilot window. Previous to February twentieth, this check didn’t exist. Run Claude Code Safety and Codex Safety in opposition to the identical codebase, permitting Delta to make use of experiential knowledge somewhat than vendor advertising to drive procurement conversations. You may have 30 days to get that knowledge.

There have been 14 days between Anthropic and OpenAI. The interval between subsequent releases will likely be shorter. The attacker is monitoring the identical calendar.

Anthropic and OpenAI just exposed SAST's structural blind spot with free tools

How Anthropic and OpenAI reached the identical conclusion from completely different architectures

What Vendor Solutions Show

7 issues to do earlier than the subsequent board assembly

Leave a Reply Cancel reply

Follow US

Popular News

Winter Wheat Berry Salad | The Full Helping

AWS vs Azure vs Google Cloud

Magic Cookie Recipe (Easy, Soft & Chewy Cookies)

Brendan Carr Reposts Trump’s Call for Seth Meyers to Be Fired

Sorry MAGA, Turns Out People Still Like ‘Woke’ Art

Categories

About US

Quick Links

Important Links

Subscribe US

How Anthropic and OpenAI reached the identical conclusion from completely different architectures

What Vendor Solutions Show

7 issues to do earlier than the subsequent board assembly

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Winter Wheat Berry Salad | The Full Helping

AWS vs Azure vs Google Cloud

Magic Cookie Recipe (Easy, Soft & Chewy Cookies)

Brendan Carr Reposts Trump’s Call for Seth Meyers to Be Fired

Sorry MAGA, Turns Out People Still Like ‘Woke’ Art

Categories

About US

Quick Links

Important Links

Subscribe US