• +228 872 7355
  • This email address is being protected from spambots. You need JavaScript enabled to view it.

BGTC Voice

The Secure Spark: Why Code Security is the Unsung Triumph of GPT-5

The Secure Spark: Why Code Security is the Unsung Triumph of GPT-5


When OpenAI launched its latest chatbot, GPT-5, CEO Sam Altman spoke boldly of "PhD-level" capabilities. The reality of the consumer experience, however, was less dazzling, attracting complaints about a perceived lack of conversational warmth and no obvious conversational leap forward. Altman himself later admitted the release had been "screwed up."

Yet, buried beneath the noise of unmet user expectations, a significant, consequential improvement was delivered in an area that matters deeply to global infrastructure: writing secure code. This underreported success represents a quiet, strategic win for the field of AI and a critical shift in how we think about the tool’s role in software development.


The Unseen Audit: Veracode's Findings

The cybersecurity company Veracode, a $2.5 billion industry force, conducted a revealing study. They subjected over 100 large language models (LLMs) to a demanding test: 80 code completion tasks. These tasks were specifically structured to be completed in two ways: one path was clean and secure, while the other would introduce a known security weakness.

The results, particularly for OpenAI’s latest models, were stark. The GPT-5 Mini model wrote code free of vulnerabilities in 72% of the tasks. This is a considerable jump from earlier models, which managed just under 60% in the same test earlier that year. The standard GPT-5 model was close behind at 70%.

While the competition is fierce, the lead is notable. Google’s Gemini 2.5 Pro performed respectably at 59%, with XAI’s Grok 4 at 55%. Interestingly, Anthropic’s Claude Sonnet 4.5 saw a slight decline to 50% from its predecessor’s 53% earlier in the year.

The fact remains that even the best models were still culpable of introducing basic flaws, such as the infamous SQL injection, which allows a hacker to access database information with simple commands. The systems are imperfect, but the directional change is undeniable.


Why the Leap in Security?

It is not entirely clear why the GPT-5 models showed such a significant improvement, but cybersecurity experts point to a change in the model's internal mechanics. According to Veracode, the likely reason is the introduction of added reasoning steps by OpenAI.

The AI now performs more internal checks before finalising an output. In effect, it is doing something akin to a rudimentary code review on its own work. CTO and lead researcher Jens Wessling acknowledged the effort, stating, "A lot of kudos to them for making an actual investment in making the security better."

This represents a philosophical shift in AI design. It’s a move away from merely predicting the most statistically probable text (the next line of code that looks correct) toward actively engaging in a process of verification and error elimination (the next line of code that is also safe). It suggests that the "PhD-level" intelligence Altman touted may not reside in better cocktail party conversation, but in more rigorous, multi-layered problem-solving. It's a Wildcard Analogy: they have given the artist a basic form of an internal editor.


The Inherent Flaw in the Training Data

Despite OpenAI’s progress, the study results deliver a necessary dose of caution. Even with a 72% success rate, the GPT-5 model was still introducing a known vulnerability in nearly one in four coding tasks.

Wessling stressed that there is still significant ground to cover before AI can be trusted with fully autonomous coding jobs. "While it's a big improvement... it still doesn't get it to what I'd consider something I'd be comfortable deploying without reviewing," he warned.

The core problem, argues Veracode founder Chris Wyospal, lies in the foundational training material. LLMs learn by consuming vast repositories of existing code. This corpus is riddled with flaws. "It's learning from things that have been vibe coded, student projects," he noted. "You're going to get a lot of stuff that hasn't gone through a security process."

In a sense, we are asking these powerful systems to learn perfect grammar and rhetoric from a library filled with both Shakespeare and scrawled, half-finished notes. The vulnerability is baked into the curriculum.

This connection—that the historic flaws in human-written code are directly compromising the future output of AI—is a fascinating investigative synthesis. It tells us that the sins of the past are literally encoded into the tools of the future. The only way forward is not just to make the models smarter, but to make their input data cleaner, or to force the models to apply security principles that override their learned probabilistic tendencies.

The real takeaway here is not that AI is perfect, but that targeted engineering can improve security drastically. The next logical step is to see these companies compete not just on conversational fluency or speed, but on achieving a 99% security score—a benchmark that would truly merit the moniker of a "breakthrough."