Building with Guardrails Before Acceleration – O’Reilly

It’s been less than three years since OpenAI released ChatGPT, setting off the GenAI boom. But in that short time, software development has transformed: code-complete assistants evolved into chat-based “vibe coding,” and now we’re entering the agent era, where developers may soon be managing fleets of autonomous coders (if Steve Yegge’s predictions are correct). Writing code has never been easier, but securing it hasn’t kept pace. Bad actors have wasted no time targeting vulnerabilities in AI-generated code. For AI-native organizations, lagging security isn’t just a liability—it’s an existential risk. So the question isn’t just “Can we build?” It’s “Can we build safely?”

Security conversations still tend to center around the model. In fact, a new working paper from the AI Disclosures Project finds that corporate AI labs focus most of their research on “pre-deployment, pre-market, concerns such as alignment, benchmarking, and interpretability.”¹ Meanwhile, the real threat surface emerges after deployment. That’s when GenAI apps are vulnerable to prompt injection, data poisoning, agent memory manipulation, and context leakage—today’s version of SQL injection. Unfortunately, many GenAI apps have minimal input sanitization or system-level validation. That has to change. As Steve Wilson, author of The Developer’s Playbook for Large Language Model Security, warns, “Without a deep dive into the murky waters of LLM security risks and how to navigate them, we’re not just risking minor glitches; we’re courting major catastrophes.”

And if you’re “fully giv[ing] in to the vibes” and running AI-generated code you haven’t reviewed, you’re compounding the problem. When insecure defaults get baked in, they’re difficult to detect—and even harder to unwind at scale. You have no idea what vulnerabilities may be creeping in.

Security may be “everyone’s responsibility,” but in AI systems, not everyone’s responsibilities are the same. Model providers should ensure their systems resist prompt-based manipulation, sanitize training data, and mitigate harmful outputs. But most AI risk emerges once those models are deployed in live systems. Infrastructure teams must lock down data authentication and interagent access using zero trust principles. App developers hold the frontline, applying traditional secure-by-design principles in entirely new interaction models.

Microsoft’s recent work on AI red teaming shows how guardrail strategies should be adapted (in some cases radically so) depending on use case: What works for a coding assistant might fail in an autonomous sales agent, for instance. The shared stack doesn’t imply shared responsibility; it requires clearly delineated roles and proactive security ownership at every layer.

Right now, we don’t know what we don’t know about AI models—and as Bruce Schneier recently pointed out (in response to new research on emergent misalignment): “The emergent properties of LLMs are so, so weird.” It turns out, models tuned on insecure prompts develop other misaligned outputs. What else might we be missing? One thing is clear: Inexperienced coders are introducing vulnerabilities as they vibe, whether those security risks turn up in the code itself or in biased or otherwise harmful outputs. And they may not catch, or even be aware of, the dangers—new developers often fail to test for adversarial inputs or agentic recursion. Vibe coding may help you quickly spin up a project, but as Steve Yegge warns, “You can’t trust anything. You have to validate and verify.” (Addy Osmani puts it a little differently: “Vibe Coding is not an excuse for low-quality work.”) Without an intentional focus on security, your fate may be “Prototype today, exploit tomorrow.”

The next evolutionary step—agent-to-agent coordination—only widens the threat surface. Anthropic’s Model Context Protocol and Google’s Agent2Agent enable agents to act across multiple tools and data sources, but this interoperability can deepen vulnerabilities if assumed secure by default. Layering A2A into existing stacks without red teams or zero trust principles is like connecting microservices without API gateways. These platforms must be designed with security-first networking, permissions, and observability baked in. The good news: Fundamental skills still work. Layered defenses, red teaming, least-privilege permissions, and secure model interfaces are still your best tools. The guardrails aren’t new. They’re just more essential than ever.

O’Reilly founder Tim O’Reilly is fond of quoting designer Edwin Schlossberg, who noted that “the skill of writing is to create a context in which other people can think.” In the age of AI, those responsible for keeping systems safe must broaden the context within which we all think about security. The task is more important—and more complex—than ever. Don’t wait until you’re moving fast to think about guardrails. Build them in first, then build securely from there.

Footnotes

Ilan Strauss, Isobel Moure, Tim O’Reilly, and Sruly Rosenblat, “Real-World Gaps in AI Governance Research,” The AI Disclosures Project, 2024. The AI Disclosures Project is co-led by O’Reilly Media founder Tim O’Reilly and economist Ilan Strauss.

Join Tim O’Reilly and Steve Wilson on June 3 for Building Secure Code in the Age of Vibe Coding—it’s free and open to all. After an introductory conversation with Tim on how AI-assisted coding (and vibe coding in particular) introduces new classes of security vulnerabilities, Steve will respond to questions from attendees, giving you a chance to better understand how his insights apply to your own situation and experiences. Register now to save your spot.

Source link