Radar Trends to Watch: May 2025 – O’Reilly

Anthropic’s Model Context Protocol (MCP) has received a lot of attention for standardizing the way models communicate with tools, making it much easier to build intelligent agents. Google’s Agent2Agent (A2A) now adds features that were left out of the original MCP specification: security, agent cards for describing agent capabilities, and more. Is A2A competitive or complementary? Is it another layer in a developing protocol stack for agentic applications? Similarly, Claude Code has been the flagship for agentic coding, the next step beyond cut-and-paste and comment completion (GitHub) models. Now, with OpenAI’s terminal-based Codex and Google’s Firebase Studio IDE, it has competition. The upside for Anthropic? These tools implicitly acknowledge that Anthropic is the AI vendor to beat.

Artificial Intelligence

OpenAI’s latest video generation model (gpt-image-1) is now available via the company’s API.
The European Space Agency and IBM have created TerraMind, a generative AI model of the Earth. Among other things, the model has been trained for climate forecasting. It’s available on Hugging Face.
WhaleSpotter is an AI-enabled thermal camera that ships can use to spot whales in time to change course and avoid collisions. The system detects the heat from a whale’s spout.
Google’s latest reasoning model, Gemini 2.5 Flash, is now available in preview. Flash is a “hybrid reasoning model” that allows users to specify a “thinking budget” so they can control how much money (time, tokens) are spent on reasoning.
MCP Run Python is an MCP server from Pydantic for running LLM-generated Python code in a sandbox. Simon Willison has a couple of fascinating demos.
OpenAI has launched its o3 and o4-mini models. o3 is its most advanced reasoning model, and o4-mini is a smaller reasoning model designed to be faster and more cost-efficient. These new models replace o1 and o3-mini.
A model for maritime navigation has demonstrated that explaining the reason for navigational decisions increases trust and reduces human error.
OpenAI has released GPT-4.1, including mini and nano versions. OpenAI claims that GPT-4.1 improves significantly on code generation and instruction following. All the models have a 1M token input window. The 4.1 series models are currently only available via the API. GPT-4 is slated to be retired, as is GPT-4.5 preview.
A new paper from DeepMind describes some strategies for defending against prompt injection attacks. As Simon Willison writes, prompt injection has been around for two and a half years; this may be the first significant progress in defeating it.
ChatGPT can now reference your entire chat history. This is a significant extension of its older Memory feature, which could only remember a few pieces of information.
MCP may be the basis for the next generation of AI-driven technology, but it’s important to remember security. Protocol vulnerabilities are as dangerous as SQL injection—and MCP has many of them. (No doubt A2A does too; it goes with the territory.)
Anthropic has announced a new Max Plan for Claude users to mitigate complaints that users are bumping into their usage limits too often. Max is $100 or $200 a month, for 5x or 20x more usage than Pro. It’s not cheap, but bumping into limits is frustrating.
For those of us who like keeping our AI close to home, there’s now DeepCoder, a 14B model that specializes in coding and that claims performance similar to OpenAI’s o3-mini. Dataset, code, training logs, and system optimizations are all open.
Two important papers from Anthropic give some clues about how agents think. And an article by Google’s Blaise Agüera y Arcas challenges our notions of how we think.
Google has announced its Agent2Agent protocol (A2A), to facilitate communications between intelligent agents. It provides communications between agents, agent discovery, and asynchronous task management. The company stresses that A2A is complementary to MCP.
The Model Context Protocol (MCP) is taking the AI world by storm. There are several projects listing MCP servers, including mcpservers.org, the awesome-mcp-servers GitHub repo, Glama’s list, and Cline’s MCP Marketplace (accessible through its plug-in).
OpenAI is rolling out watermarks for its image generation model, possibly in response to reactions to its “Studio Ghibli” filter. Users with a paid account can apparently save images without watermarks.
Meta has released the Llama 4 “herd” of open models. They’re all mixture-of-experts models with large context windows. Scout and Maverick both have 17B active parameters, with 16 and 128 “experts,” respectively; they’re available on llama.com and Hugging Face. Behemoth is a 228B active parameter (2T total) “teacher” model used to train other models.
OpenAI is actually planning to release an open model? Surprise, surprise. Needless to say, it hasn’t been released yet. But they want feedback already.
Gemini 2.5 is now available to free users; select Gemini 2.5 Pro (Experimental) in the Gemini app. Some of its capabilities are restricted (for example, free users can’t upload documents).
Can an AI be a trusted third party? Can it make a judgment based on information from two sources without revealing the information on which the judgment was based? The answer may be “yes.” It helps that models can be deleted.
Google’s open Gemma 3 models have taken several steps forward. They now support function calling and larger (128K) context windows. Quantization-aware training optimizes their performance to make the models accessible for less-powerful hardware: a single GPU or even a GPU-less laptop.

Programming

We do code reviews. Should we also do data reviews? As we become more dependent on AI and massive data pipelines, we need to know that our data is trustworthy.
When using Claude Code, the thinking budget is evidently controlled by using the words “think,” “think hard,” “think harder,” and “ultrathink” in prompts.
Kelsey Hightower sees the Nix project as a possible complement to Docker. Using Nix inside of Docker files leads to more efficient and reproducible builds.
OpenAI has also released Codex, a coding agent that runs in the terminal. It appears to be similar to Claude Code, but it has an open source license.
The kro project (Kubernetes Resource Orchestrator) allows developers to build groups of Kubernetes resources that can be used to simplify Kubernetes cluster configurations in a vendor-independent way.
Python now has a tariff package to tax imports! 50% on NumPy, 200% on pandas. As in the real world, you only tax yourself.
Google’s Firebase Studio is a generative AI-native IDE for building full stack web applications. It’s getting good reviews online. In addition to integration with Git and GitHub, it’s integrated into Google Cloud, so it can deploy applications automatically.
OpenAI will require organization verification for developers to gain API access to future models. Despite the name, this status applies to individual developers and will require a valid government-issued ID; IDs from over 200 countries are acceptable.
Amazon’s Alexa has lost its shine, but the new Alexa+ is based on generative AI. The company is looking for developers to test its AI-native SDKs.
Although Rust code is still a small part of the Linux kernel, its presence is growing—and Rust’s memory safety is paying off.
NVIDIA is adding native support for Python to CUDA, its toolkit for programming GPUs.
NVIDIA has also announced that a future version of CUDA will allow developers to treat large clusters of GPUs as a single virtual GPU. There’s no estimate for when these new features will be released.
Microsoft has published a paper about giving a code-generating LLM access to a Python debugger. Agentic vibe debugging, here we come!
Run a server in the browser? With Wasm, why not? It’s not a good production environment, but it could be ideal for development and debugging.
Rust finally has a formal language specification! The spec was developed and donated to the Rust Foundation by Ferrous Systems, a company that develops Rust compilers. I’m shocked that one didn’t already exist—but apparently one didn’t.

Security

Policy Puppetry is a new prompt injection attack technique that works against all major LLMs. The attack works by writing the malicious prompt in a form that can be interpreted as a policy file that the LLM would be required to obey.
Windows Recall is back. It’s in the preview channel. Many of the problems appear to have been fixed. It’s not on by default, it can be uninstalled, and it can be used without a network connection. But it’s still creepy, and Microsoft’s reputation is a problem that remains.
Mitre’s CVE program (Common Vulnerabilities and Exposures) was almost defunded. Funding expired on April 15 and was only extended for 11 months on April 17. CVE has been essential in disseminating information about security weaknesses in computer systems.
Google has announced end-to-end encryption (e2e) for Gmail. While this reduces the burden of implementing e2e encryption for IT departments, it’s debatable whether this is truly e2e. Recipients who don’t use Gmail can use a special subset of Gmail to read encrypted mail.
OpenPubkey SSH simplifies using SSH with single sign-on. It adds SSH public keys to the ID tokens used by OpenID Connect. Short-lived SSH keypairs are created automatically when users sign in, and don’t need to be managed by users.

Infrastructure

Web

Could OpenAI be the new Twitter? The company’s apparently in the early stages of creating a social network that integrates with ChatGPT.
xkcd’s annual belated April Fools’ joke on push notifications is a masterpiece.
Mozilla is looking past its Thunderbird email client to Thundermail Pro, a full email service that’s designed to compete with Gmail. It will include a calendaring service and an AI tool for help writing messages.

Quantum Computing

Quantum messages have been sent over commercial communications infrastructure. The distance (254 km) almost doesn’t matter; what’s more important is that the experiment used commercial optical fiber with no cooling or other quantum-specific support.
An Australian company has developed an alternative to GPS that uses quantum sensors to pinpoint locations based on the Earth’s magnetic field. The device doesn’t emit signals, can filter out noise, and unlike current GPS systems, isn’t vulnerable to outages or attacks.
Phasecraft has developed an algorithm that makes quantum simulations more efficient. This advance could help quantum computers to model chemical reactions and create new materials.

Robotics

Hugging Face has acquired Pollen Robotics and is planning to sell robots. Its first offering, Reachy 2, is a humanoid robot that can be programmed using Hugging Face’s LeRobot models.
RoboBee is a tiny flying robot (roughly an inch long) that can land safely on a leaf.

Learn faster. Dig deeper. See farther.

Source link