Last month, DeepSeek released its R1 reasoning model (now apparently named DeepThink), with capabilities similar to OpenAI o1. What’s important about DeepSeek isn’t its benchmark results; there are a number of models on the same level as o1. What’s important is that it appears to have been trained with one-tenth the resources of comparable models. Throwing more hardware at a problem is rarely the best way to get good results.
Artificial Intelligence
- Anthropic has added a Citations API to Claude. Citations builds RAG directly into the model. It allows users to add documents to the context. When generating an answer, Claude includes citations that show exactly which parts of the documents were used in developing the response.
- OpenAI has released a research preview of Operator, its competitor to Anthropic’s Computer Use. Like Computer Use, Operator is a general-purpose agent: It can use a browser to navigate the web, bring back information, and generate new actions to accomplish the user’s request.
- Berkeley has released Sky-T1-32B-Preview, a small reasoning model that cost under $450 to train. It’s based on Alibaba’s Qwen2.5-32B-Instruct. Sky’s performance is similar to OpenAI o1-preview, and it’s fully open: Training data, weights, code, and infrastructure are all open source.
- DeepSeek has released its R1 reasoning model, on which its V3 model was based. R1 has performance equivalent or superior to OpenAI o1 and is significantly less expensive. DeepSeek has also released several other models derived from R1, including a number of smaller models based on Llama and Alibaba’s Qwen. All of these models have open code and weights.
- The key to using OpenAI o1 effectively is context, not clever prompting. “Don’t write prompts, write briefs”; give it all the information it needs to solve a problem.
- OpenAI has announced a new technique for training its new reasoning models to be safe. Deliberative alignment trains the models to reason on the safety policies themselves rather than requiring humans to grade model responses.
- Meta has introduced SeamlessM4T, a multimodal (speech and text) model designed for translation. It can translate speech-to-speech and text-to-speech for nearly 100 input languages and 35 output languages.
- Anthropic has received ISO 42001 certification. This certification covers responsible AI and addresses AI design and deployment processes, transparency, testing and monitoring, and oversight.
- Google has released a paper on a new LLM architecture called Titans (a.k.a. Transformers 2.0). The primary advantage of Titans is its ability to scale to very large context windows. In effect, it adds persistent long-term memory to the Transformers model.
- ChatGPT can now schedule recurring tasks, making it more like a personal assistant. Tasks can include generating reminders, scheduling, summarizing news, and other chores.
- AI systems may “think” using a variant of Occam’s razor, which prioritizes simpler solutions to problems.
- Mistral has released Codestral 25.01, a language model that’s optimized for code generation. It claims proficiency in over 80 programming languages. This new release is faster, supports a larger context window, and gives better benchmark results than similarly sized models.
- Harvard’s Institutional Data Initiative has assembled a large dataset of digitized copyright-free works for training language models. The collection currently has roughly 1 million books; it’s significantly larger than the Books3 dataset that was used to train earlier models.
- Microsoft’s Phi-4 model is now available on Hugging Face and Ollama. It’s yet another impressive model that can run on a reasonably well-equipped laptop.
- 4M is an open source framework for training multimodal AI models.
- NVIDIA has announced Project DIGITS, a personal supercomputer for running AI models up to 200B parameters locally. The system comes with 128GB of RAM. They will be available in May; the starting price is $3,000.
- O2 (the company, not the skilled GPT version number) has announced Daisy, a language model of its own. It answers fraudulent phone calls in real time, wasting the scammer’s time by impersonating a vulnerable elderly person.
- Fast-LLM is an open source library for training large language models. It can scale to run on anything from a single GPU to large clusters and can train models up to (and exceeding) 70B parameters.
Programming
- Puppet joins the group of former open source projects that have an open source fork: OpenVox. OpenVox promises to be fully Puppet-compatible. The project is looking for sponsors.
- Stratoshark is a new tool for analyzing system calls on Linux. It’s a companion to Wireshark, with a similar user interface that’s designed to help users capture system calls and analyze what they’re doing.
- Need to write applications for the Cray X-MP in your basement? You’ll need a compiler. Here’s one that runs on Linux and macOS.
- Sigstore is a project that simplifies digitally signing and managing open source software components. It reduces the burden of establishing provenance for software you’ve developed, along with checking the provenance of software dependencies you use.
- If you generate more code, there will be more code to debug and review. Two-thirds of developers in groups that use AI are spending more time debugging and resolving security vulnerabilities.
- Do you really need a new terminal emulator? Ghostty is getting rave reviews. It’s worth trying. Forgejo is an open source software forge. It’s a decentralized platform for collaborative software development that includes a self-hosted alternative to GitHub.
- A startup is building digital twins of cities. These will be very useful to city planners—and possibly also for emergency response.
- Leptos is a new web framework for Rust. Like Sycamore, another Rust web framework, Leptos compiles Rust to WebAssembly.
- The International Obfuscated C Code Contest is back! (Did you miss it?) For more information, follow @ioccc on Mastodon (fosstodon.org).
- A chess engine in 84,688 regular expressions: It’s a regex masterpiece. As the author says, more people should do entirely pointless things.
Security
- Cybercriminals are distributing malware through Roblox mods. Discord, Reddit, GitHub, and other communications channels are used to attract users to malware-containing packages.
- Cloudflare has successfully mitigated the largest DDOS attack ever seen: 5.6 terabits/second from the Mirai botnet. An important new twist: Attacks are very short-lived, making human response impossible.
- Phishing doesn’t always start with an email. Cybercriminals are placing Google search advertisements that direct victims to phishing sites that steal their credentials.
- The FBI has forced the PlugX malware to delete itself from over 4,200 computers. Since roughly 2014, PlugX has been used by the Chinese government to steal data from victims. One suspects that the next version of PlugX won’t have a “self-delete” command.
- A new ransomware attack called Codefinger encrypts AWS S3 buckets. The attack uses AWS’s server-side encryption (SSE) to generate cryptographic keys that Amazon doesn’t store; they are only known to the attacker.
- Microsoft has sued a group of unnamed (and unknown) developers for compromising legitimate user accounts and using those accounts to generate harmful content.
- An incorrect certificate is causing macOS to treat Docker Desktop as malware, preventing it from starting. The problem can be fixed by upgrading to Docker 4.37.2.
- An attack against the cryptocurrency transaction simulation mechanism tricks victims into approving transactions that strip their wallet of cryptocurrency.
- The Cyber Trust Mark is a certification intended to ensure consumers that devices incorporating AI meet certain standards set by the US National Institute of Standards and Technology (NIST) and the Federal Communications Commission (FCC).
- Apple is discovering that errors aren’t the only problem with consumer-facing AI; the company is also having problems with email and chat summaries that make spam and fraud messages look legitimate.
- Security products based on fear, along with security sales and marketing practices, are counterproductive.
Web
- Regardless of the future of TikTok, Pixelfed—a decentralized application for sharing photos and videos—looks like a good alternative. Like Mastodon, Pixelfed is part of the fediverse and is built on the federated ActivityPub protocol.
- Mercator: Extreme allows you to put the North Pole anywhere you want, and draws the corresponding Mercator map. Aside from being a web masterpiece, it shows just how distorted the Mercator projection is. Sadly, almost all of our maps are still based on it.
- Marimo playgrounds are notebooks (like Jupyter) that run entirely in the browser using WebAssembly. They can easily be created and shared on GitHub or on marimo.app.
- Most online organizations have some kind of web-based API access. Now that AI is in the picture, APIs must be usable by AI agents. They need to be properly documented in a machine-readable fashion (e.g., with OpenAPI) and as uniform as possible.
- A new fork of the Flutter project, called Flock, intends to provide features and bug fixes that users have wanted but that have never made it into the release.
- Streets is a 3D version of OpenStreetMap. It takes a long time to load and many of the labels aren’t up-to-date, but it’s impressive.
- What’s the future of the web? If the web is to be a data source for AI, it will need to get much simpler, shedding megabytes of JavaScript and CSS in favor of text.
- Something new in CAPTCHAs: Play Doom and kill at least three monsters. It was built with prompt-driven AI using Vercel’s v0 and runs in the browser with Wasm. Unfortunately, I doubt it will keep bots out for long.
Virtual Reality
Quantum Computing
- A new quantum computing technology enables trapped ions to move around on a quantum computing chip. This allows the developers to build chips that support more qubits efficiently.
- A new kind of quantum refrigerator makes it possible to cool qubits to 22 millikelvin. At lower temperatures, they will be less vulnerable to errors from noise.
Robotics
- A robotic hand has been developed that can train pianists to perform very difficult movements more effectively.
Biology
- AI can be used to sharpen biological images that have been distorted by light passing through layers of tissue. In the past, this problem has been solved with expensive adaptive optics.