AI Coding News

March 24, 2026

Key Signals

Claude Code's new "auto mode" shifts permission decisions from the developer to an AI safety layer, marking a turning point in how agentic coding tools handle autonomy. Rather than forcing developers to choose between babysitting every action or running fully unchecked via dangerously-skip-permissions, auto mode uses an AI reviewer to classify each action as safe or risky before execution, blocking prompt injection and unauthorized operations automatically. The feature is currently a research preview limited to Claude Sonnet 4.6 and Opus 4.6, with Anthropic recommending sandboxed environments — reflecting the industry's acknowledgment that full agent autonomy requires new trust architectures, not just better models. [1][2]
Anthropic expands Claude's computer use to let AI agents directly control desktops — opening apps, navigating browsers, and editing files — intensifying a multi-vendor race for AI-controlled workstations. The feature follows similar releases from Perplexity, Manus, and Nvidia, all arriving within weeks of each other. This wave traces directly back to the viral success of OpenClaw earlier in 2026, which prompted OpenAI to hire its creator. Anthropic warns its safeguards "aren't perfect" and restricts access to sensitive app categories like trading platforms by default, underscoring the unresolved tension between agent capability and safety. [3][4]
GitHub Copilot's coding agent can now be invoked directly within any pull request via @copilot, a significant workflow change that embeds AI assistance deeper into the code review loop. Previously, Copilot would open a separate PR; now it pushes changes directly to the existing branch after validating against tests and linters in a cloud environment. Combined with new REST APIs for managing coding agent repository access at scale, GitHub is systematically removing friction for organizations adopting AI-assisted development. [5][6]
The "coding was never the bottleneck" thesis gains empirical support: Faros AI data from 10,000+ developers shows 21% more completed tasks with AI adoption, but PR review time increased 91%, relocating the constraint to specification and verification. Agoda's engineering analysis proposes a "grey box" model where developers own the spec and acceptance criteria while treating generated code as an intermediate artifact — neither reviewing every line nor shipping blindly. A complementary tutorial on freeCodeCamp introduces "spec-writer," a Claude Code skill that generates structured specs with explicit [ASSUMPTION] tags, compatible across Claude Code, Cursor, GitHub Copilot, and Gemini CLI. This convergence around Spec-Driven Development may reshape how teams integrate AI coding agents. [7][8]
Gemini CLI ships two major releases in one day, advancing a subagent architecture with native OS sandboxing and parallel tool execution. The v0.36.0-preview.0 release introduces multi-registry tool filtering for subagents, macOS Seatbelt sandboxing, native Windows sandboxing, Git worktree support for isolated parallel sessions, and an experimental memory manager agent. The stable v0.35.0 adds customizable keyboard shortcuts, a model-driven parallel tool scheduler, and concurrent safe-tool execution. Together, these releases signal Google's push toward a fully sandboxed, multi-agent CLI environment. [9][10]
Mozilla's "cq" project proposes a shared knowledge commons for coding agents, tackling the expensive problem of thousands of agents independently re-solving the same issues. The system lets agents query a shared pool before starting unfamiliar work and contribute novel discoveries back, with knowledge earning trust through validated use rather than authority. This directly addresses two persistent pain points — training-cutoff staleness and the limitations of static .md instruction files — and could fundamentally change how agents accumulate and share expertise if it achieves adoption. [11]

AI Coding News

Claude's computer use capability now extends to desktop control within Cowork, letting users dispatch tasks from their phone and have the AI execute them by opening apps, navigating browsers, and filling spreadsheets. Anthropic positions this as a research preview with built-in safeguards: Claude requests permission before accessing new apps and blocks categories like cryptocurrency and trading platforms. The company acknowledges that training-based safety measures are imperfect, warning that "Claude may occasionally act outside these boundaries." The feature competes directly with Perplexity's Personal Computer, Manus's My Computer, and Nvidia's NemoClaw, all launched within weeks. [3][4]
Mozilla developer Peter Wilson introduces "cq," a knowledge-sharing platform for AI coding agents that functions as "Stack Overflow for agents." Agents query the cq commons before tackling unfamiliar APIs, CI/CD configs, or frameworks, and contribute novel solutions back to the pool. The system aims to replace static claude.md and agents.md files with a living knowledge base where entries earn trust through confirmed use. Security, data poisoning, and accuracy remain open challenges that will determine whether the project achieves meaningful adoption. [11]
Kubernetes co-founder Brendan Burns argues that AI-generated code will become as invisible as assembly — a transient artifact nobody reads, validated by test suites rather than human reviewers. Burns, who now runs Azure's 1,400-person container infrastructure organization, pushes back on the current focus on scaling code review: "Did you forget that 100% of your code was machine-generated if you used a compiler? We stopped caring about that code." He suggests that future programming languages may be designed for AI rather than human ergonomics, optimizing for formal guarantees over readability. [12]
Agoda's engineering team publishes an analysis arguing that AI coding assistants have not meaningfully accelerated project-level delivery, because coding was never the primary bottleneck. The analysis proposes a "grey box" taxonomy: white-box line-by-line review doesn't scale, black-box "vibe coding" is brittle for production, and the preferred middle ground treats specification and verification as the engineer's primary deliverables. Faros AI research across 1,255 teams supports the conclusion — teams with high AI adoption merged 98% more PRs while review time nearly doubled. [7]
A freeCodeCamp tutorial introduces "spec-writer," a Claude Code skill implementing Spec-Driven Development that generates structured specifications with explicit assumption tagging before any code is written. The skill produces three outputs — a SPEC, PLAN, and TASKS — and marks every implicit decision with [ASSUMPTION: ...] tags ranked by architectural impact. It uses the Agent Skills standard, working across Claude Code, Cursor, GitHub Copilot, and Gemini CLI, and is compatible with GitHub Spec Kit and OpenSpec frameworks. [8]
WebAssembly emerges as a leading candidate for sandboxing AI agent-generated code, offering isolation advantages over containers and microVMs that rely on shared kernels. At Wasm I/O in Barcelona, systems engineer Dan Phillips demonstrated that Wasm modules start with zero capabilities and add from there, making entire classes of exploits "unavailable by construction." The open-source Boxer project bridges the adoption gap by letting developers package existing Dockerfiles as Wasm distributions without code rewrites, potentially lowering the barrier for teams seeking safer agent execution environments. [13]

Feature Update

Claude Code auto mode lets the AI autonomously classify and execute safe actions while blocking risky ones, replacing the binary choice between constant manual approval and dangerously-skip-permissions. The feature uses an AI safety layer that reviews each action for unauthorized behavior and prompt injection patterns. Currently limited to Claude Sonnet 4.6 and Opus 4.6, auto mode is rolling out to Team plan users first, with Enterprise and API access following. Anthropic recommends isolated environments during the preview period. [1][2]
GitHub Copilot coding agent can now be mentioned with @copilot in any pull request to fix failing workflows, address review comments, or make arbitrary changes directly on the PR branch. The agent works in a cloud-based environment where it validates changes against tests and linters before pushing. Previously, Copilot opened a new PR on top of the existing one; developers who prefer that behavior can still request it in natural language. Available with all paid Copilot plans, with administrator enablement required for Business and Enterprise tiers. [5]
GitHub releases Copilot coding agent management REST APIs in public preview, enabling organization owners to manage repository access for the coding agent programmatically at scale. This addresses a key enterprise adoption requirement: the ability to control which repositories the coding agent can access without manual per-repo configuration. [6]
Copilot SDK v0.2.1-preview.0 adds slash commands and UI elicitation for Node.js, custom model listing for BYOK mode across all four SDKs, and blob attachments for inline image data. The Node.js SDK now supports session.ui.confirm(), session.ui.select(), and session.ui.input() dialogs, and tools can set skipPermission: true to bypass per-use prompts. Notable fixes include CJS compatibility for VS Code extensions, C# AOT serialization crash resolution, and a breaking Go enum naming convention change to TypeNameValue. [14]
Gemini CLI v0.36.0-preview.0 introduces a multi-registry subagent architecture with tool filtering, native macOS Seatbelt and Windows sandboxing, Git worktree support for isolated parallel sessions, and an experimental memory manager agent. Additional features include task tracker protocol integration into the core system prompt, A2A agent acknowledgment commands, plan mode support in non-interactive mode, and admin-forced MCP server installations. The release contains over 50 merged PRs spanning core subagent execution, security, and UI improvements. [9]
Gemini CLI v0.35.0 stable release delivers customizable keyboard shortcuts, extended vim mode, and a model-driven parallel tool scheduler allowing safe tools to execute concurrently. Other additions include an --admin-policy flag for supplemental policies, browser input blocker overlay during automation, custom base URL support via environment variables, and SandboxManager interface and config schema. The release also fixes critical issues with subagent context propagation, session resume, and API error retry. [10]
Kiro upgrades Claude Opus 4.6 and Sonnet 4.6 to a 1M context window (up from 200K), marking the models as generally available for Pro, Pro+, and Power tier subscribers. Separately, MiniMax 2.5 is now available in eu-central-1, extending regional availability beyond us-east-1 with a 0.25x credit multiplier and 200K context window across all subscription tiers. [15]
OpenCode v1.3.1 adds Poe as a built-in authentication provider, token caching for Amazon Bedrock, and syntax highlighting for Kotlin, HCL, Lua, and TOML. The release includes 14 bug fixes covering session timeline scrolling, GitLab Duo Workflow identity, theme mode switching, and sidecar process cleanup. The command palette shortcut changes to Cmd+K. OpenCode v1.3.2 follows with heap snapshot functionality for TUI and server process debugging. [16][17]
OpenAI Codex ships four Rust CLI alpha releases (0.117.0-alpha.11 through alpha.14) on a single day, continuing rapid iteration on the rewrite. The releases are incremental builds without detailed changelogs, suggesting active development on the Rust-based CLI replacement. [18]