Weekly Notes

Personal Reflection

This week the "agents as search" mental model clicked into focus while Anthropic fought simultaneously on three fronts: a Pentagon ultimatum, a coordinated Chinese distillation attack on 16M+ exchanges, and a Trump administration pressure campaign designed to make the lab an example. The labor story landed too — Block cut 40% of its workforce citing AI tools directly — and the week's remarkable Anthropic research output (coding skills RCT, model retirements, persona selection) made clear the pace of change is now structural.

🧠 Main

Agents are not thinking, they are searching — A sharp essay reframing AI agents as RL policies doing reward-maximizing search through action space, not "reasoning" in any cognitive sense. The practical payoff is Agent Field Theory: environment design and reward shaping beat prompt engineering every time.
Pentagon Gives Anthropic Ultimatum — The DoD issued Anthropic a deadline to allow Pentagon use of Claude without AI-safety restrictions, or face consequences including potential loss of government contracts. The first time a US defense agency has issued an ultimatum to an AI lab over safety guardrails.
Anthropic CEO says Pentagon threats don't change position — Dario Amodei is holding the line: no blanket DoD authorization for autonomous weapons or mass-surveillance use cases, even under political pressure. The standoff is unresolved and the most significant AI-governance moment of the year so far.
The Trump Administration Is Trying to Make an Example of Anthropic — The Center for American Progress argues the DoD pressure campaign is political: the administration wants to dismantle AI safety norms using Anthropic as a high-profile test case. Provides the policy context the WSJ and CNBC reports lack.
Detecting and preventing distillation attacks — Anthropic caught DeepSeek, Moonshot, and MiniMax running 16M+ exchanges through roughly 24,000 fraudulent accounts to systematically extract Claude's capabilities. The scale and coordination make this a national-security incident, not just a terms-of-service violation.
Anthropic Banned Third-Party Tools. Here's What It Means. — Anthropic revoked OAuth access for third-party Claude integrations, effectively ending a class of community-built tools. The OpenClaw developer community is scrambling to adapt. Raises sharp questions about platform risk for anyone building on Claude's API ecosystem.
Claude Cowork Enterprise Expansion — Anthropic moves Claude Cowork out of research preview and into full enterprise mode, adding native connectors for Google Drive, Gmail, DocuSign, and FactSet. The gap between AI assistant and enterprise workflow platform is closing fast.
Jack Dorsey's Block to Lay Off 40% of Workforce in AI Remake — More than 4,000 employees are being cut. Dorsey explicitly cited "intelligence tools" changing what it means to build and run a company. The clearest corporate admission yet that AI is the proximate cause of mass layoffs, not a backdrop.
Head of Amazon's AGI lab is leaving the company — The director of Amazon's AGI research division is departing without a replacement named. The exit comes as Amazon lags behind Google, OpenAI, and Anthropic on frontier model capabilities — a talent signal that's hard to interpret charitably.
Meta and AMD Agree to AI Chips Deal Worth More Than $100 Billion — Meta and AMD signed a multi-year partnership worth over $100 billion for AI accelerator chips — the largest chip procurement deal in history. A direct challenge to Nvidia's GPU monopoly and a signal that Meta is building vertical hardware independence.
SpaceX, OpenAI & Anthropic IPOs: A $3 Trillion Stress Test — Tom Tunguz analyzes the prospect of three $1T+ companies going public in 2026. The combined float would be the largest single-year tech IPO wave in history, stress-testing venture capital exit math and public-market AI valuations simultaneously.
Can OpenAI Build Alexa Before Amazon Can Build ChatGPT? — The smart-speaker race is now a three-way sprint: OpenAI's Jony Ive device (camera, no screen, $200–$300), Alexa+, and Apple HomePad. The twist: Amazon may partner with OpenAI on the retail layer, making competitors into collaborators.
Intrinsic joins Google — Alphabet folds its robotics software subsidiary directly into Google to accelerate physical AI, where it will work alongside DeepMind and Gemini on industrial automation. The Alphabet moonshot portfolio is consolidating around AI.
OpenAI prepares new ChatGPT Pro Lite tier at $100 monthly — OpenAI is testing a $100/month Pro Lite tier, splitting the $200/month Pro offering. Likely targets power users who want extended reasoning but don't need full o3 access. Pricing strategy is converging across the major labs.
Non-Code Moats — Rich Mironov argues that in an AI-assisted world, the defensible moats are now operational — customer relationships, domain expertise, trust, and distribution, not code. A useful counterweight to the "anyone can ship software now" narrative.
Open Source in the age of AI — John O'Nolan (Ghost founder) argues AI is simultaneously open source's biggest threat and biggest opportunity: LLMs commoditize what open source once uniquely offered, but they also accelerate contributions. Worth reading alongside the Mironov moats piece.
Why Developers Keep Choosing Claude Over Every Other AI — A detailed breakdown of why developers systematically prefer Claude for coding tasks — instruction following, refusal calibration, long context handling, and code quality. Complements the Anthropic coding skills RCT from the Research section.
OpenAI's Kevin Weil on the Future of Scientific Discovery — OpenAI's CPO argues AI will compress the research cycle from years to days across biology, chemistry, and materials science. One of the clearest articulations of the "AI as scientific multiplier" thesis from inside a lab building it.

🧪Research

Does Your Reasoning Model Implicitly Know When to Stop Thinking? — SAGE and SAGE-RL show that large reasoning models already have an internal signal for when to stop, but current sampling obscures it. Aligning sampling with that signal yields +2.1% accuracy and 44.1% token reduction on math benchmarks — efficiency gains without retraining.
Codex Prompting Guide — OpenAI's official guide for GPT-5.2-codex covers autonomy settings, the apply_patch workflow, context compaction for multi-hour sessions, and personality modes. Essential reading before building anything serious on Codex.
Long horizon tasks with Codex — OpenAI's guide to running multi-hour Codex sessions, covering context compaction strategies, AGENTS.md patterns, and how to structure tasks that exceed a single context window. The practical counterpart to the Codex Prompting Guide.
DeepSeek withholds latest AI model from US chipmakers — DeepSeek gave Huawei a weeks-long head start for hardware optimization of its newest model. The report also suggests the model may have been trained on Blackwell GPUs in potential violation of US export controls — a deliberate geopolitical signal, not an oversight.
The First Fully General Computer Action Model (FDM-1) — SI's FDM-1 claims to be the first model that can execute arbitrary computer tasks without task-specific training — a step toward truly general computer-use agents. Early benchmark results are strong, though the "fully general" claim needs independent verification.
Claude Opus 3 retirement + persona selection model — The @AnthropicAI thread announces Claude Opus 3's retirement date and introduces a persona selection model that lets users choose Claude's communication style. The persona feature is more significant than the deprecation — it's a UX bet on personality as a product differentiator.
AI assistance impacts formation of coding skills — Anthropic published a randomized controlled trial measuring whether AI coding assistance impairs skill development in new programmers. Results show mixed effects: productivity improves immediately, but foundational skill formation slows for beginners. Methodologically rigorous and likely to be cited in education policy debates.

🛠️Tools

Introducing Perplexity Computer — A unified multi-model worker that orchestrates Opus 4.6, Gemini, Grok, ChatGPT, and specialist models as sub-agents running hours-long workflows with real filesystem, browser, and tool integrations. The most ambitious single-product take on the "AI as your second computer" idea yet.
The AI is the Computer — Perplexity CEO Arav Srinivas frames the Computer launch as a thesis, not just a product: the AI model becomes the operating system, and apps are just plugins. The most direct articulation of the "post-app" worldview yet from a major lab CEO.
KiloClaw — Deploy OpenClaw agents in 60 seconds — Kilo's hosted platform lets anyone deploy production OpenClaw agents in 60 seconds with no DevOps work. Lowers the barrier for shipping agents from "you need a backend team" to "you need a browser tab."
Google Nano Banana 2 — Google's new image generation model promises professional-quality outputs at flash speed, positioned against Midjourney and DALL-E 3. The naming is distinctly un-Google, suggesting a brand repositioning toward consumer appeal.
Perplexity tests Messages integration and usage credits — Perplexity is testing a direct Messages app integration and usage-credit sharing across accounts. If shipped, it would make Perplexity the first AI assistant with native SMS/Messages reach on iOS — a meaningful distribution unlock.

🌅Closing Reflection

Best revisit: the "agents as search" essay for the clearest conceptual reframe of the week, Anthropic's distillation-attack report for a concrete picture of AI capability theft at scale, and the Non-Code Moats piece for the most useful counterpoint to the "now anyone can ship software" narrative. The Pentagon standoff and the coding-skills RCT are the two stories most likely to look significant in hindsight.

🙏Thanks & Contact

Thanks for reading! If you have suggestions or feedback, I'd love to hear from you via my contact form. See you next week!

Weekly Notes - 2026-W09

About the Author