What Changed This Week
The AI landscape moved fast this week. Here are six developments that matter if you write prompts for a living — or just want better results from AI models.
1. GPT-5.4 Is the New Default
OpenAI retired GPT-5.1 on March 11 and rolled users forward to GPT-5.4. The new model favors structured prompts — think JSON schemas, XML scaffolding, and the CTCO framework (Context, Task, Constraints, Output) over conversational phrasing.
What to do: Pin your production apps to specific model snapshots. If you rely on structured outputs like SQL or JSON, GPT-5.4 can emit them directly without markdown wrappers — just ask.
2. Claude 4.6 Introduces the Effort Parameter
Anthropic's Claude Opus 4.6 and Sonnet 4.6 now ship with adaptive thinking — the model decides when and how deeply to reason. But there is a catch: Sonnet 4.6 defaults to high effort, which can spike latency if you are migrating from Sonnet 4.5 without adjusting settings.
What to do: Set the effort parameter explicitly. Use high effort for complex reasoning tasks, and dial it down for simpler ones where speed matters more.
3. GEPA: A Better Way to Optimize Prompts Automatically
GEPA (Genetic-Pareto Reflective Prompt Evolution) was accepted as an oral presentation at ICLR 2026. It is an evolutionary prompt optimizer that doubles the gains of MIPROv2 — the previous best — while using far fewer compute resources than reinforcement learning alternatives.
What to do: If you use DSPy, try dspy.GEPA as your first optimizer. It works best when you have measurable quality metrics to optimize against.
4. Context Engineering Is Now a Real Discipline
The shift from prompt engineering to context engineering is no longer just a buzzword. Concrete best practices are now documented across the industry:
- Just-in-time retrieval beats stuffing your entire knowledge base into the prompt. Identify intent first, then fetch only what is relevant.
- Keep tool selections under 30. Applying RAG to tool descriptions yields 3x better tool selection accuracy.
- Watch where you place critical info. Models show a U-shaped attention curve — accuracy drops over 30% for information buried in the middle of long contexts. Put the important stuff at the beginning or end.
5. Multi-Agent Orchestration Goes Mainstream
Multi-agent systems are no longer experimental. Specialized agent roles — Planner, Implementer, Tester, Reviewer — working in coordinated teams are now the default architecture for production AI systems. MCP (Model Context Protocol) has crossed 10,000 active public servers and is supported by ChatGPT, Cursor, Gemini, and VS Code.
What to do: If you are building with agents, focus on clear role definitions and handoff protocols between agents rather than cramming detailed instructions into each one.
6. Open-Weight Models Are Closing the Gap
Llama 4 ships in configurations up to 128x17B parameters with strong reasoning. Mistral Small 3.2 hits 92.9% on HumanEval Plus while running 3x faster than Llama 3.3 70B. For most enterprise use cases, good prompting with few-shot examples on open-weight models now gets the job done without fine-tuning.
The Takeaway
The theme across all six updates is the same: structure wins over cleverness. Whether it is structured prompts for GPT-5.4, explicit effort settings for Claude 4.6, or well-defined agent roles in multi-agent systems — the models are getting better at following clear instructions, and the best results come from giving them exactly that.