⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI - Latent Space Recap
Podcast: Latent Space
Published: 2025-12-26
Duration: 28 minutes
Guests: Brian Fioca, Bill Chen
Summary
The episode explores Codex Max, OpenAI's new coding agent, highlighting its ability to work continuously for over 24 hours, manage its context, and spawn sub-agents. It also discusses the training of AI models to develop personality and trust, emphasizing practical use cases over academic benchmarks.
What Happened
OpenAI's Codex Max, a new coding agent, has been designed to operate for over 24 hours, manage its own context, and spawn sub-agents, facilitating parallel work across codebases. Bryan and Bill discuss the importance of developing a model that engineers can trust, emphasizing that personality, communication, and self-checking are crucial elements. They describe how Codex Max has a preference for certain tools, like using 'rg' over 'grep,' and how renaming tools to match Codex's internal training names can dramatically improve performance.
The episode contrasts Codex with the more general GPT-5 model, noting that Codex is optimized for specific coding tasks within its harness while GPT-5 is broader and more adaptable to different tools and modalities. The hosts explain how OpenAI collaborates with coding partners to co-develop tool integrations and discover unexpected model habits, such as the significance of tool naming conventions.
Bryan and Bill discuss the shift from academic benchmarks to applied evaluations that capture real-world use cases, highlighting that approximately 50% of OpenAI employees now use Codex daily. They also touch on the potential of multi-turn evaluations and the idea of using language models as judges for entire task trajectories, introducing Bryan's concept of a 'job interview eval.'
The discussion expands into how coding agents are moving beyond code, integrating personal automation and organizing workflows like email and file management. They envision a future where coding agents are trusted enough to handle complex refactors and general enough to build integrations and unlock new capabilities.
The hosts note the growing trend of sub-agents and agents using agents, with Codex Max being designed to spawn instances that can handle context handoffs and parallel work. They highlight the role of Slack as a potential ultimate user interface for work, where coding agents could automate a wide range of tasks.
Looking to the future, Bryan and Bill predict that by 2026, coding agents will be capable enough to be utilized by any company, democratizing access to top-tier developer capabilities. They express a desire for increased trust in AI models, allowing them to perform tasks traditionally reserved for elite engineers.
Key Insights
- Codex Max, a new coding agent by OpenAI, can operate autonomously for over 24 hours and manage its own context, enabling it to spawn sub-agents for parallel work across different codebases.
- Renaming tools to match Codex Max's internal training names can significantly enhance its performance, as the model has specific preferences, such as using 'rg' over 'grep'.
- Approximately 50% of OpenAI employees use Codex daily, marking a shift from academic benchmarks to applied evaluations that reflect real-world use cases.
- By 2026, coding agents are expected to be advanced enough for widespread use across companies, democratizing access to high-level developer capabilities traditionally reserved for elite engineers.