[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor - Latent Space Recap
Podcast: Latent Space
Published: 2025-12-30
Duration: 45 minutes
Guests: Ashvin Nair
Summary
Ashvin Nair discusses his journey from Berkeley robotics to leading model development at Cursor, emphasizing the evolution and challenges of reinforcement learning (RL) in AI. He highlights Cursor's unique approach to continual learning and the potential for LLM agents to outpace robotics in market impact.
What Happened
Ashvin Nair, now at Cursor, reflects on his journey from a Berkeley robotics PhD to leading advancements at OpenAI and now driving innovation at Cursor. He recounts his early days at OpenAI during the Dota era, where a small team believed in the potential of RL, leading to significant scaling when initial prototypes showed promise. This belief laid the groundwork for the reasoning team to grow from a dozen to over 300 people, focusing on scaling up AI models and reasoning capabilities.
Nair discusses the paradox of achieving IOI Gold in programming competitions, which was once seen as a pinnacle of AI achievement, yet it did not bring the expected transformative impact on daily life. He reflects on the limitations of RL when applied beyond training distributions, noting the importance of integrating economically useful tasks into AI development to ensure practical applications.
The episode delves into why much of the RL research from 2017 to 2022 did not generalize well beyond benchmarks, with the community often overfitting to benchmarks and valuing complexity over simplicity. Nair emphasizes the importance of co-designing products and models to ensure real-world applicability, a strategy he believes Cursor is uniquely positioned to execute.
Cursor's approach to continual learning involves policy updates every two hours, integrating product and model development closely. This methodology allows for a seamless workflow where engineers remain engaged, avoiding the pitfalls of constant context switching. Nair highlights this as a crucial factor in Cursor's ability to execute continual learning at scale.
Ashvin Nair also shares his vision for the future of AI, predicting a paradigm shift towards continual learning with infinite memory. He envisions models that can learn from single experiences, storing vast amounts of deployment tokens in their weights without exceeding capacity.
Cursor's Composer model exemplifies product-model co-design, being both intelligent and efficient, enabling engineers to stay in the loop without disruption. With its internal tooling strength, Cursor is able to integrate closely with data and user environments, positioning itself as a leader in automating software engineering processes.
Ashvin Nair expresses skepticism about the current goalposts for AGI, suggesting that while predictions are broadly on track, the definition and expectations of AGI remain fluid. He remains optimistic about the potential for AI to transform industries, with Cursor leading the charge in marrying product development with advanced AI capabilities.
Key Insights
- OpenAI's reasoning team expanded from a dozen to over 300 members, driven by early successes in reinforcement learning during the Dota era, which demonstrated the potential for scaling AI models.
- From 2017 to 2022, much of the reinforcement learning research failed to generalize beyond benchmarks due to a tendency to overfit and prioritize complexity over simplicity.
- Cursor's Composer model exemplifies a co-design approach, integrating product and model development to automate software engineering processes efficiently, keeping engineers engaged and reducing context switching.
- Cursor implements policy updates every two hours as part of its continual learning strategy, allowing for seamless integration of product and model development and maintaining workflow momentum.