The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI - Latent Space Recap

Podcast: Latent Space

Published: 2026-02-06

Duration: 1 hr 8 min

Guests: Myra Deng, Mark Bissell

Summary

Goodfire AI is pioneering the use of interpretability in AI models to address fundamental flaws in the AI lifecycle. By creating bi-directional interfaces between humans and models, they aim to enhance control and understanding of AI behavior.

What Happened

Goodfire AI, led by Mark Bissell and Myra Deng, is at the forefront of making mechanistic interpretability a practical tool in AI development. They recently secured $150 million in Series B funding at a $1.25 billion valuation, underscoring their significant role in the industry. Goodfire's approach involves using interpretability not just as a post-training tool but as a foundational aspect of AI model development. They are developing lightweight probes and token-level safety filters that operate with near-zero latency, allowing real-time adjustments to trillion-parameter models like Kimi K2.

Mark and Myra emphasize that the current AI lifecycle is broken, as it relies heavily on data inputs without a reliable way to ensure models learn the correct behaviors. Goodfire's solution is to create a bi-directional interface between humans and models, enabling precise surgical edits to remove unwanted behaviors and biases. This approach contrasts with the traditional 'black-box' methods, offering a more transparent and controllable AI model.

One of the most notable applications of Goodfire's technology is with Rakuten, where their interpretability tools are used for real-time PII detection in multilingual settings, without training on actual customer data. This showcases the practical deployment of interpretability in high-stakes environments.

The episode also covers the operational benefits of interpretability, which can be more cost-effective than using large language models for model oversight. By using probing techniques, Goodfire reduces the need for substantial computational resources typically required by other guardrail methods.

Goodfire's work extends beyond language models, as they explore applications in genomics and medical imaging. Their interpretability techniques are being used to debug AI models and extract valuable scientific insights, accelerating discoveries in fields like healthcare.

The conversation also touches on the philosophical and theoretical aspects of AI interpretability, referencing sci-fi author Ted Chiang's works to illustrate the potential of AI models to self-analyze. This aligns with Goodfire's vision of intentional model design, where goals and constraints are directly imparted by experts rather than relying solely on data-driven methods.

Key Insights

Goodfire AI secured $150 million in Series B funding, reaching a $1.25 billion valuation, to advance mechanistic interpretability in AI model development.
Goodfire's interpretability tools enable real-time PII detection in multilingual settings for Rakuten without training on actual customer data, demonstrating practical deployment in high-stakes environments.
By using lightweight probes and token-level safety filters, Goodfire AI can make real-time adjustments to trillion-parameter models like Kimi K2, reducing the need for substantial computational resources.
Goodfire's interpretability techniques are being applied beyond language models, aiding in debugging AI models in genomics and medical imaging, thereby accelerating scientific discoveries in healthcare.