Read What Matters

Read What Matters

Every week our team explores all the relevant researches in GenAI. Check out our curated selection.

Read What Matters

Read What Matters

Every week our team explores all the relevant researches in GenAI. Check out our curated selection.

Read What Matters

Read What Matters

Every week our team explores all the relevant researches in GenAI. Check out our curated selection.

RAG is Dead, Context Engineering is King

RAG is Dead, Context Engineering is King

RAG is Dead, Context Engineering is King

👥 Jeff Huber of Chroma

📅 August 19, 2025

⏱️ 10 mins

Abstract

Abstract

Abstract

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Why It Matters

Why It Matters

Why It Matters

Many AI applications involve RAG in some kind of way, but almost everyone uses it in an inefficient way. Jeff Huber discusses how we can build smarter RAG in this Latent Space podcast.

Many AI applications involve RAG in some kind of way, but almost everyone uses it in an inefficient way. Jeff Huber discusses how we can build smarter RAG in this Latent Space podcast.

Many AI applications involve RAG in some kind of way, but almost everyone uses it in an inefficient way. Jeff Huber discusses how we can build smarter RAG in this Latent Space podcast.

Key Findings

Key Findings

Key Findings

✓ Context quality is more important than the amount of context,

✓ Hybrid retrieval systems are more effective than plain RAG,

✓ Large context windows in LLM could be not as useful as we think.

✓ Context quality is more important than the amount of context,

✓ Hybrid retrieval systems are more effective than plain RAG,

✓ Large context windows in LLM could be not as useful as we think.

✓ Context quality is more important than the amount of context,

✓ Hybrid retrieval systems are more effective than plain RAG,

✓ Large context windows in LLM could be not as useful as we think.

Context Engineering

RAG

RAG

Retrieval Systems

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

👥 Jiejun Tan; Zhicheng Dou; Yan Yu; Jiehan Cheng; Qiang Ju; Jian Xie; Ji-Rong Wen

📅 August 11, 2025

⏱️ 25 mins

Abstract

Abstract

Abstract

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web.

Read more

Why It Matters

Why It Matters

Why It Matters

Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus.

Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus.

Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus.

Key Findings

Key Findings

Key Findings

✓ Coupling a local deep search agent with a web search one and a planner agent is effective especially in searching and reasoning,

✓ Using reasoning models for agentic tasks greatly increases token consumption, as expected.

✓ Coupling a local deep search agent with a web search one and a planner agent is effective especially in searching and reasoning,

✓ Using reasoning models for agentic tasks greatly increases token consumption, as expected.

✓ Coupling a local deep search agent with a web search one and a planner agent is effective especially in searching and reasoning,

✓ Using reasoning models for agentic tasks greatly increases token consumption, as expected.

Deep Research

Team of Agents

Enterprise

Team of Agents

Enterprise

Agentic Enterprise: AI-Centric User to User-Centric AI

Agentic Enterprise: AI-Centric User to User-Centric AI

Agentic Enterprise: AI-Centric User to User-Centric AI

👥 Arpit Narechania; Alex Endert; Atanu R. Sinha

📅 June 28, 2025

⏱️ 35 mins

Abstract

Abstract

Abstract

After a very long winter, the Artificial Intelligence (AI) spring is here. Or, so it seems over the last three years. AI has the potential to impact many areas of human life - personal, social, health, education, professional.

Read more

After a very long winter, the Artificial Intelligence (AI) spring is here. Or, so it seems over the last three years. AI has the potential to impact many areas of human life - personal, social, health, education, professional.

Read more

After a very long winter, the Artificial Intelligence (AI) spring is here. Or, so it seems over the last three years. AI has the potential to impact many areas of human life - personal, social, health, education, professional.

Read more

Why It Matters

Why It Matters

Why It Matters

Current practices in the world of AI are AI-centric, where the user needs to adapt to the model. This paper highlights six tenets to start the shift into user-centric AI.

Current practices in the world of AI are AI-centric, where the user needs to adapt to the model. This paper highlights six tenets to start the shift into user-centric AI.

Current practices in the world of AI are AI-centric, where the user needs to adapt to the model. This paper highlights six tenets to start the shift into user-centric AI.

Key Findings

Key Findings

Key Findings

✓ User-centric AI is crucial for strategic decision-making, given that some areas are still inaccessible for LLMs,

✓ The paper underlines how the six proposed tenets can help with this, through the use of agents.

✓ User-centric AI is crucial for strategic decision-making, given that some areas are still inaccessible for LLMs,

✓ The paper underlines how the six proposed tenets can help with this, through the use of agents.

✓ User-centric AI is crucial for strategic decision-making, given that some areas are still inaccessible for LLMs,

✓ The paper underlines how the six proposed tenets can help with this, through the use of agents.

UI Design

Human-Computer Interaction

AI Era

Society

Human-Computer Interaction

AI-Era

Society

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

👥 Yijia Shao; Humishka Zope; Yucheng Jiang; Jiaxin Pei; David Nguyen; Erik Brynjolfsson; Diyi Yang

📅 June 6, 2025

⏱️ 45 mins

Abstract

Abstract

Abstract

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape.

Read more

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape.

Read more

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape.

Read more

Why It Matters

Why It Matters

Why It Matters

A first-of-its-kind large-scale audit of worker desires and AI agent capabilities across various occupational tasks. It moves beyond a simple automation dichotomy, introducing the Human Agency Scale (HAS) to quantify preferred human involvement. This research offers actionable insights for prioritizing AI agent development that aligns with human needs, revealing critical mismatches between current investments and areas with high potential for productivity and societal gains.

A first-of-its-kind large-scale audit of worker desires and AI agent capabilities across various occupational tasks. It moves beyond a simple automation dichotomy, introducing the Human Agency Scale (HAS) to quantify preferred human involvement. This research offers actionable insights for prioritizing AI agent development that aligns with human needs, revealing critical mismatches between current investments and areas with high potential for productivity and societal gains.

A first-of-its-kind large-scale audit of worker desires and AI agent capabilities across various occupational tasks. It moves beyond a simple automation dichotomy, introducing the Human Agency Scale (HAS) to quantify preferred human involvement. This research offers actionable insights for prioritizing AI agent development that aligns with human needs, revealing critical mismatches between current investments and areas with high potential for productivity and societal gains.

Key Findings

Key Findings

Key Findings

✓ A novel auditing framework and database, built on worker preferences and AI expert assessments,

✓ Identification of four task zones (Green Light, Red Light, R&D Opportunity, Low Priority) to guide AI development,

✓ Revelation of a disconnect between worker desires for automation and current LLM usage patterns,

✓ Insights into how AI agent integration may shift core human skills from information processing to interpersonal competence.

✓ A novel auditing framework and database, built on worker preferences and AI expert assessments,

✓ Identification of four task zones (Green Light, Red Light, R&D Opportunity, Low Priority) to guide AI development,

✓ Revelation of a disconnect between worker desires for automation and current LLM usage patterns,

✓ Insights into how AI agent integration may shift core human skills from information processing to interpersonal competence.

✓ A novel auditing framework and database, built on worker preferences and AI expert assessments,

✓ Identification of four task zones (Green Light, Red Light, R&D Opportunity, Low Priority) to guide AI development,

✓ Revelation of a disconnect between worker desires for automation and current LLM usage patterns,

✓ Insights into how AI agent integration may shift core human skills from information processing to interpersonal competence.

AI-Agents

Future

Human-Centred AI

Work Automation

Future

Human-Centred AI

Work Automation

Why Language Models Hallucinate

Why Language Models Hallucinate

Why Language Models Hallucinate

👥 Adam T. Kalai; Ofir Nachum; Santosh S. Vempala; Edwin Zhang

📅 September 4, 2025

⏱️ 20 mins

Abstract

Abstract

Abstract

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust.

Read more

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust.

Read more

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust.

Read more

Why It Matters

Why It Matters

Why It Matters

LLMs are useful for many real-world application, but their probabilistic nature makes them unreliable and even not interpretable at times, with hallucinations being one of such issues. OpenAI tries to investigate on the cause of hallucinations.

LLMs are useful for many real-world application, but their probabilistic nature makes them unreliable and even not interpretable at times, with hallucinations being one of such issues. OpenAI tries to investigate on the cause of hallucinations.

LLMs are useful for many real-world application, but their probabilistic nature makes them unreliable and even not interpretable at times, with hallucinations being one of such issues. OpenAI tries to investigate on the cause of hallucinations.

Key Findings

Key Findings

Key Findings

✓ One of the causes of hallucinations is the way these models are benchmarked, penalizing saying “I don’t know” when asked something compared to trying to guess the answer,

✓ During pre-training LLMs ingest vast amounts of text. Since the content is not labelled as “correct” or “incorrect” in this phase, LLMs do not learn to recognize false statements the way they can recognize, for example, spelling mistakes (which is something they can do quite well).

✓ One of the causes of hallucinations is the way these models are benchmarked, penalizing saying “I don’t know” when asked something compared to trying to guess the answer,

✓ During pre-training LLMs ingest vast amounts of text. Since the content is not labelled as “correct” or “incorrect” in this phase, LLMs do not learn to recognize false statements the way they can recognize, for example, spelling mistakes (which is something they can do quite well).

✓ One of the causes of hallucinations is the way these models are benchmarked, penalizing saying “I don’t know” when asked something compared to trying to guess the answer,

✓ During pre-training LLMs ingest vast amounts of text. Since the content is not labelled as “correct” or “incorrect” in this phase, LLMs do not learn to recognize false statements the way they can recognize, for example, spelling mistakes (which is something they can do quite well).

Hallucinations

LLMs Training

LLMs Training

Universal Deep Research: Bring Your Own Model and Strategy

Universal Deep Research: Bring Your Own Model and Strategy

Universal Deep Research: Bring Your Own Model and Strategy

👥 Peter Belcak; Pavlo Molchanov

📅 August 29, 2025

⏱️ 20 mins

Abstract

Abstract

Abstract

Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools.

Read more

Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools.

Read more

Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools.

Read more

Why It Matters

Why It Matters

Why It Matters

Universal Deep Research (UDR) is a flexible agentic system that overcomes the limitations of current deep research tools. Unlike existing tools with rigid strategies tied to specific language models, UDR allows users to create and customize research strategies without extra training. This enhances the efficiency and quality of research output, automating high-value workloads in industries like finance, legal, and healthcare.

Universal Deep Research (UDR) is a flexible agentic system that overcomes the limitations of current deep research tools. Unlike existing tools with rigid strategies tied to specific language models, UDR allows users to create and customize research strategies without extra training. This enhances the efficiency and quality of research output, automating high-value workloads in industries like finance, legal, and healthcare.

Universal Deep Research (UDR) is a flexible agentic system that overcomes the limitations of current deep research tools. Unlike existing tools with rigid strategies tied to specific language models, UDR allows users to create and customize research strategies without extra training. This enhances the efficiency and quality of research output, automating high-value workloads in industries like finance, legal, and healthcare.

Key Findings

Key Findings

Key Findings

✓ UDR proves that a flexible research tool can be built on any generative model, giving users agency by letting them "program" agentic behavior in natural language,

✓ The system separates control logic from model reasoning, which reduces GPU usage, latency, and cost,

✓ It improves reliability by converting natural language strategies into structured, executable code, ensuring coherent and interpretable results.

✓ UDR proves that a flexible research tool can be built on any generative model, giving users agency by letting them "program" agentic behavior in natural language,

✓ The system separates control logic from model reasoning, which reduces GPU usage, latency, and cost,

✓ It improves reliability by converting natural language strategies into structured, executable code, ensuring coherent and interpretable results.

✓ UDR proves that a flexible research tool can be built on any generative model, giving users agency by letting them "program" agentic behavior in natural language,

✓ The system separates control logic from model reasoning, which reduces GPU usage, latency, and cost,

✓ It improves reliability by converting natural language strategies into structured, executable code, ensuring coherent and interpretable results.

Deep Research

Agentic Systems

LLMs

Automation

Agentic Systems

LLMs

Automation

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

👥 Yuxian Gu; Qinghao Hu; Shang Yang; Haocheng Xi; Junyu Chen; Song Han; Han Cai

📅 September 8, 2025

⏱️ 35 mins

Abstract

Abstract

Abstract

We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput.

Read more

We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput.

Read more

We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput.

Read more

Why It Matters

Why It Matters

Why It Matters

This paper introduces Jet-Nemotron, a new family of hybrid-architecture LMs combining high accuracy with exceptional efficiency for real-world applications. These models achieve state-of-the-art accuracy with substantially higher generation throughput. This efficiency gain significantly reduces operational costs and improves service responsiveness, making powerful LLMs more practical and accessible.

This paper introduces Jet-Nemotron, a new family of hybrid-architecture LMs combining high accuracy with exceptional efficiency for real-world applications. These models achieve state-of-the-art accuracy with substantially higher generation throughput. This efficiency gain significantly reduces operational costs and improves service responsiveness, making powerful LLMs more practical and accessible.

This paper introduces Jet-Nemotron, a new family of hybrid-architecture LMs combining high accuracy with exceptional efficiency for real-world applications. These models achieve state-of-the-art accuracy with substantially higher generation throughput. This efficiency gain significantly reduces operational costs and improves service responsiveness, making powerful LLMs more practical and accessible.

Key Findings

Key Findings

Key Findings

✓ The newly proposed attention mechanisms outperforms priors in accuracy on tasks like math reasoning and retrieval while maintaining similar efficiency,

✓ The cache size is a critical factor for long-context and long-generation throughput, and optimizing it can lead to significant improvements in efficiency.

✓ The newly proposed attention mechanisms outperforms priors in accuracy on tasks like math reasoning and retrieval while maintaining similar efficiency,

✓ The cache size is a critical factor for long-context and long-generation throughput, and optimizing it can lead to significant improvements in efficiency.

✓ The newly proposed attention mechanisms outperforms priors in accuracy on tasks like math reasoning and retrieval while maintaining similar efficiency,

✓ The cache size is a critical factor for long-context and long-generation throughput, and optimizing it can lead to significant improvements in efficiency.

LLMs

Neural Architecture Search

Model Efficiency

Hybrid Models

Neural Architecture Search

Model Efficiency

Hybrid Models

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

👥 Joel Becker; Nate Rush; Elizabeth Barnes; David Rein

📅 July 25, 2025

⏱️ 60 mins

Abstract

Abstract

Abstract

Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers.

Read more

Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers.

Read more

Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers.

Read more

Why It Matters

Why It Matters

Why It Matters

This paper challenges the assumption that AI tools enhance developer productivity. It provides real-world evidence that, contrary to common belief, AI tooling can actually slow down task completion time. The findings highlight a disconnect between perceived and actual AI utility, suggesting that we need a more nuanced understanding of AI's impact in practical settings beyond synthetic benchmarks.

This paper challenges the assumption that AI tools enhance developer productivity. It provides real-world evidence that, contrary to common belief, AI tooling can actually slow down task completion time. The findings highlight a disconnect between perceived and actual AI utility, suggesting that we need a more nuanced understanding of AI's impact in practical settings beyond synthetic benchmarks.

This paper challenges the assumption that AI tools enhance developer productivity. It provides real-world evidence that, contrary to common belief, AI tooling can actually slow down task completion time. The findings highlight a disconnect between perceived and actual AI utility, suggesting that we need a more nuanced understanding of AI's impact in practical settings beyond synthetic benchmarks.

Key Findings

Key Findings

Key Findings

AI Tools Slowed Developers Down: Experienced developers using early-2025 AI tools took 19% longer on average to complete tasks, showing that the tooling hindered their performance,

Overestimated Impact: Both developers and AI experts significantly overestimated the AI's helpfulness, incorrectly predicting it would speed them up by 24% and 39%, respectively,

Reasons for Slowdown: The study suggests this was caused by over-optimism, high developer familiarity with the code, repository complexity, and low AI reliability.

AI Tools Slowed Developers Down: Experienced developers using early-2025 AI tools took 19% longer on average to complete tasks, showing that the tooling hindered their performance,

Overestimated Impact: Both developers and AI experts significantly overestimated the AI's helpfulness, incorrectly predicting it would speed them up by 24% and 39%, respectively,

Reasons for Slowdown: The study suggests this was caused by over-optimism, high developer familiarity with the code, repository complexity, and low AI reliability.

AI Tools Slowed Developers Down: Experienced developers using early-2025 AI tools took 19% longer on average to complete tasks, showing that the tooling hindered their performance,

Overestimated Impact: Both developers and AI experts significantly overestimated the AI's helpfulness, incorrectly predicting it would speed them up by 24% and 39%, respectively,

Reasons for Slowdown: The study suggests this was caused by over-optimism, high developer familiarity with the code, repository complexity, and low AI reliability.

AI Productivity

Software Development

Dev Tools

Agents

Software Development

Dev Tools

Agents

The Hidden Costs of AI: A Review of Energy, E-Waste, and Inequality in Model Development

The Hidden Costs of AI: A Review of Energy, E-Waste, and Inequality in Model Development

The Hidden Costs of AI: A Review of Energy, E-Waste, and Inequality in Model Development

👥 Jenis Winsta

📅 July 13, 2025

⏱️ 15 mins

Abstract

Abstract

Abstract

Artificial intelligence (AI) has made remarkable progress in recent years, yet its rapid expansion brings overlooked environmental and ethical challenges. This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems.

Read more

Artificial intelligence (AI) has made remarkable progress in recent years, yet its rapid expansion brings overlooked environmental and ethical challenges. This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems.

Read more

Artificial intelligence (AI) has made remarkable progress in recent years, yet its rapid expansion brings overlooked environmental and ethical challenges. This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems.

Read more

Why It Matters

Why It Matters

Why It Matters

This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems, highlighting systemic issues such as high emissions from model training, rising hardware turnover, global infrastructure disparities, and the energy demands of securing AI.

This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems, highlighting systemic issues such as high emissions from model training, rising hardware turnover, global infrastructure disparities, and the energy demands of securing AI.

This review explores four critical areas where AI's impact extends beyond performance: energy consumption, electronic waste (e-waste), inequality in compute access, and the hidden energy burden of cybersecurity systems, highlighting systemic issues such as high emissions from model training, rising hardware turnover, global infrastructure disparities, and the energy demands of securing AI.

Key Findings

Key Findings

Key Findings

✓ Training large models today can emit hundreds of tons of CO2, while the hardware used accelerates e-waste generation,

✓ Access to the compute resources needed to build frontier models remains concentrated in a handful of institutions and nations about fairness and inclusion,

✓ Without meaningful reforms, the gap between AI’s creators and the communities affected by it will continue to widen.

✓ Training large models today can emit hundreds of tons of CO2, while the hardware used accelerates e-waste generation,

✓ Access to the compute resources needed to build frontier models remains concentrated in a handful of institutions and nations about fairness and inclusion,

✓ Without meaningful reforms, the gap between AI’s creators and the communities affected by it will continue to widen.

✓ Training large models today can emit hundreds of tons of CO2, while the hardware used accelerates e-waste generation,

✓ Access to the compute resources needed to build frontier models remains concentrated in a handful of institutions and nations about fairness and inclusion,

✓ Without meaningful reforms, the gap between AI’s creators and the communities affected by it will continue to widen.

Environment

Green AI

Sustainability

Future

Green AI

Sustainability

Future

Thought Anchors: Which LLM Reasoning Steps Matter?

Thought Anchors: Which LLM Reasoning Steps Matter?

Thought Anchors: Which LLM Reasoning Steps Matter?

👥 Paul C. Bogdan; Uzay Macar; Neel Nanda; Arthur Conmy

📅 August 5, 2025

⏱️ 45 mins

Abstract

Abstract

Abstract

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose.

Read more

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose.

Read more

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose.

Read more

Why It Matters

Why It Matters

Why It Matters

CoT (chain-of-thought) has improved LLMs performance in various complex tasks. This paper analyzes how reasoning LLMs work at a sentence level.

CoT (chain-of-thought) has improved LLMs performance in various complex tasks. This paper analyzes how reasoning LLMs work at a sentence level.

CoT (chain-of-thought) has improved LLMs performance in various complex tasks. This paper analyzes how reasoning LLMs work at a sentence level.

Key Findings

Key Findings

Key Findings

✓ Some sentences in the reasoning traces have more weight than others when crafting the response,

✓ These are called “thought anchors”, effectively being critical reasoning steps that guide the rest of the reasoning trace.

✓ Some sentences in the reasoning traces have more weight than others when crafting the response,

✓ These are called “thought anchors”, effectively being critical reasoning steps that guide the rest of the reasoning trace.

✓ Some sentences in the reasoning traces have more weight than others when crafting the response,

✓ These are called “thought anchors”, effectively being critical reasoning steps that guide the rest of the reasoning trace.

Chain-of-Thought

Visual Representation

Education Tools

Visual Representation

Education Tools