Research

The Language, Cognition, and Computation (LCC) Lab investigates research questions at the intersection of Psychology and AI. We use tools like large language models (LLMs) as “model organisms” to study human cognition; in turn, we apply theories and methods from Psychology to better understand how LLMs work under the hood.

A (non-exhaustive) sample of ongoing research programs is listed below.

Research areas: Mental state reasoning · Lexical ambiguity · Epistemology · Embodiment · Calibration and trust

Mental state reasoning in humans and language models

What can we learn by comparing the behavior of humans and language models on mental state reasoning tasks?

Humans regularly reason about the belief states of others. But how do we acquire this capacity? One hypothesis is that it emerges in part from exposure to language. We use LLMs trained on the distributional statistics of language to ask whether, when, and why sensitivity to belief states reliably emerges from this training protocol. We also investigate the developmental trajectory and mechanistic underpinnings of this behavior in LLMs to better understand the phenomenon in both LLMs and humans.

Selected papers:

  • Trott, S., Jones, C., Chang, T., Michaelov, J., & Bergen, B. (2023). Do Large Language Models know what humans know? Cognitive Science. [Link]
  • Jones, C. R., Trott, S., & Bergen, B. (2024). Comparing Humans and Large Language Models on EPITOME. Transactions of the Association for Computational Linguistics, 12, 803-819. [Link]

Representation and processing of lexical ambiguity

How do humans and language models represent the multiple meanings of ambiguous words?

Most words are ambiguous, i.e., they mean different things in different contexts. Ambiguity raises a challenge for comprehension, but it’s also an opportunity for research: it provides a window into the representations and mechanisms underlying the contextualization of meaning. In humans, our work focuses on the format of meaning representations (e.g., are they discrete or continuous?). In language models, we focus on the internal mechanisms responsible for disambiguation (e.g., specific attention heads).

Selected papers:

  • Rivière, P., & Trott, S. (2026). Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity. Transactions of the Association for Computational Linguistics. [Link] [Code and data]
  • Rivière, P. D., Beatty-Martínez, A. L., & Trott, S. (2025, April). Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 8322-8338). [Link] [Code] [HuggingFace dataset]
  • Trott, S., & Bergen, B. (2023). Word meaning is both categorical and continuous. Psychological Review, 130(5), 1239. [Link] [Code and data]
  • Trott, S., & Bergen, B. (2021, August). RAW-C: Relatedness of Ambiguous Words in Context (A New Lexical Resource for English). In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 7077-7087). [Link] [Dataset and code][HuggingFace dataset]

Epistemological foundations of “LLM-ology”

Can we develop a systematic, generalizable science of LLMs?

The use of LLMs in Cognitive Science research is relatively novel, as is the scientific study of LLMs themselves (“LLM-ology”). This raises a number of epistemological challenges:

  • Challenges of measurement: does the same test “mean the same thing” when applied to humans and LLMs, or does it exhibit differential construct validity? What is the functional scope of a particular mechanism?
  • Challenges of generalizability: what does research on a particular LLM tell us about LLMs more broadly? And in what sense are LLMs “model organisms” for research on human cognition?
  • Challenges of ontology: what “kind of thing” is an LLM in the first place and what conceptual frameworks are most useful for understanding these systems?

Selected papers:

  • Trott, S. Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research. In Mechanistic Interpretability Workshop at NeurIPS 2025. [Link] [Code and data]
  • Trott, S. (2024). Large language models and the wisdom of small crowds. Open Mind, 8, 723-738. [Link] [Code and data]
  • Trott, S., Jones, C., Chang, T., Michaelov, J., & Bergen, B. (2023). Do Large Language Models know what humans know? Cognitive Science. [Link]

Embodiment and conceptual knowledge

How is conceptual knowledge shaped by sensorimotor and linguistic experience?

Humans have bodies and move through space. What role do these perceptual and motor experiences play in shaping our conceptual representations? And how do these experiences interact with our knowledge of the symbolic, categorical nature of human language? We use neural networks trained on various input modalities (e.g., vision models, language models, and vision-language models) to ask which best accounts for human conceptual knowledge.

Selected papers:

  • Rivière, P. D., Parkinson-Coombs, O., Jones, C., & Trott, S. (2025, July). Does Language Stabilize Quantity Representations in Vision Transformers? In (Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 47, No. 47)*. [Link]
  • Jones, C. R., Bergen, B., & Trott, S. (2024). Do multimodal large language models and humans ground language similarly?. Computational Linguistics, 50(4), 1415-1440. [Link]
  • Trott, S., & Bergen, B. (2022). Contextualized sensorimotor norms: Multi-dimensional measures of sensorimotor strength for ambiguous English words, in context. [Link]

Calibration and trust

What kinds of mental models do individuals have about the capabilities and limitations of AI tools, and how we can encourage appropriate calibration?

AI tools like LLMs display seemingly remarkable abilities—but also, in some cases, surprising and inexplicable failures. This makes calibrating our expectations of what these systems can and can’t do very difficult. We study potential use cases of LLMs (e.g., in scientific research) as well as the metaphors and mental models individuals form from observing the behavior of LLMs.

Selected papers:

  • Trott, S. (2024). Can large language models help augment English psycholinguistic datasets?. Behavior Research Methods, 56(6), 6082-6100. [Link] [Code and data]