Mansi Sakarvadia

399 John Crerar Library

5730 S Ellis Ave

Chicago, IL 60637

Hello! I am a second-year Department of Energy Computational Science Graduate Fellow and a Computer Science Ph.D. student at the University of Chicago. I am a member of Globus Labs where I am co-advised by Ian Foster and Kyle Chard. I completed my Bachelors in Computer Science and Mathematics with a minor in Environmental Science at the University of North Carolina, Chapel Hill and previously interned at Argonne National Laboratory.

Currently, I am very interested in Machine Learning Interpretability. My research aims to systematically reverse engineer neural networks to interpret their weights. Specifically, I love to investigate how neural networks are able to do things like:

Factual recall
Multi-hop and common sense reasoning
Question answering
Knowledge retrieval
Catastrophic forgetting
In-context learning
Anomalous behavior
(and much more!)

By understanding how neural networks implement these algorithms (above) in their weights, I hope to develop interventions to better align AI systems with human goals. Some examples of this are:

Editing/correcting learned concepts/associations
Localizing/mitigating bias
Obscuring/unlearning sensitive information
De-parameterizing over-parameterized models
Patching ML vulnerabilities (e.g. backdoors)
Developing more efficient/targeted learning strategies
(and the list goes on!)

news

Apr 5, 2024	Excited to announce that I passed my qualifier exams/Master’s defense! Check out a recording of my talk here.
Feb 12, 2024	Served as a reviewer for ICDCS 2024.
Oct 27, 2023	Excited to announce that Attention Lens was accepted to the Workshop on Attributing Model Behavior at Scale (ATTRIB) @ NeurIPS. Looking forward to presenting this work and learning about all the other exiting work at this venue!
Oct 8, 2023	Excited to announce that 2 of my works were accepted to BlackboxNLP this year. Memory Injections was accepted as a full paper and Attention Lens was accepted as an extended abstract. Looking forward to presenting my work!
Sep 28, 2023	Became an acting member of the Diversity, Equity, and Inclusion (DEI) committee for the UChicago CS Department for the 2023-24 academic year.

selected publications

BlackboxNLP
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 5 more authors

2023

Work accepted to BlackBoxNLP 2023.

Abs arXiv Bib

Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
@article{sakarvadia2023memory, title = {Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models}, author = {Sakarvadia, Mansi and Ajith, Aswathy and Khan, Arham and Grzenda, Daniel and Hudson, Nathaniel and Bauer, André and Chard, Kyle and Foster, Ian}, year = {2023}, note = {Work accepted to BlackBoxNLP 2023.}, }
ATTRIB
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

Mansi Sakarvadia, Arham Khan, Aswathy Ajith, and 5 more authors

2023

Accepted to Workshop on Attributing Model Behavior At Scale (ATTRIB) Workshop @ NeurIPS.

Abs arXiv Bib

Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Much recent work has attempted to decode the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks, including by reverse-engineering the role of linear layers. Yet little is known about the role of attention heads in producing the final token prediction. We propose the Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized and specific roles in language models.
@article{sakarvadia2023attention, title = {Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism}, author = {Sakarvadia, Mansi and Khan, Arham and Ajith, Aswathy and Grzenda, Daniel and Hudson, Nathaniel and Bauer, André and Chard, Kyle and Foster, Ian}, year = {2023}, note = {Accepted to Workshop on Attributing Model Behavior At Scale (ATTRIB) Workshop @ NeurIPS.}, }