Mansi Sakarvadia

Computer Science Ph.D Student

prof_pic.jpg

399 John Crerar Library

5730 S Ellis Ave

Chicago, IL 60637

Hello! I am a second-year Department of Energy Computational Science Graduate Fellow and a Computer Science Ph.D. student at the University of Chicago. I am a member of Globus Labs where I am co-advised by Ian Foster and Kyle Chard. I completed my Bachelors in Computer Science and Mathematics with a minor in Environmental Science at the University of North Carolina, Chapel Hill and previously interned at Argonne National Laboratory.

Currently, I am very interested in Machine Learning Interpretability. My research aims to systematically reverse engineer neural networks to interpret their weights. Specifically, I love to investigate how neural networks are able to do things like:

  • Factual recall
  • Multi-hop and common sense reasoning
  • Question answering
  • Knowledge retrieval
  • Catastrophic forgetting
  • In-context learning
  • Anomalous behavior
  • (and much more!)

By understanding how neural networks implement these algorithms (above) in their weights, I hope to develop interventions to better align AI systems with human goals. Some examples of this are:

  • Editing/correcting learned concepts/associations
  • Localizing/mitigating bias
  • Obscuring/unlearning sensitive information
  • De-parameterizing over-parameterized models
  • Patching ML vulnerabilities (e.g. backdoors)
  • Developing more efficient/targeted learning strategies
  • (and the list goes on!)

news

Apr 5, 2024 Excited to announce that I passed my qualifier exams/Master’s defense! Check out a recording of my talk here.
Feb 12, 2024 Served as a reviewer for ICDCS 2024.
Oct 27, 2023 Excited to announce that Attention Lens was accepted to the Workshop on Attributing Model Behavior at Scale (ATTRIB) @ NeurIPS. Looking forward to presenting this work and learning about all the other exiting work at this venue!
Oct 8, 2023 Excited to announce that 2 of my works were accepted to BlackboxNLP this year. Memory Injections was accepted as a full paper and Attention Lens was accepted as an extended abstract. Looking forward to presenting my work!
Sep 28, 2023 Became an acting member of the Diversity, Equity, and Inclusion (DEI) committee for the UChicago CS Department for the 2023-24 academic year.

selected publications

  1. BlackboxNLP
    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 5 more authors
    2023
    Work accepted to BlackBoxNLP 2023.
  2. ATTRIB
    Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
    Mansi Sakarvadia, Arham Khan, Aswathy Ajith, and 5 more authors
    2023
    Accepted to Workshop on Attributing Model Behavior At Scale (ATTRIB) Workshop @ NeurIPS.