Mansi Sakarvadia

Computer Science Ph.D Student

prof_pic.jpg

Hello! I am a third-year Department of Energy Computational Science Graduate Fellow and a Computer Science Ph.D. student at the University of Chicago, where I am co-advised by Ian Foster and Kyle Chard.

I develop machine learning interpretability methods. My research aims to systematically reverse engineer neural networks to interpret their weights. For example, much of my work focuses on localizing sources of model failure within weight-space and developing efficient methods to correct model behavior.

Prior to my Ph.D., I completed my Bachelors in Computer Science and Mathematics with a minor in Environmental Science at the University of North Carolina, Chapel Hill.

news

Jan 15, 2025 I was interviewed by the Department of Energy Science in Parallel podcast about the recent Nobel prizes in Physics and Chemistry and their implications for ML and the domain scienes.
Dec 6, 2024 Was honored to have given a talk on my recent work on Mitigating Memorization in Language Models at the UChicago/TTIC NLP seminar!
Nov 21, 2024 Congrats to my summer student, Jordan Pettyjohn, for winning 1st place in the ACM Student Research Competition at Super Computing 2024 for his work on detoxifying LM outputs at scale, “Mind Your Manners: Detoxifying Language Models via Attention Head Intervention”!
Oct 18, 2024 Presented my poster Mitigating Memorization in Language Models at the University of Chicago Communication & Intelligene Symposium.
Oct 3, 2024 Excited to announce that our work on detoxifying LM outputs, Mind Your Manners: Detoxifying Language Models via Attention Head Intervention, was accepted to BlackboxNLP 2024 as an extended abstract.

selected publications

  1. Preprint
    Mitigating Memorization In Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 6 more authors
    2024
  2. Preprint
    SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
    Arham Khan, Todd Nief, Nathaniel Hudson, and 6 more authors
    2024
  3. BlackboxNLP
    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 5 more authors
    2023
    Work accepted to BlackBoxNLP 2023.