Mansi Sakarvadia

Computer Science Ph.D Student

prof_pic.jpg

Hello! I am a third-year Department of Energy Computational Science Graduate Fellow and a Computer Science Ph.D. student at the University of Chicago, where I am co-advised by Ian Foster and Kyle Chard.

I develop machine learning interpretability methods. My research aims to systematically reverse engineer neural networks to interpret their weights. For example, much of my work focuses on localizing sources of model failure within weight-space and developing efficient methods to correct model behavior.

Prior to my Ph.D., I completed my Bachelors in Computer Science and Mathematics with a minor in Environmental Science at the University of North Carolina, Chapel Hill.

news

Oct 18, 2024 Presented my poster “Mitigating Memorization in Language Models” at the University of Chicago Communication & Intelligene Symposium.
Oct 3, 2024 Excited to announce that our work on detoxifying LM outputs, “Mind Your Manners: Detoxifying Language Models via Attention Head Intervention”, was accepted to BlackboxNLP 2024 as an extended abstract.
Sep 1, 2024 Had a great time at Lawrence Berekley National Laboratory’s ML and Analytics group this summer working on developing methods to Mitigate Memorization in LMs.
Jul 16, 2024 Presented my poster “Mitigating Memorization in Language Models” at the CSGF Program Review in Washington, DC.
Apr 5, 2024 Excited to announce that I passed my qualifier exams/Master’s defense! Check out a recording of my talk here.

selected publications

  1. Preprint
    Mitigating Memorization In Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 6 more authors
    2024
  2. Preprint
    SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques
    Arham Khan, Todd Nief, Nathaniel Hudson, and 6 more authors
    2024
  3. BlackboxNLP
    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, and 5 more authors
    2023
    Work accepted to BlackBoxNLP 2023.