Harshita Narnoli
Hi there! I'm currently a graduate student at the University of Arizona (Tucson) , working with Professor Mihai Surdeanu on research involving large language models and machine learning. My research focuses on protecting text-to-image models from adversarial attacks, specifically the "Divide-and-Conquer Attack" (DACA). By using text summarization, our method significantly improves the detection and moderation of obfuscated prompts.
I earned my Master's degree in May 2024, and during this time, I also actively collaborated with Professor Joshua Levine at The Humans, Data, and Computers Lab .
Before this, I earned a Bachelor's degree in Computer Science and Engineering through MAKAUT in 2017. I began my career as a web developer, focusing on Java and the Spring MVC framework. During this time, I led a team on NRI Fintech project targeting the Shanghai and Shenzhen stock exchanges. Afterward, I moved to a Senior Software Developer role at Alien Brains, where I worked with Ruby on Rails, managed frontend and backend teams, and served as a Scrum Master.
Publications
-
The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification
(Read Abstract)
In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do not truly understand how ICL works. In this paper, we compare the behavior of in-context learning with supervised classifiers trained on ICL demonstrations to investigate three research questions: (1) Do LLMs with ICL behave similarly to classifiers trained on the same examples? (2) If so, which classifiers are closer, those based on gradient descent (GD) or those based on k-nearest neighbors (kNN)? (3) When they do not behave similarly, what conditions are associated with differences in behavior? Using text classification as a use case, with six datasets and three LLMs, we observe that LLMs behave similarly to these classifiers when the relevance of demonstrations is high. On average, ICL is closer to kNN than logistic regression, giving empirical evidence that the attention mechanism behaves more similarly to kNN than GD. However, when demonstration relevance is low, LLMs perform better than these classifiers, likely because LLMs can back off to their parametric memory, a luxury these classifiers do not have.
-
(Read Abstract)
Text-to-image models are vulnerable to the stepwise “Divide-and-Conquer Attack” (DACA) that utilize a large language model to obfuscate inappropriate content in prompts by wrapping sensitive text in a benign narrative. To mitigate stepwise DACA attacks, we propose a two-layer method involving text summarization followed by binary classification. We assembled the Adversarial Text-toImage Prompt (ATTIP) dataset (N = 940), which contained DACA-obfuscated and nonobfuscated prompts. From the ATTIP dataset, we created two summarized versions: one generated by a small encoder model and the other by a large language model. Then, we used an encoder classifier and a GPT-4o classifier to perform content moderation on the summarized and unsummarized prompts. When compared with a classifier that operated over the unsummarized data, our method improved F1 score performance by 31%. Further, the highest recorded F1 score achieved (98%) was produced by the encoder classifier on a summarized ATTIP variant. This study indicates that pre-classification text summarization can inoculate content detection models against stepwise DACA obfuscations.
-
Activity-Aware Data Rate Tuning in Wireless Body Area Networks
(Read Abstract)
This work proposes an Activity-Aware Data Rate Tuning (A2D) scheme for Wireless Body Area Network (WBAN), while considering the criticality of the physiological sensed data. We consider different physical activities of the patients and thereafter, compute their health criticality. Further, on the basis of the health criticality value, the data rate of these physiological sensors are tuned. Depending on the physical activity of a patient, the value sensed by the physiological sensors may change. Consequently, when a healthy person runs, a particular sensor value may be significantly high, even if it is normal, however, the same data reading may be critical for a person who is sitting or standing. Thus, a WBAN is required to be activity-aware in order to measure the correct criticality values. We implemented in a real hardware platform system to show the effectiveness of the proposed scheme. Experimental results show that the proposed scheme is capable of tuning the data rate of different physiological sensors, based on human activity and critical conditions, while ensuring more than 90% of packet delivery ratio in intra-BAN communication and 93% in inter-BAN communication.
-
(Read Abstract)
The aim of this work is to develop a graph theoretical computer vision framework to partition shape of an image object into parts based on a heuristic approach such that the partitioning remains consistent with human perception. The proposed framework employs a special polygonal approximation scheme to represent a shape suitably in simpler graph form where each polygonal side represents a graph-edge. The shape-representative graph is explored to determine vertex-visibility graph by a simple algorithm presented in this paper. We have proposed a heuristic based iterative clique extraction strategy to decompose the shape-representative graph depending on its vertex-visibility graph. This proposed framework considers MPEG-7 shape data set for probing the acceptability of the proposed framework and according to our observation, the performance of the framework is comparable with existing schemes.
Academics
2024
-
August 2022 - May 2024
2017
Experiences
2022-2024
-
May 2023 - Jan 2024• I worked on generation of Isosurfaces from Implicit Neural Representations (INRs).
• Collaborated with Josh Levine on research involving visualization and machine learning, to focus on the construction of Implicit Neural Representations (INRs) and data analysis utilizing latent embeddings.
-
College of Science, The University of Arizona, TucsonSpring '23 - CSC 473 Automata, Grammars, and Languages (Instructor: Eric Anson)
Fall '22 - CSC 346 Cloud Computing (Instructor: Mark Fischer)
2020-2022
-
Jan 2020 - July 2022Served as a backend developer, utilizing Ruby on Rails technology, while also taking on the role of a Scrum Master. Assigned tasks to both frontend and backend teams in alignment with the design and client's expectations.
2017-2019
-
Aug 2017 - Dec 2019Engaged in projects utilizing Spring MVC and JAVA technologies, directly interfaced with clients, crafted models, and managed production support.
2016
-
Dec 2015 - Jan 2016Contributed to the development of an interactive web page that displayed real-time data from various sensors on the human body using an Arduino.
Volunteering
2024
-
Fall 2023 - PresentIn my role, I assist students by explaining ideas, guiding them through projects, and giving presentions in our meetings. I focus on making complex concepts easy to understand, creating a supportive learning environment. I have been assigned to assist in the development of a website and other coding problems in Python.
Academic Projects
2023
-
A PyQt5-based GUI serves as an integrated tool for visualizing the Theory of Mind-based Cognitive Architecture for Teams (ToMCAT) dataset, part of a collaborative project aiming to create a comprehensive multimodal visualization tool. Accepting experimental data in CSV and PNG formats, it facilitates the analysis of various data types (fNIRS, eye tracking, EEG), providing insights into cognitive processes during tasks, aiding researchers in understanding team interactions across diverse environments.






