I am a second-year Ph.D. student in the School of Interactive Computing at Georgia Tech, advised by
Professor Humphrey Shi.
I completed my Bachelor's in Computer Science and Engineering in 2023 at IIT Roorkee.
In the past, I have interned at Microsoft Research, Redmond (Summer 2024, with
Dr. Jianwei Yang) and Picsart AI Research (Summer 2021-22, with Dr. Humphrey Shi).
My current research interests revolve around multimodal systems.
Presently, I am looking at developing Agent Models leveraging LLMs and principles from cognitive neuroscience.
I am also interested in representation learning, efficiency, and various real-world applications of multimodal systems.
My recent works are based on analyzing and improving the visual perception ability in Multimodal Large Language Models
[OLA-VLM,
VCoder],
building upon my experience from working on developing models for dense prediction tasks
[OneFormer,
SeMask].
Reach out if you are interested in my research or would like to discuss any ideas. If you are a self-motivated researcher who's
looking for guidance on one of your projects, feel free to drop me an email with a brief description about your
(manifested) research project.
I am seeking internship opportunities starting in Summer 2025. If you have any openings, please reach out to me!
Professional Life Happenings
- [December 2024]: Checkout out OLA-VLM, result of my internship at Microsoft Research, Redmond! π
- [May 2024]: Excited to start Summer Internship at Microsoft Research, Redmond! π§βπ»
- [February 2024]: VCoder is accepted to CVPR 2024! See you in Seattle! π₯
- [August 2023]: SeMask is accepted to NIVT Workshop at ICCV 2023! π₯
- [July 2023]: Graduated from IIT Roorkee with a Bachelor's in Computer Science and Engineering! π
- [June 2023]: I will be joining Georgia Tech as a Ph.D. student in Computer Science in Fall 2023! π₯
- [February 2023]: OneFormer is accepted to CVPR 2023! π₯