Jitesh Jain
  • Home
  • Publications
  • Blogs
  • Books
    • what's a good metric for PhD Students?
    • Ascending the Research Trail
    • Summer Diaries: Intern Diary of an Undergrad DL Researcher
    • The Contemporary Overthinking Problem
    • Riding the Noisy Research Track
    • Do self-help books really help?
    • The Priority Hierarchy Dilemma
    • Why Research if I can Develop?!
    • One Year of Traversing the “eˣ” Graph
    • The Curse of Dimensionality
    • Societyfy, An app made for finding company at anytime for anything
    • memory_wall_iitr
    • OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation
    • VCoder: Versatile Vision Encoders for Multimodal Large Language Models
    • OneFormer: One Transformer to Rule Universal Image Segmentation
    • Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand
    • SeMask: Semantically Masked Transformers for Semantic Segmentation
  • Projects
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python
  • Projects
    • Neural Style Transfer: A Technical Report
    • AOT-GAN Experiments
    • OLIE
    • granim
    • BackBone Profile
    • ArxivApp
    • IITR ChatBot
    • IITR Security App
    • Paper Summaries
    • ChatBot with Pytorch
    • VQ-VAE on MNIST
    • Time Tracer
    • Societyfy
    • Geofile

OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Dec 2024·
Jitesh Jain
Jitesh Jain
,
Zhengyuan Yang
,
Humphrey Shi
,
Jianfeng Gao
,
Jianwei Yang
· 0 min read
Go to Project Site Preprint PDF Cite Code Project
Abstract
TBD
Type
Preprint
Publication
Under Review
Last updated on Dec 2024
Under Review
Jitesh Jain
Authors
Jitesh Jain
Ph.D. Student

VCoder: Versatile Vision Encoders for Multimodal Large Language Models Dec 2023 →

© {2024} Jitesh Jain

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.