Main picture

Ph.D. Candidate


University of California, Los Angeles
Advisor: Prof. Ying Nian Wu and Song-Chun Zhu

Email: xudehong1996@ucla.edu

Google ScholarLinkedInGitHub

Bio

I am a final-year Ph.D. student in the Machine Learning at UCLA, advised by Prof. Ying Nian Wu and Song-Chun Zhu. I was a member of the Center for Vision, Cognition, Learning, and Autonomy (VCLA). Previously, I conducted research at Amazon Rufus team and Amazon AGI team.

My research explores the intersections of language modeling, representation learning, and decision-making.

🌟 Actively seeking full-time Research Scientist/MLE position for 2025. 🌟

 

News

 

Selected Publications

* denotes equal contribution.

Scalable Language Models with Posterior Inference of Latent Thought Vectors
, , , , , , , , , ,
Available at Preprint
We introduce Latent-Thought Language Models (LTMs), a novel language model family that incorporates explicit latent thought vectors. LTMs leverage dual-rate optimization, rapidly updating local latent vectors while gradually refining global decoder parameters. This approach unlocks new scaling dimensions, achieving superior efficiency, perplexity, and zero-shot performance over traditional models. They also exhibit emergent few-shot reasoning, highlighting their potential for advanced language tasks.

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
, , , ,
ICLR 2025 [Oral Presentation (1.8%)]
This paper explores the conformal isometry hypothesis as a unifying explanation for the hexagonal firing patterns of grid cells. It posits that an animal’s 2D location is encoded as a high-dimensional neural vector lying on a 2D manifold, where local distances in physical space are preserved up to a scaling factor. We are the first paper to provide theortical proof that conformal isometry leads to the emergence of grid cell hexagonality. And we further conduct numerical experiments that such local distance preservation naturally produces the observed hexagonal grid.

Latent Plan Transformer: Planning as Latent Variable Inference
, , , , , , , ,
NeurIPS 2024
Decision-making via sequence modeling can be viewed as return-conditioned autoregressive behavior cloning. Unaware of their own future behaviors, such models were thought to be susceptible to drifting errors. Decision Transformer alleviates this issue by additionally predicting the return-to-go labels. We propose an unsupervised solution, where a latent variable is first inferred from a target return and then guides the policy throughout the episode, functioning as a plan. Our model discovers improved decisions from suboptimal trajectories.

Aligning Large Language Models via Fine-grained Supervision
, , , ,
ACL 2024
We propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained token-level Proximal Policy Optimization (PPO) model.

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
,
ICML 2023
In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses.

 

Experience

Applied Scientist Intern
Amazon Inc. - Search M5 Team, 2024.06 - 2024.09
Improving Instruction-following Capability of Multi-modal Embedding Models
(Preprint)
  • Developed a multi-modal, decoder-only framework for learning representations with instruction-following capabilities.
  • Designed and implemented a two-stage training approach: a pre-training phase for modality alignment, followed by instruction fine-tuning.
  • Our method achieved SoTA performance on multi-modal information retrieval benchmarks.
Applied Scientist Intern
Amazon Inc. - Alexa AGI Team & Rufus Team, 2023.06 - 2023.10
Aligning Large Language Models via Fine-grained Supervision and Token-level RLHF
(Paper published in ACL 2024)
  • Developed a fine-grained data collection method for reward training via minimal editing, which pinpoints the exact output segments that affect user choices.
  • Proposed token-level RLHF by training a token-level reward model with fine-grained supervision and incorporated it into PPO training.
  • Our method outperformed LLaMA2-chat-7B and achieved the SoTA performance on AlpacaFarm.

 

Professional Service

 

Teaching

Â