Main picture

Ph.D. Candidate

Department of Statistics
University of California, Los Angeles
Advisor: Prof. Ying Nian Wu and Song-chun Zhu

Email: xudehong1996@ucla.edu

Google ScholarLinkedInGitHub

Bio

I am a final-year Ph.D. student in the Department of Statistics at UCLA, advised by Prof. Ying Nian Wu and Song-chun Zhu. I was a member of the Center for Vision, Cognition, Learning, and Autonomy (VCLA). Previously, I conducted research at Amazon Rufus team and Amazon AGI team.

My research explores the intersections of language modeling, representation learning, and decision-making.

🌟 Actively seeking full-time Research Scientist position for 2025. 🌟

 

News

 

Selected Publications

* denotes equal contribution.

A Minimalistic Representation Model for Head Direction System
, , ,
NeurIPS Workshop on Symmetry and Geometry in Neural Representations
We present a model for the head direction (HD) system that captures essential HD cell properties through a high-dimensional U(1) representation. This model reveals Gaussian-like tuning and 2D circular geometry, accurately supporting path integration in both fully connected and convolutional forms.

Latent Plan Transformer: Planning as Latent Variable Inference
, , , , , , , ,
NeurIPS 2024
Decision-making via sequence modeling can be viewed as return-conditioned autoregressive behavior cloning. Unaware of their own future behaviors, such models were thought to be susceptible to drifting errors. Decision Transformer alleviates this issue by additionally predicting the return-to-go labels. We propose an unsupervised solution, where a latent variable is first inferred from a target return and then guides the policy throughout the episode, functioning as a plan. Our model discovers improved decisions from suboptimal trajectories.

Aligning Large Language Models via Fine-grained Supervision
, , , , ,
ACL 2024
We propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained token-level Proximal Policy Optimization (PPO) model.

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
,
ICML 2023
In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses.

 

Experience

Applied Scientist Intern
Amazon Inc. - Search M5 Team, 2024.06 - 2024.09
Improving Instruction-following Capability of Multi-modal Embedding Models (In submission to CVPR 2025)
  • Developed a multi-modal, decoder-only framework for learning representations with instruction-following capabilities.
  • Designed and implemented a two-stage training approach: a pre-training phase for modality alignment, followed by instruction fine-tuning.
  • Our method achieved SoTA performance on multi-modal information retrieval benchmarks.
Applied Scientist Intern
Amazon Inc. - Alexa AGI Team & Rufus Team, 2023.06 - 2023.10
Aligning Large Language Models via Fine-grained Supervision and Token-level RLHF (Paper published in ACL 2024)
  • Developed a fine-grained data collection method for reward training via minimal editing, which pinpoints the exact output segments that affect user choices.
  • Proposed token-level RLHF by training a token-level reward model with fine-grained supervision and incorporated it into PPO training.
  • Our method outperformed LLaMA2-chat-7B and achieved the SoTA performance on AlpacaFarm.

 

Professional Service

 

Teaching

Â