We present a model for the head
direction (HD) system that captures essential HD cell properties through a high-dimensional U(1)
representation. This model reveals Gaussian-like tuning and 2D circular geometry, accurately
supporting path integration in both fully connected and convolutional forms.
Cite A minimalistic representation model for head direction system
@article{zhao2024head,
title={A minimalistic representation model for head direction system},
author={Zhao, Minglu and Xu, Dehong and Kong, Deqian and Zhang, Wen-Hao and Wu, Ying Nian},
journal={NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations (NeurReps)},
year={2024}
}
Decision-making via sequence modeling can be viewed as return-conditioned autoregressive
behavior cloning. Unaware of their own future behaviors, such models were thought to be
susceptible to drifting errors. Decision Transformer alleviates this issue by additionally
predicting the return-to-go labels. We propose an unsupervised solution, where a latent variable
is first inferred from a target return and then guides the policy throughout the episode,
functioning as a plan. Our model discovers improved decisions from suboptimal trajectories.
We propose a method to enhance LLM alignment through fine-grained token-level supervision.
Specifically, we ask annotators to minimally edit less preferred responses within the standard
reward modeling dataset to make them more favorable, ensuring changes are made only where
necessary while retaining most of the original content. The refined dataset is used to train a
token-level reward model, which is then used for training our fine-grained token-level Proximal
Policy
Optimization (PPO) model.
In this paper, we present an end-to-end learning framework, termed Sequential Posterior
Inference (SPI), capable of selecting knowledge and generating dialogues by approximately
sampling from the posterior distribution. Unlike other methods, SPI does not require the
inference network or assume a simple geometry of the posterior distribution. This
straightforward and intuitive inference procedure of SPI directly queries the response
generation model, allowing for accurate knowledge selection and generation of faithful
responses.
Cite Latent Plan Transformer: Planning as Latent Variable Inference
@article{kong2024latent,
title={Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference},
author={Kong, Deqian and Xu, Dehong and Zhao, Minglu and Pang, Bo and Xie, Jianwen and Lizarraga, Andrew and Huang, Yuhao and Xie, Sirui and Wu, Ying Nian},
journal={Advances in Neural Information Processing Systems},
year={2024}
}
Cite Aligning Large Language Models via Fine-grained Supervision
@article{xu2024aligning,
title={Aligning Large Language Models via Fine-grained Supervision},
author={Xu, Dehong and Qiu, Liang and Kim, Minseok and Ladhak, Faisal and Do, Jaeyoung},
journal={arXiv preprint arXiv:2406.02756},
year={2024}
}
Cite Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior
Inference
@inproceedings{xu2023diverse,
title={Diverse and faithful knowledge-grounded dialogue generation via sequential posterior inference},
author={Xu, Yan and Kong, Deqian and Xu, Dehong and Ji, Ziwei and Pang, Bo and Fung, Pascale and Wu, Ying Nian},
booktitle={International Conference on Machine Learning},
pages={38518--38534},
year={2023},
organization={PMLR}
}
Improving Instruction-following Capability of Multi-modal
Embedding Models(In submission to CVPR 2025)
Developed a multi-modal, decoder-only framework for
learning representations with
instruction-following capabilities.
Designed and implemented a two-stage training
approach: a pre-training phase for modality
alignment, followed by instruction fine-tuning.
Our method achieved SoTA performance on multi-modal information retrieval benchmarks.
Applied Scientist Intern Amazon Inc. -
Alexa
AGI
Team &
Rufus Team, 2023.06 - 2023.10
Aligning Large Language Models via Fine-grained
Supervision
and Token-level RLHF(Paper published in ACL 2024)
Developed a fine-grained data collection method for
reward training via minimal editing, which
pinpoints the exact output segments that affect user choices.
Proposed token-level RLHF by training a token-level
reward
model with fine-grained supervision and
incorporated it into PPO training.
Our method outperformed LLaMA2-chat-7B and achieved the SoTA performance on AlpacaFarm.
Â
Professional Service
Conference Reviewer: NeurIPS, ICLR, ICML, IJCAI, AISTATS, ACM MM
Journal Reviewer: TMLR, IEEE TNNLS, IEEE TIP, Stat
Â
Teaching
STATS 100A Introduction to Probability
STATS 102C Introduction to Monte Carlo Methods
STATS 202C Monte Carlo Methods for Optimization
STATS 231A Pattern Recognition and Machine Learning