Hi, I am Zihan Gu (顾子涵), a Ph.D. candidate in Institute of Information Engineering, Chinese Academy of Sciences (IIE, CAS), working with Prof. Yue Hu and Hua Zhang. I received my Bachelor of Science degree in Mathematics and Applied Mathematics from Fudan University. My current research interests include interpretable AI, pre-training and post-training of multimodal models, and Diffusion LLM.

My research centers on the development of algorithms and theoretical foundations for interpretable attribution. I investigate hidden-layer representations throughout training and inference from a symbolic and structural perspective, and apply these insights to pre-training, post-training, and continual learning.

My primary objective is to enhance model capabilities through principled theoretical frameworks. This pursuit follows two complementary directions. First, I study the physics of AI: what classes of models give rise to what capabilities under which data regimes. This line of inquiry aims to provide general laws that inform training strategies. Second, I analyze the model’s decision-making pathways, including its dominant computational loops and the dependency structure between inputs and outputs. This perspective leads to effective forms of regularization grounded in interpretability.

I view many black-box behaviors—particularly the emergence and generalization of higher-order capabilities—as phenomena that are currently entangled but, in principle, decouplable. Achieving such decoupling is a key step toward the next generation of AI systems.

Beyond machine learning, I have a strong background in linguistics and classical Chinese poetry. A selection of my own poetry is available at https://huaiqi.site.

I am open to research collaborations. If these directions resonate with your interests, or if you would like to exchange ideas, please feel free to get in touch.

🔥 News

2026.04: 🎉🎉 Two papers are accepted by ACL 2026.
2026.02: 🎉🎉 One paper is accepted by CVPR 2026.
2026.01: 🎉🎉 One paper is accepted by ICLR 2026.

📝 Publications

CVPR 2026

PhaseWin Search Framework Enable Efficient Object-Level Interpretation

Zihan Gu, Ruoyu Chen, Junchi Zhang, Yue Hu, Hua Zhang, Xiaochun Cao

🌐 Project 🐙 Code 📄 arXiv

By conjecturing the decision function of visual models, a near-first-order black-box attribution algorithm is proposed and validated on attribution tasks of object detection and visual grounding.

ICLR 2026

Deconstructing Positional Information: From Attention Logits to Training Biases

Zihan Gu, Ruoyu Chen, Han Zhang, Hua Zhang, Yue Hu

🌐 Project 🐙 Code 📄 arXiv

By using the expression of position encoding applied to attention logits, we conjectured the inherent characteristic of RoPE during the training phase: the deposit-pattern, and designed experiments to verify it.

ACL 2026

Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification

Zihan Gu, Tianyi Zhang, Xinyan Zhang, Zhiyuan Wang, Han Zhang, Yuhao Wei, Jiacheng Lu, Tianyi Ma, Xingsheng Zhang, Hua Zhang, Yue Hu

We analyze single-edit stability in locate-then-edit models, show inherent geometric interference in least-squares updates, and introduce an uncertainty-based metric that exposes hidden instabilities beyond standard evaluations.

ACL 2026

Neo-Classic: A Benchmark for Evaluating Linguistic-Aesthetic Reasoning in Classical Chinese Poetry

Han Zhang, Zihan Gu (Equal Conribution), Zhiyuan Wang, Tianyi Ma, Jiacheng Lu, Xinyan Zhang, Yuhao Wei, Cheng Hua

🌐 Project 🐙 Code

Neo-Classic, a contamination-free dataset of 1406 modern classical Chinese poems, reveals LLMs rely heavily on memorization, suffering a 20-50% performance drop and struggling with aesthetic reasoning and global planning compared to human experts.