Hi, I am Zihan Gu (顾子涵), a Ph.D. candidate in Institute of Information Engineering, Chinese Academy of Sciences (IIE, CAS), working with Prof. Yue Hu and Hua Zhang. I received my Bachelor of Science degree in Mathematics and Applied Mathematics from Fudan University. My current research interests include interpretable AI, pre-training and post-training of multimodal models, and Diffusion LLM.
My research centers on the development of algorithms and theoretical foundations for interpretable attribution. I investigate hidden-layer representations throughout training and inference from a symbolic and structural perspective, and apply these insights to pre-training, post-training, and continual learning.
My primary objective is to enhance model capabilities through principled theoretical frameworks. This pursuit follows two complementary directions. First, I study the physics of AI: what classes of models give rise to what capabilities under which data regimes. This line of inquiry aims to provide general laws that inform training strategies. Second, I analyze the model’s decision-making pathways, including its dominant computational loops and the dependency structure between inputs and outputs. This perspective leads to effective forms of regularization grounded in interpretability.
I view many black-box behaviors—particularly the emergence and generalization of higher-order capabilities—as phenomena that are currently entangled but, in principle, decouplable. Achieving such decoupling is a key step toward the next generation of AI systems.
Beyond machine learning, I have a strong background in linguistics and classical Chinese poetry. A selection of my own poetry is available at https://huaiqi.site.
I am open to research collaborations. If these directions resonate with your interests, or if you would like to exchange ideas, please feel free to get in touch.
🔥 News
- 2026.04: 🎉🎉 Two papers are accepted by ACL 2026.
- 2026.02: 🎉🎉 One paper is accepted by CVPR 2026.
- 2026.01: 🎉🎉 One paper is accepted by ICLR 2026.
📝 Publications

PhaseWin Search Framework Enable Efficient Object-Level Interpretation
Zihan Gu, Ruoyu Chen, Junchi Zhang, Yue Hu, Hua Zhang, Xiaochun Cao
By conjecturing the decision function of visual models, a near-first-order black-box attribution algorithm is proposed and validated on attribution tasks of object detection and visual grounding.

Deconstructing Positional Information: From Attention Logits to Training Biases
Zihan Gu, Ruoyu Chen, Han Zhang, Hua Zhang, Yue Hu
By using the expression of position encoding applied to attention logits, we conjectured the inherent characteristic of RoPE during the training phase: the deposit-pattern, and designed experiments to verify it.

Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification
Zihan Gu, Tianyi Zhang, Xinyan Zhang, Zhiyuan Wang, Han Zhang, Yuhao Wei, Jiacheng Lu, Tianyi Ma, Xingsheng Zhang, Hua Zhang, Yue Hu
We analyze single-edit stability in locate-then-edit models, show inherent geometric interference in least-squares updates, and introduce an uncertainty-based metric that exposes hidden instabilities beyond standard evaluations.

Neo-Classic: A Benchmark for Evaluating Linguistic-Aesthetic Reasoning in Classical Chinese Poetry
Han Zhang, Zihan Gu (Equal Conribution), Zhiyuan Wang, Tianyi Ma, Jiacheng Lu, Xinyan Zhang, Yuhao Wei, Cheng Hua
Neo-Classic, a contamination-free dataset of 1406 modern classical Chinese poems, reveals LLMs rely heavily on memorization, suffering a 20-50% performance drop and struggling with aesthetic reasoning and global planning compared to human experts.