Tuo Zhao 赵拓

I am an Associate Professor of ISyE and CSE at Georgia Tech. I received my Ph.D. in Computer Science from Johns Hopkins University.

My research focuses on methodologies and theories of machine learning, especially deep learning. My recent efforts have been primarily dedicated to Large Language Models (LLMs), collaborating closely with Microsoft and Amazon.

I am working with talented alchemists in the FLASH (Foundations of LeArning Systems for alcHemy) group. If you are interested in joining my group, please see more information here.

Google ScholarTwitter知乎

Recent News (Check all the past news)
  • Apr. 2025: Alexander Bukharin has successfully defended his Ph.D. Dissertation: Robust and Flexible Reward Modeling for LLM Alignment. He will join NVIDIA as a research scientist.
  • Apr. 2025: Qingru Zhang has successfully defended his Ph.D. Dissertation: On the Efficiency and Steerability of Self-Attention Mechanism of LLMs. He will join Microsoft as a research scientist.
  • June. 2024: Yan Li has successfully defended his Ph.D. Dissertation: Theories and Algorithms for Efficient and Robust Sequential Decision Making. He will join Department of Industrial and Systems Engineering at Texas A&M University as a tenure-track assistant professor in 2024 Fall.
  • Feb. 2024: Minshuo Chen has accepted an offer of tenure-track assistant professor position from Department of Industrial Engineering and Management Sciences at Northwestern University. He will start in 2024 Fall.
  • Nov. 2023: Chen Liang has successfully defended her Ph.D. Dissertation: On Parameter Efficiency of Neural Language Models. She will join Microsoft as a senior research scientist.
  • Oct. 2023: Prof. Shihao Yang and I co-organized Georgia Statistics Day 2023.
  • Apr. 2023: Simiao Zuo has successfully defended his Ph.D. Dissertation: On Training, Inference and Sample Efficiency of Language Models. He will join Microsoft as a research scientist.
  • Mar. 2023: Qingru Zhang's recent collabrative work with Microsoft Azure AI on parameter efficient fine-tuning is available on Hugging Face now. See more information here.
  • Oct. 2022: One Ph.D. position is avaiable in my group. Prof. Yongsheng Chen in School of Civil and Environmental Engineering and I are recruiting a Ph.D. student to work on the interface of computational chmistry and machine learning. See more information here. Please contact us if you are interested and have a background in molecular dynamics simulation.
  • Sep. 2022: One Ph.D. position is avaiable in my group. Prof. Hua Wang at ETH Zurich and I are recruiting a PhD student to work on the interface of morden circuit design and machine learning. See more information here. Please contact me if you are interested and have a background in electromagnetics, especially EM simulation.
  • Sep. 2022: Two Ph.D. positions are avaiable in my group. See more information here. Please contact me if you are interested in deep learning theory or natural language processing.
  • Jul. 2022: Minshuo Chen has successfully defended his Ph.D. Dissertation: Representaiton and Statistical Properties of Deep Neural Networks for Structured Data. He will join Princeton Univesity as a postdoctral fellow.
  • Jul. 2022: Siawpeng Er has successfully defended his Ph.D. Dissertation: Deep Learning in Biomedical Informatics and Modern Circuit Design. He will join Home Depot as a Data Scientist.

  • Preprints and Working Papers (* indicates equal contributions, and ‡ indicates advisees)
    • Adversarial Training of Reward Models
      Alexander Bukharin‡, Haifeng Qian, Shengyang Sun, Adithya Renduchintala, Soumye Singhal, Zhilin Wang, Oleksii Kuchaiev, Olivier Delalleau and Tuo Zhao
      Preprint available on arXiv [Link]
    • IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
      Yixiao Li‡, Yixiao Li, Xianzhi Du, Ajay Jaiswal, Tao Lei, Tuo Zhao, Chong Wang and Jianyu Wang
      Preprint available on arXiv [Link]
    • LLMs can generate a better answer by aggregating their own responses
      Zichong Li‡, Xinyu Feng, Yuheng Cai‡, Zixuan Zhang‡, Tianyi Liu, Chen Liang, Weizhu Chen, Haoyu Wang and Tuo Zhao
      Preprint available on arXiv [Link]
    • A Minimalist Example of Edge-of-Stability and Progressive Sharpening
      Liming Liu‡, Zixuan Zhang‡, Simon Du and Tuo Zhao
      Preprint available on arXiv [Link]
    • COSMOS: A hybrid adaptive optimizer for memory-efficient training of LLMs
      Liming Liu‡, Zhenghao Xu‡, Zixuan Zhang‡, Hao Kang, Zichong Li‡, Chen Liang, Weizhu Chen and Tuo Zhao
      Preprint available on arXiv [Link]
    • RoseRAG: Robust retrieval-augmented generation with small-scale LLMs via margin-aware preference optimization
      Tianci Liu, Haoxiang Jiang, Tianze Wang, Ran Xu, Yue Yu, Linjun Zhang, Tuo Zhao and Haoyu Wang
      Preprint available on arXiv [Link]
    • Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
      Qingru Zhang‡, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth and Hao Cheng
      Preprint available on arXiv [Link]
    • GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLMs
      Hao Kang, Qingru Zhang‡, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tusha Krishna and Tuo Zhao
      Preprint available on arXiv [Link]
    • Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
      Yuqing Wang, Zhenghao Xu‡, Tuo Zhao and Molei Tao
      Preprint available on arXiv [Link]
    • Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
      Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao and Mengdi Wang
      Preprint available on arXiv [Link]
    • First-order Policy Optimization for Robust Markov Decision Process
      George Lan, Yan Li‡ and Tuo Zhao
      Preprint available on arXiv [Link]
    • DiP-GNN: Discriminative Pre-Training of Graph Neural Networks
      Simiao Zuo‡, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin and Tuo Zhao
      Preprint available on arXiv [Link]
    • Differentially Private Estimation of Hawkes Process
      Simiao Zuo‡, Tianyi Liu‡, Tuo Zhao and Hongyuan Zha
      Preprint available on arXiv [Link]
    • Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
      Yan Li‡, Caleb Ju, Ethan Fang and Tuo Zhao
      Preprint available on arXiv [Link]
    • Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
      Minshuo Chen‡*, Hao Liu*, Wenjing Liao and Tuo Zhao
      Preprint available on arXiv [Link]
    • Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation
      Minshuo Chen‡, Wenjing Liao, Hongyuan Zha and Tuo Zhao (Alphabetical order)
      Preprint available on arXiv [Link]

    Selected Publications (* indicates equal contributions, # indicates alphabetical order, and ‡ indicates advisees)
    • Deep Reinforcement Learning with Hierarchical Preference Design
      Alexander Bukharin‡, Yixiao Li‡, Pengcheng He, Weizhu Chen and Tuo Zhao
      International Conference on Machine Learning (ICML), 2025 [arXiv]
    • Discriminative Finetuning of Generative LLMs without Reward Models and Preference Data
      Siqi Guo, Ilgee Hong‡, Vicente Balmaseda, Tuo Zhao and Tianbao Yang
      International Conference on Machine Learning (ICML), 2025 [arXiv]
    • Robust Reinforcement Learning from Corrupted Human Feedback
      Alexander Bukharin‡, Ilgee Hong‡, Haoming Jiang, Zichong Li‡, Qingru Zhang‡, Zixuan Zhang‡ and Tuo Zhao#
      Annual Conference on Neural Information Processing (NeurIPS), 2024 [arXiv]
    • Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
      Zixuan Zhang*‡, Kaiqi Zhang*, Minshuo Chen, Mengdi Wang, Tuo Zhao and Yuxiang Wang
      Annual Conference on Neural Information Processing (NeurIPS), 2024 [arXiv]
    • Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
      Ilgee Hong‡*, Zichong Li‡*, Alexander Bukharin‡, Yixiao Li‡, Haoming Jiang, Tianbao Yang and Tuo Zhao
      Annual Conference on Neural Information Processing (NeurIPS), 2024 [arXiv]
    • Provable Acceleration of Nesterov's Accelerated Gradient for Asymmetric Matrix Factorization and Linear Neural Networks
      Zhenghao Xu‡, Yuqing Wang, Tuo Zhao, Rachel Ward and Molei Tao
      Annual Conference on Neural Information Processing (NeurIPS), 2024 [arXiv]
    • RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
      Haoyu Wang, Tianci Liu. Tuo Zhao and Jing Gao
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
    • BlendFilter: Advancing Retrieval-Augmented LLMs via Query Generation Blending and Knowledge Filtering
      Haoyu Wang, Tuo Zhao and Jing Gao
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
    • Data Diversity Matters for Robust Instruction Tuning
      Alexander Bukharin‡ and Tuo Zhao
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
    • Efficient Long Sequence Modeling via State Space Augmented Transformer
      Simiao Zuo*‡, Xiaodong Liu*, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao and Jianfeng Gao
      Conference on Language Modeling (COLM), 2024 [arXiv]
    • Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
      Zhenghao Xu‡, Xiang Ji, Minshuo Chen‡, Mengdi Wang and Tuo Zhao
      Accepted by Journal of Machine Learning Research (JMLR), 2024+ [arXiv]
    • Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer
      Yunhai Han, Rahul Batra, Nathan Boyd, Tuo Zhao, Yu She, Seth Hutchinson and Ye Zhao
      Accepted by IEEE/ASME Transactions on Mechatronics (TMECH), 2024+ [arXiv]
      International Conference on Advanced Intelligent Mechatronics (AIM), 2023 (short version)
    • Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification
      Zichong Li‡, Qunzhi Xu, Zhenghao Xu‡, Yajun mei, Tuo Zhao and Hongyuan Zha
      International Conference on Machine Learning (ICML), 2024 [arXiv]
    • To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
      Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang and Tianbao Yang
      International Conference on Machine Learning (ICML), 2024 [arXiv]
    • Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
      Qingru Zhang‡, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
      International Conference on Learning Representations (ICLR), 2024 [arXiv]
    • LoftQ: LoRA-Fine-Tuning-Aware Quantization for LLMs
      Yixiao Li‡, Yifan Yu‡, Chen Liang‡, Pengcheng He, Nikos Karampatziakis, Weizhu Chen and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2024 [arXiv]
    • Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces
      Hao Liu, Haizhao Yang, Minshuo Chen‡, Tuo Zhao and Wenjing Liao
      Accepted by Journal of Machine Learning Research (JMLR), 2024[arXiv]
    • Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
      Qingru Zhang‡, Dhananjay Ram, Cole Hawkins, Sheng Zha and Tuo Zhao
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 [arXiv]
    • HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference
      Haoyu Wang, Yaqing Wang, Tianci Liu, Tuo Zhao and Jing Gao
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
    • Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
      Alexander Bukharin‡, Yan Li‡, Yue Yu, Qingru Zhang‡, Zhehui Chen‡, Simiao Zuo‡, Chao Zhang, Songan Zhang and Tuo Zhao
      Annual Conference on Neural Information Processing (NeurIPS), 2023 [arXiv]
    • Module-wise Adaptive Distillation for Multimodality Foundation Models
      Chen Liang‡, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong and Tianyi Zhou
      Annual Conference on Neural Information Processing (NeurIPS), 2023 [arXiv]
    • Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
      Shenao Zhang, Boyi Liu, Zhaoran Wang and Tuo Zhao
      Annual Conference on Neural Information Processing (NeurIPS), 2023 [arXiv]
    • Pivotal Estimation of Linear Discriminant Analysis in High Dimensions
      Ethan Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu and Tuo Zhao
      Journal of Machine Learning Research (JMLR), 2023 [arXiv]
    • High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization
      Jiahui Cheng*, Minshuo Chen*‡, Hao Liu, Tuo Zhao and Wenjing Liao
      Sampling Theory, Signal Processing, and Data Analysis, 2023 [arXiv]
    • Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
      Yan Li‡, George Lan and Tuo Zhao
      Mathematical Programming Series Series A, 2023+ [arXiv]
    • LightToken: a Task and Model-agnostic Lightweight Token Embedding Framework for Pre-trained Language Models
      Haoyu Wang, Ruirui Li, Haoming Jiang, Zhengyang Wang, Xianfeng Tang, Bin Bi, Monica Cheng, Bing Yin, Yaqing Wang, Tuo Zhao and Jing Gao
      SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023 [arXiv]
    • County augmented transformer for COVID‐19 state hospitalizations prediction
      Siawpeng Er‡, Shihao Yang and Tuo Zhao
      Scientific Reports, 2023 [arXiv]
    • Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites
      Simiao Zuo‡, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang and Tuo Zhao
      Annual Meeting of the Association for Computational Linguistics (ACL), 2023 [arXiv]
    • Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
      Zixuan Zhang‡, Minshuo Chen‡, Mengdi Wang, Wenjing Liao and Tuo Zhao
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • Machine Learning Force Fields with Data Cost Aware Training
      Alexander Bukharin‡, Tianyi Liu, Shengjie Wang, Simiao Zuo‡, Weihao Gao, Wen Yan and Tuo Zhao
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
      Zichong Li‡, Yanbo Xu, Simiao Zuo‡, Haoming Jiang, Chao Zhang, Tuo Zhao and Hongyuan Zha
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • LoSparse: Structured Compression of LLMs based on Low-Rank and Sparse Approximation
      Yixiao Li*‡, Yifan Yu*‡, Qingru Zhang‡, Chen Liang‡, Pengcheng He, Weizhu Chen and Tuo Zhao
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
      Minshuo Chen‡, Kaixuan Huang, Tuo Zhao and Mengdi Wang
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • Less is More: Task-aware Layer-wise Distillation for Language Model Compression
      Chen Liang‡, Simiao Zuo‡, Qingru Zhang‡, Pengcheng He, Weizhu Chen and Tuo Zhao
      International Conference on Machine Learning (ICML), 2023 [arXiv]
    • A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
      Jie Wang, Minshuo Chen‡, Tuo Zhao, Wenjing Liao and Yao Xie
      Information and Inference: A Journal of the IMA, 2023 [arXiv]
    • Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
      Xiang Ji, Minshuo Chen‡, Mengdi Wang and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2023 [arXiv]
    • Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
      Qingru Zhang‡, Minshuo Chen‡, Alexander Bukharin‡, Pengcheng He, Yu Cheng, Weizhu Chen and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2023 [arXiv]
    • HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
      Chen Liang‡, Haoming Jiang, Zheng Li, Xianfeng Tang, Bing Yin and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2023 [arXiv]
    • Reinforcement Learning for Adaptive Mesh Refinement
      Jiachen Yang‡, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson and Daniel Faissol
      International Conference on Artificial Intelligence and Statistics (AISTATS), 2023 [arXiv]
    • Block Policy Mirror Descent
      George Lan, Yan Li‡ and Tuo Zhao
      SIAM Journal on Optimization (SIOPT), 33(3):2341-2378, 2023 [arXiv]
    • On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds
      Biraj Dahal, Alexander Havrilla, Minshuo Chen‡, Tuo Zhao and Wenjing Liao
      Annual Conference on Neural Information Processing (NeurIPS),2022 [arXiv]
    • Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
      Hao Liu, Minshuo Chen‡, Siawpeng Er‡, Wenjing Liao, Tong Zhang and Tuo Zhao
      International Conference on Machine Learning (ICML), 2022 [arXiv]
    • PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
      Qingru Zhang‡, Simiao Zuo‡, Chen Liang‡, Alex Bukharin‡, Pengcheng He, Weizhu Chen and Tuo Zhao
      International Conference on Machine Learning (ICML), 2022 [arXiv]
    • MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
      Simiao Zuo‡, Qingru Zhang‡, Chen Liang‡, Pengcheng He, Tuo Zhao and Weizhu Chen
      Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]
    • CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
      Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao and Chao Zhang
      Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]
    • Self-Training with Differentiable Teacher
      Simiao Zuo‡, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang Tuo Zhao and Hongyuan Zha
      Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]
    • Adversarially Regularized Policy Learning Guided by Trajectory Optimization
      Zhigen Zhao, Simiao Zuo‡, Tuo Zhao and Ye Zhao
      Annual Learning for Dynamics & Control Conference (L4DC), 2022 [arXiv]
    • CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
      Chen Liang‡, Pengcheng He, Yelong Shen, Weizhu Chen and Tuo Zhao
      Annual Meeting of the Association for Computational Linguistics (ACL), 2022 [arXiv]
    • No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
      Chen Liang‡, Haoming Jiang‡, Simiao Zuo‡, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2022 [arXiv]
    • Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits
      Yan Li‡, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao and Guanghui Lan
      International Conference on Learning Representations (ICLR), 2022 [arXiv]
    • Taming Sparsely Activated Transformer with Stochastic Experts
      Simiao Zuo‡, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao and Jianfeng Gao
      International Conference on Learning Representations (ICLR), 2022 [arXiv]
    • Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
      Yuqing Wang, Minshuo Chen‡, Tuo Zhao and Molei Tao
      International Conference on Learning Representations (ICLR), 2022 [arXiv]
    • Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
      Tianyi Liu‡, Yan Li‡, Enlu Zhou and Tuo Zhao
      International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 [arXiv]
    • Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning
      Jiachen Yang‡, Ethan Wang‡, Rakshit Trivedi, Tuo Zhao and Hongyuan Zha
      International Conference on Autonomous Agents and Multiagent Systems, 2022 [arXiv]
    • Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks
      Minshuo Chen‡, Haoming Jiang‡, Wenjing Liao and Tuo Zhao#
      Information and Inference: A Journal of the IMA, 2022 [arXiv, Poster]
    • Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
      Minshuo Chen‡, Yan Li‡, Zhuoran Yang, Zhaoran Wang and Tuo Zhao
      Annual Conference on Neural Information Processing (NeurIPS),2021 [arXiv]
    • Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
      Haoming Jiang‡, Bo Dai, Mengjiao Yang, Tuo Zhao and Wei Wei
      Conference on Empirical Methods in Natural Language Processing (EMNLP),2021 [arXiv]
    • Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach
      Simiao Zuo‡, Chen Liang‡, Haoming Jiang‡, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao and Tuo Zhao
      Conference on Empirical Methods in Natural Language Processing (EMNLP),2021 [arXiv]
    • Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks
      Hao Liu, Minshuo Chen‡, Tuo Zhao and Wenjing Liao
      International Conference on Machine Learning (ICML), 2021 [arXiv]
    • How Important is the Train-Validation Split in Meta-Learning?
      Yu Bai, Minshuo Chen‡, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang and Caiming Xiong
      International Conference on Machine Learning (ICML), 2021 [arXiv]
    • Fine-Tuning Pre-trained Language Models with Weak Supervision: A Contrastive-Regularized Self-Training Approach
      Yue Yu*, Simiao Zuo‡*, Haoming Jiang‡, Wendi Ren, Tuo Zhao and Chao Zhang
      Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021 [arXiv]
    • Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network
      Siawpeng Er‡, Edward Liu, Minshuo Chen‡, Yan Li‡, Yuqi Liu, Tuo Zhao and Hua Wang
      International Microwave Symposium (IMS), 2021
      [The Finalist of IMS 2021 Best Student Paper Competition]
    • Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective
      Kaixuan Huang*‡, Yuqing Wang*, Molei Tao and Tuo Zhao
      Annual Conference on Neural Information Processing Systems (NeurIPS), 2020 [arXiv, Poster]
    • Deep Reinforcement Learning with Smooth and Robust Policy
      Qianli Shen*‡, Yan Li*‡, Haoming Jiang‡, Zhaoran Wang and Tuo Zhao
      International Conference on Machine Learning (ICML), 2020 [arXiv]
    • Transformer Hawkes Process
      Simiao Zuo‡, Haoming Jiang‡, Zichong Li‡, Tuo Zhao and Hongyuan Zha
      International Conference on Machine Learning (ICML), 2020 [arXiv]
    • BOND: Bert-Assisted Open-Domain Named Entity Recognition with Distant Supervision
      Chen Liang*‡, Yue Yu*, Haoming Jiang*‡, Siawpeng Er, Ruijia Wang, Tuo Zhao and Chao Zhang
      SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020 [arXiv]
    • SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
      Haoming Jiang‡, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
      Annual Meeting of the Association for Computational Linguistics (ACL), 2020 [arXiv]
    • Residual Network Based Direct Synthesis of EM Structures: A Study on One-to-One Transformers
      David Munzer, Siawpeng Er‡, Minshuo Chen‡, Yan Li‡, Naga Mannem, Tuo Zhao and Hua Wang
      IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020 [arXiv]
    • Implicit Bias of Gradient Descent based Adversarial Training on Separable Data
      Yan Li‡, Ethan Fang, Huan Xu and Tuo Zhao
      International Conference on Learning Representations (ICLR), 2020 [arXiv, Poster]
    • Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
      Minshuo Chen‡, Haoming Jiang‡, Wenjing Liao and Tuo Zhao#
      Annual Conference on Neural Information Processing Systems (NeurIPS), 2019 [arXiv, Poster]
    • Towards Understanding the Importance of Shortcut Connections in Residual Networks
      Tianyi Liu*‡, Minshuo Chen*‡, Mo Zhou‡, Simon Du, Enlu Zhou and Tuo Zhao
      Annual Conference on Neural Information Processing Systems (NeurIPS), 2019 [arXiv, Poster]
    • Towards Understanding the Importance of Noise in Training Neural Networks
      Mo Zhou*‡, Tianyi Liu*‡, Yan Li‡, Dachao Lin, Enlu Zhou and Tuo Zhao
      International Conference on Machine Learning (ICML), 2019 [arXiv, Poster]
    • Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
      Jason Ge*‡, Xingguo Li*‡, Haoming Jiang‡, Han Liu, Tong Zhang, Mengdi Wang and Tuo Zhao
      Journal of Machine Learning Research (JMLR), 20(44):1−5, 2019 [PDF, Software]
      [2016 ASA Best Student Paper Award on Statistical Computing]
    • Symmetry, Saddle Points and Global Optimization Landscape of Nonconvex Matrix Factorization
      Xingguo Li‡, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang and Tuo Zhao
      IEEE Transactions on Information Theory, 65(6):3489-3514, 2019 [arXiv]
    • Pathwise Coordinate Optimization for Nonconvex Sparse Learning: Algorithm and Theory
      Tuo Zhao, Han Liu and Tong Zhang
      The Annals of Statistics, 46(1):180-218, 2018 [arXiv, Software]

    Software Packages
    • Picasso: Pathwise Calibrated Sparse Shooting Algorithm
      with Jason Ge, Xinguo Li, Haoming Jiang, Han Liu, Tong Zhang and Mengdi Wang
      [GitHub (Python), GitHub (R), Download (CRAN)]
    • PRIMAL: PaRametric sImplex Method for spArse Learning
      with Qianli Shen, Zichong Li, Yujia Xie [GitHub (R)]
    • Flare: Family of Lasso Regression
      with Xinguo Li, Lie Wang, Xiaoming Yuan and Han Liu [Download (CRAN)]
    • Huge: High-dimensional Undirected Graph Estimation
      with Haomingjiang, Xinyu Fei, Xingguo Li, Han Liu, Kathryn Roeder, John Lafferty and Larry Wasserman
      [GitHub (R), Download (CRAN)]
    • SAM: Sparse Additive Modeling
      with Haoming Jiang, Yukun Ma, Xinguo Li, Han Liu and Kathryn Roeder
      [GitHub (R), Download (CRAN)]

    Selected Awards and Honors
    • Google Faculty Research Award [2020]
    • 2016 INFORMS SAS Best Paper Award on Data Mining [2016]
    • 2016 ASA Best Student Paper Award on Statistical Computing [2016]
    • Baidu Fellowship [2015]
    • Siebel Scholarship [2014, Siebel Scholar Profile]
    • Google Summer of Code Award [2011-2013]
    • Winner of INDI ADHD-200 Global Competition [2011]

    Alchemists in My Group
  • Yixiao Li -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present)
  • Zhenghao Xu -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present, Coadvised by Molei Tao)
  • Zixuan Zhang -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present)
  • Ilgee Hong -- Ph.D. Student, ISyE, Georgia Tech (2023.8--Present)
  • Zichong Li -- Ph.D. Student, ISyE, Georgia Tech (2023.8--Present)
    Former Visiting Student, Georgia Tech (2019.7--2019.9)
  • Liming Liu -- Ph.D. Student, ISyE, Georgia Tech (2024.8--Present)

  • FLASH Alumni
    • Alexander Bukharin -- Ph.D. in Machine Learning, Georgia Tech (2021.8--2025.4)
      Current Position: Research Scientist, NVIDIA
    • Qingru Zhang -- Ph.D. in Machine Learning, Georgia Tech (2021.8--2025.4)
      Current Position: Research Scientist, Microsoft
    • Yan Li -- Ph.D. in Operations Research, Georgia Tech (2018.12--2024.7)
      Current Position: Assistant Professor of ISE, Texas A&M University
    • Chen Liang -- Ph.D. in Machine Learning, Georgia Tech (2018.8--2023.11)
      Current Position: Senior Research Scientist, Microsoft
    • Simiao Zuo -- Ph.D. in Machine Learning, Georgia Tech (2019.8--2023.4)
      Current Position: Senior Research Scientist, Microsoft
    • Minshuo Chen -- Ph.D. in Machine Learning, Georgia Tech (2017.6--2022.7)
      Current Position: Assistant Professor of IEMS, Northwestern University
    • Siawpeng Er -- Ph.D. in Bioinformatics, Georgia Tech (2019.8--2022.7)
      Current Position: Senior Data Scientist, Home Depot
    • Jiachen Yang -- Ph.D. in Machine Learning, Georgia Tech (2020.01--2021.12)
      Current Position: Co-Founder, Simular.ai
    • Yujia Xie -- Ph.D. in Computational Science and Engineering, Georgia Tech (2018.12--2021.8)
      Current Position: Principal Research Scientist, Microsoft
    • Zhehui Chen -- Ph.D. in Industrial Engineering, Georgia Tech (2016.8--2021.4)
      Current Position: Senior Software Development Engineer, Google
    • Haoming Jiang -- Ph.D. in Machine Learning, Georgia Tech (2017.8--2021.4)
      Current Position: Member of Technical Staff, OpenAI
    • Tianyi Liu -- Ph.D. in Operations Research, Georgia Tech (2017.9--2021.4, Coadvised by Enlu Zhou)
      Current Position: Senior Research Scientist, Amazon
    • Xingguo Li -- Visiting Student, Georgia Tech (2017.3--2018.6)
      Current Position: Quantitative Researcher, Radix Trading LLC
    • Lin Yang -- Visiting Student, Georgia Tech (2017.3--2017.6)
      Current Position: Associate Professor of ECE, University of California Los Angeles
    • Yuheng Cai -- Graduate Researcher, Georgia Tech (2024.7--2025.4)
      Current Position: Software Development Engineer, Google
    • Yuezhou Hu -- Visiting Student, Georgia Tech (2024.7--2024.9)
      Current Position: Ph.D. Student, University of University of California Berkeley
    • Jiaxin Guo -- Visiting Student, Georgia Tech (2024.7--2024.9)
      Current Position: Undergraduate Student, Tsinghua University
    • Yifan Yu -- Undergraduate Student Researcher, Georgia Tech (2021.8--2024.5)
      Current Position: Ph.D. Student, University of Illinous Urbana-Champaign
    • Ethan Wang -- Undergraduate Student Researcher, Georgia Tech (2020.01--2021.11, Coadvised by Hongyuan Zha)
      Current Position: Software Development Engineer, Jane Street
    • Jie Lyu -- Undergraduate Student Researcher, Georgia Tech (2020.1--2020.5)
      Current Position: Senior Machine Learning Engineer, Meta
    • Xinyu Fei -- Visiting Student, Georgia Tech (2018.7--2018.9)
      Current Position: Research Scientist, Amazon
    • Mo Zhou -- Visiting Student, Georgia Tech (2018.7--2018.9)
      Current Position: Postdoctral Fellow, University of Washington
    • Yizhou Wang -- Visiting Student, Georgia Tech (2019.1--2019.5)
      Current Position: Ph.D. Student, Northeastern University
    • Kaixuan Huang -- Visiting Student, Georgia Tech (2019.7--2019.9)
      Current Position: Ph.D. Student, Princeton University
    • Qianli Shen -- Visiting Student, Georgia Tech (2019.7--2019.9)
      Current Position: Ph.D. Student, National University of Singapore

    About Alchemy
    • Back When We were Kids
      Ali Rahimi - NeurIPS 2017 Test-of-Time Award Presentation [Link]
    • My Take on Ali Rahimi's "Test of Time" Award Talk at NeurIPS
      Quoted from Yann LeCun's Facebook [Link]
    • Ali Rahimi's Response to Yann LeCun
      Quoted from Ali Rahimi's Facebook [Link]
    • An Addendum to Alchemy
      Quoted from Ben Recht's Blog [Link]
    • The Role of Theory in Deep Learning
      Quoted from David McAllester's Blog [Link]

    Teaching
    • Basic Statistical Methods ISYE3030 -- 2019 Summer, 2019 Fall, 2020 Spring, 2020 Fall, Georgia Tech
    • Advanced Machine Learning ISYE8803 -- 2018 Spring, 2019 Spring, 2020 Fall, Georgia Tech
    • Introduction to Machine Learning ISYE4803 -- 2018 Fall, Georgia Tech
    • Machine Learning ISYE6740/CSE6740/CS7641 -- 2017 Spring, Fall, Georgia Tech

    NSF Projects
    • IIS-1717916: Topics in Temporal Marked Point Processes: Granger Causality, Imperfect Observations and Intervention (2017.9 - 2021.8) [Link]
    • DMS-2012652: Deep Neural Networks for Structured Data: Regression, Distribution Estimation, and Optimal Transport (2020.9-2024.8) [Link]
    • IIS-2008334: Go Beyond Short-term Dependency and Homogeneity: A General-Purpose Transformer Recipe for Multi-Domain Sequential Data Analysis (2020.9-2025.8) [Link]
    • DMS-2134037: Bridging Statistical Hypothesis Tests and Deep Learning for Reliability and Computational Efficiency (2022.1-2024.12) [Link]
    • IIS-2226152: RI: Small: Taming Massive Pre-trained Models under Label Scarcity via an Optimization Lens (2022.9-2026.8) [Link]
    Picture

    Contact
    Tuo Zhao
    H. Milton Stewart School of Industrial and Systems Engineering
    Groseclose 344
    755 Ferst Dr. NW
    Atlanta, GA 30332
    Email: tourzhao (at) gatech (dot) edu