CV
Education
- Ph.D. in Computer Science Engineering, University of Texas at Arlington, 2017-2023
- Focus: Deep Learning Frameworks, GPU Computing
- M.E. in Information, Production, and Systems, Waseda University, 2014-2016
- Focus: 3D Integrated Circuit Design
- B.E. in Electronic Science and Engineering, Southeast University, 2011-2016
- Focus: Electronic Science
Work experience
- 2023-Present: Assistant Professor
- Faculty of Data Science, City University of Macau
- Research focus: Deep Learning Systems, Performance Optimization
- Location: Macau, China
Intern experience
- 2021: Software Engineer Intern
- ProtagoLabs (Remote)
- Developed stable elastic distributed deep learning system for large-scale NLP model training
- 2020: Software Engineering Intern
- 2015: Research Intern
- Hitachi Research Laboratory, Japan
- Data analysis for Ropits autonomous vehicle
- Safety scope determination through vehicle speed and tire rotation analysis
- 2014: Research Intern
- Institute of Electronics, Chinese Academy of Sciences, Beijing
- Implemented GPU-accelerated target detection algorithm
Research Experience
- Large Model Training AI Infrastructure (2024-Present)
- Working with Nvidia Megatron-LM
- Developing AI infrastructure based on domestic Ascend 910 series
- Brain-inspired Deep Learning (2023-Present)
- Algorithm optimization for Spiking Neural Networks in CNN model accuracy
- Deep Learning Systems (2018-Present)
- Atom System for Large Language Model Training
- Elastic fault-tolerant system for asynchronous training in decentralized environments
- Memory swapping and sub-model preloading for improved resource utilization
- Data Loader and Data Reuse Acceleration
- Novel optimization method for data reuse without accuracy degradation
- Deep Learning Compilation
- TVM integration and automatic optimization for various models
- SwitchFlow Framework
- Preemptive multitasking framework for deep learning
- Best Paper Award at Middleware 2021 (1/107)
- JVM HotSpot Heap Management (2017-2018)
- Side-channel attack research on parallel scavenge
- Investigation of time-stretching effects on heap size adjustment
Publications
Talks
- 2024: Guangdong Institute of Intelligent Science and Technology - FTBC: Forward Temporal Bias Correction
- 2024: Pengcheng Laboratory - Decentralized Distributed Deep Learning Training Framework
- 2023: University of Macau - Bridging the Resource Scheduling Gap in Deep Learning System
- 2023: Zhejiang Lab - Decentralized Distributed Deep Learning Training Framework
- 2023: Microsoft Research DeepSpeed Team
- 2021: Middleware’21 Conference - SwitchFlow: Preemptive Multitasking for Deep Learning
- 2018: HotCloud’18 Conference - HotSpot Side-channel Attack
Skills
- Programming Languages: C/C++, Python, CUDA, Java
- Frameworks & Tools: TensorFlow, PyTorch, Huggingface NLP, Keras, TVM, Docker, Ray, Bazel, CMake, gRPC
- Hardware Development:
- Second Prize in National College Student Electronic Design Competition (2013)
- Second Prize in Southeast University Embedded System Design Competition (2014)
- Book Chapter: “National College Student Electronic Design Competition Excellent Works Collection” Book Chapter