Junxiao Song is a principal researcher at DeepSeek AI, where he has played a pivotal role in developing cutting-edge language models that challenge state-of-the-art systems while maintaining exceptional cost efficiency.
Academic Background and Early Career
Song completed his PhD at the Hong Kong University of Science and Technology (HKUST) under the supervision of Prof. Palomar. His research focused on optimization methods for signal processing, with several highly-cited papers in IEEE Transactions on Signal Processing.
Key Contributions at DeepSeek: 1
Proposed the novel reinforcement learning algorithm GRPO (Group Relative Policy Optimization), which has been applied to train nearly all models in the DeepSeek series, e.g., DeepSeek-R1.
Co-developed DeepSeek-V3 (671B param MoE) and DeepSeek-V2, achieving GPT-4 level performance at 1/10 training cost.
Created novel reinforcement learning pipelines in DeepSeek-R1, eliminating supervised fine-tuning needs.
Pioneered resource-efficient training enabling 671B parameter models with $5.5M compute budget.
Developed model distillation techniques producing state-of-the-art 7B/70B variants.
Led DeepSeek-Prover-V1.5 integrating Lean 4 for theorem proving.
Contributed to DeepSeek-Coder-V2 surpassing closed models in code intelligence.
This biography was prepared with the assistance of DeepSeek-R1. âŠī¸
PhD in Electronic and Computer Engineering, 2015
The Hong Kong University of Science and Technology (HKUST)
BSc in Automation, 2011
Zhejiang University