Junxiao Song

PhD
2011 – 2015

DeepSeek AI

Junxiao Song is a principal researcher at DeepSeek AI, where he has played a pivotal role in developing cutting-edge language models that challenge state-of-the-art systems while maintaining exceptional cost efficiency.

Academic Background and Early Career

Song completed his PhD at the Hong Kong University of Science and Technology (HKUST) under the supervision of Prof. Palomar. His research focused on optimization methods for signal processing, with several highly-cited papers in IEEE Transactions on Signal Processing.

Key Contributions at DeepSeek: ¹

Proposed the novel reinforcement learning algorithm GRPO (Group Relative Policy Optimization), which has been applied to train nearly all models in the DeepSeek series, e.g., DeepSeek-R1.
Co-developed DeepSeek-V3 (671B param MoE) and DeepSeek-V2, achieving GPT-4 level performance at 1/10 training cost.
Created novel reinforcement learning pipelines in DeepSeek-R1, eliminating supervised fine-tuning needs.
Pioneered resource-efficient training enabling 671B parameter models with $5.5M compute budget.
Developed model distillation techniques producing state-of-the-art 7B/70B variants.
Led DeepSeek-Prover-V1.5 integrating Lean 4 for theorem proving.
Contributed to DeepSeek-Coder-V2 surpassing closed models in code intelligence.

This biography was prepared with the assistance of DeepSeek-R1. ↩︎

Interests

Convex Optimization
Reinforcement Learning
Mixture-of-Experts Architectures
Mathematical Reasoning in LLMs

Education

PhD in Electronic and Computer Engineering, 2015

The Hong Kong University of Science and Technology (HKUST)
BSc in Automation, 2011

Zhejiang University

Junxiao Song

PhD 2011 – 2015

DeepSeek AI

PhD
2011 – 2015