Junxiao Song

Junxiao Song

PhD
2011 – 2015

DeepSeek AI

Junxiao Song is a principal researcher at DeepSeek AI, where he has played a pivotal role in developing cutting-edge language models that challenge state-of-the-art systems while maintaining exceptional cost efficiency.

Academic Background and Early Career

Song completed his PhD at the Hong Kong University of Science and Technology (HKUST) under the supervision of Prof. Palomar. His research focused on optimization methods for signal processing, with several highly-cited papers in IEEE Transactions on Signal Processing.

Key Contributions at DeepSeek: 1

  • Proposed the novel reinforcement learning algorithm GRPO (Group Relative Policy Optimization), which has been applied to train nearly all models in the DeepSeek series, e.g., DeepSeek-R1.

  • Co-developed DeepSeek-V3 (671B param MoE) and DeepSeek-V2, achieving GPT-4 level performance at 1/10 training cost.

  • Created novel reinforcement learning pipelines in DeepSeek-R1, eliminating supervised fine-tuning needs.

  • Pioneered resource-efficient training enabling 671B parameter models with $5.5M compute budget.

  • Developed model distillation techniques producing state-of-the-art 7B/70B variants.

  • Led DeepSeek-Prover-V1.5 integrating Lean 4 for theorem proving.

  • Contributed to DeepSeek-Coder-V2 surpassing closed models in code intelligence.


  1. This biography was prepared with the assistance of DeepSeek-R1. â†Šī¸Ž

Interests
  • Convex Optimization
  • Reinforcement Learning
  • Mixture-of-Experts Architectures
  • Mathematical Reasoning in LLMs
Education
  • PhD in Electronic and Computer Engineering, 2015

    The Hong Kong University of Science and Technology (HKUST)

  • BSc in Automation, 2011

    Zhejiang University