Sitong Zhang

Sitong Zhang

postdoc · aalto university · espoo, finland

Hi — I’m Sitong Zhang, a Postdoctoral Researcher at Aalto University, Department of Computer Science, working with Prof. Bo Zhao.

Before Aalto I was a Postdoctoral Researcher at CityU-Oxford Joint CIMDA, City University of Hong Kong, working with Prof. Hong Yan (IEEE Fellow). I received my PhD (2023) and BEng (2018) from Harbin Engineering University, advised by Prof. Yibing Li.

Currently building
HarnessKitA control plane for your AI coding agents — see, secure, and manage every extension and config from one place.
ItsMyPodA personal podcast channel turning your reading backlog into a daily audio edition — a full LLM + TTS pipeline.

Research Interests

Deep Reinforcement Learning MLSys LLM Infra

My research lies at the intersection of machine learning and systems. I currently focus on infrastructure for distributed reinforcement learning and LLM post-training, with particular interest in the runtime designs that make these workloads fast, adaptive, and cost-aware at cluster scale.

This builds on my doctoral work in reinforcement learning itself, where I developed deep reinforcement learning methods for UAV autonomous navigation. After years inside the training loop, I now work on the systems that run it at scale.

RL Training Stack
Algorithm 2018–2024
policy design, learning objectives
Runtime 2025–
scheduling when
placement where
orchestration how
hover to explore
Hover a layer to see what I work on there.
past work · 2018–2024
Deep reinforcement learning for UAV autonomous navigation — obstacle avoidance, long-distance trajectory planning, and human-in-the-loop motion planning.
current focus · 2025–
Runtime systems for distributed RL and LLM post-training. The right balance of compute, memory, and interconnect shifts continuously with model scale, workload characteristics, and hardware conditions. Scheduling, placement, and orchestration are the runtime levers that keep the system responsive to these dynamics at cluster scale.

Selected Publications

Systems papers currently under submission.

    1. 2022
      Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach
      Applied Soft Computing[html][code][video][bibtex]
    1. 2023
      A hybrid human-in-the-loop deep reinforcement learning method for UAV motion planning for long trajectories with unpredictable obstacles
    2. 2023
      Dynamic redeployment of UAV base stations in large-scale and unreliable environments
      Internet of Things[html][bibtex]
view all publications on Google Scholar →