Sitong Zhang

postdoc · aalto university · espoo, finland

Hi — I’m Sitong Zhang, a Postdoctoral Researcher at Aalto University, Department of Computer Science, working with Prof. Bo Zhao.

Before Aalto I was a Postdoctoral Researcher at CityU-Oxford Joint CIMDA, City University of Hong Kong, working with Prof. Hong Yan (IEEE Fellow). I received my PhD (2023) and BEng (2018) from Harbin Engineering University, advised by Prof. Yibing Li.

Currently building

HarnessKit — A control plane for your AI coding agents — see, secure, and manage every extension and config from one place. ↗ ★—

ItsMyPod — A personal podcast channel turning your reading backlog into a daily audio edition — a full LLM + TTS pipeline.

Research Interests

Deep Reinforcement Learning MLSys LLM Infra

My research lies at the intersection of machine learning and systems. I currently focus on infrastructure for distributed reinforcement learning and LLM post-training, with particular interest in the runtime designs that make these workloads fast, adaptive, and cost-aware at cluster scale.

This builds on my doctoral work in reinforcement learning itself, where I developed deep reinforcement learning methods for UAV autonomous navigation. After years inside the training loop, I now work on the systems that run it at scale.

RL Training Stack

Algorithm 2018–2024

policy design, learning objectives

Runtime 2025–

▸ scheduling when

▸ placement where

▸ orchestration how

hover to explore

Hover a layer to see what I work on there.

past work · 2018–2024

Deep reinforcement learning for UAV autonomous navigation — obstacle avoidance, long-distance trajectory planning, and human-in-the-loop motion planning.

current focus · 2025–

Runtime systems for distributed RL and LLM post-training. The right balance of compute, memory, and interconnect shifts continuously with model scale, workload characteristics, and hardware conditions. Scheduling, placement, and orchestration are the runtime levers that keep the system responsive to these dynamics at cluster scale.

Selected Publications

Systems papers currently under submission.

2022

Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach

Applied Soft Computing[html][code][video][bibtex]

@article{zhang2022autonomous,
  title = {Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach},
  author = {Zhang, Sitong and Li, Yibing and Dong, Qianhui},
  journal = {Applied Soft Computing},
  volume = {115},
  pages = {108194},
  year = {2022},
  publisher = {Elsevier},
  html = {https://doi.org/10.1016/j.asoc.2021.108194},
  code = {https://github.com/RealZST/TD3-based_UAV_Collision_Avoidance},
  bibtex_show = {true},
  selected = {true},
  preview = {asoc.png},
  video = {https://youtu.be/1zL-srwnoZE?si=GUKcP2WIIknG30tJ},
  podcast = {https://open.spotify.com/episode/0LpUkC6t0rKS1zCn9S490v?si=1b94c81991564f9e}
}

2023

A hybrid human-in-the-loop deep reinforcement learning method for UAV motion planning for long trajectories with unpredictable obstacles

Drones[html][code][bibtex]

@article{zhang2023hybrid,
  title = {A hybrid human-in-the-loop deep reinforcement learning method for UAV motion planning for long trajectories with unpredictable obstacles},
  author = {Zhang, Sitong and Li, Yibing and Ye, Fang and Geng, Xiaoyu and Zhou, Zitao and Shi, Tuo},
  journal = {Drones},
  volume = {7},
  number = {5},
  pages = {311},
  year = {2023},
  publisher = {MDPI},
  html = {https://doi.org/10.3390/drones7050311},
  code = {https://github.com/RealZST/DRL-based_UAV_Motion_Planning},
  bibtex_show = {true},
  selected = {true},
  preview = {drones.png},
  podcast = {https://open.spotify.com/episode/3XhLrCE2SYyKiYiZw4LDBr?si=fbacadb8a7a44721}
}

2023

Dynamic redeployment of UAV base stations in large-scale and unreliable environments

Internet of Things[html][bibtex]

@article{zhang2023dynamic,
  title = {Dynamic redeployment of UAV base stations in large-scale and unreliable environments},
  author = {Zhang, Sitong and Li, Yibing and Tian, Yuan and Zhou, Zitao and Geng, Xiaoyu and Shi, Tuo},
  journal = {Internet of Things},
  volume = {24},
  pages = {100985},
  year = {2023},
  publisher = {Elsevier},
  html = {https://doi.org/10.1016/j.iot.2023.100985},
  bibtex_show = {true},
  selected = {true},
  preview = {iot2.png},
  podcast = {https://open.spotify.com/episode/2fDkatDUjJp66TtVeJVPKu?si=13378ee051e746df}
}

view all publications on Google Scholar →