Below is a short summary and detailed review of this video written by FutureFactual:
ACE: Sony AI's Table Tennis Robot Challenges Elite Players
Overview
In this piece, Sony AI Zurich unveils ACE, the first physical AI system designed to challenge elite table tennis players. The system combines learning from simulated practice with rigorous optimization-based control to ensure safety and speed on the real court.
- Key insight 1: ACE leverages reinforcement learning in simulation paired with optimization-based control for safety.
- Key insight 2: Perception uses multiple cameras and an event-based gaze system to measure ball position and spin in real time.
- Key insight 3: The end-to-end loop from ball in space to racket torque is about 20 milliseconds, far faster than human reaction times.
- Key insight 4: Real-world tests with licensed umpires and standard equipment show ACE competing with and outperforming elite players on a level playing field.
Introduction and Problem Setup
The video presents Peter Dur, director of Sony AI Zurich, describing the fascination with elite table tennis and the dream of building a table tennis robot. ACE is introduced as the first physical AI system capable of challenging elite athletes in a real sport. Dur emphasizes that in table tennis the ball travels at high speeds and spins aggressively, creating a demanding test bed for both physics and reaction time. A key theme is safety: unlike virtual simulations, a robot in the real world can injure players or damage itself, so learning must be done safely.
"Spin is an essential element of the game of table tennis, as when the ball spins, that influences the trajectory of the ball" - Peter Dur
ACE System Architecture and Real-Time Performance
The ACE robot is built around three core components: perception, control, and hardware. The perception system relies on high-frame-rate cameras (around 200 fps) to triangulate the ball in 3D space, while an event-based camera-based gaze control system measures the ball's spin in flight. To capture spin accurately, the team uses specialized optics with fast field-of-view changes and reads a printed logo on ITTF-certified balls to infer spin magnitude and axis. The control system converts observations into torque commands for a custom six-DOF arm plus two extra DOFs for spatial reach, delivering end-to-end latency of about 20 ms from ball flight to robot action—roughly ten times faster than a human reaction time.
"Reinforcement learning with an optimization based part lets our robot control run in the real world in situations that we haven't seen in simulation" - Peter Dur